suggester issues

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

suggester issues

Kuba Krzemień
Hello, I am working on creating a auto-complete functionality for my platform which indexes large ammounts of text (title + contents) - there is too much data for a dictionary. I am using the latest version of Solr (3.3) and I am trying to take advantage of the Suggester functionality. Unfortunately so far the outcome isn't that great.

The Suggester works only for single words or whole phrases (depends on the tokenizer). When using the first option, I am unable to suggest any combined queries. For example the suggestion for 'ne' will be 'new'. Suggestion for 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats worse, querying 'new AND y' gives the same results (also when using collate), which means that the returned suggestion may give no results - what makes sense separately often doesn't work combined. I need a way to find only those suggestions, that will return results when doing a AND query (for example 'new AND york', 'new AND year', as long as they give results upon querying - 'new AND yeti' shouldn't be returned as a suggestion).

When I use the second tokenizer and the suggestions return phrases, for 'ne' I will get 'new york' and 'new year', but for 'new y' I will get nothing. Also, for 'y' I will get nothing, so the issue remains.

If someone has some experience working with the Suggester, or if someone has created a well working auto-suggester based on Solr, please help me. I've been trying to find a sollution for this for quite some time.

Yours sincerely,
Jackob K
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

Alexei Martchenko
I have the very very very same problem. I could copy+paste your message as
mine. I've discovered so far that bigger dictionaries work better for me,
controlling threshold is much better than avoid indexing one or twio fields.
Of course i'm still polishing this.

At this very moment I was looking into Shingles, are you using them?
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory

How are your fields?

2011/8/17 Kuba Krzemień <[hidden email]>

> Hello, I am working on creating a auto-complete functionality for my
> platform which indexes large ammounts of text (title + contents) - there is
> too much data for a dictionary. I am using the latest version of Solr (3.3)
> and I am trying to take advantage of the Suggester functionality.
> Unfortunately so far the outcome isn't that great.
>
> The Suggester works only for single words or whole phrases (depends on the
> tokenizer). When using the first option, I am unable to suggest any combined
> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
> worse, querying 'new AND y' gives the same results (also when using
> collate), which means that the returned suggestion may give no results -
> what makes sense separately often doesn't work combined. I need a way to
> find only those suggestions, that will return results when doing a AND query
> (for example 'new AND york', 'new AND year', as long as they give results
> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>
> When I use the second tokenizer and the suggestions return phrases, for
> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
> nothing. Also, for 'y' I will get nothing, so the issue remains.
>
> If someone has some experience working with the Suggester, or if someone
> has created a well working auto-suggester based on Solr, please help me.
> I've been trying to find a sollution for this for quite some time.
>
> Yours sincerely,
> Jackob K
>



--

*Alexei Martchenko* | *CEO* | Superdownloads
[hidden email] | [hidden email] | (11)
5083.1018/5080.3535/5080.3533
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

Alexei Martchenko
I've been indexing and reindexing stuff here with Shingles. I don't believe
it's the best approach. Results are interesting, but I believe it's not what
the suggester is meant to be.

I tried

<fieldType name="textSuggestion" class="solr.TextField"
positionIncrementGap="10" stored="false" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="4"
outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

but I got compound words in the suggestion itself.

If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i
get

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="dri">
<int name="numFound">6</int>
<int name="startOffset">0</int>
<int name="endOffset">3</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
<str>drive</str>
<str>driver</str>
</arr>
</lst>
<str name="collation">drivers</str>
</lst>
</lst>
</response>

but when i enter the second word,
http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n>
it
scrambles everything

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="drivers">
<int name="numFound">4</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
</arr>
</lst>
<lst name="n">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>nvidia</str>
<str>net</str>
<str>nvidia geforce</str>
<str>network</str>
<str>new</str>
<str>n</str>
<str>ninja</str>
</arr>
</lst>
<str name="collation">drivers nvidia</str>
</lst>
</lst>
</response>

Although the collation seems fine for this, it's not exactly what suggester
is supposed to do.

Any thoughts?

2011/8/17 Alexei Martchenko <[hidden email]>

> I have the very very very same problem. I could copy+paste your message as
> mine. I've discovered so far that bigger dictionaries work better for me,
> controlling threshold is much better than avoid indexing one or twio fields.
> Of course i'm still polishing this.
>
> At this very moment I was looking into Shingles, are you using them?
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
> How are your fields?
>
> 2011/8/17 Kuba Krzemień <[hidden email]>
>
>> Hello, I am working on creating a auto-complete functionality for my
>> platform which indexes large ammounts of text (title + contents) - there is
>> too much data for a dictionary. I am using the latest version of Solr (3.3)
>> and I am trying to take advantage of the Suggester functionality.
>> Unfortunately so far the outcome isn't that great.
>>
>> The Suggester works only for single words or whole phrases (depends on the
>> tokenizer). When using the first option, I am unable to suggest any combined
>> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
>> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
>> worse, querying 'new AND y' gives the same results (also when using
>> collate), which means that the returned suggestion may give no results -
>> what makes sense separately often doesn't work combined. I need a way to
>> find only those suggestions, that will return results when doing a AND query
>> (for example 'new AND york', 'new AND year', as long as they give results
>> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>>
>> When I use the second tokenizer and the suggestions return phrases, for
>> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
>> nothing. Also, for 'y' I will get nothing, so the issue remains.
>>
>> If someone has some experience working with the Suggester, or if someone
>> has created a well working auto-suggester based on Solr, please help me.
>> I've been trying to find a sollution for this for quite some time.
>>
>> Yours sincerely,
>> Jackob K
>>
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> [hidden email] | [hidden email] | (11)
> 5083.1018/5080.3535/5080.3533
>
>


--

*Alexei Martchenko* | *CEO* | Superdownloads
[hidden email] | [hidden email] | (11)
5083.1018/5080.3535/5080.3533
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

Kuba Krzemien
What happens if you set spellcheck.maxCollations to more than 1?

--------------------------------------------------
From: "Alexei Martchenko" <[hidden email]>
Sent: Wednesday, August 17, 2011 11:01 PM
To: <[hidden email]>
Subject: Re: suggester issues

> I've been indexing and reindexing stuff here with Shingles. I don't
> believe
> it's the best approach. Results are interesting, but I believe it's not
> what
> the suggester is meant to be.
>
> I tried
>
> <fieldType name="textSuggestion" class="solr.TextField"
> positionIncrementGap="10" stored="false" multiValued="true">
> <analyzer type="index">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.ShingleFilterFactory" maxShingleSize="4"
> outputUnigrams="true" outputUnigramsIfNoShingles="false" />
> </analyzer>
> <analyzer type="query">
> <tokenizer class="solr.StandardTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory"/>
> <filter class="solr.StandardFilterFactory"/>
> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
> </analyzer>
> </fieldType>
>
> but I got compound words in the suggestion itself.
>
> If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri
> i
> get
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> </lst>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="dri">
> <int name="numFound">6</int>
> <int name="startOffset">0</int>
> <int name="endOffset">3</int>
> <arr name="suggestion">
> <str>drivers</str>
> <str>drivers nvidia</str>
> <str>drivers intel</str>
> <str>drivers nvidia geforce</str>
> <str>drive</str>
> <str>driver</str>
> </arr>
> </lst>
> <str name="collation">drivers</str>
> </lst>
> </lst>
> </response>
>
> but when i enter the second word,
> http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n>
> it
> scrambles everything
>
> <response>
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">0</int>
> </lst>
> <lst name="spellcheck">
> <lst name="suggestions">
> <lst name="drivers">
> <int name="numFound">4</int>
> <int name="startOffset">0</int>
> <int name="endOffset">7</int>
> <arr name="suggestion">
> <str>drivers</str>
> <str>drivers nvidia</str>
> <str>drivers intel</str>
> <str>drivers nvidia geforce</str>
> </arr>
> </lst>
> <lst name="n">
> <int name="numFound">10</int>
> <int name="startOffset">8</int>
> <int name="endOffset">9</int>
> <arr name="suggestion">
> <str>nvidia</str>
> <str>net</str>
> <str>nvidia geforce</str>
> <str>network</str>
> <str>new</str>
> <str>n</str>
> <str>ninja</str>
> </arr>
> </lst>
> <str name="collation">drivers nvidia</str>
> </lst>
> </lst>
> </response>
>
> Although the collation seems fine for this, it's not exactly what
> suggester
> is supposed to do.
>
> Any thoughts?
>
> 2011/8/17 Alexei Martchenko <[hidden email]>
>
>> I have the very very very same problem. I could copy+paste your message
>> as
>> mine. I've discovered so far that bigger dictionaries work better for me,
>> controlling threshold is much better than avoid indexing one or twio
>> fields.
>> Of course i'm still polishing this.
>>
>> At this very moment I was looking into Shingles, are you using them?
>> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>>
>> How are your fields?
>>
>> 2011/8/17 Kuba Krzemień <[hidden email]>
>>
>>> Hello, I am working on creating a auto-complete functionality for my
>>> platform which indexes large ammounts of text (title + contents) - there
>>> is
>>> too much data for a dictionary. I am using the latest version of Solr
>>> (3.3)
>>> and I am trying to take advantage of the Suggester functionality.
>>> Unfortunately so far the outcome isn't that great.
>>>
>>> The Suggester works only for single words or whole phrases (depends on
>>> the
>>> tokenizer). When using the first option, I am unable to suggest any
>>> combined
>>> queries. For example the suggestion for 'ne' will be 'new'. Suggestion
>>> for
>>> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
>>> worse, querying 'new AND y' gives the same results (also when using
>>> collate), which means that the returned suggestion may give no results -
>>> what makes sense separately often doesn't work combined. I need a way to
>>> find only those suggestions, that will return results when doing a AND
>>> query
>>> (for example 'new AND york', 'new AND year', as long as they give
>>> results
>>> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>>>
>>> When I use the second tokenizer and the suggestions return phrases, for
>>> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
>>> nothing. Also, for 'y' I will get nothing, so the issue remains.
>>>
>>> If someone has some experience working with the Suggester, or if someone
>>> has created a well working auto-suggester based on Solr, please help me.
>>> I've been trying to find a sollution for this for quite some time.
>>>
>>> Yours sincerely,
>>> Jackob K
>>>
>>
>>
>>
>> --
>>
>> *Alexei Martchenko* | *CEO* | Superdownloads
>> [hidden email] | [hidden email] | (11)
>> 5083.1018/5080.3535/5080.3533
>>
>>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> [hidden email] | [hidden email] | (11)
> 5083.1018/5080.3535/5080.3533
>
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

O. Klein
The problem lies in the suggester like the spellchecker, tokenizing on whitespace. So while shingles might give you nice suggestions, the behaviour of the Suggester makes it unusable.

Besides that, I never succeeded in getting the suggester to show more collations then one. Normal spellchecker on the same fields showed them allright.

Unless Im missing some hidden features or something, I think the Suggester might need some work to make it work like people expect it to work.
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

oberman_cs
I was trying to deal with the exact same issue, with the exact same results.  Is there really no way to feed a phrase into the suggester (spellchecker) without it splitting the input phrase into words?
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

Alexei Martchenko
It can be done, I did that with shingles, but it's not the way it's meant to
be. The main problem with suggester is that we want compound words and we
never get them. I try to get "internet explorer" but when i enter in the
second word, "internet e" the suggester never finds "explorer".

2011/8/18 oberman_cs <[hidden email]>

> I was trying to deal with the exact same issue, with the exact same
> results.
> Is there really no way to feed a phrase into the suggester (spellchecker)
> without it splitting the input phrase into words?
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>



--

*Alexei Martchenko* | *CEO* | Superdownloads
[hidden email] | [hidden email] | (11)
5083.1018/5080.3535/5080.3533
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

oberman_cs
I tried this:
package com.civicscience;

import java.util.ArrayList;
import java.util.Collection;
import java.util.Collections;

import org.apache.lucene.analysis.Token;
import org.apache.solr.spelling.QueryConverter;

/**                                                                                                                                                                                                                                        
 * Converts the query string to a Collection of Lucene tokens.                                                                                                                                                                            
 **/
public class SpellingQueryConverter extends QueryConverter  {

  /**                                                                                                                                                                                                                                      
   * Converts the original query string to a collection of Lucene Tokens.                                                                                                                                                                
   * @param original the original query string                                                                                                                                                                                            
   * @return a Collection of Lucene Tokens                                                                                                                                                                                                
   */
  @Override
  public Collection<Token> convert(String original) {
    if (original == null) {                                                                                                                                                            
      return Collections.emptyList();
    }
    Collection<Token> result = new ArrayList<Token>();
    Token token = new Token(original, 0, original.length(), "word");
    result.add(token);
    return result;
  }

}

And added it to the classpath, and now it does what I expect.

will


On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:

> It can be done, I did that with shingles, but it's not the way it's meant to
> be. The main problem with suggester is that we want compound words and we
> never get them. I try to get "internet explorer" but when i enter in the
> second word, "internet e" the suggester never finds "explorer".
>
> 2011/8/18 oberman_cs <[hidden email]>
>
>> I was trying to deal with the exact same issue, with the exact same
>> results.
>> Is there really no way to feed a phrase into the suggester (spellchecker)
>> without it splitting the input phrase into words?
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>> Sent from the Solr - User mailing list archive at Nabble.com.
>>
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> [hidden email] | [hidden email] | (11)
> 5083.1018/5080.3535/5080.3533

Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

Kuba Krzemien
As far as I checked creating a custom query converter is the only way to
make this work.
Unfortunately I have some problems with running it - after creating a JAR
with my class (Im using your source code, obviously besides package and
class names) and throwing it into the lib dir I've added <queryConverter
name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to
solrconfig.xml.

I get a "SEVERE: org.apache.solr.common.SolrException: Error Instantiating
QueryConverter, mypackage.MySpellingQueryConverter is not a
org.apache.solr.spelling.QueryConverter".

What am I doing wrong?

--------------------------------------------------
From: "William Oberman" <[hidden email]>
Sent: Thursday, August 18, 2011 10:35 PM
To: <[hidden email]>
Subject: Re: suggester issues

> I tried this:
> package com.civicscience;
>
> import java.util.ArrayList;
> import java.util.Collection;
> import java.util.Collections;
>
> import org.apache.lucene.analysis.Token;
> import org.apache.solr.spelling.QueryConverter;
>
> /**
> * Converts the query string to a Collection of Lucene tokens.
> **/
> public class SpellingQueryConverter extends QueryConverter  {
>
>  /**
>   * Converts the original query string to a collection of Lucene Tokens.
>   * @param original the original query string
>   * @return a Collection of Lucene Tokens
>   */
>  @Override
>  public Collection<Token> convert(String original) {
>    if (original == null) {
>      return Collections.emptyList();
>    }
>    Collection<Token> result = new ArrayList<Token>();
>    Token token = new Token(original, 0, original.length(), "word");
>    result.add(token);
>    return result;
>  }
>
> }
>
> And added it to the classpath, and now it does what I expect.
>
> will
>
>
> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>
>> It can be done, I did that with shingles, but it's not the way it's meant
>> to
>> be. The main problem with suggester is that we want compound words and we
>> never get them. I try to get "internet explorer" but when i enter in the
>> second word, "internet e" the suggester never finds "explorer".
>>
>> 2011/8/18 oberman_cs <[hidden email]>
>>
>>> I was trying to deal with the exact same issue, with the exact same
>>> results.
>>> Is there really no way to feed a phrase into the suggester
>>> (spellchecker)
>>> without it splitting the input phrase into words?
>>>
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>>
>> *Alexei Martchenko* | *CEO* | Superdownloads
>> [hidden email] | [hidden email] | (11)
>> 5083.1018/5080.3535/5080.3533
>
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

oberman_cs
Hard to say, so I'll list the exact steps I took:
-Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
-Untar and cd
-ant
-Wrote my class below (under a peer directory in apache-solr-3.3.0)
-javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar com/civicscience/SpellingQueryConverter.java
-jar cf cs.jar com
-Unzipped solr.war (under example)
-Added my cs.jar to lib (under web-inf)
-Rezipped solr.war
-Added: <queryConverter name="queryConverter" class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml
-Restarted jetty

And, that seemed to all work.

will

On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:

> As far as I checked creating a custom query converter is the only way to make this work.
> Unfortunately I have some problems with running it - after creating a JAR with my class (Im using your source code, obviously besides package and class names) and throwing it into the lib dir I've added <queryConverter name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to solrconfig.xml.
>
> I get a "SEVERE: org.apache.solr.common.SolrException: Error Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a org.apache.solr.spelling.QueryConverter".
>
> What am I doing wrong?
>
> --------------------------------------------------
> From: "William Oberman" <[hidden email]>
> Sent: Thursday, August 18, 2011 10:35 PM
> To: <[hidden email]>
> Subject: Re: suggester issues
>
>> I tried this:
>> package com.civicscience;
>>
>> import java.util.ArrayList;
>> import java.util.Collection;
>> import java.util.Collections;
>>
>> import org.apache.lucene.analysis.Token;
>> import org.apache.solr.spelling.QueryConverter;
>>
>> /**
>> * Converts the query string to a Collection of Lucene tokens.
>> **/
>> public class SpellingQueryConverter extends QueryConverter  {
>>
>> /**
>>  * Converts the original query string to a collection of Lucene Tokens.
>>  * @param original the original query string
>>  * @return a Collection of Lucene Tokens
>>  */
>> @Override
>> public Collection<Token> convert(String original) {
>>   if (original == null) {
>>     return Collections.emptyList();
>>   }
>>   Collection<Token> result = new ArrayList<Token>();
>>   Token token = new Token(original, 0, original.length(), "word");
>>   result.add(token);
>>   return result;
>> }
>>
>> }
>>
>> And added it to the classpath, and now it does what I expect.
>>
>> will
>>
>>
>> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>>
>>> It can be done, I did that with shingles, but it's not the way it's meant to
>>> be. The main problem with suggester is that we want compound words and we
>>> never get them. I try to get "internet explorer" but when i enter in the
>>> second word, "internet e" the suggester never finds "explorer".
>>>
>>> 2011/8/18 oberman_cs <[hidden email]>
>>>
>>>> I was trying to deal with the exact same issue, with the exact same
>>>> results.
>>>> Is there really no way to feed a phrase into the suggester (spellchecker)
>>>> without it splitting the input phrase into words?
>>>>
>>>> --
>>>> View this message in context:
>>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>
>>>
>>>
>>>
>>> --
>>>
>>> *Alexei Martchenko* | *CEO* | Superdownloads
>>> [hidden email] | [hidden email] | (11)
>>> 5083.1018/5080.3535/5080.3533

Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

Kuba Krzemien
Finally got it working - turns out you can't just add it to the lib dir as
the wiki suggests. Unfortunately the only way is adding it to solr.war.

Thanks for your help.

--------------------------------------------------
From: "William Oberman" <[hidden email]>
Sent: Friday, August 19, 2011 5:07 PM
To: <[hidden email]>
Subject: Re: suggester issues

> Hard to say, so I'll list the exact steps I took:
> -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
> -Untar and cd
> -ant
> -Wrote my class below (under a peer directory in apache-solr-3.3.0)
> -javac -cp
> ../dist/apache-solr-core-3.3.0.jar:../lucene/build/lucene-core-3.3-SNAPSHOT.jar
> com/civicscience/SpellingQueryConverter.java
> -jar cf cs.jar com
> -Unzipped solr.war (under example)
> -Added my cs.jar to lib (under web-inf)
> -Rezipped solr.war
> -Added: <queryConverter name="queryConverter"
> class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml
> -Restarted jetty
>
> And, that seemed to all work.
>
> will
>
> On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:
>
>> As far as I checked creating a custom query converter is the only way to
>> make this work.
>> Unfortunately I have some problems with running it - after creating a JAR
>> with my class (Im using your source code, obviously besides package and
>> class names) and throwing it into the lib dir I've added <queryConverter
>> name="queryConverter" class="mypackage.MySpellingQueryConverter"/> to
>> solrconfig.xml.
>>
>> I get a "SEVERE: org.apache.solr.common.SolrException: Error
>> Instantiating QueryConverter, mypackage.MySpellingQueryConverter is not a
>> org.apache.solr.spelling.QueryConverter".
>>
>> What am I doing wrong?
>>
>> --------------------------------------------------
>> From: "William Oberman" <[hidden email]>
>> Sent: Thursday, August 18, 2011 10:35 PM
>> To: <[hidden email]>
>> Subject: Re: suggester issues
>>
>>> I tried this:
>>> package com.civicscience;
>>>
>>> import java.util.ArrayList;
>>> import java.util.Collection;
>>> import java.util.Collections;
>>>
>>> import org.apache.lucene.analysis.Token;
>>> import org.apache.solr.spelling.QueryConverter;
>>>
>>> /**
>>> * Converts the query string to a Collection of Lucene tokens.
>>> **/
>>> public class SpellingQueryConverter extends QueryConverter  {
>>>
>>> /**
>>>  * Converts the original query string to a collection of Lucene Tokens.
>>>  * @param original the original query string
>>>  * @return a Collection of Lucene Tokens
>>>  */
>>> @Override
>>> public Collection<Token> convert(String original) {
>>>   if (original == null) {
>>>     return Collections.emptyList();
>>>   }
>>>   Collection<Token> result = new ArrayList<Token>();
>>>   Token token = new Token(original, 0, original.length(), "word");
>>>   result.add(token);
>>>   return result;
>>> }
>>>
>>> }
>>>
>>> And added it to the classpath, and now it does what I expect.
>>>
>>> will
>>>
>>>
>>> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>>>
>>>> It can be done, I did that with shingles, but it's not the way it's
>>>> meant to
>>>> be. The main problem with suggester is that we want compound words and
>>>> we
>>>> never get them. I try to get "internet explorer" but when i enter in
>>>> the
>>>> second word, "internet e" the suggester never finds "explorer".
>>>>
>>>> 2011/8/18 oberman_cs <[hidden email]>
>>>>
>>>>> I was trying to deal with the exact same issue, with the exact same
>>>>> results.
>>>>> Is there really no way to feed a phrase into the suggester
>>>>> (spellchecker)
>>>>> without it splitting the input phrase into words?
>>>>>
>>>>> --
>>>>> View this message in context:
>>>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>> *Alexei Martchenko* | *CEO* | Superdownloads
>>>> [hidden email] | [hidden email] | (11)
>>>> 5083.1018/5080.3535/5080.3533
>
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

oberman_cs


Sent from my iPhone

On Aug 21, 2011, at 5:54 AM, "Kuba Krzemien" <[hidden email]> wrote:

> Finally got it working - turns out you can't just add it to the lib  
> dir as the wiki suggests. Unfortunately the only way is adding it to  
> solr.war.
>
> Thanks for your help.
>
> --------------------------------------------------
> From: "William Oberman" <[hidden email]>
> Sent: Friday, August 19, 2011 5:07 PM
> To: <[hidden email]>
> Subject: Re: suggester issues
>
>> Hard to say, so I'll list the exact steps I took:
>> -Downloaded apache-solr-3.3.0 (I like to stick with releases vs. svn)
>> -Untar and cd
>> -ant
>> -Wrote my class below (under a peer directory in apache-solr-3.3.0)
>> -javac -cp ../dist/apache-solr-core-3.3.0.jar:../lucene/build/
>> lucene-core-3.3-SNAPSHOT.jar com/civicscience/
>> SpellingQueryConverter.java
>> -jar cf cs.jar com
>> -Unzipped solr.war (under example)
>> -Added my cs.jar to lib (under web-inf)
>> -Rezipped solr.war
>> -Added: <queryConverter name="queryConverter"  
>> class="com.civicscience.SpellingQueryConverter"/> to solrconfig.xml
>> -Restarted jetty
>>
>> And, that seemed to all work.
>>
>> will
>>
>> On Aug 19, 2011, at 10:44 AM, Kuba Krzemien wrote:
>>
>>> As far as I checked creating a custom query converter is the only  
>>> way to make this work.
>>> Unfortunately I have some problems with running it - after  
>>> creating a JAR with my class (Im using your source code, obviously  
>>> besides package and class names) and throwing it into the lib dir  
>>> I've added <queryConverter name="queryConverter"  
>>> class="mypackage.MySpellingQueryConverter"/> to solrconfig.xml.
>>>
>>> I get a "SEVERE: org.apache.solr.common.SolrException: Error  
>>> Instantiating QueryConverter, mypackage.MySpellingQueryConverter  
>>> is not a org.apache.solr.spelling.QueryConverter".
>>>
>>> What am I doing wrong?
>>>
>>> --------------------------------------------------
>>> From: "William Oberman" <[hidden email]>
>>> Sent: Thursday, August 18, 2011 10:35 PM
>>> To: <[hidden email]>
>>> Subject: Re: suggester issues
>>>
>>>> I tried this:
>>>> package com.civicscience;
>>>>
>>>> import java.util.ArrayList;
>>>> import java.util.Collection;
>>>> import java.util.Collections;
>>>>
>>>> import org.apache.lucene.analysis.Token;
>>>> import org.apache.solr.spelling.QueryConverter;
>>>>
>>>> /**
>>>> * Converts the query string to a Collection of Lucene tokens.
>>>> **/
>>>> public class SpellingQueryConverter extends QueryConverter  {
>>>>
>>>> /**
>>>> * Converts the original query string to a collection of Lucene  
>>>> Tokens.
>>>> * @param original the original query string
>>>> * @return a Collection of Lucene Tokens
>>>> */
>>>> @Override
>>>> public Collection<Token> convert(String original) {
>>>>  if (original == null) {
>>>>    return Collections.emptyList();
>>>>  }
>>>>  Collection<Token> result = new ArrayList<Token>();
>>>>  Token token = new Token(original, 0, original.length(), "word");
>>>>  result.add(token);
>>>>  return result;
>>>> }
>>>>
>>>> }
>>>>
>>>> And added it to the classpath, and now it does what I expect.
>>>>
>>>> will
>>>>
>>>>
>>>> On Aug 18, 2011, at 2:33 PM, Alexei Martchenko wrote:
>>>>
>>>>> It can be done, I did that with shingles, but it's not the way  
>>>>> it's meant to
>>>>> be. The main problem with suggester is that we want compound  
>>>>> words and we
>>>>> never get them. I try to get "internet explorer" but when i  
>>>>> enter in the
>>>>> second word, "internet e" the suggester never finds "explorer".
>>>>>
>>>>> 2011/8/18 oberman_cs <[hidden email]>
>>>>>
>>>>>> I was trying to deal with the exact same issue, with the exact  
>>>>>> same
>>>>>> results.
>>>>>> Is there really no way to feed a phrase into the suggester  
>>>>>> (spellchecker)
>>>>>> without it splitting the input phrase into words?
>>>>>>
>>>>>> --
>>>>>> View this message in context:
>>>>>> http://lucene.472066.n3.nabble.com/suggester-issues-tp3262718p3265803.html
>>>>>> Sent from the Solr - User mailing list archive at Nabble.com.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> *Alexei Martchenko* | *CEO* | Superdownloads
>>>>> [hidden email] | [hidden email] | (11)
>>>>> 5083.1018/5080.3535/5080.3533
Reply | Threaded
Open this post in threaded view
|

Re: suggester issues

aniljayanti
Hi,

 I m also facing same issue while using suggester (working in c#.net).
Below is my configurations.

suggest/?q="michael ja"
-----------------------
<fieldType name="edgytext" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
        <analyzer type="index">
          <tokenizer class="solr.KeywordTokenizerFactory" />
          <filter class="solr.LowerCaseFilterFactory" />
          <filter class="solr.EdgeNGramFilterFactory" minGramSize="1" maxGramSize="15" side="front" />
          <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
   </analyzer>
   <analyzer type="query">
     <tokenizer class="solr.KeywordTokenizerFactory" /> 
         <filter class="solr.LowerCaseFilterFactory" />     
   </analyzer>
  </fieldType>

<field name="empname" type="edgytext" indexed="true" stored="true"
omitNorms="true" omitTermFreqAndPositions="true" />

<field name="autocomplete_text" type="edgytext" indexed="true" stored="false"  multiValued="true" omitNorms="true" omitTermFreqAndPositions="false" />

<copyField source="empname" dest="autocomplete_text"/>

Response :

 <?xml version="1.0" encoding="UTF-8" ?>
- <response>
- <lst name="responseHeader">
  <int name="status">0</int> 
  <int name="QTime">1</int> 
  </lst>
  <result name="response" numFound="0" start="0" /> 
- <lst name="spellcheck">
- <lst name="suggestions">
- <lst name="michael">
  <int name="numFound">10</int> 
  <int name="startOffset">1</int> 
  <int name="endOffset">8</int> 
- <arr name="suggestion">
  <str>michael "bully" herbig</str> 
  <str>michael bolton</str> 
  <str>michael bolton: arias</str> 
  <str>michael falch</str> 
  <str>michael holm</str> 
  <str>michael jackson</str> 
  <str>michael neale</str> 
  <str>michael penn</str> 
  <str>michael salgado</str> 
  <str>michael w. smith</str> 
  </arr>
  </lst>
- <lst name="ja">
  <int name="numFound">10</int> 
  <int name="startOffset">9</int> 
  <int name="endOffset">11</int> 
- <arr name="suggestion">
  <str>ja me tanssimme</str> 
  <str>jacob andersen</str> 
  <str>jacob haugaard</str> 
  <str>jagged edge</str> 
  <str>jaguares</str> 
  <str>jamiroquai</str> 
  <str>jamppa tuominen</str> 
  <str>jane olivor</str> 
  <str>janis joplin</str> 
  <str>janne tulkki</str> 
  </arr>
  </lst>
  <str name="collation">"michael "bully" herbig ja me tanssimme"</str> 
  </lst>
  </lst>
  </response>

Please Help,

AnilHayanti