Spelling

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Spelling

mgkimsal
I've looked through the archives but don't see any specific issue relating
to my question.

Is there a way to have SOLR return partial matches - words that are one (or
two or X) letters off the matching word?  A search for 'field' would also
return 'feeld' (typo) for example.


--
Michael Kimsal
http://webdevradio.com
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Erik Hatcher

On Feb 5, 2007, at 3:21 PM, Michael Kimsal wrote:
> I've looked through the archives but don't see any specific issue  
> relating
> to my question.
>
> Is there a way to have SOLR return partial matches - words that are  
> one (or
> two or X) letters off the matching word?  A search for 'field'  
> would also
> return 'feeld' (typo) for example.

Solr supports the Lucene QueryParser syntax (plus some).

        <http://lucene.apache.org/java/docs/queryparsersyntax.html>

Try searching for feeld~, for example.

        Erik


Reply | Threaded
Open this post in threaded view
|

Re: Spelling

mgkimsal
Thanks Erik.  That worked, then threw me for another loop, which I sort of
have fixed I think.

I'm using the highligher functionality, but it doesn't seem to highlight the
'matched' word if it's a partial match, although it does in fact return that
record.  Am I missing something obvious here, or is highlighting of partial
matches not supported?

Thanks again!

On 2/5/07, Erik Hatcher <[hidden email]> wrote:

>
>
> On Feb 5, 2007, at 3:21 PM, Michael Kimsal wrote:
> > I've looked through the archives but don't see any specific issue
> > relating
> > to my question.
> >
> > Is there a way to have SOLR return partial matches - words that are
> > one (or
> > two or X) letters off the matching word?  A search for 'field'
> > would also
> > return 'feeld' (typo) for example.
>
> Solr supports the Lucene QueryParser syntax (plus some).
>
>         <http://lucene.apache.org/java/docs/queryparsersyntax.html>
>
> Try searching for feeld~, for example.
>
>         Erik
>
>
>


--
Michael Kimsal
http://webdevradio.com
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Karl Wettin

6 feb 2007 kl. 04.19 skrev Michael Kimsal:

> Thanks Erik.  That worked, then threw me for another loop, which I  
> sort of
> have fixed I think.
>
> I'm using the highligher functionality, but it doesn't seem to  
> highlight the
> 'matched' word if it's a partial match, although it does in fact  
> return that
> record.  Am I missing something obvious here, or is highlighting of  
> partial
> matches not supported?

You need to rewrite the query. See Query.rewrite.

(I think that's it.)


But,

fuzzy queries are sort of slow, at least compared to many other things.
Depending on your server load and corpus size, perhaps I would  
recommend you
using some sort of "did you mean"- functionallity rather than fuzzy  
queries.


--
karl
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Erik Hatcher
Doesn't the built-in Solr Highlighting feature do the rewrite?  If  
not, it should.  I looked into this once and I believe it does have  
this particular bug, but I also vaguely recall it not being  
straightforward to rewrite the query at that point in the code.

        Erik


On Feb 6, 2007, at 8:31 AM, karl wettin wrote:

>
> 6 feb 2007 kl. 04.19 skrev Michael Kimsal:
>
>> Thanks Erik.  That worked, then threw me for another loop, which I  
>> sort of
>> have fixed I think.
>>
>> I'm using the highligher functionality, but it doesn't seem to  
>> highlight the
>> 'matched' word if it's a partial match, although it does in fact  
>> return that
>> record.  Am I missing something obvious here, or is highlighting  
>> of partial
>> matches not supported?
>
> You need to rewrite the query. See Query.rewrite.
>
> (I think that's it.)
>
>
> But,
>
> fuzzy queries are sort of slow, at least compared to many other  
> things.
> Depending on your server load and corpus size, perhaps I would  
> recommend you
> using some sort of "did you mean"- functionallity rather than fuzzy  
> queries.
>
>
> --
> karl

Reply | Threaded
Open this post in threaded view
|

Re: Spelling

mgkimsal
In reply to this post by Karl Wettin
This isn't something I use that approach on.  Let me explain.

I work in a call center, and I'm doing a search for specific key word in
customer notes every night.

For example, we might need a report of which customers called up about
"apple", "banana" or "pear".
I have a script which generates a report for the required key words, and the
report is mailed to
the appropriate staff for review/action.  The highlighting comes in to help
them quickly locate the problem
words.  But not being able to highlight the misspellings ("bannana",
"peaar", etc.) means that they
may overlook the particular entries when reviewing the email.

When you say rewrite the query, what specifically do you mean?  I'm googling
(direct and on the solr site)
for query.rewrite, but nothing is jumping out at me as anything that's
useful/pertinent.  It sounds like
you're telling me to do some manipulation on the query first, but I'm
currently
just passing queries as part of the GET string in an HTTP request (this was
my main
attraction to SOLR in the first place)  Is there a way to trigger the
'rewrite' functionality via
another GET parameter?

Thanks all!

On 2/6/07, karl wettin <[hidden email]> wrote:

>
>
> 6 feb 2007 kl. 04.19 skrev Michael Kimsal:
>
> > Thanks Erik.  That worked, then threw me for another loop, which I
> > sort of
> > have fixed I think.
> >
> > I'm using the highligher functionality, but it doesn't seem to
> > highlight the
> > 'matched' word if it's a partial match, although it does in fact
> > return that
> > record.  Am I missing something obvious here, or is highlighting of
> > partial
> > matches not supported?
>
> You need to rewrite the query. See Query.rewrite.
>
> (I think that's it.)
>
>
> But,
>
> fuzzy queries are sort of slow, at least compared to many other things.
> Depending on your server load and corpus size, perhaps I would
> recommend you
> using some sort of "did you mean"- functionallity rather than fuzzy
> queries.
>
>
> --
> karl
>



--
Michael Kimsal
http://webdevradio.com
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Mike Klaas
On 2/6/07, Michael Kimsal <[hidden email]> wrote:

> When you say rewrite the query, what specifically do you mean?  I'm googling
> (direct and on the solr site)
> for query.rewrite, but nothing is jumping out at me as anything that's
> useful/pertinent.  It sounds like
> you're telling me to do some manipulation on the query first, but I'm
> currently
> just passing queries as part of the GET string in an HTTP request (this was
> my main
> attraction to SOLR in the first place)  Is there a way to trigger the
> 'rewrite' functionality via
> another GET parameter?

Not currently.  However, you could try applying the following patch (untested):

Index: solr/request/StandardRequestHandler.java
===================================================================
--- solr/request/StandardRequestHandler.java    (revision 503874)
+++ solr/request/StandardRequestHandler.java    (working copy)
@@ -138,9 +138,9 @@
         SolrException.logOnce(SolrCore.log, "Exception during debug", e);
         rsp.add("exception_during_debug", SolrException.toStr(e));
       }
-
+
       NamedList sumData = HighlightingUtils.doHighlighting(
-        results.docList, query, req, new String[]{defaultField});
+        results.docList,
query.rewrite(req.getSearcher().getReader()), req, new
String[]{defaultField});
       if(sumData != null)
         rsp.add("highlighting", sumData);
   }

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

mgkimsal
Hello Solr friends:

Mr. Klaas - I've not tested your patch yet (will try to get to it soon) but
I've found almost the opposite problem now and people are questioning
how/why things are happening as they are.

I'm searching for the word "illegal" and the query results are coming back
with an entry that has "illegible" in it.  "illegible" is highlighted as
well.  I'm strictly searching for "illegal" - no modifiers (well, a + to
indicate I have to have it, but no ~ modifier).

I'm a newb to all this, so please bear with me.  I'm using the standard
'text' field schema definition in the default 'schema.xml' to index this
field data.  Does that account for partial and/or soundalike matches by
default?

Thanks!

On 2/6/07, Mike Klaas <[hidden email]> wrote:

>
> On 2/6/07, Michael Kimsal <[hidden email]> wrote:
>
> > When you say rewrite the query, what specifically do you mean?  I'm
> googling
> > (direct and on the solr site)
> > for query.rewrite, but nothing is jumping out at me as anything that's
> > useful/pertinent.  It sounds like
> > you're telling me to do some manipulation on the query first, but I'm
> > currently
> > just passing queries as part of the GET string in an HTTP request (this
> was
> > my main
> > attraction to SOLR in the first place)  Is there a way to trigger the
> > 'rewrite' functionality via
> > another GET parameter?
>
> Not currently.  However, you could try applying the following patch
> (untested):
>
> Index: solr/request/StandardRequestHandler.java
> ===================================================================
> --- solr/request/StandardRequestHandler.java    (revision 503874)
> +++ solr/request/StandardRequestHandler.java    (working copy)
> @@ -138,9 +138,9 @@
>          SolrException.logOnce(SolrCore.log, "Exception during debug", e);
>          rsp.add("exception_during_debug", SolrException.toStr(e));
>        }
> -
> +
>        NamedList sumData = HighlightingUtils.doHighlighting(
> -        results.docList, query, req, new String[]{defaultField});
> +        results.docList,
> query.rewrite(req.getSearcher().getReader()), req, new
> String[]{defaultField});
>        if(sumData != null)
>          rsp.add("highlighting", sumData);
>    }
>
> -Mike
>



--
Michael Kimsal
http://webdevradio.com
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Mike Klaas
On 2/8/07, Michael Kimsal <[hidden email]> wrote:

> Hello Solr friends:
>
> Mr. Klaas - I've not tested your patch yet (will try to get to it soon) but
> I've found almost the opposite problem now and people are questioning
> how/why things are happening as they are.
>
> I'm searching for the word "illegal" and the query results are coming back
> with an entry that has "illegible" in it.  "illegible" is highlighted as
> well.  I'm strictly searching for "illegal" - no modifiers (well, a + to
> indicate I have to have it, but no ~ modifier).
>
> I'm a newb to all this, so please bear with me.  I'm using the standard
> 'text' field schema definition in the default 'schema.xml' to index this
> field data.  Does that account for partial and/or soundalike matches by
> default?

Yes, the default text field contains EnglishPorterFilterFactory, which
"stems" english words.  This normalizes pluralization and tense, but
can also result in confusion as you have noted.

The "default" fields in the example schema.xml should be treated as
examples and not as canonical field definitions.  If you decide to use
one or all for your own application, it is important to  understand
what they are going (the comments in the file as well as the analysis
screen in Solr adminui are good tools for this).

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Yonik Seeley-2
Adding to Mike's comments, for this specific query, one can see that
both words stem to "illeg":
http://localhost:8983/solr/select?q=illegible+illegal&debugQuery=on

You could fix this specific case with either configuring protected
words on the stemmer, or by using the synonym filter and mapping one
of the alternatives to something that won't be stemmed (but the former
is probably a better option).

More generally, some have noted that Lucene (and hence Solr) would
benefit from the option of a "weaker" stemmer.

-Yonik

On 2/8/07, Mike Klaas <[hidden email]> wrote:

> On 2/8/07, Michael Kimsal <[hidden email]> wrote:
> > Hello Solr friends:
> >
> > Mr. Klaas - I've not tested your patch yet (will try to get to it soon) but
> > I've found almost the opposite problem now and people are questioning
> > how/why things are happening as they are.
> >
> > I'm searching for the word "illegal" and the query results are coming back
> > with an entry that has "illegible" in it.  "illegible" is highlighted as
> > well.  I'm strictly searching for "illegal" - no modifiers (well, a + to
> > indicate I have to have it, but no ~ modifier).
> >
> > I'm a newb to all this, so please bear with me.  I'm using the standard
> > 'text' field schema definition in the default 'schema.xml' to index this
> > field data.  Does that account for partial and/or soundalike matches by
> > default?
>
> Yes, the default text field contains EnglishPorterFilterFactory, which
> "stems" english words.  This normalizes pluralization and tense, but
> can also result in confusion as you have noted.
>
> The "default" fields in the example schema.xml should be treated as
> examples and not as canonical field definitions.  If you decide to use
> one or all for your own application, it is important to  understand
> what they are going (the comments in the file as well as the analysis
> screen in Solr adminui are good tools for this).
>
> -Mike
>
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

Mike Klaas
On 2/8/07, Yonik Seeley <[hidden email]> wrote:

> You could fix this specific case with either configuring protected
> words on the stemmer, or by using the synonym filter and mapping one
> of the alternatives to something that won't be stemmed (but the former
> is probably a better option).
>
> More generally, some have noted that Lucene (and hence Solr) would
> benefit from the option of a "weaker" stemmer.

Any opinions on commenting out the stemmer in the default text field?
It might be less confusing to have a more intuitive example, while
easily showing the way to the more advanced analysis.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Spelling

mgkimsal
>
> Any opinions on commenting out the stemmer in the default text field?
> It might be less confusing to have a more intuitive example, while
> easily showing the way to the more advanced analysis.



I'm in favor of that.  I imagine there's others like me that want to get
started with the defaults first, and having them be more useful for
'average' use cases would be helpful, with comments on how to do advanced
stuff left in.

Thanks!


--
Michael Kimsal
http://webdevradio.com