Problem with fuzzy search and accentuation

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Problem with fuzzy search and accentuation

Monique Monteiro
Hi all,

I'm having a problem when I search for a word with some non-ASCII
characters in combination with fuzzy search.

For example, if I type 'administração' or 'contratação' (both words end
with 'ção'), the search results are returned correctly.  However, if I type
'administração~', no result is returned.  For other terms, I haven't found
any problem.

My Solr version is  6.6.3.

Has anyone any idea about what may cause this issue?

Thanks in advance.

--
Monique Monteiro
Twitter: http://twitter.com/monilouise
Reply | Threaded
Open this post in threaded view
|

Re: Problem with fuzzy search and accentuation

Erick Erickson
What does adding &debug=query show you the parsed query is in the two cases?

My guess is that accent folding is kicking in one case but not the
other, but that's
a blind guess.



On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
<[hidden email]> wrote:

> Hi all,
>
> I'm having a problem when I search for a word with some non-ASCII
> characters in combination with fuzzy search.
>
> For example, if I type 'administração' or 'contratação' (both words end
> with 'ção'), the search results are returned correctly.  However, if I type
> 'administração~', no result is returned.  For other terms, I haven't found
> any problem.
>
> My Solr version is  6.6.3.
>
> Has anyone any idea about what may cause this issue?
>
> Thanks in advance.
>
> --
> Monique Monteiro
> Twitter: http://twitter.com/monilouise
Reply | Threaded
Open this post in threaded view
|

Re: Problem with fuzzy search and accentuation

Monique Monteiro
By adding debug=true, I get the following:


   - administração (correct result):

"debug":{
    "rawquerystring":"administração",
    "querystring":"administração",
    "parsedquery":"text:administr",
    "parsedquery_toString":"text:administr",
    "QParser":"LuceneQParser"}}


   - administração~ (incorrect behaviour, no results):

"debug":{
    "rawquerystring":"administração~",
    "querystring":"administração~",
    "parsedquery":"text:administração~2",
    "parsedquery_toString":"text:administração~2",
    "QParser":"LuceneQParser"}}


   - tribunal (correct result):

"debug":{
    "rawquerystring":"tribunal",
    "querystring":"tribunal",
    "parsedquery":"text:tribunal",
    "parsedquery_toString":"text:tribunal",
    "QParser":"LuceneQParser"}}


   - tribubal (correct result, no accents):

 "debug":{
    "rawquerystring":"tribubal~",
    "querystring":"tribubal~",
    "parsedquery":"text:tribubal~2",
    "parsedquery_toString":"text:tribubal~2",
    "QParser":"LuceneQParser"}}

On Fri, Aug 3, 2018 at 3:26 PM Erick Erickson <[hidden email]>
wrote:

> What does adding &debug=query show you the parsed query is in the two
> cases?
>
> My guess is that accent folding is kicking in one case but not the
> other, but that's
> a blind guess.
>
>
>
> On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
> <[hidden email]> wrote:
> > Hi all,
> >
> > I'm having a problem when I search for a word with some non-ASCII
> > characters in combination with fuzzy search.
> >
> > For example, if I type 'administração' or 'contratação' (both words end
> > with 'ção'), the search results are returned correctly.  However, if I
> type
> > 'administração~', no result is returned.  For other terms, I haven't
> found
> > any problem.
> >
> > My Solr version is  6.6.3.
> >
> > Has anyone any idea about what may cause this issue?
> >
> > Thanks in advance.
> >
> > --
> > Monique Monteiro
> > Twitter: http://twitter.com/monilouise
>


--
Monique Monteiro
Twitter: http://twitter.com/monilouise
Reply | Threaded
Open this post in threaded view
|

Re: Problem with fuzzy search and accentuation

Erick Erickson
Stemming is getting in the way here. You could probably use copyField
to a field that doesn't stem and fuzzy search against that field
rather than the stemmed one.

Best,
Erick

On Fri, Aug 3, 2018 at 11:31 AM, Monique Monteiro
<[hidden email]> wrote:

> By adding debug=true, I get the following:
>
>
>    - administração (correct result):
>
> "debug":{
>     "rawquerystring":"administração",
>     "querystring":"administração",
>     "parsedquery":"text:administr",
>     "parsedquery_toString":"text:administr",
>     "QParser":"LuceneQParser"}}
>
>
>    - administração~ (incorrect behaviour, no results):
>
> "debug":{
>     "rawquerystring":"administração~",
>     "querystring":"administração~",
>     "parsedquery":"text:administração~2",
>     "parsedquery_toString":"text:administração~2",
>     "QParser":"LuceneQParser"}}
>
>
>    - tribunal (correct result):
>
> "debug":{
>     "rawquerystring":"tribunal",
>     "querystring":"tribunal",
>     "parsedquery":"text:tribunal",
>     "parsedquery_toString":"text:tribunal",
>     "QParser":"LuceneQParser"}}
>
>
>    - tribubal (correct result, no accents):
>
>  "debug":{
>     "rawquerystring":"tribubal~",
>     "querystring":"tribubal~",
>     "parsedquery":"text:tribubal~2",
>     "parsedquery_toString":"text:tribubal~2",
>     "QParser":"LuceneQParser"}}
>
> On Fri, Aug 3, 2018 at 3:26 PM Erick Erickson <[hidden email]>
> wrote:
>
>> What does adding &debug=query show you the parsed query is in the two
>> cases?
>>
>> My guess is that accent folding is kicking in one case but not the
>> other, but that's
>> a blind guess.
>>
>>
>>
>> On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
>> <[hidden email]> wrote:
>> > Hi all,
>> >
>> > I'm having a problem when I search for a word with some non-ASCII
>> > characters in combination with fuzzy search.
>> >
>> > For example, if I type 'administração' or 'contratação' (both words end
>> > with 'ção'), the search results are returned correctly.  However, if I
>> type
>> > 'administração~', no result is returned.  For other terms, I haven't
>> found
>> > any problem.
>> >
>> > My Solr version is  6.6.3.
>> >
>> > Has anyone any idea about what may cause this issue?
>> >
>> > Thanks in advance.
>> >
>> > --
>> > Monique Monteiro
>> > Twitter: http://twitter.com/monilouise
>>
>
>
> --
> Monique Monteiro
> Twitter: http://twitter.com/monilouise
Reply | Threaded
Open this post in threaded view
|

Re: Problem with fuzzy search and accentuation

Monique Monteiro
Hi Erick,

In fact, stemming was the culprit for the problem.

Thanks!
Monique Monteiro

On Fri, Aug 3, 2018 at 3:45 PM Erick Erickson <[hidden email]>
wrote:

> Stemming is getting in the way here. You could probably use copyField
> to a field that doesn't stem and fuzzy search against that field
> rather than the stemmed one.
>
> Best,
> Erick
>
> On Fri, Aug 3, 2018 at 11:31 AM, Monique Monteiro
> <[hidden email]> wrote:
> > By adding debug=true, I get the following:
> >
> >
> >    - administração (correct result):
> >
> > "debug":{
> >     "rawquerystring":"administração",
> >     "querystring":"administração",
> >     "parsedquery":"text:administr",
> >     "parsedquery_toString":"text:administr",
> >     "QParser":"LuceneQParser"}}
> >
> >
> >    - administração~ (incorrect behaviour, no results):
> >
> > "debug":{
> >     "rawquerystring":"administração~",
> >     "querystring":"administração~",
> >     "parsedquery":"text:administração~2",
> >     "parsedquery_toString":"text:administração~2",
> >     "QParser":"LuceneQParser"}}
> >
> >
> >    - tribunal (correct result):
> >
> > "debug":{
> >     "rawquerystring":"tribunal",
> >     "querystring":"tribunal",
> >     "parsedquery":"text:tribunal",
> >     "parsedquery_toString":"text:tribunal",
> >     "QParser":"LuceneQParser"}}
> >
> >
> >    - tribubal (correct result, no accents):
> >
> >  "debug":{
> >     "rawquerystring":"tribubal~",
> >     "querystring":"tribubal~",
> >     "parsedquery":"text:tribubal~2",
> >     "parsedquery_toString":"text:tribubal~2",
> >     "QParser":"LuceneQParser"}}
> >
> > On Fri, Aug 3, 2018 at 3:26 PM Erick Erickson <[hidden email]>
> > wrote:
> >
> >> What does adding &debug=query show you the parsed query is in the two
> >> cases?
> >>
> >> My guess is that accent folding is kicking in one case but not the
> >> other, but that's
> >> a blind guess.
> >>
> >>
> >>
> >> On Fri, Aug 3, 2018 at 11:19 AM, Monique Monteiro
> >> <[hidden email]> wrote:
> >> > Hi all,
> >> >
> >> > I'm having a problem when I search for a word with some non-ASCII
> >> > characters in combination with fuzzy search.
> >> >
> >> > For example, if I type 'administração' or 'contratação' (both words
> end
> >> > with 'ção'), the search results are returned correctly.  However, if I
> >> type
> >> > 'administração~', no result is returned.  For other terms, I haven't
> >> found
> >> > any problem.
> >> >
> >> > My Solr version is  6.6.3.
> >> >
> >> > Has anyone any idea about what may cause this issue?
> >> >
> >> > Thanks in advance.
> >> >
> >> > --
> >> > Monique Monteiro
> >> > Twitter: http://twitter.com/monilouise
> >>
> >
> >
> > --
> > Monique Monteiro
> > Twitter: http://twitter.com/monilouise
>


--
Monique Monteiro
Twitter: http://twitter.com/monilouise