Fuzzy query on capital letters does not match documents

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Fuzzy query on capital letters does not match documents

G.Long
Hi :)

In my lucene index, there are documents with a field title. values of
this field are indexed with a whitespace analyzer. When I search for
documents, I create a boolean query which includes fuzzy queries for the
title. The final query looks like: +tnc_title:portant~0.7
+tnc_title:création~0.7 +tnc_title:mention~0.7 +tnc_title:rugby~0.7
+tnc_title:XV~0.7

One of the documents in the index has all these words in its title but
the query does not return any results. If I remove the +tnc_title:XV~0.7
part, the document is found.

Is there any known issue with upper case letters and fuzzy queries?

Regards,

Gary



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy query on capital letters does not match documents

Jack Krupansky-2
Be careful with very short terms and fuzzy query. The rounding when
converting from a fraction to an edit distance can make the match exact
rather than fuzzy.

What terms does your index have? XV, Xv, xV, xv? XV~0.7 may only match XV.

-- Jack Krupansky

-----Original Message-----
From: G.Long
Sent: Thursday, February 27, 2014 12:15 PM
To: [hidden email]
Subject: Fuzzy query on capital letters does not match documents

Hi :)

In my lucene index, there are documents with a field title. values of
this field are indexed with a whitespace analyzer. When I search for
documents, I create a boolean query which includes fuzzy queries for the
title. The final query looks like: +tnc_title:portant~0.7
+tnc_title:création~0.7 +tnc_title:mention~0.7 +tnc_title:rugby~0.7
+tnc_title:XV~0.7

One of the documents in the index has all these words in its title but
the query does not return any results. If I remove the +tnc_title:XV~0.7
part, the document is found.

Is there any known issue with upper case letters and fuzzy queries?

Regards,

Gary



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Fuzzy query on capital letters does not match documents

G.Long
Rigth, my index has the term XV". I removed the " character and the
query worked.
Thanks for your help :)

Gary

Le 27/02/2014 18:36, Jack Krupansky a écrit :

> Be careful with very short terms and fuzzy query. The rounding when
> converting from a fraction to an edit distance can make the match
> exact rather than fuzzy.
>
> What terms does your index have? XV, Xv, xV, xv? XV~0.7 may only match
> XV.
>
> -- Jack Krupansky
>
> -----Original Message----- From: G.Long
> Sent: Thursday, February 27, 2014 12:15 PM
> To: [hidden email]
> Subject: Fuzzy query on capital letters does not match documents
>
> Hi :)
>
> In my lucene index, there are documents with a field title. values of
> this field are indexed with a whitespace analyzer. When I search for
> documents, I create a boolean query which includes fuzzy queries for the
> title. The final query looks like: +tnc_title:portant~0.7
> +tnc_title:création~0.7 +tnc_title:mention~0.7 +tnc_title:rugby~0.7
> +tnc_title:XV~0.7
>
> One of the documents in the index has all these words in its title but
> the query does not return any results. If I remove the +tnc_title:XV~0.7
> part, the document is found.
>
> Is there any known issue with upper case letters and fuzzy queries?
>
> Regards,
>
> Gary
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]