Error tolerant text search with Lucene?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Error tolerant text search with Lucene?

Marjan Celikik
Hi everyone,

I know that there are packages that support the "Did you mean ... ?"
search features with lucene which tries to find the most suited
correct-word query.. however, so far I haven't encountered the opposite
search feature: given a correct query, find all documents which contain
misspellings of the query.. are you guys aware of anything like this
with lucene?

Thanks!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Error tolerant text search with Lucene?

Mathieu Lecarme
Marjan Celikik a écrit :
> Hi everyone,
>
> I know that there are packages that support the "Did you mean ... ?"
> search features with lucene which tries to find the most suited
> correct-word query.. however, so far I haven't encountered the opposite
> search feature: given a correct query, find all documents which contain
> misspellings of the query.. are you guys aware of anything like this
> with lucene?
You have to iterate over your query, if it's a BooleanQuery, keep it, if
it's a TermQuery, replace it with a BooleanQuery with all variants of
the Term with Occur.SHOULD

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Error tolerant text search with Lucene?

Marjan Celikik
Mathieu Lecarme wrote:
> You have to iterate over your query, if it's a BooleanQuery, keep it,
> if it's a TermQuery, replace it with a BooleanQuery with all variants
> of the Term with Occur.SHOULD
>
> M.
>

Thanks.. however I don't fully understand what do you mean by "iterate
over your query". I would like a conceptual answer how is this done with
Lucene, not a technical one..

Thanks again.

Marjan.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Error tolerant text search with Lucene?

Mathieu Lecarme
Marjan Celikik a écrit :

> Mathieu Lecarme wrote:
>> You have to iterate over your query, if it's a BooleanQuery, keep it,
>> if it's a TermQuery, replace it with a BooleanQuery with all variants
>> of the Term with Occur.SHOULD
>>
>> M.
>>
>
> Thanks.. however I don't fully understand what do you mean by "iterate
> over your query". I would like a conceptual answer how is this done
> with Lucene, not a technical one..
Your query is a tree, with BooleanQuery as branch and other query as
leaf. If you wont to transforma query in "tolerant query", you have to
change Term query (the leaf), with a "OR"  branch with variant term as leaf.

To find variant of a term, you have to used a list of your Term and
apply a filter to its to group them. Common filter for that are
stemming, ngram+levenstein distance, phonetic ...

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Error tolerant text search with Lucene?

Marjan Celikik
Mathieu Lecarme wrote:

>
>> wever I don't fully understand what do you mean by "iterate over your
>> query". I would like a conceptual answer how is this done with
>> Lucene, not a technical one..
> Your query is a tree, with BooleanQuery as branch and other query as
> leaf. If you wont to transforma query in "tolerant query", you have to
> change Term query (the leaf), with a "OR"  branch with variant term as
> leaf.
>
> To find variant of a term, you have to used a list of your Term and
> apply a filter to its to group them. Common filter for that are
> stemming, ngram+levenstein distance, phonetic ...
>
> M.
>
OK, now it's more clear.. my final question is when is this filter
information incorporated.. at index time or at search time? i.e. I want
to know whether the levenshtein distance is computed at query time or
this information is precomputed in the index?

Marjan.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Error tolerant text search with Lucene?

Mathieu Lecarme
Marjan Celikik a écrit :

> Mathieu Lecarme wrote:
>>
>>> wever I don't fully understand what do you mean by "iterate over
>>> your query". I would like a conceptual answer how is this done with
>>> Lucene, not a technical one..
>> Your query is a tree, with BooleanQuery as branch and other query as
>> leaf. If you wont to transforma query in "tolerant query", you have
>> to change Term query (the leaf), with a "OR"  branch with variant
>> term as leaf.
>>
>> To find variant of a term, you have to used a list of your Term and
>> apply a filter to its to group them. Common filter for that are
>> stemming, ngram+levenstein distance, phonetic ...
>>
>> M.
>>
> OK, now it's more clear.. my final question is when is this filter
> information incorporated.. at index time or at search time?
both. You've got two index, one for your data, one for your Term. The
second (dictionnary, lexicon ...) uses one Document per Term, and n
Field for informations like ngram or phonetic. When you search a near
word, you build data from the word, build a request with this data, and
sort result with levenstein distance. You've got an ordered list of
suggestion.
> i.e. I want to know whether the levenshtein distance is computed at
> query time or this information is precomputed in the index?
First lucene select candidate, after you pick the best from this list.
Levenstein distance is only apply is only apply on few words.

M.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]