Inverted letters

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Inverted letters

Ulrich Vachon-2
Hi all,
 
It's possible to use simplely (without java preprocessing, if possible)
Lucene to find items with this constraints:
 
I have indexed this word : clamoxyle
I want to find it with this queries : claomxyle, clamoxile, camoxyle.
 
It is possible?
 
Thank you,
Ulrich.
Reply | Threaded
Open this post in threaded view
|

Re: Inverted letters

Erick Erickson
You should probably think about synonym analyzers, both at index
time and query time. Because I think you have a problem here....

Let's say you can do what you ask, at query time transform
any of your three options into "clamoxyle". Would it really
be satisfactory to your users to then NOT get any matches on
documents containing "claomxyle" or "clamoxile"? It seems to
me that this would be unsatisfactory.

Erick

On Feb 12, 2008 7:56 AM, Ulrich Vachon <[hidden email]> wrote:

> Hi all,
>
> It's possible to use simplely (without java preprocessing, if possible)
> Lucene to find items with this constraints:
>
> I have indexed this word : clamoxyle
> I want to find it with this queries : claomxyle, clamoxile, camoxyle.
>
> It is possible?
>
> Thank you,
> Ulrich.
>
Reply | Threaded
Open this post in threaded view
|

Re: Inverted letters

Patrek
In reply to this post by Ulrich Vachon-2
Did you take a look at the
org.apache.lucene.analysis.ngram.NGramTokenFilter? Or other ngram
implementation? Works great for us.

Patrick

Ulrich Vachon wrote:

> Hi all,
>  
> It's possible to use simplely (without java preprocessing, if possible)
> Lucene to find items with this constraints:
>  
> I have indexed this word : clamoxyle
> I want to find it with this queries : claomxyle, clamoxile, camoxyle.
>  
> It is possible?
>  
> Thank you,
> Ulrich.
>
>  


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Inverted letters

Ulrich Vachon-2
In reply to this post by Ulrich Vachon-2
Thank you for your responses.

But i'm very surprised to see the FuzzyQuery used in my junit test work at 100%. I must only to determine if this simple algo work with full data.

<code>
      RAMDirectory directory = new RAMDirectory();
      IndexWriter writer = new IndexWriter(directory, new MyAnalyzer(), true);
      addDoc("aspegic", writer);
      addDoc("clamoxyl", writer);

      writer.optimize();
      writer.close();
      IndexSearcher searcher = new IndexSearcher(directory);

      Query query = new FuzzyQuery(new Term("field", "claomxyle"), 0.6f);
      hits = searcher.search(query);
      assertEquals(1, hits.length());
      query = new FuzzyQuery(new Term("field", "clamoxile"), 0.6f);
      hits = searcher.search(query);
      assertEquals(1, hits.length());
      query = new FuzzyQuery(new Term("field", "camoxyle"), 0.60f);
      hits = searcher.search(query);
      assertEquals(1, hits.length());
      query = new FuzzyQuery(new Term("field", "clamoxil"), 0.60f);
      hits = searcher.search(query);
      assertEquals(1, hits.length());
</code>

Regards,
Ulrich/


-----Message d'origine-----
De : Erick Erickson [mailto:[hidden email]]
Envoyé : mardi 12 février 2008 15:58
À : [hidden email]
Objet : Re: Inverted letters

You should probably think about synonym analyzers, both at index time and query time. Because I think you have a problem here....

Let's say you can do what you ask, at query time transform any of your three options into "clamoxyle". Would it really be satisfactory to your users to then NOT get any matches on documents containing "claomxyle" or "clamoxile"? It seems to me that this would be unsatisfactory.

Erick

On Feb 12, 2008 7:56 AM, Ulrich Vachon <[hidden email]> wrote:

> Hi all,
>
> It's possible to use simplely (without java preprocessing, if
> possible) Lucene to find items with this constraints:
>
> I have indexed this word : clamoxyle
> I want to find it with this queries : claomxyle, clamoxile, camoxyle.
>
> It is possible?
>
> Thank you,
> Ulrich.
>


______________________________________________________________________
Cet e-mail a été scanné par MessageLabs Email Security System.
Pour plus d'informations, visitez http://www.messagelabs.com/email ______________________________________________________________________

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]