Is Fair Similarity working with lucene 2.2 ?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Is Fair Similarity working with lucene 2.2 ?

Fabrice Robini
Hi,

I've tried the "fair" similarity described here (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739) with lucene 2.2 but it does not seems to work.

I've attached the custom "MyFair" similarity to both IndexWriter and IndexSearcher.

Do you have any idea ?

Thanks a lot,

Fabrice
Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Daniel Naber-10
On Montag, 21. Januar 2008, Fabrice Robini wrote:

> I've tried the "fair" similarity described here
> (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739)
> with lucene 2.2 but it does not seems to work.

What exactly doesn't work, don't you see an effect? At least the scores
should change if you try with an artificial small document. Maybe you can
strip down your code to a small self-contained example and post it.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Fabrice Robini
Yes, I do not see an effect...

Here is my unit test that test it:

    public void testFairSimilarity() throws CorruptIndexException, IOException, ParseException
    {
        Directory theDirectory = new RAMDirectory();
        Analyzer theAnalyzer = new FrenchAnalyzer();
       
        IndexWriter theIndexWriter = new IndexWriter(theDirectory, theAnalyzer);
        theIndexWriter.setSimilarity(new FairSimilarity());
       
        Document doc1 = new Document();
        Field name1 = new Field(" NAME", "SHORT_SUITE", Field.Store.YES, Field.Index.UN_TOKENIZED);
        Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10", Field.Store.NO, Field.Index.TOKENIZED);
        doc1.add(name1);
        doc1.add(content1);        
        theIndexWriter.addDocument(doc1);
       
        Document doc2 = new Document();
        Field name2 = new Field(" NAME", "BIG_SUITE", Field.Store.YES, Field.Index.UN_TOKENIZED);
        Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
        doc1.add(name2);
        doc1.add(content2);        
        theIndexWriter.addDocument(doc2);
       
        Searcher searcher = new IndexSearcher(theDirectory);
        searcher.setSimilarity(new FairSimilarity());

        QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);

        Hits hits = searcher.search(queryParser.parse("x"));

        assertEquals(2, hits.length());
        assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
        assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
    }

Is there anything wrong ?
Thanks a lot,

Fabrice

Daniel Naber-10 wrote
On Montag, 21. Januar 2008, Fabrice Robini wrote:

> I've tried the "fair" similarity described here
> (http://www.nabble.com/a-%22fair%22-similarity-to5806739.html#a5806739)
> with lucene 2.2 but it does not seems to work.

What exactly doesn't work, don't you see an effect? At least the scores
should change if you try with an artificial small document. Maybe you can
strip down your code to a small self-contained example and post it.

Regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Srikant Jakilinki-3
Well, I cant seem to even get past the assertions of this code.

The first assertion is failing in that I get 0 hits. I am using
SimpleAnalyzer since I do not have a FrenchAnalyzer.

Any thoughts?
Srikant

Fabrice Robini wrote:
> Yes, I do not see an effect...
>
> Here is my unit test that test it:
>

----------------------------------------------------------------------
Get a free email account with anti spam protection.
http://www.bluebottle.com/tag/2


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Fabrice Robini
Oooops sorry, bad cut/paste...

Here is the right one :-)

    public void testFairSimilarity() throws CorruptIndexException, IOException, ParseException
    {
        Directory theDirectory = new RAMDirectory();
        Analyzer theAnalyzer = new StandardAnalyzer();
       
        IndexWriter theIndexWriter = new IndexWriter(theDirectory, theAnalyzer);
        theIndexWriter.setSimilarity(new FairSimilarity());
       
        Document doc1 = new Document();
        Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES, Field.Index.UN_TOKENIZED);
        Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10", Field.Store.NO, Field.Index.TOKENIZED);
        doc1.add(name1);
        doc1.add(content1);        
        theIndexWriter.addDocument(doc1);
       
        Document doc2 = new Document();
        Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES, Field.Index.UN_TOKENIZED);
        Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
        doc2.add(name2);
        doc2.add(content2);        
        theIndexWriter.addDocument(doc2);
       
        theIndexWriter.close();
       
        Searcher searcher = new IndexSearcher(theDirectory);
        searcher.setSimilarity(new FairSimilarity());

        QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);

        Hits hits = searcher.search(queryParser.parse("x"));

        assertEquals(2, hits.length());
        assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
        assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
    }
   


Srikant Jakilinki-3 wrote
Well, I cant seem to even get past the assertions of this code.

The first assertion is failing in that I get 0 hits. I am using
SimpleAnalyzer since I do not have a FrenchAnalyzer.

Any thoughts?
Srikant

Fabrice Robini wrote:
> Yes, I do not see an effect...
>
> Here is my unit test that test it:
>

----------------------------------------------------------------------
Get a free email account with anti spam protection.
http://www.bluebottle.com/tag/2


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Srikant Jakilinki-3
OK, got it to work. Thanks.

By a quick scoring comparision, I got the same scores for both hits.
Maybe there is a loss of precision somewhere. Or when scores are equal,
Lucene is doing something unintended/overlooked and thus putting shorter
documents higher as the experiment is a special case where the TF of a
queried term (for both suites, the TF of x = 10%) is equal which is very
rarely. Or maybe the IDF factor is kicking in in some strange way
although it shouldnt. There are a number of varied reasons, but for the
naked eye, there isnt much.

However, that said, length normalization is not a science but an art and
the simple scheme we have here in the FairSimilarity will not perform
always as expected in real world scenarios. Maybe I am missing something
or have forgot my basics but that is not to say your observation is trivial.

Rather, the contrary. Hope there will be more activity on this topic
because it is an issue of computing relevance which is the core of search.

Cheers,
Srikant

Fabrice Robini wrote:

> Oooops sorry, bad cut/paste...
>
> Here is the right one :-)
>
>     public void testFairSimilarity() throws CorruptIndexException,
> IOException, ParseException
>     {
>         Directory theDirectory = new RAMDirectory();
>         Analyzer theAnalyzer = new StandardAnalyzer();
>        
>         IndexWriter theIndexWriter = new IndexWriter(theDirectory,
> theAnalyzer);
>         theIndexWriter.setSimilarity(new FairSimilarity());
>        
>         Document doc1 = new Document();
>         Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>         Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
> Field.Store.NO, Field.Index.TOKENIZED);
>         doc1.add(name1);
>         doc1.add(content1);        
>         theIndexWriter.addDocument(doc1);
>        
>         Document doc2 = new Document();
>         Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>         Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
>         doc2.add(name2);
>         doc2.add(content2);        
>         theIndexWriter.addDocument(doc2);
>        
>         theIndexWriter.close();
>        
>         Searcher searcher = new IndexSearcher(theDirectory);
>         searcher.setSimilarity(new FairSimilarity());
>
>         QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);
>
>         Hits hits = searcher.search(queryParser.parse("x"));
>
>         assertEquals(2, hits.length());
>         assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
>         assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
>     }
>    
>
>
>
> Srikant Jakilinki-3 wrote:
>  
>> Well, I cant seem to even get past the assertions of this code.
>>
>> The first assertion is failing in that I get 0 hits. I am using
>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>
>> Any thoughts?
>> Srikant
>>

----------------------------------------------------------------------
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Fabrice Robini
Hi Srikant,

I really thank you for your reply, it's very interesting.
I have to say I am confused with that now...
I do not know what I can to for passing this Unit test...

I agree with you, it may be an issue of computing relevance.

Fabrice

Srikant Jakilinki-3 wrote
OK, got it to work. Thanks.

By a quick scoring comparision, I got the same scores for both hits.
Maybe there is a loss of precision somewhere. Or when scores are equal,
Lucene is doing something unintended/overlooked and thus putting shorter
documents higher as the experiment is a special case where the TF of a
queried term (for both suites, the TF of x = 10%) is equal which is very
rarely. Or maybe the IDF factor is kicking in in some strange way
although it shouldnt. There are a number of varied reasons, but for the
naked eye, there isnt much.

However, that said, length normalization is not a science but an art and
the simple scheme we have here in the FairSimilarity will not perform
always as expected in real world scenarios. Maybe I am missing something
or have forgot my basics but that is not to say your observation is trivial.

Rather, the contrary. Hope there will be more activity on this topic
because it is an issue of computing relevance which is the core of search.

Cheers,
Srikant

Fabrice Robini wrote:
> Oooops sorry, bad cut/paste...
>
> Here is the right one :-)
>
>     public void testFairSimilarity() throws CorruptIndexException,
> IOException, ParseException
>     {
>         Directory theDirectory = new RAMDirectory();
>         Analyzer theAnalyzer = new StandardAnalyzer();
>        
>         IndexWriter theIndexWriter = new IndexWriter(theDirectory,
> theAnalyzer);
>         theIndexWriter.setSimilarity(new FairSimilarity());
>        
>         Document doc1 = new Document();
>         Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>         Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
> Field.Store.NO, Field.Index.TOKENIZED);
>         doc1.add(name1);
>         doc1.add(content1);        
>         theIndexWriter.addDocument(doc1);
>        
>         Document doc2 = new Document();
>         Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>         Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
>         doc2.add(name2);
>         doc2.add(content2);        
>         theIndexWriter.addDocument(doc2);
>        
>         theIndexWriter.close();
>        
>         Searcher searcher = new IndexSearcher(theDirectory);
>         searcher.setSimilarity(new FairSimilarity());
>
>         QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);
>
>         Hits hits = searcher.search(queryParser.parse("x"));
>
>         assertEquals(2, hits.length());
>         assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
>         assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
>     }
>    
>
>
>
> Srikant Jakilinki-3 wrote:
>  
>> Well, I cant seem to even get past the assertions of this code.
>>
>> The first assertion is failing in that I get 0 hits. I am using
>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>
>> Any thoughts?
>> Srikant
>>

----------------------------------------------------------------------
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Daniel Naber-10
In reply to this post by Fabrice Robini
On Dienstag, 22. Januar 2008, Fabrice Robini wrote:

> Oooops sorry, bad cut/paste...
>
> Here is the right one :-)

The score is the same, so documents with a lower id (inserted earlier) will
be returned first. So everything looks okay to me, or am I missing
something?

regards
 Daniel

--
http://www.danielnaber.de

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is Fair Similarity working with lucene 2.2 ?

Fabrice Robini
In reply to this post by Fabrice Robini
Is there anything I can do to pass my Unit-Test ?
Or it is impossible ?

Thanks a lot,

Fabrice


Fabrice Robini wrote
Hi Srikant,

I really thank you for your reply, it's very interesting.
I have to say I am confused with that now...
I do not know what I can to for passing this Unit test...

I agree with you, it may be an issue of computing relevance.

Fabrice

Srikant Jakilinki-3 wrote
OK, got it to work. Thanks.

By a quick scoring comparision, I got the same scores for both hits.
Maybe there is a loss of precision somewhere. Or when scores are equal,
Lucene is doing something unintended/overlooked and thus putting shorter
documents higher as the experiment is a special case where the TF of a
queried term (for both suites, the TF of x = 10%) is equal which is very
rarely. Or maybe the IDF factor is kicking in in some strange way
although it shouldnt. There are a number of varied reasons, but for the
naked eye, there isnt much.

However, that said, length normalization is not a science but an art and
the simple scheme we have here in the FairSimilarity will not perform
always as expected in real world scenarios. Maybe I am missing something
or have forgot my basics but that is not to say your observation is trivial.

Rather, the contrary. Hope there will be more activity on this topic
because it is an issue of computing relevance which is the core of search.

Cheers,
Srikant

Fabrice Robini wrote:
> Oooops sorry, bad cut/paste...
>
> Here is the right one :-)
>
>     public void testFairSimilarity() throws CorruptIndexException,
> IOException, ParseException
>     {
>         Directory theDirectory = new RAMDirectory();
>         Analyzer theAnalyzer = new StandardAnalyzer();
>        
>         IndexWriter theIndexWriter = new IndexWriter(theDirectory,
> theAnalyzer);
>         theIndexWriter.setSimilarity(new FairSimilarity());
>        
>         Document doc1 = new Document();
>         Field name1 = new Field("NAME", "SHORT_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>         Field content1 = new Field("CONTENT", "x 2 3 4 5 6 7 8 9 10",
> Field.Store.NO, Field.Index.TOKENIZED);
>         doc1.add(name1);
>         doc1.add(content1);        
>         theIndexWriter.addDocument(doc1);
>        
>         Document doc2 = new Document();
>         Field name2 = new Field("NAME", "BIG_SUITE", Field.Store.YES,
> Field.Index.UN_TOKENIZED);
>         Field content2 = new Field("CONTENT", "x x 3 4 5 6 7 8 9 10 11 12 13
> 14 15 16 17 18 19 20", Field.Store.NO, Field.Index.TOKENIZED);
>         doc2.add(name2);
>         doc2.add(content2);        
>         theIndexWriter.addDocument(doc2);
>        
>         theIndexWriter.close();
>        
>         Searcher searcher = new IndexSearcher(theDirectory);
>         searcher.setSimilarity(new FairSimilarity());
>
>         QueryParser queryParser = new QueryParser("CONTENT", theAnalyzer);
>
>         Hits hits = searcher.search(queryParser.parse("x"));
>
>         assertEquals(2, hits.length());
>         assertEquals("BIG_SUITE", hits.doc(0).get("NAME"));
>         assertEquals("SHORT_SUITE", hits.doc(1).get("NAME"));
>     }
>    
>
>
>
> Srikant Jakilinki-3 wrote:
>  
>> Well, I cant seem to even get past the assertions of this code.
>>
>> The first assertion is failing in that I get 0 hits. I am using
>> SimpleAnalyzer since I do not have a FrenchAnalyzer.
>>
>> Any thoughts?
>> Srikant
>>

----------------------------------------------------------------------
Free pop3 email with a spam filter.
http://www.bluebottle.com/tag/5


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org