Occurence (freq) and ordering

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Occurence (freq) and ordering

Philippe Deslauriers (Beetext)
Hi again,

 

Upgrading from lucene 1.3 to 1.9.

 

We need to order the result in order of occurrences (score of a doc = sum of
occurrences of all Query).

In lucene 1.3 we did rewrite all the Query classes (BooleanQuery,
PhraseQuery, etc..) to reach our goals, but is there an easier way to do it
in 1.9?

 

I am just starting to read on Similarity, weights etc.

 

Can someone give me a heads up?

 

Thanks!

 

Philippe Deslauriers

 

Reply | Threaded
Open this post in threaded view
|

Re: Occurence (freq) and ordering

Chris Hostetter-3

: Upgrading from lucene 1.3 to 1.9.

: We need to order the result in order of occurrences (score of a doc = sum of
: occurrences of all Query).

: I am just starting to read on Similarity, weights etc.

You are definitely on the right track with Similarity.  What you want is a
Similarity implimentation where the values returned by most methods are
either 0 or 1, except for the tf(int) and tf(float) which should be an
identify function.

If you *allways* want *every* query to work this way, then you may also
want to look at using the new Field.setOmitNorms(true) option when you
index your documents.  It not only removes the lengthNorm from the scoring
equation, but it can help to reduce the size of your index (which i seem
to recall you were concerned with an another thread)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Occurence (freq) and ordering

Philippe Deslauriers (Beetext)
Thanks for the Field.setOmitNorms(true) tip!

Regarding the Similarity implementation I am trying to do, somehow it does
not work.

Here's what I understand:

Scorer implementation uses the method defined in Similarity, to compute
score. (the formula expressed in
"http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similarity.
html" is implemented in the scorer.  

According to the formula if all my methods return 1, except for tf(freq)
which simply returns freq, all should work.

score(q,d) = SUM (t in q) : tf(freq) * 1 * 1 * 1 * 1 * 1 * 1

By example doc contains 3 times the word "test", and 1 time the word
"example", and the query was looking for both words, the score for the doc
should be 4.

But whatever I do, score is 1.

What I am missing?

Thanks

Phil.

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Thursday, April 27, 2006 2:21 PM
To: [hidden email]
Subject: Re: Occurence (freq) and ordering


: Upgrading from lucene 1.3 to 1.9.

: We need to order the result in order of occurrences (score of a doc = sum
of
: occurrences of all Query).

: I am just starting to read on Similarity, weights etc.

You are definitely on the right track with Similarity.  What you want is a
Similarity implimentation where the values returned by most methods are
either 0 or 1, except for the tf(int) and tf(float) which should be an
identify function.

If you *allways* want *every* query to work this way, then you may also
want to look at using the new Field.setOmitNorms(true) option when you
index your documents.  It not only removes the lengthNorm from the scoring
equation, but it can help to reduce the size of your index (which i seem
to recall you were concerned with an another thread)


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Occurence (freq) and ordering

Chris Hostetter-3

: By example doc contains 3 times the word "test", and 1 time the word
: "example", and the query was looking for both words, the score for the doc
: should be 4.
:
: But whatever I do, score is 1.

1) this is where Searcher.explain really comes in handy ... it will help
you seewhat is going on.

2) are you using the Hits API?  if so then if the high score is above 1,
all scores are normalized realtive hte high score.  use the TopDocs search
method instead.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]