Lucene scoring and random result order

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene scoring and random result order

Yanick Gamelin
Hi all,

I have the following problem with Lucene being not deterministic.

I use a MultiSearcher to process a search and when I get hits with same score, those are returned in a random order.
I wouldn't care much about the order of the hits with same score if I could get them all, so I could sort them myself.
But if we request a maximum number of results lower than the amount of hits with same score, we only get a subset of those hits and that result list of hits will change because the order is not guarantied.
Sometimes the first part of the result list is consistent because scoring is different for those hits, but then we have a bit block with equals scoring, so Lucene only take what it need to fill the rest of the list. Lucene takes randomly what its need from the big block of equal score

As an example imagine x,y,and z which have a high scoring, all other letters have same score
3 consecutive searches will give
[x,y,z,a,b,c,d,f,g,h,i,j]
[x,y,z,q,w,e,r,t,u,i,o,p]
[x,y,z,m,n,b,v,c,a,s,d,g]

Pretty annoying eh? So, what can I do about that?
Reply | Threaded
Open this post in threaded view
|

RE: Lucene scoring and random result order

Sendros, Jason
You can sort on multiple values. Keep the primary sort as a relevancy
sort, and choose something else to sort on to keep the rest of the
responses fairly static.

http://lucene.apache.org/java/3_3_0/api/core/org/apache/lucene/search/So
rt.html

Example:
Sort sortBy = new Sort(new SortField[] { SortField.FIELD_SCORE, new
SortField("POSITION",SortField.INT) });

-----Original Message-----
From: Yanick Gamelin [mailto:[hidden email]]
Sent: Thursday, August 25, 2011 3:02 PM
To: [hidden email]
Subject: Lucene scoring and random result order

Hi all,

I have the following problem with Lucene being not deterministic.

I use a MultiSearcher to process a search and when I get hits with same
score, those are returned in a random order.
I wouldn't care much about the order of the hits with same score if I
could get them all, so I could sort them myself.
But if we request a maximum number of results lower than the amount of
hits with same score, we only get a subset of those hits and that result
list of hits will change because the order is not guarantied.
Sometimes the first part of the result list is consistent because
scoring is different for those hits, but then we have a bit block with
equals scoring, so Lucene only take what it need to fill the rest of the
list. Lucene takes randomly what its need from the big block of equal
score

As an example imagine x,y,and z which have a high scoring, all other
letters have same score
3 consecutive searches will give
[x,y,z,a,b,c,d,f,g,h,i,j]
[x,y,z,q,w,e,r,t,u,i,o,p]
[x,y,z,m,n,b,v,c,a,s,d,g]

Pretty annoying eh? So, what can I do about that?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]