term OR term OR term OR .... query question

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

term OR term OR term OR .... query question

Vladimir Olenin
Hi.
 
I have a question regarding Lucene scoring algorithm. Providing I have a
query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
and doc2 "d e", will doc1 score higher than doc2? In other words, does
Lucene takes into account the number of terms matched in the document in
case of the 'or' query?
 
Providing that I don't know the algorithms behind the Lucene, how does
'or' query time depends on the number of searched terms? Does it grow
linierly, exponentially? How does 'and' query time depends on the number
of searched terms? (it should decrease, right?)
 
Thanks.
 
Vlad
Reply | Threaded
Open this post in threaded view
|

Re: term OR term OR term OR .... query question

Grant Ingersoll
See below.

Also, there is new Scoring documentation available via the website  
(http://lucene.apache.org/java/docs/scoring.html) that covers scoring  
in some detail.

On Sep 26, 2006, at 5:23 PM, Vladimir Olenin wrote:

> Hi.
>
> I have a question regarding Lucene scoring algorithm. Providing I  
> have a
> query "a OR b OR c OR d OR e OR f", and two documents: doc1 "a b c d"
> and doc2 "d e", will doc1 score higher than doc2? In other words, does
> Lucene takes into account the number of terms matched in the  
> document in
> case of the 'or' query?
>

Yes, it should score higher.  See the coord() factor as part of the  
similarity.

> Providing that I don't know the algorithms behind the Lucene, how does
> 'or' query time depends on the number of searched terms? Does it grow
> linierly, exponentially? How does 'and' query time depends on the  
> number
> of searched terms? (it should decrease, right?)
>

Not 100% on this, but that does make sense, pretty simple to test  
out, I think.    We are working on some benchmarks and this may be a  
good one to add to it.



--------------------------
Grant Ingersoll
Sr. Software Engineer
Center for Natural Language Processing
Syracuse University
335 Hinds Hall
Syracuse, NY 13244
http://www.cnlp.org

Voice: 315-443-5484
Skype: grant_ingersoll
Fax: 315-443-6886




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]