Is it a lucene bug?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it a lucene bug?

Wilson Wu
Hi,
     Recently, there is a requirement to sort the hits by both the
scores of documents and the updateTime which is a field of document to
mark the document's update time.  We want the new document in the
front and also want high score document in the front,in other words,
we want to mix the score and updateTime, but not first sort by
one,second by the other. So, I design a time based function f(t) to
calculte each document boost to write into the index store.
      The result is that I can caculate a value for each document
based its update time, and the value can influence the document score
through adjusting the fieldNorm value. But when I lookup the boost
value through the method document.getBoost() from every document in
the index store, I found the boost value = 1.0. Which means I can set
a document's boost value and the boost value can adjust the final
score, but I can't read the boost value from the document I have
searched out.
    Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
    PS: Is there any other way to meet my reqirement?  I think it is
not a good idea to adjust document's final score through writing a
document boost into the index store. Because if I want to open two
interfaces to the Client: one is sorting documents only by score which
is just the similarity score and has not been adjusted by boost value
f(t), the other is sorting by final score which has been adjuested by
boost value f(t). Thank a lot!

                                               wilson

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it a lucene bug?

savvas.andreas
hi,



I’m not exactly sure I understand they the type of sorting you are trying to
achieve.

You have an updateTime field and you mention that "We want the new document
in the
front and also want high score document in the front".

My take on this is that you want to first sort by the updateTime and then by
score but you say this is not the case?


Instead of calculating a boost value with f(t) can you not calculate and
index the actual value you need for every document?

Then you can  first sort by this value and then by score?



regards,

savvas


2009/11/26 Wilson Wu <[hidden email]>

> Hi,
>     Recently, there is a requirement to sort the hits by both the
> scores of documents and the updateTime which is a field of document to
> mark the document's update time.  We want the new document in the
> front and also want high score document in the front,in other words,
> we want to mix the score and updateTime, but not first sort by
> one,second by the other. So, I design a time based function f(t) to
> calculte each document boost to write into the index store.
>      The result is that I can caculate a value for each document
> based its update time, and the value can influence the document score
> through adjusting the fieldNorm value. But when I lookup the boost
> value through the method document.getBoost() from every document in
> the index store, I found the boost value = 1.0. Which means I can set
> a document's boost value and the boost value can adjust the final
> score, but I can't read the boost value from the document I have
> searched out.
>    Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>    PS: Is there any other way to meet my reqirement?  I think it is
> not a good idea to adjust document's final score through writing a
> document boost into the index store. Because if I want to open two
> interfaces to the Client: one is sorting documents only by score which
> is just the similarity score and has not been adjusted by boost value
> f(t), the other is sorting by final score which has been adjuested by
> boost value f(t). Thank a lot!
>
>                                               wilson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Is it a lucene bug?

Uwe Schindler
In reply to this post by Wilson Wu
Read the documentation of the Document class: if you set a boost for a
document, it is used when indexing the fields and multiplied to each field.
For the document itself no boost value is stored, so you cannot get it (only
so called stored fields are retrievable).

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Wilson Wu [mailto:[hidden email]]
> Sent: Thursday, November 26, 2009 1:01 PM
> To: [hidden email]
> Subject: Is it a lucene bug?
>
> Hi,
>      Recently, there is a requirement to sort the hits by both the
> scores of documents and the updateTime which is a field of document to
> mark the document's update time.  We want the new document in the
> front and also want high score document in the front,in other words,
> we want to mix the score and updateTime, but not first sort by
> one,second by the other. So, I design a time based function f(t) to
> calculte each document boost to write into the index store.
>       The result is that I can caculate a value for each document
> based its update time, and the value can influence the document score
> through adjusting the fieldNorm value. But when I lookup the boost
> value through the method document.getBoost() from every document in
> the index store, I found the boost value = 1.0. Which means I can set
> a document's boost value and the boost value can adjust the final
> score, but I can't read the boost value from the document I have
> searched out.
>     Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>     PS: Is there any other way to meet my reqirement?  I think it is
> not a good idea to adjust document's final score through writing a
> document boost into the index store. Because if I want to open two
> interfaces to the Client: one is sorting documents only by score which
> is just the similarity score and has not been adjusted by boost value
> f(t), the other is sorting by final score which has been adjuested by
> boost value f(t). Thank a lot!
>
>                                                wilson
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it a lucene bug?

Wilson Wu
hi
    Thank you vary much!
    I have another question.As is mentioned in document class: if I
set a boost for a document, it is used when indexing the field and
multiplied to each field. Here is a case:  sometimes I want the boost
to be a factor of score, but sometimes I want to ignore the boost when
scoring the searched hits. Can lucene fulfill? If can,how to write the
search code  in that case?

2009/11/26 Uwe Schindler <[hidden email]>:

> Read the documentation of the Document class: if you set a boost for a
> document, it is used when indexing the fields and multiplied to each field.
> For the document itself no boost value is stored, so you cannot get it (only
> so called stored fields are retrievable).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
>> -----Original Message-----
>> From: Wilson Wu [mailto:[hidden email]]
>> Sent: Thursday, November 26, 2009 1:01 PM
>> To: [hidden email]
>> Subject: Is it a lucene bug?
>>
>> Hi,
>>      Recently, there is a requirement to sort the hits by both the
>> scores of documents and the updateTime which is a field of document to
>> mark the document's update time.  We want the new document in the
>> front and also want high score document in the front,in other words,
>> we want to mix the score and updateTime, but not first sort by
>> one,second by the other. So, I design a time based function f(t) to
>> calculte each document boost to write into the index store.
>>       The result is that I can caculate a value for each document
>> based its update time, and the value can influence the document score
>> through adjusting the fieldNorm value. But when I lookup the boost
>> value through the method document.getBoost() from every document in
>> the index store, I found the boost value = 1.0. Which means I can set
>> a document's boost value and the boost value can adjust the final
>> score, but I can't read the boost value from the document I have
>> searched out.
>>     Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>>     PS: Is there any other way to meet my reqirement?  I think it is
>> not a good idea to adjust document's final score through writing a
>> document boost into the index store. Because if I want to open two
>> interfaces to the Client: one is sorting documents only by score which
>> is just the similarity score and has not been adjusted by boost value
>> f(t), the other is sorting by final score which has been adjuested by
>> boost value f(t). Thank a lot!
>>
>>                                                wilson
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it a lucene bug?

fulin tang
In reply to this post by Uwe Schindler
Maybe you should take a look at the  Scorer  and  Similarity  series
classes , they will show you how the score is calculated , make some
change of them, and you will get what you want.

We have the same problem and we get it done by write subclasses of
DefaultSimilarity and BooleanScorer


2009/11/26 Uwe Schindler <[hidden email]>:

> Read the documentation of the Document class: if you set a boost for a
> document, it is used when indexing the fields and multiplied to each field.
> For the document itself no boost value is stored, so you cannot get it (only
> so called stored fields are retrievable).
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
>> -----Original Message-----
>> From: Wilson Wu [mailto:[hidden email]]
>> Sent: Thursday, November 26, 2009 1:01 PM
>> To: [hidden email]
>> Subject: Is it a lucene bug?
>>
>> Hi,
>>      Recently, there is a requirement to sort the hits by both the
>> scores of documents and the updateTime which is a field of document to
>> mark the document's update time.  We want the new document in the
>> front and also want high score document in the front,in other words,
>> we want to mix the score and updateTime, but not first sort by
>> one,second by the other. So, I design a time based function f(t) to
>> calculte each document boost to write into the index store.
>>       The result is that I can caculate a value for each document
>> based its update time, and the value can influence the document score
>> through adjusting the fieldNorm value. But when I lookup the boost
>> value through the method document.getBoost() from every document in
>> the index store, I found the boost value = 1.0. Which means I can set
>> a document's boost value and the boost value can adjust the final
>> score, but I can't read the boost value from the document I have
>> searched out.
>>     Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>>     PS: Is there any other way to meet my reqirement?  I think it is
>> not a good idea to adjust document's final score through writing a
>> document boost into the index store. Because if I want to open two
>> interfaces to the Client: one is sorting documents only by score which
>> is just the similarity score and has not been adjusted by boost value
>> f(t), the other is sorting by final score which has been adjuested by
>> boost value f(t). Thank a lot!
>>
>>                                                wilson
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



--
梦的开始挣扎于城市的边缘
心的远方执着在脚步的瞬间
我的宿命埋藏了寂寞的永远

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it a lucene bug?

Wilson Wu
In reply to this post by savvas.andreas
Hi,
      I am afraid I didn't describe clearly enough in my last mail.
Let me describe it again.
For example, there are 5 documents as doc1,doc2,doc3,doc4,doc5 in the
search hits. And their updateTimes are respectively
 t1 = doc1.updateTime = 2009-01-01 12:45:00
 t2 = doc2.updateTime = 2009-01-01 15:30:00
t3 = doc3.updateTime = 2009-01-05 09:45:00
t4 = doc4.updateTime = 2009-08-01 12:45:00
t5 = doc5.updateTime = 2009-11-27 12:45:00
Suppose their relevancy scores are:
score1 = doc1.score = 2.4
score2 = doc2.score = 2.3
score3 = doc3.score = 2.3
score4 = doc4.score = 1.8
score5 = doc5.score = 1.6
If I don't care the updateTime and I sort by document score
(relevancy), the sequence should be doc1 > doc2 > doc3 > doc4 > doc5,
am I right?
But we should take the updateTime as a sorting factor. Through the
function f(t), we can calculate values according to updateTimes.
Suppose the values are
 v1 = f(t1) = 2.00
 v2 = f(t2) = 2.01
 v3 = f(t3) = 2.1
 v4 = f(t4) = 2.5
 v5 = f(t5) = 3.5
So the final result is:
r1 = v1 * score1 = 2      * 2.4 = 4.8
r2 = v2 * score2 = 2.01 * 2.3 = 4.623
r3 = v3 * score3 = 2.1   * 2.3 =  4.83
r4 = v4 * score4 = 2.5   * 1.8 = 4.5
r5 = v5 * score5 = 3.5   * 1.6 =  5.6,
r5 > r3 > r1 > r2 > r4
the sequence should be doc5 > doc3 > doc1 > doc2 > doc4 .

    In the above example, we can see althrough score1(= 2.4) >
score2(=2.3) = score3(=2.3), but t2 is almost  3 hours bigger than
t1,and t3 is almost 4 days bigger than t1. We think 3 hours is a small
value,and 4 days maybe a much big value. So the final result r3 > r1 >
r2. And we can also change the updateTime's proportion in sorting
factors through changing the function f(t).

   Am I describing clearly?






2009/11/26 Savvas-Andreas Moysidis <[hidden email]>:

> hi,
>
>
>
> I’m not exactly sure I understand they the type of sorting you are trying to
> achieve.
>
> You have an updateTime field and you mention that "We want the new document
> in the
> front and also want high score document in the front".
>
> My take on this is that you want to first sort by the updateTime and then by
> score but you say this is not the case?
>
>
> Instead of calculating a boost value with f(t) can you not calculate and
> index the actual value you need for every document?
>
> Then you can  first sort by this value and then by score?
>
>
>
> regards,
>
> savvas
>
>
> 2009/11/26 Wilson Wu <[hidden email]>
>
>> Hi,
>>     Recently, there is a requirement to sort the hits by both the
>> scores of documents and the updateTime which is a field of document to
>> mark the document's update time.  We want the new document in the
>> front and also want high score document in the front,in other words,
>> we want to mix the score and updateTime, but not first sort by
>> one,second by the other. So, I design a time based function f(t) to
>> calculte each document boost to write into the index store.
>>      The result is that I can caculate a value for each document
>> based its update time, and the value can influence the document score
>> through adjusting the fieldNorm value. But when I lookup the boost
>> value through the method document.getBoost() from every document in
>> the index store, I found the boost value = 1.0. Which means I can set
>> a document's boost value and the boost value can adjust the final
>> score, but I can't read the boost value from the document I have
>> searched out.
>>    Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>>    PS: Is there any other way to meet my reqirement?  I think it is
>> not a good idea to adjust document's final score through writing a
>> document boost into the index store. Because if I want to open two
>> interfaces to the Client: one is sorting documents only by score which
>> is just the similarity score and has not been adjusted by boost value
>> f(t), the other is sorting by final score which has been adjuested by
>> boost value f(t). Thank a lot!
>>
>>                                               wilson
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it a lucene bug?

Wilson Wu
In reply to this post by fulin tang
Hi,
   Thanks for your inspiration. What version(lucene 2.4 or 2.9 or
others) are you used in your project. Can you give more details fo
your  suggestion, thanks.

                                          Wilson

2009/11/27 fulin tang <[hidden email]>:

> Maybe you should take a look at the  Scorer  and  Similarity  series
> classes , they will show you how the score is calculated , make some
> change of them, and you will get what you want.
>
> We have the same problem and we get it done by write subclasses of
> DefaultSimilarity and BooleanScorer
>
>
> 2009/11/26 Uwe Schindler <[hidden email]>:
>> Read the documentation of the Document class: if you set a boost for a
>> document, it is used when indexing the fields and multiplied to each field.
>> For the document itself no boost value is stored, so you cannot get it (only
>> so called stored fields are retrievable).
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [hidden email]
>>
>>
>>> -----Original Message-----
>>> From: Wilson Wu [mailto:[hidden email]]
>>> Sent: Thursday, November 26, 2009 1:01 PM
>>> To: [hidden email]
>>> Subject: Is it a lucene bug?
>>>
>>> Hi,
>>>      Recently, there is a requirement to sort the hits by both the
>>> scores of documents and the updateTime which is a field of document to
>>> mark the document's update time.  We want the new document in the
>>> front and also want high score document in the front,in other words,
>>> we want to mix the score and updateTime, but not first sort by
>>> one,second by the other. So, I design a time based function f(t) to
>>> calculte each document boost to write into the index store.
>>>       The result is that I can caculate a value for each document
>>> based its update time, and the value can influence the document score
>>> through adjusting the fieldNorm value. But when I lookup the boost
>>> value through the method document.getBoost() from every document in
>>> the index store, I found the boost value = 1.0. Which means I can set
>>> a document's boost value and the boost value can adjust the final
>>> score, but I can't read the boost value from the document I have
>>> searched out.
>>>     Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
>>>     PS: Is there any other way to meet my reqirement?  I think it is
>>> not a good idea to adjust document's final score through writing a
>>> document boost into the index store. Because if I want to open two
>>> interfaces to the Client: one is sorting documents only by score which
>>> is just the similarity score and has not been adjusted by boost value
>>> f(t), the other is sorting by final score which has been adjuested by
>>> boost value f(t). Thank a lot!
>>>
>>>                                                wilson
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>
>
>
> --
> 梦的开始挣扎于城市的边缘
> 心的远方执着在脚步的瞬间
> 我的宿命埋藏了寂寞的永远
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it a lucene bug?

savvas.andreas
have you considered a custom sort strategy using a ScoreDocComparator ?
Inside your implementation you have access to individual doc scores and you
could create a parallel (to your docs) array of floats which stores your
r1,r2,r3 etc values.
Then use this array to implement your int compare(ScoreDoc i, ScoreDoc j)method.

savvas.

2009/11/27 Wilson Wu <[hidden email]>

> Hi,
>   Thanks for your inspiration. What version(lucene 2.4 or 2.9 or
> others) are you used in your project. Can you give more details fo
> your  suggestion, thanks.
>
>                                          Wilson
>
> 2009/11/27 fulin tang <[hidden email]>:
> > Maybe you should take a look at the  Scorer  and  Similarity  series
> > classes , they will show you how the score is calculated , make some
> > change of them, and you will get what you want.
> >
> > We have the same problem and we get it done by write subclasses of
> > DefaultSimilarity and BooleanScorer
> >
> >
> > 2009/11/26 Uwe Schindler <[hidden email]>:
> >> Read the documentation of the Document class: if you set a boost for a
> >> document, it is used when indexing the fields and multiplied to each
> field.
> >> For the document itself no boost value is stored, so you cannot get it
> (only
> >> so called stored fields are retrievable).
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: [hidden email]
> >>
> >>
> >>> -----Original Message-----
> >>> From: Wilson Wu [mailto:[hidden email]]
> >>> Sent: Thursday, November 26, 2009 1:01 PM
> >>> To: [hidden email]
> >>> Subject: Is it a lucene bug?
> >>>
> >>> Hi,
> >>>      Recently, there is a requirement to sort the hits by both the
> >>> scores of documents and the updateTime which is a field of document to
> >>> mark the document's update time.  We want the new document in the
> >>> front and also want high score document in the front,in other words,
> >>> we want to mix the score and updateTime, but not first sort by
> >>> one,second by the other. So, I design a time based function f(t) to
> >>> calculte each document boost to write into the index store.
> >>>       The result is that I can caculate a value for each document
> >>> based its update time, and the value can influence the document score
> >>> through adjusting the fieldNorm value. But when I lookup the boost
> >>> value through the method document.getBoost() from every document in
> >>> the index store, I found the boost value = 1.0. Which means I can set
> >>> a document's boost value and the boost value can adjust the final
> >>> score, but I can't read the boost value from the document I have
> >>> searched out.
> >>>     Is it a bug in lucene? Thanks. I use lucene version 2.4.1.
> >>>     PS: Is there any other way to meet my reqirement?  I think it is
> >>> not a good idea to adjust document's final score through writing a
> >>> document boost into the index store. Because if I want to open two
> >>> interfaces to the Client: one is sorting documents only by score which
> >>> is just the similarity score and has not been adjusted by boost value
> >>> f(t), the other is sorting by final score which has been adjuested by
> >>> boost value f(t). Thank a lot!
> >>>
> >>>                                                wilson
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: [hidden email]
> >>> For additional commands, e-mail: [hidden email]
> >>
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
> >
> >
> >
> > --
> > 梦的开始挣扎于城市的边缘
> > 心的远方执着在脚步的瞬间
> > 我的宿命埋藏了寂寞的永远
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>