Favouring recent matches

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Favouring recent matches

James Brady-3
Hello all,
In Lucene in Action, (replicated here: http://www.theserverside.com/tt/articles/article.tss?l=ILoveLucene)
, theserverside.com team say "The date boost has been really important  
for us".

I'm looking for some advice on the best way to actually implement this  
- the only way I can see to do it right now is to set a boost for  
documents at index time that increases linearly over time. However,  
I'm wary of skewing Lucene's scoring in some strange way, or  
interfering with the document boosts I'm setting for other reasons.

Any suggests?

Thanks
James
Reply | Threaded
Open this post in threaded view
|

Fwd: Favouring recent matches

James Brady-3
Sorry, I really should have directly explained what I was looking to  
do: theserverside.com give higher scores to documents that were added  
more recently.

I'd like to do the same, without the date boost being too overbearing  
(or unnoticeable...) - some ideas on how to approach this would be  
great.

James

Begin forwarded message:

> From: James Brady <[hidden email]>
> Date: 8 March 2008 19:41:56 PST
> To: [hidden email]
> Subject: Favouring recent matches
>
> Hello all,
> In Lucene in Action, (replicated here: http://www.theserverside.com/tt/articles/article.tss?l=ILoveLucene)
> , theserverside.com team say "The date boost has been really  
> important for us".
>
> I'm looking for some advice on the best way to actually implement  
> this - the only way I can see to do it right now is to set a boost  
> for documents at index time that increases linearly over time.  
> However, I'm wary of skewing Lucene's scoring in some strange way,  
> or interfering with the document boosts I'm setting for other reasons.
>
> Any suggests?
>
> Thanks
> James

Reply | Threaded
Open this post in threaded view
|

Re: Favouring recent matches

Walter Underwood, Netflix
Ultraseek has "recent and relevant" as an option. We used the document age
in days (now - document_date) and took the log of that. You need to adjust
the boost to have the desired amount of influence.

The most conservative approach is to use it as a tiebreaker, so that
you can distinguish between two different "President Bush" stories
that are about different presidents.

wunder

On 3/8/08 8:29 PM, "James Brady" <[hidden email]> wrote:

> Sorry, I really should have directly explained what I was looking to
> do: theserverside.com give higher scores to documents that were added
> more recently.
>
> I'd like to do the same, without the date boost being too overbearing
> (or unnoticeable...) - some ideas on how to approach this would be
> great.
>
> James
>
> Begin forwarded message:
>
>> From: James Brady <[hidden email]>
>> Date: 8 March 2008 19:41:56 PST
>> To: [hidden email]
>> Subject: Favouring recent matches
>>
>> Hello all,
>> In Lucene in Action, (replicated here:
>> http://www.theserverside.com/tt/articles/article.tss?l=ILoveLucene)
>> , theserverside.com team say "The date boost has been really
>> important for us".
>>
>> I'm looking for some advice on the best way to actually implement
>> this - the only way I can see to do it right now is to set a boost
>> for documents at index time that increases linearly over time.
>> However, I'm wary of skewing Lucene's scoring in some strange way,
>> or interfering with the document boosts I'm setting for other reasons.
>>
>> Any suggests?
>>
>> Thanks
>> James
>

Reply | Threaded
Open this post in threaded view
|

Re: Fwd: Favouring recent matches

Mathieu Lecarme
In reply to this post by James Brady-3
1) document boost is periodicaly recomputed with age as a factor (or
log(age)). It should be slow.
2) Use your own Similarity implementation. Use the DefaultSimilarity
with a dynamic document boost. The Map document id -> age or document id
-> date should be cached with Map, ehCache, whirlcache, oscache or bdb
base. Use expiration caching, and be careful, warm up (ie populating the
cache) should be slow.

M.
James Brady a écrit :

> Sorry, I really should have directly explained what I was looking to
> do: theserverside.com give higher scores to documents that were added
> more recently.
>
> I'd like to do the same, without the date boost being too overbearing
> (or unnoticeable...) - some ideas on how to approach this would be great.
>
> James
>
> Begin forwarded message:
>
>> From: James Brady <[hidden email]>
>> Date: 8 March 2008 19:41:56 PST
>> To: [hidden email]
>> Subject: Favouring recent matches
>>
>> Hello all,
>> In Lucene in Action, (replicated here:
>> http://www.theserverside.com/tt/articles/article.tss?l=ILoveLucene),
>> theserverside.com team say "The date boost has been really important
>> for us".
>>
>> I'm looking for some advice on the best way to actually implement
>> this - the only way I can see to do it right now is to set a boost
>> for documents at index time that increases linearly over time.
>> However, I'm wary of skewing Lucene's scoring in some strange way, or
>> interfering with the document boosts I'm setting for other reasons.
>>
>> Any suggests?
>>
>> Thanks
>> James
>
>