Sorting by calculated custom score at search time

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Sorting by calculated custom score at search time

Nick Vincent-2
Hi,

I am trying to find a way to create scores with a custom formula based
on the initial score from Lucene and field values from each document,
e.g. for each document:

 finalScore = searchScore * (popularity) * (userRating)

The customer requires this functionality as I have to replace an
existing system that works like this.  User rating and popularity are
already available and will be stored in Lucene.  I've looked through LIA
and the approaches there don't seem to fit the requirement:

5.1.6 Sorting by multiple fields: only sorts by one field, then the
next, I need to combine the scores
6.1 Using a custom sort method: does not take into account the
document's original score

From an earlier thread discussing a calculated score based on the hit
score and the age of document I gather that TSS regenerate their indexes
to alter the document boost based on date.  I need to be able to sort by
either relevance or "popularity rated relevance" depending on user
input, so I don't think adding a precalculated document boost at index
time is an option.

In the worst case scenario I'll need to iterate through the hits and
then sort them in memory myself, but I'm looking to be indexing around
500,000 documents, and in this particular application there are a lot of
common keywords, so a large number of hits for a basic query is common.
I'm trying to avoid this as it's an untidy solution which is likely to
be (relatively) slow.

I notice Erik has commented that "I've not come across a really clean
way to do this sort of age-based  
boosting other than how TSS does it".  I was wondering if anyone has any
experience with dirtier approaches they could share with me?

Any help is really appreciated,

Thanks,

Nick
 

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Sorting by calculated custom score at search time

Tim.Wright
Nick Vincent [mailto:[hidden email]] wrote:

[snip]

> From an earlier thread discussing a calculated score based on the hit
> score and the age of document I gather that TSS regenerate their
indexes
> to alter the document boost based on date.  I need to be able to sort
by
> either relevance or "popularity rated relevance" depending on user
> input, so I don't think adding a precalculated document boost at index
> time is an option.

The easiest way I can think of would be to build two indexes - one with
popularity boosted documents and one without, and search the one you
want.

Alternatively, if you have to have a single index, you could add each
document twice, once with no boost, and once with "popularity" boost,
and specify a keyword field for each document defining whether it's
boosted or not. Then, when you want to search, restrict results to
documents where the keyword is "boosted" or "notboosted" respectively.

Both fairly low tech solutions, but I'm lazy like that!

Cheers,

Tim.



********************************************************************************
The information contained in this email message may be confidential. If you are not the intended recipient, any use, interference with, disclosure or copying of this material is unauthorised and prohibited. Although this message and any attachments are believed to be free of viruses, no responsibility is accepted by Informa for any loss or damage arising in any way from receipt or use thereof.  Messages to and from the company are monitored for operational reasons and in accordance with lawful business practices.
If you have received this message in error, please notify us by return and delete the message and any attachments.  Further enquiries/returns can be sent to [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by calculated custom score at search time

gekkokid
In reply to this post by Nick Vincent-2
how does TSS boost by date? give a small boost increase like 0.1 or 0.2 x
(ArticlePublishDate - IndexCreationDate)?


----- Original Message -----
From: "Nick Vincent" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, January 24, 2006 5:42 PM
Subject: Sorting by calculated custom score at search time

I notice Erik has commented that "I've not come across a really clean
way to do this sort of age-based
boosting other than how TSS does it".  I was wondering if anyone has any
experience with dirtier approaches they could share with me?

Any help is really appreciated,

Thanks,

Nick


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by calculated custom score at search time

Chris Hostetter-3
In reply to this post by Nick Vincent-2


Take a look at the org.apache.lucene.search.function package in SVN.  It
provides an API that allows you to define "function" classes that can
compute a score for each document using whatever means you want.  The
overall FunctionQuery can then be wrapped in a BooleanQuery along with
whateer other search critera you have.

5 basic functions have been included that can be composed in all sorts of
interesting ways to compute scores based on document values for a
particular field (or the relative ordinal positions in the FieldCache for
that field)


: Date: Tue, 24 Jan 2006 17:42:06 -0000
: From: Nick Vincent <[hidden email]>
: Reply-To: [hidden email]
: To: [hidden email]
: Subject: Sorting by calculated custom score at search time
:
: Hi,
:
: I am trying to find a way to create scores with a custom formula based
: on the initial score from Lucene and field values from each document,
: e.g. for each document:
:
:  finalScore = searchScore * (popularity) * (userRating)
:
: The customer requires this functionality as I have to replace an
: existing system that works like this.  User rating and popularity are
: already available and will be stored in Lucene.  I've looked through LIA
: and the approaches there don't seem to fit the requirement:
:
: 5.1.6 Sorting by multiple fields: only sorts by one field, then the
: next, I need to combine the scores
: 6.1 Using a custom sort method: does not take into account the
: document's original score
:
: >From an earlier thread discussing a calculated score based on the hit
: score and the age of document I gather that TSS regenerate their indexes
: to alter the document boost based on date.  I need to be able to sort by
: either relevance or "popularity rated relevance" depending on user
: input, so I don't think adding a precalculated document boost at index
: time is an option.
:
: In the worst case scenario I'll need to iterate through the hits and
: then sort them in memory myself, but I'm looking to be indexing around
: 500,000 documents, and in this particular application there are a lot of
: common keywords, so a large number of hits for a basic query is common.
: I'm trying to avoid this as it's an untidy solution which is likely to
: be (relatively) slow.
:
: I notice Erik has commented that "I've not come across a really clean
: way to do this sort of age-based
: boosting other than how TSS does it".  I was wondering if anyone has any
: experience with dirtier approaches they could share with me?
:
: Any help is really appreciated,
:
: Thanks,
:
: Nick
:
:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by calculated custom score at search time

Yonik Seeley
It's not in subversion yet though ;-)

You have to look here:
http://issues.apache.org/jira/browse/LUCENE-446

I haven't committed it, because we may be able to do better (maybe
removing the difference between Query and ValueSource so you could
freely mix the two and not have to wrap ValueSource in a
FunctionQuery).

-Yonik


On 1/24/06, Chris Hostetter <[hidden email]> wrote:

>
>
> Take a look at the org.apache.lucene.search.function package in SVN.  It
> provides an API that allows you to define "function" classes that can
> compute a score for each document using whatever means you want.  The
> overall FunctionQuery can then be wrapped in a BooleanQuery along with
> whateer other search critera you have.
>
> 5 basic functions have been included that can be composed in all sorts of
> interesting ways to compute scores based on document values for a
> particular field (or the relative ordinal positions in the FieldCache for
> that field)
>
>
> : Date: Tue, 24 Jan 2006 17:42:06 -0000
> : From: Nick Vincent <[hidden email]>
> : Reply-To: [hidden email]
> : To: [hidden email]
> : Subject: Sorting by calculated custom score at search time
> :
> : Hi,
> :
> : I am trying to find a way to create scores with a custom formula based
> : on the initial score from Lucene and field values from each document,
> : e.g. for each document:
> :
> :  finalScore = searchScore * (popularity) * (userRating)
> :
> : The customer requires this functionality as I have to replace an
> : existing system that works like this.  User rating and popularity are
> : already available and will be stored in Lucene.  I've looked through LIA
> : and the approaches there don't seem to fit the requirement:
> :
> : 5.1.6 Sorting by multiple fields: only sorts by one field, then the
> : next, I need to combine the scores
> : 6.1 Using a custom sort method: does not take into account the
> : document's original score
> :
> : >From an earlier thread discussing a calculated score based on the hit
> : score and the age of document I gather that TSS regenerate their indexes
> : to alter the document boost based on date.  I need to be able to sort by
> : either relevance or "popularity rated relevance" depending on user
> : input, so I don't think adding a precalculated document boost at index
> : time is an option.
> :
> : In the worst case scenario I'll need to iterate through the hits and
> : then sort them in memory myself, but I'm looking to be indexing around
> : 500,000 documents, and in this particular application there are a lot of
> : common keywords, so a large number of hits for a basic query is common.
> : I'm trying to avoid this as it's an untidy solution which is likely to
> : be (relatively) slow.
> :
> : I notice Erik has commented that "I've not come across a really clean
> : way to do this sort of age-based
> : boosting other than how TSS does it".  I was wondering if anyone has any
> : experience with dirtier approaches they could share with me?
> :
> : Any help is really appreciated,
> :
> : Thanks,
> :
> : Nick
> :
> :
> : ---------------------------------------------------------------------
> : To unsubscribe, e-mail: [hidden email]
> : For additional commands, e-mail: [hidden email]
> :
>
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Sorting by calculated custom score at search time

Chris Hostetter-3

: It's not in subversion yet though ;-)
:
: You have to look here:
: http://issues.apache.org/jira/browse/LUCENE-446

Whoops ... sorry about that.  I forget how far out on the bleeding edge
the code I'm using is sometimes :)

It definitely works right now, so you should give it a shot -- but you may
not want to invest *too* heavily in integrating it into your code if the
API is still in flux.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]