Title Boosting and IDF

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Title Boosting and IDF

Tavi Nathanson
Hey everyone,

I field documents by "title" and "body". The title field often has far fewer terms than the body field. IDF, as a result, will have a profound effect in the title field compared to the body field.

I currently have the title field boosted by 4x relative to the body field. While I want matches in the title field to result in higher scores than matches in the body field, I don't believe I want the title to completely trump the body. I've seen this happen when a rare term is present in the title field, and IDF combines with the 4x boost to wreak havoc.

I'd like to get your thoughts on the following:

- Is it standard practice to avoid boosting the title field much, because of the (generally) high IDF of title field terms?
- Are there other strategies for handling the high IDF of a title field?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Title Boosting and IDF

Erick Erickson
Your first and biggest problem will be to define "good"
result ordering. You have some anecdotal statements
that amount to something like "sometimes I don't like
the results". But unless you can quantify this, you'll spend a
LOT of time going tweaking the results ordering and then
going back and re-tweaking based on another result....

But to your point. the 4x boosting is actually rather high. You
might be able to get better results by boosting by significantly
smaller values, say 1.5 or something.

But under any circumstances, _some_ searches will not be
satisfactory, I guess it's up to you to figure out what's
about "the best you can do"... Wish I had better answers, but
judgement calls are like that <G>..

Best
Erick

On Tue, Apr 24, 2012 at 5:28 PM, Tavi Nathanson
<[hidden email]> wrote:

> Hey everyone,
>
> I field documents by "title" and "body". The title field often has far fewer
> terms than the body field. IDF, as a result, will have a profound effect in
> the title field compared to the body field.
>
> I currently have the title field boosted by 4x relative to the body field.
> While I want matches in the title field to result in higher scores than
> matches in the body field, I don't believe I want the title to completely
> trump the body. I've seen this happen when a rare term is present in the
> title field, and IDF combines with the 4x boost to wreak havoc.
>
> I'd like to get your thoughts on the following:
>
> - Is it standard practice to avoid boosting the title field much, because of
> the (generally) high IDF of title field terms?
> - Are there other strategies for handling the high IDF of a title field?
>
> Thanks!
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Title-Boosting-and-IDF-tp3936709p3936709.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Title Boosting and IDF

Tavi Nathanson
Thanks, Erick!
Reply | Threaded
Open this post in threaded view
|

Re: Title Boosting and IDF

Walter Underwood
In reply to this post by Erick Erickson
Interestingly, I worked at two different web search companies with two different completely different search engines, and one arrived at an 8X title boost and the other at a 7.5X title boost. So I consider 8X a universal physical constant.

I totally agree about using real user queries and real user clicks to evaluate your configuration.

wunder
Former Infoseek, Inktomi, Verity, Autonomy, Netflix, etc.

On Apr 25, 2012, at 6:00 PM, Erick Erickson wrote:

> Your first and biggest problem will be to define "good"
> result ordering. You have some anecdotal statements
> that amount to something like "sometimes I don't like
> the results". But unless you can quantify this, you'll spend a
> LOT of time going tweaking the results ordering and then
> going back and re-tweaking based on another result....
>
> But to your point. the 4x boosting is actually rather high. You
> might be able to get better results by boosting by significantly
> smaller values, say 1.5 or something.
>
> But under any circumstances, _some_ searches will not be
> satisfactory, I guess it's up to you to figure out what's
> about "the best you can do"... Wish I had better answers, but
> judgement calls are like that <G>..
>
> Best
> Erick
>
> On Tue, Apr 24, 2012 at 5:28 PM, Tavi Nathanson
> <[hidden email]> wrote:
>> Hey everyone,
>>
>> I field documents by "title" and "body". The title field often has far fewer
>> terms than the body field. IDF, as a result, will have a profound effect in
>> the title field compared to the body field.
>>
>> I currently have the title field boosted by 4x relative to the body field.
>> While I want matches in the title field to result in higher scores than
>> matches in the body field, I don't believe I want the title to completely
>> trump the body. I've seen this happen when a rare term is present in the
>> title field, and IDF combines with the 4x boost to wreak havoc.
>>
>> I'd like to get your thoughts on the following:
>>
>> - Is it standard practice to avoid boosting the title field much, because of
>> the (generally) high IDF of title field terms?
>> - Are there other strategies for handling the high IDF of a title field?
>>
>> Thanks!
>>
>> --
>> View this message in context: http://lucene.472066.n3.nabble.com/Title-Boosting-and-IDF-tp3936709p3936709.html
>> Sent from the Solr - User mailing list archive at Nabble.com.

--
Walter Underwood
[hidden email]



Reply | Threaded
Open this post in threaded view
|

Re: Title Boosting and IDF

Yonik Seeley-2-2
On Wed, Apr 25, 2012 at 9:24 PM, Walter Underwood <[hidden email]> wrote:
> Interestingly, I worked at two different web search companies with two different completely different search engines, and one arrived at an 8X title boost and the other at a 7.5X title boost. So I consider 8X a universal physical constant.

Great info!  Do you know if that 8x was after (i.e. already included)
length normalization?

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10