Setting Boost values

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Setting Boost values

KushalDave
Hi,

We are implementing a search engine for a huge dataset (approximately 50
million html pages).
We have indexed various field related information, such as Title, Body ,
Meta text, H1, URL  etc.
Lucene provides the setBoost() function to give weightage to these fields.
What should be the values for these fields?
Should they be relative?
Are there any standard values?

We've also computed Page Rank for those web pages, what can be the best way
to combine
the page rank information with the lucene's  document score?

--
Kushal Dave
Reply | Threaded
Open this post in threaded view
|

Re: Setting Boost values

iorixxx

> We have indexed various field related information, such as
> Title, Body , Meta text, H1, URLĀ  etc.
> What should be the values for these fields?

Boost value is multiplied with score. Or in other words it is a multiplication factor in score calculation.

> Should they be relative?
Yes.

> Are there any standard values?
No.

"The default value of field boosts, logically, is 1.0. During indexing, a Document can be assigned a boost, too. A Document boost factor implicitly
sets the starting field boost of all fields to the specified value. Field-specific boosts are multiplied by the starting value, giving the final value of the field boost factor." [Hatcher's Lucene in Action Book]

> what can be the best way to combine the page rank information with the lucene'sĀ document score?

I don't know it is the best way but using page rank as a document level boost during indexing is the easiest way, I guess.

You can probably do it at query time also but I don't know how to do it in Lucene.
In Solr http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]