Query Boosting

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Query Boosting

bourne71
Hi,

I am fairly new to Lucene and have encounter a problem with the search function i am trying to create using Lucene.  When I search, lets say "news sharing", then the results return and display.

Its fine up to this point until I check the ranking. Some results, although match only 1 of the 2 keywords, will have higher ranking. The problem is like describe below:

Page 1
news - Total found 23
sharing - Total found 0

Page 2
news - Total found 1
sharing - Total found 21

This is understandable why Page 1 got better ranking, bcs it has more keyword found. But this will make the results return to be less relevant

My current query is like the following:
(url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0 content:news title:news^1.5) url:"sharing news"~2147483647^2.0 content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5

Is there anyway I can add an additional query that will give an additional boost to results that has both the keyword in it?
Reply | Threaded
Open this post in threaded view
|

Re: Query Boosting

Simon Willnauer
Hi there,

well, where to start from.... I would suggest you look at the output
of Query#explain() first to see how the score is calculated. You might
use a simpler query to get started with it as this might be quite
cryptic if you see it the first time.
To completely understand what the output means have a closer look to
the javadoc of the class Similarity
(http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html)
this will explain how the score is calculated in the very detail.
Once you understand what is going on during the scoring process I
would suggest you revise your boosting. I don't know if you have field
boost set but it seems it would make more sense in your usecase as far
as I can tell.
In general make sure you understand what the different boosts are used
for - this snippet from the wiki might help you:
<snip>
What is the difference between field (or document) boosting and query boosting?

Index time field boosts (field.setBoost(boost)) are a way to express
things like "this document's title is worth twice as much as the title
of most documents". Query time boosts (query.setBoost(boost)) are a
way to express "I care about matches on this clause of my query twice
as much as I do about matches on other clauses of my query".

Index time field boosts are worthless if you set them on every document.

Index time document boosts (doc.setBoost(float)) are equivalent to
setting a field boost on ever field in that document.
</snip> (http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7)

hope that helps to get started with scoring etc.

simon


On Tue, Aug 11, 2009 at 10:50 AM, bourne71<[hidden email]> wrote:

>
> Hi,
>
> I am fairly new to Lucene and have encounter a problem with the search
> function i am trying to create using Lucene.  When I search, lets say "news
> sharing", then the results return and display.
>
> Its fine up to this point until I check the ranking. Some results, although
> match only 1 of the 2 keywords, will have higher ranking. The problem is
> like describe below:
>
> Page 1
> news - Total found 23
> sharing - Total found 0
>
> Page 2
> news - Total found 1
> sharing - Total found 21
>
> This is understandable why Page 1 got better ranking, bcs it has more
> keyword found. But this will make the results return to be less relevant
>
> My current query is like the following:
> (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
> content:news title:news^1.5) url:"sharing news"~2147483647^2.0
> content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5
>
> Is there anyway I can add an additional query that will give an additional
> boost to results that has both the keyword in it?
> --
> View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Query Boosting

bourne71
thanks, I understand how boosting works, what I need will be a boost in the query that will increase the score of a page if all keywords/query is found in the page to increase its ranking.

I tried all sort of combination and it did not work. Anyone can provide any suggestion?

Simon Willnauer wrote
Hi there,

well, where to start from.... I would suggest you look at the output
of Query#explain() first to see how the score is calculated. You might
use a simpler query to get started with it as this might be quite
cryptic if you see it the first time.
To completely understand what the output means have a closer look to
the javadoc of the class Similarity
(http://lucene.apache.org/java/2_4_1/api/core/org/apache/lucene/search/Similarity.html)
this will explain how the score is calculated in the very detail.
Once you understand what is going on during the scoring process I
would suggest you revise your boosting. I don't know if you have field
boost set but it seems it would make more sense in your usecase as far
as I can tell.
In general make sure you understand what the different boosts are used
for - this snippet from the wiki might help you:
<snip>
What is the difference between field (or document) boosting and query boosting?

Index time field boosts (field.setBoost(boost)) are a way to express
things like "this document's title is worth twice as much as the title
of most documents". Query time boosts (query.setBoost(boost)) are a
way to express "I care about matches on this clause of my query twice
as much as I do about matches on other clauses of my query".

Index time field boosts are worthless if you set them on every document.

Index time document boosts (doc.setBoost(float)) are equivalent to
setting a field boost on ever field in that document.
</snip> (http://wiki.apache.org/lucene-java/LuceneFAQ#head-246300129b9d3bf73f597facec54ac2ee54e15d7)

hope that helps to get started with scoring etc.

simon


On Tue, Aug 11, 2009 at 10:50 AM, bourne71<garylkc@live.com> wrote:
>
> Hi,
>
> I am fairly new to Lucene and have encounter a problem with the search
> function i am trying to create using Lucene.  When I search, lets say "news
> sharing", then the results return and display.
>
> Its fine up to this point until I check the ranking. Some results, although
> match only 1 of the 2 keywords, will have higher ranking. The problem is
> like describe below:
>
> Page 1
> news - Total found 23
> sharing - Total found 0
>
> Page 2
> news - Total found 1
> sharing - Total found 21
>
> This is understandable why Page 1 got better ranking, bcs it has more
> keyword found. But this will make the results return to be less relevant
>
> My current query is like the following:
> (url:sharing^2.0 content:sharing title:sharing^1.5) (url:news^2.0
> content:news title:news^1.5) url:"sharing news"~2147483647^2.0
> content:"sharing news"~2147483647 title:"sharing news"~2147483647^1.5
>
> Is there anyway I can add an additional query that will give an additional
> boost to results that has both the keyword in it?
> --
> View this message in context: http://www.nabble.com/Query-Boosting-tp24913967p24913967.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Query Boosting

iorixxx

> thanks, I understand how boosting works, what I need will
> be a boost in the query that will increase the score of a page if all
> keywords/query is found in the page to increase its ranking.

You can find answer of your question in the last two messages at this thread:

http://www.nabble.com/Generating-Query-for-Multiple-Clauses-in-a-Single-Field-td24694748.html


     

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]