Question About Boosting.

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Question About Boosting.

shai deljo
How can i boost some tokens over others in the same field (at Index
time) ? If this is not supported directly, what's the best way around
this problem (what's the hack to solve this :) ).
Thanks,
Shai
Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

Walter Underwood, Netflix
What are you trying to achieve? Let's start with the problem
instead of picking one solution which Solr doesn't support. --wunder

On 3/10/07 5:08 PM, "shai deljo" <[hidden email]> wrote:

> How can i boost some tokens over others in the same field (at Index
> time) ? If this is not supported directly, what's the best way around
> this problem (what's the hack to solve this :) ).
> Thanks,
> Shai

Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

shai deljo
I have elements within a field that have different importance.
I thought boosting would be an elegant way to take this into account.
Please advise,


On 3/10/07, Walter Underwood <[hidden email]> wrote:

> What are you trying to achieve? Let's start with the problem
> instead of picking one solution which Solr doesn't support. --wunder
>
> On 3/10/07 5:08 PM, "shai deljo" <[hidden email]> wrote:
>
> > How can i boost some tokens over others in the same field (at Index
> > time) ? If this is not supported directly, what's the best way around
> > this problem (what's the hack to solve this :) ).
> > Thanks,
> > Shai
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

Walter Underwood, Netflix
Back up another step. What are the documents and what do you
want to show to the users? Have you tried the default configuration
with real user queries?

After you've tested it with user queries, then look at the
results where the ranking isn't performing well.

Lucene and Solr already automatically boost rare terms over
common terms, using tf.idf weighting.

I posted more detail on this in my blog last summer:

http://wunderwood.org/most_casual_observer/2006/06/good_to_great_search.html

wunder

On 3/10/07 8:04 PM, "shai deljo" <[hidden email]> wrote:

> I have elements within a field that have different importance.
> I thought boosting would be an elegant way to take this into account.
> Please advise,
>
>
> On 3/10/07, Walter Underwood <[hidden email]> wrote:
>> What are you trying to achieve? Let's start with the problem
>> instead of picking one solution which Solr doesn't support. --wunder
>>
>> On 3/10/07 5:08 PM, "shai deljo" <[hidden email]> wrote:
>>
>>> How can i boost some tokens over others in the same field (at Index
>>> time) ? If this is not supported directly, what's the best way around
>>> this problem (what's the hack to solve this :) ).
>>> Thanks,
>>> Shai
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

shai deljo
Thanks,
The only way i found to do this
(http://www.mail-archive.com/solr-user@.../msg02456.html)
 is to hack and repeat the word several times in the field, but
doesn't this screw up the norms?
Also, how do i boost words in a query? e.g. q=key1 key2 and i know
key2 is twice as important than key1 ? (searching 1 field).
Thanks,
S.

On 3/11/07, Walter Underwood <[hidden email]> wrote:

> Back up another step. What are the documents and what do you
> want to show to the users? Have you tried the default configuration
> with real user queries?
>
> After you've tested it with user queries, then look at the
> results where the ranking isn't performing well.
>
> Lucene and Solr already automatically boost rare terms over
> common terms, using tf.idf weighting.
>
> I posted more detail on this in my blog last summer:
>
> http://wunderwood.org/most_casual_observer/2006/06/good_to_great_search.html
>
> wunder
>
> On 3/10/07 8:04 PM, "shai deljo" <[hidden email]> wrote:
>
> > I have elements within a field that have different importance.
> > I thought boosting would be an elegant way to take this into account.
> > Please advise,
> >
> >
> > On 3/10/07, Walter Underwood <[hidden email]> wrote:
> >> What are you trying to achieve? Let's start with the problem
> >> instead of picking one solution which Solr doesn't support. --wunder
> >>
> >> On 3/10/07 5:08 PM, "shai deljo" <[hidden email]> wrote:
> >>
> >>> How can i boost some tokens over others in the same field (at Index
> >>> time) ? If this is not supported directly, what's the best way around
> >>> this problem (what's the hack to solve this :) ).
> >>> Thanks,
> >>> Shai
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

Mike Klaas
On 3/11/07, shai deljo <[hidden email]> wrote:
> Thanks,
> The only way i found to do this
> (http://www.mail-archive.com/solr-user@.../msg02456.html)
>  is to hack and repeat the word several times in the field, but
> doesn't this screw up the norms?

Yes, it can influence the norms.

> Also, how do i boost words in a query? e.g. q=key1 key2 and i know
> key2 is twice as important than key1 ? (searching 1 field).

q=key1 key2^2

If the keywords that have more importance are the same for every
document, query-time boosting is by far the more preferable route.
You have much more flexibility and it isn't  less performant.

There are some things which are elegantly solved using index-time
boosting, and so it is likely that lucene will support it one day.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

Chris Hostetter-3
In reply to this post by shai deljo

: I have elements within a field that have different importance.
: I thought boosting would be an elegant way to take this into account.
: Please advise,

typically if you know when sending hte doc to solr that certian
words/phrases of field A are extremely significant for that document, the
simple approach is to also put those words/phrases in some other field "B"
and at query time search both A and B .. since B tends to have less words
anyway, itmakes more of an impact on teh results, but if you want those
words to be *really* important boost your queries on B.

The dismax handler makes quering across these multiple fields very easy.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

shai deljo
I thought about this option but it doesn't sound scalable. What
happens if i have 100 words with 100 different boost factors?

On 3/12/07, Chris Hostetter <[hidden email]> wrote:

>
> : I have elements within a field that have different importance.
> : I thought boosting would be an elegant way to take this into account.
> : Please advise,
>
> typically if you know when sending hte doc to solr that certian
> words/phrases of field A are extremely significant for that document, the
> simple approach is to also put those words/phrases in some other field "B"
> and at query time search both A and B .. since B tends to have less words
> anyway, itmakes more of an impact on teh results, but if you want those
> words to be *really* important boost your queries on B.
>
> The dismax handler makes quering across these multiple fields very easy.
>
>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

Chris Hostetter-3

: I thought about this option but it doesn't sound scalable. What
: happens if i have 100 words with 100 different boost factors?

then you've got a problem :)

typically it's not this severe ... i'll frequently have half a dozen
fields that i divide text up into to boost on different amounts, but i'm
having a hard time understanding why you would need 100 unique boost
factors for 100 unique words ... putting things buckets tends be
effective.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Question About Boosting.

shai deljo
Buckets it is :)
Thx

On 3/12/07, Chris Hostetter <[hidden email]> wrote:

>
> : I thought about this option but it doesn't sound scalable. What
> : happens if i have 100 words with 100 different boost factors?
>
> then you've got a problem :)
>
> typically it's not this severe ... i'll frequently have half a dozen
> fields that i divide text up into to boost on different amounts, but i'm
> having a hard time understanding why you would need 100 unique boost
> factors for 100 unique words ... putting things buckets tends be
> effective.
>
>
>
> -Hoss
>
>