Best way to change weighting based on the presence of a field

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Best way to change weighting based on the presence of a field

Kyle Banerjee
Howdy all,

We are attempting to provide access to about 8 million records of
highly variable quality and length. In a nutshell, we are trying to
find a way to deprioritize "suspect" records without discriminating
against useful records that happen to be short. We do not wish to
eliminate suspect records from the results -- just deprioritize them a
bit.

We have been indexing a field that marks a record as likely to be good
or bad, and I'm trying to figure out the most efficient way to use it
(should I be trying this at all?). As a newbie, my first inclination
was to OR the search terms with the same terms combined with a "good
record marker" with a modest boost.

However, this method seems really clunky, and I'm wondering if there's
a better way to accomplish what we're trying to do. Thanks,

kyle
Reply | Threaded
Open this post in threaded view
|

Re: Best way to change weighting based on the presence of a field

Mike Klaas
On 5-Oct-07, at 2:06 PM, Kyle Banerjee wrote:

> Howdy all,
>
> We are attempting to provide access to about 8 million records of
> highly variable quality and length. In a nutshell, we are trying to
> find a way to deprioritize "suspect" records without discriminating
> against useful records that happen to be short. We do not wish to
> eliminate suspect records from the results -- just deprioritize them a
> bit.
>
> We have been indexing a field that marks a record as likely to be good
> or bad, and I'm trying to figure out the most efficient way to use it
> (should I be trying this at all?). As a newbie, my first inclination
> was to OR the search terms with the same terms combined with a "good
> record marker" with a modest boost.
>
> However, this method seems really clunky, and I'm wondering if there's
> a better way to accomplish what we're trying to do. Thanks,

If you know at index time that the document is shady, the easiest way  
to de-emphasize it globally is to set the document boost to some  
value other than one.

<doc boost="0.5">...

cheers,
-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Best way to change weighting based on the presence of a field

Kyle Banerjee
> If you know at index time that the document is shady, the easiest way
> to de-emphasize it globally is to set the document boost to some
> value other than one.
>
> <doc boost="0.5">...

I considered that, but assumed we'd get the values wrong at first and
have to do a lot of tinkering before we got it right. Is there a good
way to do this at query time, or do you really need to do this when
loading? It would be feasible to boost at load time, but recovery
times from bad decisions are longer than I was hoping for.

kyle
Reply | Threaded
Open this post in threaded view
|

Re: Best way to change weighting based on the presence of a field

Mike Klaas
On 5-Oct-07, at 3:01 PM, Kyle Banerjee wrote:

>> If you know at index time that the document is shady, the easiest way
>> to de-emphasize it globally is to set the document boost to some
>> value other than one.
>>
>> <doc boost="0.5">...
>
> I considered that, but assumed we'd get the values wrong at first and
> have to do a lot of tinkering before we got it right. Is there a good
> way to do this at query time, or do you really need to do this when
> loading? It would be feasible to boost at load time, but recovery
> times from bad decisions are longer than I was hoping for.

The other option is to use a function query on the value stored in a  
field (which could represent a range of 'badness').  This can be used  
directly in the dismax handler using the bf (boost function) query  
parameter.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Best way to change weighting based on the presence of a field

Yonik Seeley-2
On 10/5/07, Mike Klaas <[hidden email]> wrote:
> The other option is to use a function query on the value stored in a
> field (which could represent a range of 'badness').  This can be used
> directly in the dismax handler using the bf (boost function) query
> parameter.

In the near future, you can do a real query-time boost (score multiplication)
by another field or function
https://issues.apache.org/jira/browse/SOLR-334

And even quickly update all the values of the field being used as the boost:
https://issues.apache.org/jira/browse/SOLR-351

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Best way to change weighting based on the presence of a field

Kyle Banerjee
> In the near future, you can do a real query-time boost (score multiplication)
> by another field or function
> https://issues.apache.org/jira/browse/SOLR-334
>
> And even quickly update all the values of the field being used as the boost:
> https://issues.apache.org/jira/browse/SOLR-351

Thanks, all the feedback people are providing is very helpful. For the
short term, it looks like the ticket might to use a function query on
the value stored in a field that represents the quality of the record.

kyle