Boosting score based off a match in a particular field

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Boosting score based off a match in a particular field

Tanya Bompi
Hi,
  I have an index that is built using a combination of fields (Title,
Description, Phone, Email etc). I have an indexed all the fields and the
combined copy field as well.
In the query that i have which is a combination of all the fields as input
(Title + Description+Phone+email).
There are some samples where if the Email/Phone has a match the resulting
Solr score is lower still. I have tried boosting the fields say Email^2 but
that results in any token in the input query being matched against the
email which results in erroneous results.

How can i formulate a query that I can boost for Email to match against
Email with a boost along with the combined field match against the combined
field index.

Thanks,
Tanya
Reply | Threaded
Open this post in threaded view
|

Re: Boosting score based off a match in a particular field

Doug Turnbull
The terminology we use at my company is you want to *gate* the effect of
boost to only very precise scenarios. A lot of this depends on how your
Email and Phone numbers are being tokenized/analyzed (ie what analyzer is
on the field type), because you really only want to boost when you have
high confidence email/phone number matches. You may actually have more of a
matching problem than a relevance problem. You can debug this in the Solr
analysis screen.

Another tool you can use is putting a mm on just the boost query. This
gates that specific boost based on how many query terms match that field.
It's good for doing a kind of poor-man's entity recognition (how much does
the query correspond to one kind of entity)

Something like

bq={!edismax mm=80% qf=Email^100 v=$q} <--Boost emails only when there's a
strong match, 80% of query terms match the email

alongside your main qf with the combined field

qf=text_all

There's a lot of strategies, and it usually involves a combination of query
and analysis work (and lots of good test data to prove your approach works)

(shameless plug is we cover a lot of this in Solr relevance training
https://opensourceconnections.com/events/training/)

Hope that helps
-Doug


On Wed, Nov 28, 2018 at 3:21 PM Tanya Bompi <[hidden email]> wrote:

> Hi,
>   I have an index that is built using a combination of fields (Title,
> Description, Phone, Email etc). I have an indexed all the fields and the
> combined copy field as well.
> In the query that i have which is a combination of all the fields as input
> (Title + Description+Phone+email).
> There are some samples where if the Email/Phone has a match the resulting
> Solr score is lower still. I have tried boosting the fields say Email^2 but
> that results in any token in the input query being matched against the
> email which results in erroneous results.
>
> How can i formulate a query that I can boost for Email to match against
> Email with a boost along with the combined field match against the combined
> field index.
>
> Thanks,
> Tanya
>
--
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug
Reply | Threaded
Open this post in threaded view
|

Re: Boosting score based off a match in a particular field

Tanya Bompi
Hi Doug,
  Thank you for your response. I tried the above boost syntax but I get the
following error of going into an infinite loop. In the wiki page I couldnt
figure out what the 'v' parameter is. (
https://lucene.apache.org/solr/guide/7_0/the-extended-dismax-query-parser.html).
I will try the analysis tool as well.

"bq":"{!edismax mm=80% qf=ContactEmail^100 v=$q}"}},
"error":{ "metadata":[ "error-class","org.apache.solr.common.SolrException",
"root-error-class","org.apache.solr.search.SyntaxError"],
"msg":"org.apache.solr.search.SyntaxError:
Infinite Recursion detected parsing query

Thank you,
Tanya

On Wed, Nov 28, 2018 at 12:36 PM Doug Turnbull <
[hidden email]> wrote:

> The terminology we use at my company is you want to *gate* the effect of
> boost to only very precise scenarios. A lot of this depends on how your
> Email and Phone numbers are being tokenized/analyzed (ie what analyzer is
> on the field type), because you really only want to boost when you have
> high confidence email/phone number matches. You may actually have more of a
> matching problem than a relevance problem. You can debug this in the Solr
> analysis screen.
>
> Another tool you can use is putting a mm on just the boost query. This
> gates that specific boost based on how many query terms match that field.
> It's good for doing a kind of poor-man's entity recognition (how much does
> the query correspond to one kind of entity)
>
> Something like
>
> bq={!edismax mm=80% qf=Email^100 v=$q} <--Boost emails only when there's a
> strong match, 80% of query terms match the email
>
> alongside your main qf with the combined field
>
> qf=text_all
>
> There's a lot of strategies, and it usually involves a combination of query
> and analysis work (and lots of good test data to prove your approach works)
>
> (shameless plug is we cover a lot of this in Solr relevance training
> https://opensourceconnections.com/events/training/)
>
> Hope that helps
> -Doug
>
>
> On Wed, Nov 28, 2018 at 3:21 PM Tanya Bompi <[hidden email]> wrote:
>
> > Hi,
> >   I have an index that is built using a combination of fields (Title,
> > Description, Phone, Email etc). I have an indexed all the fields and the
> > combined copy field as well.
> > In the query that i have which is a combination of all the fields as
> input
> > (Title + Description+Phone+email).
> > There are some samples where if the Email/Phone has a match the resulting
> > Solr score is lower still. I have tried boosting the fields say Email^2
> but
> > that results in any token in the input query being matched against the
> > email which results in erroneous results.
> >
> > How can i formulate a query that I can boost for Email to match against
> > Email with a boost along with the combined field match against the
> combined
> > field index.
> >
> > Thanks,
> > Tanya
> >
> --
> CTO, OpenSource Connections
> Author, Relevant Search
> http://o19s.com/doug
>
Reply | Threaded
Open this post in threaded view
|

Re: Boosting score based off a match in a particular field

Doug Turnbull
Ah yes, this is a common gotcha, its because the bq is recursively applied
to itself

So you have to change that bq to have itself a bq that's empty

bq={!edismax bq='' mm=80% qf=Email^100 v=$q}

v is simply the 'q' for this subquery, by passing v=$q you explicitly set
it to whatever was passed in q

Best
-Doug

On Wed, Nov 28, 2018 at 4:30 PM Tanya Bompi <[hidden email]> wrote:

> Hi Doug,
>   Thank you for your response. I tried the above boost syntax but I get
> the following error of going into an infinite loop. In the wiki page I
> couldnt figure out what the 'v' parameter is. (
> https://lucene.apache.org/solr/guide/7_0/the-extended-dismax-query-parser.html).
> I will try the analysis tool as well.
>
> "bq":"{!edismax mm=80% qf=ContactEmail^100 v=$q}"}},
> "error":{ "metadata":[ "error-class",
> "org.apache.solr.common.SolrException", "root-error-class",
> "org.apache.solr.search.SyntaxError"], "msg":"org.apache.solr.search.SyntaxError:
> Infinite Recursion detected parsing query
>
> Thank you,
> Tanya
>
> On Wed, Nov 28, 2018 at 12:36 PM Doug Turnbull <
> [hidden email]> wrote:
>
>> The terminology we use at my company is you want to *gate* the effect of
>> boost to only very precise scenarios. A lot of this depends on how your
>> Email and Phone numbers are being tokenized/analyzed (ie what analyzer is
>> on the field type), because you really only want to boost when you have
>> high confidence email/phone number matches. You may actually have more of
>> a
>> matching problem than a relevance problem. You can debug this in the Solr
>> analysis screen.
>>
>> Another tool you can use is putting a mm on just the boost query. This
>> gates that specific boost based on how many query terms match that field.
>> It's good for doing a kind of poor-man's entity recognition (how much does
>> the query correspond to one kind of entity)
>>
>> Something like
>>
>> bq={!edismax mm=80% qf=Email^100 v=$q} <--Boost emails only when there's a
>> strong match, 80% of query terms match the email
>>
>> alongside your main qf with the combined field
>>
>> qf=text_all
>>
>> There's a lot of strategies, and it usually involves a combination of
>> query
>> and analysis work (and lots of good test data to prove your approach
>> works)
>>
>> (shameless plug is we cover a lot of this in Solr relevance training
>> https://opensourceconnections.com/events/training/)
>>
>> Hope that helps
>> -Doug
>>
>>
>> On Wed, Nov 28, 2018 at 3:21 PM Tanya Bompi <[hidden email]>
>> wrote:
>>
>> > Hi,
>> >   I have an index that is built using a combination of fields (Title,
>> > Description, Phone, Email etc). I have an indexed all the fields and the
>> > combined copy field as well.
>> > In the query that i have which is a combination of all the fields as
>> input
>> > (Title + Description+Phone+email).
>> > There are some samples where if the Email/Phone has a match the
>> resulting
>> > Solr score is lower still. I have tried boosting the fields say Email^2
>> but
>> > that results in any token in the input query being matched against the
>> > email which results in erroneous results.
>> >
>> > How can i formulate a query that I can boost for Email to match against
>> > Email with a boost along with the combined field match against the
>> combined
>> > field index.
>> >
>> > Thanks,
>> > Tanya
>> >
>> --
>> CTO, OpenSource Connections
>> Author, Relevant Search
>> http://o19s.com/doug
>>
> --
CTO, OpenSource Connections
Author, Relevant Search
http://o19s.com/doug