How to control ranking based on into which field a hit is found

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to control ranking based on into which field a hit is found

Steven White
Hi everyone,

I index my data from the DB into their own fields.  I then use copyField to
index the value of all the fields into _ALL_FIELDS_ that I created.  In my
edismax, I use _ALL_FIELDS_ for “df”.  Here is how my edismax looks like:



    <lst name="defaults">
      <str name="echoParams">explicit</str>
      <str name="defType">edismax</str>
      <str name="q.alt">*:*</str>
      <str name="q.op">AND</str>
      <str name="fl">score,_UNIQUE_FIELD_</str>
      <str name="df">_ALL_FIELDS_</str>
      <!-- <str name="qf">_ALL_FIELDS_</str> --> <!-- use "qf" or "df" -->

      <str name="lowercaseOperators">false</str>
    </lst>

One of my fields that I index is called ID (every record I index has this
field) and it holds the user-friendly record-ID that uniquely identifies
that record (users are familiar with the value of this ID field).  The
value of this ID field can be in any other record's field too but only 1
record in the whole set has this value in the ID field.  An example of a
value for ID is “MOD00002012A".

My need is achieve both of those goals (A is a must but B is highly
desired):


A) If a user searches for just "MOD00002012A" (with our without quotes) I
want the record that matches this value in the ID field to be the first
item in the hit result regardless where else this term may also exist in
other records: the record with ID "MOD00002012A" must be hit #1 on the list.

B) If a user searches for "MOD00002012A manual" or "download MOD00002012A
manual" or "factory warranty for MOD00002012A", etc. (without quotes) I
want the record that matches this value in the ID field to be the first
item in the hit result regardless where else this term may also exist in
other records: the record with ID "MOD00002012A" must be hit #1 on the list.

How can I achieve A and B?



Of course, if “MOD00002012A” does not match in the ID field but matches in
another field, then I need the normal search / hit / ranking to happen.


As a side question, what should I be using, "qf" or "df"?  I cannot figure
out the difference between the 2 in Solr's doc.

Thanks

Steven
Reply | Threaded
Open this post in threaded view
|

Re: How to control ranking based on into which field a hit is found

Erick Erickson
Try something like q=whatever OR q=id:whatever^1000

I’d put it in quotes for the id= clause, and do look at what the parsed
query looks like when you specify &debug=query. The reason I
recommend this is you’ll no doubt try something like
q=id:download MOD00002012A manual
witout quotes and be very surprised because it parses as
q=id:download OR _ALL_FIELDS_: MOD00002012A OR _ALL_FIELDS_:manual

Which brings us to the difference between df and qf. df stand for “default field”
and is what fields the query is sent against if nothing is specified, i.e. q=blah will
parse as q=default_field:blah whereas, with edismax, qf is the query field can be
plural, so you get something like qf=field1,field2 and the search is performed on
both fields. Note that in the edismax  case, df is rarely used unless you do
something like specify "qf=“, i.e. an empty qf list.

Which also brings up whether you really want your _ALL_FIELDS_ field or not.
Depending on how many fields you really have, you can just list them all in
the qf parameter (either on the individual search or in solrconfig.xml) and avoid
the _ALL_FIELDS_ field entirely. You can even boost individual fields differently
by default.

And, if you really want _ALL_FIELDS_, you may not need edismax.

Best,
Erick

> On May 25, 2020, at 1:15 PM, Steven White <[hidden email]> wrote:
>
> Hi everyone,
>
> I index my data from the DB into their own fields.  I then use copyField to
> index the value of all the fields into _ALL_FIELDS_ that I created.  In my
> edismax, I use _ALL_FIELDS_ for “df”.  Here is how my edismax looks like:
>
>
>
>    <lst name="defaults">
>      <str name="echoParams">explicit</str>
>      <str name="defType">edismax</str>
>      <str name="q.alt">*:*</str>
>      <str name="q.op">AND</str>
>      <str name="fl">score,_UNIQUE_FIELD_</str>
>      <str name="df">_ALL_FIELDS_</str>
>      <!-- <str name="qf">_ALL_FIELDS_</str> --> <!-- use "qf" or "df" -->
>
>      <str name="lowercaseOperators">false</str>
>    </lst>
>
> One of my fields that I index is called ID (every record I index has this
> field) and it holds the user-friendly record-ID that uniquely identifies
> that record (users are familiar with the value of this ID field).  The
> value of this ID field can be in any other record's field too but only 1
> record in the whole set has this value in the ID field.  An example of a
> value for ID is “MOD00002012A".
>
> My need is achieve both of those goals (A is a must but B is highly
> desired):
>
>
> A) If a user searches for just "MOD00002012A" (with our without quotes) I
> want the record that matches this value in the ID field to be the first
> item in the hit result regardless where else this term may also exist in
> other records: the record with ID "MOD00002012A" must be hit #1 on the list.
>
> B) If a user searches for "MOD00002012A manual" or "download MOD00002012A
> manual" or "factory warranty for MOD00002012A", etc. (without quotes) I
> want the record that matches this value in the ID field to be the first
> item in the hit result regardless where else this term may also exist in
> other records: the record with ID "MOD00002012A" must be hit #1 on the list.
>
> How can I achieve A and B?
>
>
>
> Of course, if “MOD00002012A” does not match in the ID field but matches in
> another field, then I need the normal search / hit / ranking to happen.
>
>
> As a side question, what should I be using, "qf" or "df"?  I cannot figure
> out the difference between the 2 in Solr's doc.
>
> Thanks
>
> Steven

Reply | Threaded
Open this post in threaded view
|

Re: How to control ranking based on into which field a hit is found

Steven White
Thanks Erick.

OR'ing ID:"MOD00002012A"^1000 with the original query will not always
guarantee that the record with the matching ID will be the #1 hit on the
list, or will it?

Also, why did you boost by a factor of 1000?  I never figured out what the
number means for boosting.  I have seen 10, 100, 500, etc used and even
fractions such as 0.2.  Can  you shed some light on this?

Thanks for explaining the difference between "fq" and "dq".

Steven

On Mon, May 25, 2020 at 1:33 PM Erick Erickson <[hidden email]>
wrote:

> Try something like q=whatever OR q=id:whatever^1000
>
> I’d put it in quotes for the id= clause, and do look at what the parsed
> query looks like when you specify &debug=query. The reason I
> recommend this is you’ll no doubt try something like
> q=id:download MOD00002012A manual
> witout quotes and be very surprised because it parses as
> q=id:download OR _ALL_FIELDS_: MOD00002012A OR _ALL_FIELDS_:manual
>
> Which brings us to the difference between df and qf. df stand for “default
> field”
> and is what fields the query is sent against if nothing is specified, i.e.
> q=blah will
> parse as q=default_field:blah whereas, with edismax, qf is the query field
> can be
> plural, so you get something like qf=field1,field2 and the search is
> performed on
> both fields. Note that in the edismax  case, df is rarely used unless you
> do
> something like specify "qf=“, i.e. an empty qf list.
>
> Which also brings up whether you really want your _ALL_FIELDS_ field or
> not.
> Depending on how many fields you really have, you can just list them all
> in
> the qf parameter (either on the individual search or in solrconfig.xml)
> and avoid
> the _ALL_FIELDS_ field entirely. You can even boost individual fields
> differently
> by default.
>
> And, if you really want _ALL_FIELDS_, you may not need edismax.
>
> Best,
> Erick
>
> > On May 25, 2020, at 1:15 PM, Steven White <[hidden email]> wrote:
> >
> > Hi everyone,
> >
> > I index my data from the DB into their own fields.  I then use copyField
> to
> > index the value of all the fields into _ALL_FIELDS_ that I created.  In
> my
> > edismax, I use _ALL_FIELDS_ for “df”.  Here is how my edismax looks like:
> >
> >
> >
> >    <lst name="defaults">
> >      <str name="echoParams">explicit</str>
> >      <str name="defType">edismax</str>
> >      <str name="q.alt">*:*</str>
> >      <str name="q.op">AND</str>
> >      <str name="fl">score,_UNIQUE_FIELD_</str>
> >      <str name="df">_ALL_FIELDS_</str>
> >      <!-- <str name="qf">_ALL_FIELDS_</str> --> <!-- use "qf" or "df" -->
> >
> >      <str name="lowercaseOperators">false</str>
> >    </lst>
> >
> > One of my fields that I index is called ID (every record I index has this
> > field) and it holds the user-friendly record-ID that uniquely identifies
> > that record (users are familiar with the value of this ID field).  The
> > value of this ID field can be in any other record's field too but only 1
> > record in the whole set has this value in the ID field.  An example of a
> > value for ID is “MOD00002012A".
> >
> > My need is achieve both of those goals (A is a must but B is highly
> > desired):
> >
> >
> > A) If a user searches for just "MOD00002012A" (with our without quotes) I
> > want the record that matches this value in the ID field to be the first
> > item in the hit result regardless where else this term may also exist in
> > other records: the record with ID "MOD00002012A" must be hit #1 on the
> list.
> >
> > B) If a user searches for "MOD00002012A manual" or "download MOD00002012A
> > manual" or "factory warranty for MOD00002012A", etc. (without quotes) I
> > want the record that matches this value in the ID field to be the first
> > item in the hit result regardless where else this term may also exist in
> > other records: the record with ID "MOD00002012A" must be hit #1 on the
> list.
> >
> > How can I achieve A and B?
> >
> >
> >
> > Of course, if “MOD00002012A” does not match in the ID field but matches
> in
> > another field, then I need the normal search / hit / ranking to happen.
> >
> >
> > As a side question, what should I be using, "qf" or "df"?  I cannot
> figure
> > out the difference between the 2 in Solr's doc.
> >
> > Thanks
> >
> > Steven
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to control ranking based on into which field a hit is found

Erick Erickson
If you boost it high enough it should, but you’re right it’s not guaranteed.
The number is “whatever works”, it’s just a number the score is multiplied
by.

But another, not costly, but guaranteed to work would be have your app
do a real-time get on the ID in parallel with the main query. Real-time
get is very, very cheap so the cost is minimal, especially if you fire the
two queries off in parallel. Then have the app put the response from
the RTG first if there is one. You’d have to be a bit careful not to repeat
the doc if it happened to come back in the main query. Come to think, that
would probably be cheaper overall than boosting.

Best,
Erick

> On May 25, 2020, at 3:38 PM, Steven White <[hidden email]> wrote:
>
> Thanks Erick.
>
> OR'ing ID:"MOD00002012A"^1000 with the original query will not always
> guarantee that the record with the matching ID will be the #1 hit on the
> list, or will it?
>
> Also, why did you boost by a factor of 1000?  I never figured out what the
> number means for boosting.  I have seen 10, 100, 500, etc used and even
> fractions such as 0.2.  Can  you shed some light on this?
>
> Thanks for explaining the difference between "fq" and "dq".
>
> Steven
>
> On Mon, May 25, 2020 at 1:33 PM Erick Erickson <[hidden email]>
> wrote:
>
>> Try something like q=whatever OR q=id:whatever^1000
>>
>> I’d put it in quotes for the id= clause, and do look at what the parsed
>> query looks like when you specify &debug=query. The reason I
>> recommend this is you’ll no doubt try something like
>> q=id:download MOD00002012A manual
>> witout quotes and be very surprised because it parses as
>> q=id:download OR _ALL_FIELDS_: MOD00002012A OR _ALL_FIELDS_:manual
>>
>> Which brings us to the difference between df and qf. df stand for “default
>> field”
>> and is what fields the query is sent against if nothing is specified, i.e.
>> q=blah will
>> parse as q=default_field:blah whereas, with edismax, qf is the query field
>> can be
>> plural, so you get something like qf=field1,field2 and the search is
>> performed on
>> both fields. Note that in the edismax  case, df is rarely used unless you
>> do
>> something like specify "qf=“, i.e. an empty qf list.
>>
>> Which also brings up whether you really want your _ALL_FIELDS_ field or
>> not.
>> Depending on how many fields you really have, you can just list them all
>> in
>> the qf parameter (either on the individual search or in solrconfig.xml)
>> and avoid
>> the _ALL_FIELDS_ field entirely. You can even boost individual fields
>> differently
>> by default.
>>
>> And, if you really want _ALL_FIELDS_, you may not need edismax.
>>
>> Best,
>> Erick
>>
>>> On May 25, 2020, at 1:15 PM, Steven White <[hidden email]> wrote:
>>>
>>> Hi everyone,
>>>
>>> I index my data from the DB into their own fields.  I then use copyField
>> to
>>> index the value of all the fields into _ALL_FIELDS_ that I created.  In
>> my
>>> edismax, I use _ALL_FIELDS_ for “df”.  Here is how my edismax looks like:
>>>
>>>
>>>
>>>   <lst name="defaults">
>>>     <str name="echoParams">explicit</str>
>>>     <str name="defType">edismax</str>
>>>     <str name="q.alt">*:*</str>
>>>     <str name="q.op">AND</str>
>>>     <str name="fl">score,_UNIQUE_FIELD_</str>
>>>     <str name="df">_ALL_FIELDS_</str>
>>>     <!-- <str name="qf">_ALL_FIELDS_</str> --> <!-- use "qf" or "df" -->
>>>
>>>     <str name="lowercaseOperators">false</str>
>>>   </lst>
>>>
>>> One of my fields that I index is called ID (every record I index has this
>>> field) and it holds the user-friendly record-ID that uniquely identifies
>>> that record (users are familiar with the value of this ID field).  The
>>> value of this ID field can be in any other record's field too but only 1
>>> record in the whole set has this value in the ID field.  An example of a
>>> value for ID is “MOD00002012A".
>>>
>>> My need is achieve both of those goals (A is a must but B is highly
>>> desired):
>>>
>>>
>>> A) If a user searches for just "MOD00002012A" (with our without quotes) I
>>> want the record that matches this value in the ID field to be the first
>>> item in the hit result regardless where else this term may also exist in
>>> other records: the record with ID "MOD00002012A" must be hit #1 on the
>> list.
>>>
>>> B) If a user searches for "MOD00002012A manual" or "download MOD00002012A
>>> manual" or "factory warranty for MOD00002012A", etc. (without quotes) I
>>> want the record that matches this value in the ID field to be the first
>>> item in the hit result regardless where else this term may also exist in
>>> other records: the record with ID "MOD00002012A" must be hit #1 on the
>> list.
>>>
>>> How can I achieve A and B?
>>>
>>>
>>>
>>> Of course, if “MOD00002012A” does not match in the ID field but matches
>> in
>>> another field, then I need the normal search / hit / ranking to happen.
>>>
>>>
>>> As a side question, what should I be using, "qf" or "df"?  I cannot
>> figure
>>> out the difference between the 2 in Solr's doc.
>>>
>>> Thanks
>>>
>>> Steven
>>
>>