boost parameter produces garbage hits

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

boost parameter produces garbage hits

Webster Homer
Hi,

I am trying to understand how the boost (and bq) parameters are supposed to work.
My application searches our product schema and returns the best matches. To enable an exactish match on product name we created fields that are minimally tokenized (keyword tokenizer/lowercase). Now I want the search to boost results that match on those fields. I thought that either the boost or bq parameter would work. I found very few good examples of the boost parameter used on a query. A lot of permutations resulted in errors such as this:
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'ethyl alcohol'

I am using Solr 7.2 and the eDismax query parser.
I have gotten boost to work, sort of, it really changes the query results in a bad way. I'm sure that I'm doing something wrong. Here is an example of my boost parameter
boost=product(query({!edismax qf="search_en_p_pri_name_min search_en_root_name_min" v=$q boost=}, 0),10000)

When I search for "ethyl alcohol" products named "ethyl alcohol" come first, which is what I want. We have a range of ethyl alcohol products. Normally I expect to see "ethyl alcohol, pure" and "ethyl alcohol, dnatured" after the initial "ethyl alcohol" and I see this without the boost. With the boost I get "ethyl alcohol" with a score of, 3.87201088E8. The second hit is "Brilliant Cresyl blue" with a score of 0. All subsequent hits have a 0

Why are there any matches returned with a score of 0? Why are these hits with a 0 score being returned at all? Especially when more relevant matches are not being returned? I suspect that there is something wrong with my boost function, but it looks right. However if I take it and instead submit the function shown above as a bf parameter I get a syntax error:
bf=product(query({!edismax qf="search_en_p_pri_name_min search_en_root_name_min" v=$q bf=}),10000)
org.apache.solr.search.SyntaxError: Expected identifier at pos 23 str='product(query({!edismax'"

From the documentation I expected that the bf and boost parameters only differed as to how the result was boosted with boost being multiplicative and the bf being additive, but I cannot find an equivalent which actually works with the bf parameter.

The bq parameter doesn't throw an error, but it doesn't seem to have any effect in how the results are ordered.

What am I doing wrong? Why does the boost parameter return garbage hits with 0 score? What would work as a bf parameter function?

Reply | Threaded
Open this post in threaded view
|

RE: boost parameter produces garbage hits

Webster Homer
Looked at boost a bit more. The # of results remains the same whether the boost parameter is present or not. If it is present the behavior seems to be that if it matches a hit in the result, it does what I expect, however if it does not match the hit, what ends up in the result is completely unexpected with 0 relevancy.
It does appear that bq does what I want, but the behavior of boost seems like a bug. We use boost elsewhere and it works as we want, that use case does not involve using the query function though.

-----Original Message-----
From: Webster Homer <[hidden email]>
Sent: Thursday, April 18, 2019 12:16 PM
To: [hidden email]
Subject: boost parameter produces garbage hits

Hi,

I am trying to understand how the boost (and bq) parameters are supposed to work.
My application searches our product schema and returns the best matches. To enable an exactish match on product name we created fields that are minimally tokenized (keyword tokenizer/lowercase). Now I want the search to boost results that match on those fields. I thought that either the boost or bq parameter would work. I found very few good examples of the boost parameter used on a query. A lot of permutations resulted in errors such as this:
org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'ethyl alcohol'

I am using Solr 7.2 and the eDismax query parser.
I have gotten boost to work, sort of, it really changes the query results in a bad way. I'm sure that I'm doing something wrong. Here is an example of my boost parameter boost=product(query({!edismax qf="search_en_p_pri_name_min search_en_root_name_min" v=$q boost=}, 0),10000)

When I search for "ethyl alcohol" products named "ethyl alcohol" come first, which is what I want. We have a range of ethyl alcohol products. Normally I expect to see "ethyl alcohol, pure" and "ethyl alcohol, dnatured" after the initial "ethyl alcohol" and I see this without the boost. With the boost I get "ethyl alcohol" with a score of, 3.87201088E8. The second hit is "Brilliant Cresyl blue" with a score of 0. All subsequent hits have a 0

Why are there any matches returned with a score of 0? Why are these hits with a 0 score being returned at all? Especially when more relevant matches are not being returned? I suspect that there is something wrong with my boost function, but it looks right. However if I take it and instead submit the function shown above as a bf parameter I get a syntax error:
bf=product(query({!edismax qf="search_en_p_pri_name_min search_en_root_name_min" v=$q bf=}),10000)
org.apache.solr.search.SyntaxError: Expected identifier at pos 23 str='product(query({!edismax'"

From the documentation I expected that the bf and boost parameters only differed as to how the result was boosted with boost being multiplicative and the bf being additive, but I cannot find an equivalent which actually works with the bf parameter.

The bq parameter doesn't throw an error, but it doesn't seem to have any effect in how the results are ordered.

What am I doing wrong? Why does the boost parameter return garbage hits with 0 score? What would work as a bf parameter function?

Reply | Threaded
Open this post in threaded view
|

Re: boost parameter produces garbage hits

Walter Underwood
For your application, I would probably do everything with the qf and pf fields. Your minimally tokenized fields are better evidence for relevance, so weight them higher. Something like this, with phrase matches counting twice as much as word matches:

      <str name=“qf”>text_minimal^2 text_stem</str>
      <str name=“pf”>text_minimal^4 text_stem^2</str>

I most often use boost for popularity, almost always with this formula:

       <str name="boost">sum(log(sum(popularity,1)),1)</str>

If there is a chance that popularity might be negative, do this:

       <str name="boost">sum(log(sum(max(popularity,0),1)),1)</str>

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Apr 18, 2019, at 12:55 PM, Webster Homer <[hidden email]> wrote:
>
> Looked at boost a bit more. The # of results remains the same whether the boost parameter is present or not. If it is present the behavior seems to be that if it matches a hit in the result, it does what I expect, however if it does not match the hit, what ends up in the result is completely unexpected with 0 relevancy.
> It does appear that bq does what I want, but the behavior of boost seems like a bug. We use boost elsewhere and it works as we want, that use case does not involve using the query function though.
>
> -----Original Message-----
> From: Webster Homer <[hidden email]>
> Sent: Thursday, April 18, 2019 12:16 PM
> To: [hidden email]
> Subject: boost parameter produces garbage hits
>
> Hi,
>
> I am trying to understand how the boost (and bq) parameters are supposed to work.
> My application searches our product schema and returns the best matches. To enable an exactish match on product name we created fields that are minimally tokenized (keyword tokenizer/lowercase). Now I want the search to boost results that match on those fields. I thought that either the boost or bq parameter would work. I found very few good examples of the boost parameter used on a query. A lot of permutations resulted in errors such as this:
> org.apache.solr.search.SyntaxError: Infinite Recursion detected parsing query 'ethyl alcohol'
>
> I am using Solr 7.2 and the eDismax query parser.
> I have gotten boost to work, sort of, it really changes the query results in a bad way. I'm sure that I'm doing something wrong. Here is an example of my boost parameter boost=product(query({!edismax qf="search_en_p_pri_name_min search_en_root_name_min" v=$q boost=}, 0),10000)
>
> When I search for "ethyl alcohol" products named "ethyl alcohol" come first, which is what I want. We have a range of ethyl alcohol products. Normally I expect to see "ethyl alcohol, pure" and "ethyl alcohol, dnatured" after the initial "ethyl alcohol" and I see this without the boost. With the boost I get "ethyl alcohol" with a score of, 3.87201088E8. The second hit is "Brilliant Cresyl blue" with a score of 0. All subsequent hits have a 0
>
> Why are there any matches returned with a score of 0? Why are these hits with a 0 score being returned at all? Especially when more relevant matches are not being returned? I suspect that there is something wrong with my boost function, but it looks right. However if I take it and instead submit the function shown above as a bf parameter I get a syntax error:
> bf=product(query({!edismax qf="search_en_p_pri_name_min search_en_root_name_min" v=$q bf=}),10000)
> org.apache.solr.search.SyntaxError: Expected identifier at pos 23 str='product(query({!edismax'"
>
> From the documentation I expected that the bf and boost parameters only differed as to how the result was boosted with boost being multiplicative and the bf being additive, but I cannot find an equivalent which actually works with the bf parameter.
>
> The bq parameter doesn't throw an error, but it doesn't seem to have any effect in how the results are ordered.
>
> What am I doing wrong? Why does the boost parameter return garbage hits with 0 score? What would work as a bf parameter function?
>

Reply | Threaded
Open this post in threaded view
|

RE: boost parameter produces garbage hits

Baloo
In reply to this post by Webster Homer
To answer your question "Why does the boost parameter return garbage hits
with 0 score?"

>> Syntax for Solr's query function is query(subquery, default) it returns
>> the score for the given subquery, or the default value for documents not
>> matching the query. In your case for the documents where query is not
>> matched function becomes product(0,10000) which returns 0. And since
>> output of boost parameter is multiplicative to main query score, you main
>> score is also becoming zero.

Means if your documents are not matching boost query then their original
score is also getting ignored in this case. You can experiment with default
value other than zero.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html