phrase boosting by edismax

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

phrase boosting by edismax

Szűcs Roland
Hi all,

Context:
I use solr 8.4.1. A have a small database with books around 3500 documents.
I recognized that I can not search on the copyfield of all fields (author,
title, publisher, description) because description has different analyze
workflow than the others (it has stemming and stop word removals, the
others has not).

That's why I query the same search expression for multiple field. If I run
a query "József Attila", I get the following response (I echoed all the
parameters for clarity):
{status=0,QTime=52,params={facet.field=[category,
format],qt=/query,debug=true,spellcheck.dictionary=[default,
wordbreak],echoParams=all,indent=true,fl=title,author,publisher,description,category,price,stock,created,format,imageUrl,rows=40,version=2,q=title:"{"q":"József
Attila"}" category:"{"q":"József Attila"}" publisher:"{"q":"József
Attila"}" description:"{"q":"József Attila"}" author:"{"q":"József
Attila"}",defType=edismax,qf=title^10 author^7 category^5 publisher^3
description,spellcheck=on,pf=title^30 author^14 category^10 publisher^6
description^2,facet.mincount=1,facet=true,wt=javabin}}

When I checked in solrj debug mode what is the translated query by solr it
was not what I expected based on the documentation (
https://lucene.apache.org/solr/guide/8_4/the-extended-dismax-query-parser.html#using-slop
):
rawquerystring=title:"{"q":"József Attila"}" category:"{"q":"József
Attila"}" publisher:"{"q":"József Attila"}" description:"{"q":"József
Attila"}" author:"{"q":"József Attila"}",

querystring=title:"{"q":"József Attila"}" category:"{"q":"József Attila"}"
publisher:"{"q":"József Attila"}" description:"{"q":"József Attila"}"
author:"{"q":"József Attila"}",

parsedquery=+(DisjunctionMaxQuery(((publisher:q)^3.0 | description:q |
(title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0))
DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0)) category:{
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))
DisjunctionMaxQuery(((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0)) DisjunctionMaxQuery(((category::)^5.0))
DisjunctionMaxQuery(((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0))
DisjunctionMaxQuery(((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0))
DisjunctionMaxQuery(((category:})^5.0))) DisjunctionMaxQuery(((title:"q
józsef attila q józsef attila q józsef attila q józsef attila q józsef
attila")^30.0 | (author:"q józsef attila q józsef attila q józsef attila q
józsef attila q józsef attila")^14.0 | (publisher:"q józsef attila q józsef
attila q józsef attila q józsef attila q józsef attila")^6.0 |
(description:"q józsef attila q józsef attila q józsef attila q józsef
attila q józsef attila")^2.0)),parsedquery_toString=+(((publisher:q)^3.0 |
description:q | (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0)
((category::)^5.0) ((publisher:józsef)^3.0 | description:józsef |
(title:józsef)^10.0 | (category:József)^5.0 | (author:józsef)^7.0)
((publisher:attila)^3.0 | description:attila | (title:attila)^10.0 |
(category:Attila)^5.0 | (author:attila)^7.0) ((category:})^5.0) category:{
((publisher:q)^3.0 | description:q | (title:q)^10.0 | (category:q)^5.0 |
(author:q)^7.0) ((category::)^5.0) ((publisher:józsef)^3.0 |
description:józsef | (title:józsef)^10.0 | (category:József)^5.0 |
(author:józsef)^7.0) ((publisher:attila)^3.0 | description:attila |
(title:attila)^10.0 | (category:Attila)^5.0 | (author:attila)^7.0)
((category:})^5.0) ((publisher:q)^3.0 | description:q | (title:q)^10.0 |
(category:q)^5.0 | (author:q)^7.0) ((category::)^5.0)
((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 |
(category:József)^5.0 | (author:józsef)^7.0) ((publisher:attila)^3.0 |
description:attila | (title:attila)^10.0 | (category:Attila)^5.0 |
(author:attila)^7.0) ((category:})^5.0) ((publisher:q)^3.0 | description:q
| (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0) ((category::)^5.0)
((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 |
(category:József)^5.0 | (author:józsef)^7.0) ((publisher:attila)^3.0 |
description:attila | (title:attila)^10.0 | (category:Attila)^5.0 |
(author:attila)^7.0) ((category:})^5.0) ((publisher:q)^3.0 | description:q
| (title:q)^10.0 | (category:q)^5.0 | (author:q)^7.0) ((category::)^5.0)
((publisher:józsef)^3.0 | description:józsef | (title:józsef)^10.0 |
(category:József)^5.0 | (author:józsef)^7.0) ((publisher:attila)^3.0 |
description:attila | (title:attila)^10.0 | (category:Attila)^5.0 |
(author:attila)^7.0) ((category:})^5.0)) ((title:"q józsef attila q józsef
attila q józsef attila q józsef attila q józsef attila")^30.0 |* (author:"q
józsef attila q józsef attila q józsef attila q józsef attila q józsef
attila")^14.0 | (publisher:"q józsef attila q józsef attila q józsef attila
q józsef attila q józsef attila")^6.0 | (description:"q józsef attila q
józsef attila q józsef attila q józsef attila q józsef attila")^2.0)*

The highlighted part is weird. I have the following questions:
1. I did not find in the documentation what is the default value of the ps
parameter for dismax/edismax query parsers?
2. This is the "solr way" to search in multiple fields when copyfield
approach can not be applied due to the different analysis chains?
3. Why the highlighted part does not look like this:  (author:"józsef
attila")^14.0 | (publisher:"józsef attila")^6.0 | (description:"józsef
attila)^2.0)
4. How can I check the effective boosting impact of the second phrase query
generated by solr if I could not find it in the QueryResponse._debugMap?
Maybe it is due to the fact that the weird phrase was not found at all?

If you are aware of any good tutorial or you can help me in this use case I
appreciate it very much.

Thanks in advance,

Roland