How to only highlight terms that caused the document to match

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

How to only highlight terms that caused the document to match

Bjarke Buur Mortensen
Hi list,

I'm having difficulties getting the solr highlighter to highlight only the
terms that actually caused the match. Let med explain:

Given a query "john OR (peter AND mary)"
and two documents:
"john is awesome and so is peter"
"peter is awesome and so is mary",

solr will highlight "peter" and "mary" in the second document, which is
expected.
However it will also highlight both 'john' and 'peter' in the first
document, even though peter requires that mary is present also.

Is there any way to improve this?

If I add debugQuery, the explain-block can easily tell me that the first
document matched because of john, giving it a score of 1, whereas the
second matched because of the presence of both peter and mary, giving it a
score of 2.

So somehow, the information is available, but not used by the highlighter.

Below, I have included a real world solr output to explain what I mean.

Thanks,
Bjarke


-----------------------------------

{
  "responseHeader":{
    "status":0,
    "QTime":12,
    "params":{
      "hl.snippets":"2",
      "q":"plejehjem*  OR (plejecentre* AND boliger*)",
      "defType":"lucene",
      "hl":"on",
      "fl":"doc_id,score",
      "fq":"doc_id:(0273-000545 OR 259531-2018)",
      "hl.method":"unified",
      "debugQuery":"on"}},
  "response":{"numFound":2,"start":0,"maxScore":3.0,"docs":[
      {
        "doc_id":"0273-000545",
        "score":3.0},
      {
        "doc_id":"259531-2018",
        "score":1.0}]
  },
  "highlighting":{
    "udbuddk-0273-000545":{
      "content_and_cpv_descriptions_da":["Beskrivelse\n-----------\n\nKonkurrenceudsættelsen
omfatter drift af følgende 2 <em>plejecentre</em>: \n·
Sandgårdsparken, Kjellerup, 40 <em>boliger</em>  \n·
Solgården, Sjørslev, 22 <em>boliger</em>  \nBeslutningen om at udsætte
driften af <em>plejecentre</em> for konkurrence er aftalt i den
politiske budgetaftale for 2015, der blev indgået i august 2014 mellem
alle byrådets partier undtagen Dansk Folkeparti og Enhedslisten.
\n”Ældre- og Handicapudvalget igangsætter en proces for
konkurrenceudsættelse af drift af ca. 72 <em>plejehjemspladser</em>.
",
        "85144100 Sygepleje på <em>plejehjem</em>"]},
    "TED-259531-2018":{
      "content_and_cpv_descriptions_da":["Morsø Kommune 41333014
Jernbanevej 7 Nykøbing M 7900 Birgitte Lund +45 99707017
[hidden email] https://permalink.mercell.com/87422227.aspx
http://www.morsoe.dk/ https://permalink.mercell.com/87422227.aspx
Mercell Danmark A/S Østre Stationsvej 33, Vestfløjen Odense C 5000
[hidden email] https://permalink.mercell.com/87422227.aspx
https://permalink.mercell.com/87422227.aspx Vikarydelser på
ældreområdet 773-2018-5278 Udbuddet omfatter hjemmeplejen og
<em>plejecentre</em> i Morsø Kommune. ",
        "85144100 Sygepleje på <em>plejehjem</em>"]}},
  "debug":{
    "rawquerystring":"plejehjem*  OR (plejecentre* AND boliger*)",
    "querystring":"plejehjem*  OR (plejecentre* AND boliger*)",
    "parsedquery":"content_and_cpv_descriptions_da:plejehjem*
(+content_and_cpv_descriptions_da:plejecentre*
+content_and_cpv_descriptions_da:boliger*)",
    "parsedquery_toString":"content_and_cpv_descriptions_da:plejehjem*
(+content_and_cpv_descriptions_da:plejecentre*
+content_and_cpv_descriptions_da:boliger*)",
    "explain":{
      "udbuddk-0273-000545":"\n3.0 = sum of:\n  1.0 =
content_and_cpv_descriptions_da:plejehjem*\n  2.0 = sum of:\n    1.0 =
content_and_cpv_descriptions_da:plejecentre*\n    1.0 =
content_and_cpv_descriptions_da:boliger*\n",
      "TED-259531-2018":"\n1.0 = sum of:\n  1.0 =
content_and_cpv_descriptions_da:plejehjem*\n"},
    "QParser":"LuceneQParser",
    "filter_queries":["doc_id:(0273-000545 OR 259531-2018)"],
    "parsed_filter_queries":["doc_id:0273-000545 doc_id:259531-2018"],
    "timing":{
      "time":12.0,
      "prepare":{
        "time":0.0,
        "query":{
          "time":0.0},
        "facet":{
          "time":0.0},
        "facet_module":{
          "time":0.0},
        "mlt":{
          "time":0.0},
        "highlight":{
          "time":0.0},
        "stats":{
          "time":0.0},
        "expand":{
          "time":0.0},
        "terms":{
          "time":0.0},
        "debug":{
          "time":0.0}},
      "process":{
        "time":11.0,
        "query":{
          "time":1.0},
        "facet":{
          "time":0.0},
        "facet_module":{
          "time":0.0},
        "mlt":{
          "time":0.0},
        "highlight":{
          "time":9.0},
        "stats":{
          "time":0.0},
        "expand":{
          "time":0.0},
        "terms":{
          "time":0.0},
        "debug":{
          "time":0.0}},
      "loadFieldValues":{
        "time":0.0}}}}