Unified highlighter

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Unified highlighter

Julien Massiera
Hi Solr community,

I would like some help with a strange behavior that I observe on the
unified highlighter.

Here is the configuration of my highlighter :

<str name="hl">on</str>
<str name="hl.method">unified</str>
<str name="hl.defaultSummary">false</str>
<str name="hl.tag.pre">&lt;span class="em"&gt;</str>
<str name="hl.tag.post">&lt;/span&gt;</str>
<str name="hl.fl">content_fr content_en exactContent</str>
<str name="hl.requireFieldMatch">true</str>
<str name="hl.bs.type">CHARACTER</str>
<str name="hl.encoder">html</str>
<str name="hl.fragsize">200</str>
<str name="hl.maxAnalyzedChars">51200</str>


I indexed some html documents from the www.datafari.com website.

The problem is that on some documents (not all), there is not enough
"context" wrapping the found search terms.

For example, by searching "France labs", here is the highlighting
obtained for a certain document:

"content_en":["<span class=\"em\">France</span>&#32;<span
class=\"em\">Labs</span>"]

Now, if I perform the same query but with the hl.bs.type set to SENTENCE
instead of CHARACTER, I obtain the following highlighting for the same
document :

"content_en":["Trusted&#32;by&#32;About&#32;Contact&#32;Home&#32;Migrating&#32;GSA&#32;&#169;&#32;2018&#32;Datafari&#32;by&#32;<span
class=\"em\">France</span>&#32;<span class=\"em\">Labs</span>"]

This is way better but I strongly prefer using the WORD or CHARACTER
types because highlighting can be too big with the SENTENCE or LINE
types, depending on the indexed documents.

I tried to change the hl.bs.type to WORD or either to increase the
hl.fragsize up to 1000, but with any other hl.bs.type than SENTENCE or
LINE, the highlighting is limited to the found words only, which is not
enough for what I need.

Is there something I am missing with the configuration ? For infos, I am
using Solr 6.6.4.

Thanks for your help.

Julien