Suggester highlighter offsets inaccurate

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Suggester highlighter offsets inaccurate

Timothy Hill
Hello,

I am using Solr 6.6's Suggester functionality to power an autosuggest
widget that returns lists of people's names.

One requirement that we have is that the suggester be
punctuation-insensitive. For example, entering:

'Dr Joh' should provide the suggestion 'Dr. John', despite the fact that
the user omitted the period after 'dr'.

'Hank Williams Jr' should provide the suggestion 'Hank Williams, Jr.'
despite the omission of both the comma and the period.

This functionality is present - but the punctuation-stripping appears to be
causing highlighting offsets to be miscalculated: we end up with '<b>Dr
Jo</b>hn' for the first query and '<b>Hank Williams, J</b>r.' for the second

Here's are the relevant parts of the solrconfig.xml and schema.xml
configurations:

<!-- solrconfig.xml -->
<searchComponent class="solr.SuggestComponent" name="suggestEntity">
<lst name="suggester">
<str name="name">suggestEntity</str>
<str name="lookupImpl">AnalyzingInfixLookupFactory</str>
<str name="dictionaryImpl">DocumentDictionaryFactory</str>
<str name="field">skos_prefLabel</str>
<str name="weightField">derived_score</str>
<str name="payloadField">payload</str>
<str name="suggestAnalyzerFieldType">suggestType</str>
<str name="minPrefixChars">2</str>
<str name="buildOnStartup">false</str>
<str name="buildOnCommit">false</str>
<str name="buildOnOptimize">true</str>
<str name="contextField">suggest_filters</str>
</lst>
</searchComponent>
<requestHandler class="org.apache.solr.handler.component.SearchHandler"
startup="lazy" name="/suggestEntity">
<lst name="defaults">
<str name="suggest">true</str>
<str name="suggest.highlight">true</str>
<str name="suggest.count">10</str>
<str name="suggest.dictionary">suggestEntity</str>
</lst>
<arr name="components">
<str>suggestEntity</str>
</arr>
</requestHandler>

<!-- schema.xml -->
<fieldType name="suggestType" class="solr.TextField"
positionIncrementGap="100" termVectors="true" termPositions="true"
termOffsets="true" storeOffsetsWithPositions="true">
<analyzer>
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="[!'#%'()*+,-./:;=>?@[/]^{|}~]" replacement=""/>
<charFilter class="solr.MappingCharFilterFactory"
mapping="accent-map.txt"/>
<tokenizer class="solr.PatternTokenizerFactory" pattern="_"/>
<filter class="solr.LowerCaseFilterFactory"/>
</analyzer>
</fieldType>

As you can see from the schema.xml document, I've tried storing term
vectors, offsets, etc., but the Suggester highlighter doesn't seem to take
advantage of them.

Does anyone know what I'm doing wrong here? Or is this a bug in the
highlighter?

Thanks,

Tim Hill