Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

Richard Walker
I'm trying out the advice in the user guide
( https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations )
for using the unified highlighter.

I saw the note:
"This is definitely the fastest option for highlighting
wildcard queries on large text fields."

and decided to try this, namely:

* "set storeOffsetsWithPositions to true"
* "set termVectors to true but no other term vector
  related options on the field being highlighted"

I've set these options on two fields, but I now get an
exception during highlighting of the results of a phrase query.
(I'm not even testing with wildcards yet.)

Here's an extract of the schema before making the change:

  <field name="title" type="string" indexed="true" stored="true"/>
  <field name="fulltext" type="text_en_splitting" multiValued="true" indexed="true" stored="false"/>
  <copyField source="*" dest="fulltext"/>
  <field name="concept" type="string" multiValued="true" indexed="true" stored="true"/>
  <field name="concept_search" type="text_en_splitting" multiValued="true" indexed="true" stored="true"/>
  <copyField source="concept" dest="concept_search"/>

And here are the only two lines I changed:

  <field name="concept" type="string" termVectors="true" multiValued="true" storeOffsetsWithPositions="true" indexed="true" stored="true"/>
  <field name="concept_search" type="text_en_splitting" termVectors="true" multiValued="true" storeOffsetsWithPositions="true" indexed="true" stored="true"/>

Here's a sample minimal query that worked perfectly before making the change:

defType=edismax
q="space administration"
fl=id,title
qf=fulltext concept_search
hl=true
hl.method=unified
hl.fl=*

After making the change to the schema, I now get this exception in the Solr log:

o.a.s.s.HttpSolrCall null:java.lang.IllegalStateException: field "fulltext" was indexed without position data; cannot run PhraseQuery (phrase=fulltext:"space administr")
        at org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:446)
        at org.apache.lucene.search.PhraseWeight.lambda$matches$0(PhraseWeight.java:89)
        at org.apache.lucene.search.MatchesUtils.forField(MatchesUtils.java:101)
        at org.apache.lucene.search.PhraseWeight.matches(PhraseWeight.java:88)
        at org.apache.lucene.search.DisjunctionMaxQuery$DisjunctionMaxWeight.matches(DisjunctionMaxQuery.java:125)
        at org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:138)
        at org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
        at org.apache.lucene.search.uhighlight.TermVectorOffsetStrategy.getOffsetsEnum(TermVectorOffsetStrategy.java:49)
        at org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
        at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:639)
        at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:508)
        at org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
        at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
etc.

The response includes search results, but no highlighting information.

Of interest is that the exception is against the field "fulltext",
whose definition I _didn't_ change.

If I remove the "fulltext" field from qf, so that the query is now this:

defType=edismax
q="space administration"
fl=id,title
qf=concept_search
hl=true
hl.method=unified
hl.fl=*

the log now has this exception:

o.a.s.s.HttpSolrCall null:java.lang.IllegalStateException: field "concept_search" was indexed without position data; cannot run PhraseQuery (phrase=concept_search:"space administr")
        at org.apache.lucene.search.PhraseQuery$1.getPhraseMatcher(PhraseQuery.java:446)
        at org.apache.lucene.search.PhraseWeight.lambda$matches$0(PhraseWeight.java:89)
        at org.apache.lucene.search.MatchesUtils.forField(MatchesUtils.java:101)
        at org.apache.lucene.search.PhraseWeight.matches(PhraseWeight.java:88)
        at org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumsWeightMatcher(FieldOffsetStrategy.java:138)
        at org.apache.lucene.search.uhighlight.FieldOffsetStrategy.createOffsetsEnumFromReader(FieldOffsetStrategy.java:74)
        at org.apache.lucene.search.uhighlight.TermVectorOffsetStrategy.getOffsetsEnum(TermVectorOffsetStrategy.java:49)
        at org.apache.lucene.search.uhighlight.FieldHighlighter.highlightFieldForDoc(FieldHighlighter.java:76)
        at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFieldsAsObjects(UnifiedHighlighter.java:639)
        at org.apache.lucene.search.uhighlight.UnifiedHighlighter.highlightFields(UnifiedHighlighter.java:508)
        at org.apache.solr.highlight.UnifiedSolrHighlighter.doHighlighting(UnifiedSolrHighlighter.java:149)
        at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:171)
        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:298)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2566)
etc.

i.e., so I now get an error about the field that I _did_ change.

(I'm using Solr 8.1.1.)


Reply | Threaded
Open this post in threaded view
|

Re: Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

Richard Walker
On 22 Jul 2019, at 11:32 am, Richard Walker <[hidden email]> wrote:
> I'm trying out the advice in the user guide
> ( https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations )
> for using the unified highlighter.
>
> ...
> * "set storeOffsetsWithPositions to true"
> * "set termVectors to true but no other term vector
>  related options on the field being highlighted"
...

I completely forgot to mention that I also tried _just_:

> * "set storeOffsetsWithPositions to true"

i.e., without _also_ setting termVectors, and this _doesn't_
give the exception.

So it seems to be the _combination_ of:
* unified highlighter
* storeOffsetsWithPositions
* termVectors

that seems to be giving the exception.

Reply | Threaded
Open this post in threaded view
|

Re: Unified highlighter with storeOffsetsWithPositions and termVectors giving an exception

david.w.smiley@gmail.com
FWIW I tried this on the techproducts schema with a modification to the
name field, but did not see the issue.

I suspect you did not re-index after making these schema changes.  If you
did, then also check that the collection (or core) truly started fresh
(never had any previous schema) because if you tried it one way then merely
deleted/replaced the documents after changing the schema, then some
internal metadata in the underlying index data tends to persist.  I suspect
some of the options flipped here might stay sticky.

If that really isn't it, then you might suggest to me exactly how to
reproduce this from what Solr ships with, like the techproducts example
schema and dataset.

~ David


On Sun, Jul 21, 2019 at 10:07 PM Richard Walker <[hidden email]>
wrote:

> On 22 Jul 2019, at 11:32 am, Richard Walker <[hidden email]>
> wrote:
> > I'm trying out the advice in the user guide
> > (
> https://lucene.apache.org/solr/guide/8_1/highlighting.html#schema-options-and-performance-considerations
> )
> > for using the unified highlighter.
> >
> > ...
> > * "set storeOffsetsWithPositions to true"
> > * "set termVectors to true but no other term vector
> >  related options on the field being highlighted"
> ...
>
> I completely forgot to mention that I also tried _just_:
>
> > * "set storeOffsetsWithPositions to true"
>
> i.e., without _also_ setting termVectors, and this _doesn't_
> give the exception.
>
> So it seems to be the _combination_ of:
> * unified highlighter
> * storeOffsetsWithPositions
> * termVectors
>
> that seems to be giving the exception.
>
>