Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

howed

Hi all,

 

We are using Solr for indexing address data and one of the fields that we have contains the locality (e.g. suburb, town) with synonyms for the surrounding localities.  This has to handle multi-word synonyms as the original locality may have one word but the surrounding locality may contain two words.  We have found that when we have a large number of surrounding localities, the highlighting breaks with the exception:

 

"error":{

    "metadata":[

      "error-class","org.apache.solr.common.SolrException",

     "root-error-class","org.apache.lucene.search.highlight.InvalidTokenOffsetsException"],

    "msg":"org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token wail exceeds length of provided text sized 258",

    "trace":"org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token wail exceeds length of provided text sized 258\n\tat org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:648)\n\tat org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingOfField(DefaultSolrHighlighter.java:480)\n\tat

 

This has only just started happening when we realised that the default length on a Solr text field is 256 characters and not everything was being indexed, so we increased the length using maxTokenLength on the StandardTokenizerFactory.  Prior to this, only a limited number of surrounding localities were being processed but highlighting was working with no errors.  When we increased the length so that we got all surrounding localities loaded, this error started happening without us making any other changes and running an automated test suite.

 

These surrounding localities are stored in a database, so we have written our own token filter to handle building the synonyms.  When we build the index, the localties are in a token that looks like:

 

lcx__balmoral__cannum__clear_lake__lower_norton

 

so the StandardTokenizer keeps this as a single token.  Our filter looks for tokens that start with “lcx__” and then creates synonyms from the following data.  For the above, we end up with tokens being output as follows:

 

Position 1                         Position 2

balmoral                           lake

cannum (SYNONYM)      norton (SYNONYM)

clear (SYNONYM)

clearlake (SYNONYM)

lower (SYNONYM)

lowernorton (SYNONYM)

 

As you can see, we also combine two words into one word as a synonym as well.  I have attached the full output from the Solr analyser for this example below this email.  The definition of the field type for this field is:

 

  curl -X POST -H 'Content-type:application/json' --data-binary '{

    "add-field-type" : {

      "name":"localitySynonymType2",

      "class":"solr.TextField",

      "indexAnalyzer": {

        "tokenizer":{

            "class":"solr.StandardTokenizerFactory",

            "maxTokenLength": 4000

          },

          "filters": [

              {

                "class":"solr.LowerCaseFilterFactory"

              },

              {

                "class":"au.com.auspost.postal.ame.solr.LocalityTokenFilterFactory"

              }

          ]

      },

      "queryAnalyzer":{

          "tokenizer":{

            "class":"solr.StandardTokenizerFactory"

          },

          "filters": [

              {

                "class":"solr.LowerCaseFilterFactory"

              }

          ]

      }

    }

  }' http://localhost:8983/solr/address/schema

 

  echo "Creating surroundingLocalityNamesSynonym field"

    curl -X POST -H 'Content-type:application/json' --data-binary '{

      "add-field":{

        "name":"surroundingLocalityNamesSynonym",

        "type":"localitySynonymType2",

        "stored":true,

        "indexed":true

      }

    }' http://localhost:8983/solr/address/schema

 

The term that is mention in the error message above is “wail” which is the 48th locality in the list.  On another test it is the 64th locality in the list, so I think it has something to do with the length of the synonyms (as evidenced by the fact that if we remove the maxTokenLength from the StandardTokenizerFactory for these fields then everything goes back to working).  It also appears to work without problem for addresses when the locality list is short.

 

I’m not sure where the length of 258 is coming from in the error message, as it doesn’t match up with anything that I can see.

 

I have attached the full analysis for one of the data values that is causing the problem.  In building our token filter, I have tried to follow what the standard Solr synonym filter produces as an example but I may have missed something.

 

Does anybody have any ideas about what might be causing this?

 

Thanks,

 

David

 

 

Field Value (Index)

lcx__balmoral__cannum__clear_lake__lower_norton

 

text

raw_bytes

start

end

positionLength

type

termFrequency

position

keyword

balmoral

[62 61 6c 6d 6f 72 61 6c]

0

47

1

<ALPHANUM>

1

1

false

 

cannum

[63 61 6e 6e 75 6d]

0

47

1

SYNONYM

1

1

false

 

clear

[63 6c 65 61 72]

0

47

1

SYNONYM

1

1

false

 

clearlake

[63 6c 65 61 72 6c 61 6b 65]

0

47

1

SYNONYM

1

1

false

 

lower

[6c 6f 77 65 72]

0

47

1

SYNONYM

1

1

false

 

lowernorton

[6c 6f 77 65 72 6e 6f 72 74 6f 6e]

0

47

1

SYNONYM

1

1

false

 

ake

[6c 61 6b 65]

0

47

1

<ALPHANUM>

1

2

false

 

norton

[6e 6f 72 74 6f 6e]

0

47

1

SYNONYM

1

2

false

 


David Howe
Java Domain Architect
Postal Systems
Australia Post

Level 16, 111 Bourke Street Melbourne VIC 3000

T  
0391067904

M
0424036591

E  
[hidden email]
Australia Post website
StarTrack website
Follow us on Twitter Like us on Facebook Connect with us on LinkedIn

----------------------

Start your free trial today


Australia Post is committed to providing our customers with excellent service. If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, confidential or legally professionally privileged. It is intended exclusively for the individual or entity to which it is addressed. You should only read, disclose, re-transmit, copy, distribute, act in reliance on or commercialise the information if you are authorised to do so. Australia Post does not represent, warrant or guarantee that the integrity of this email communication has been maintained nor that the communication is free of errors, virus or interference.

If you are not the addressee or intended recipient please notify us by replying direct to the sender and then destroy any electronic or paper copy of this message. Any views expressed in this email communication are taken to be those of the individual sender, except where the sender specifically attributes those views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.

SolrAnalysis.xlsx (20K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

howed

To try something else, I added in a FlattenGraphFilterFactory after my custom token filter, even though I didn’t think I needed it:

 

  curl -X POST -H 'Content-type:application/json' --data-binary '{

    "add-field-type" : {

      "name":"localitySynonymType2",

      "class":"solr.TextField",

      "indexAnalyzer": {

        "tokenizer":{

            "class":"solr.StandardTokenizerFactory",

            "maxTokenLength": 4000

          },

          "filters": [

              {

                "class":"solr.LowerCaseFilterFactory"

              },

              {

                "class":"au.com.auspost.postal.ame.solr.LocalityTokenFilterFactory"

              },

              {

                "class":"solr.FlattenGraphFilterFactory"

              }

          ]

      },

      "queryAnalyzer":{

          "tokenizer":{

            "class":"solr.StandardTokenizerFactory"

          },

          "filters": [

              {

                "class":"solr.LowerCaseFilterFactory"

              }

          ]

      }

    }

  }' http://localhost:8983/solr/address/schema

 

This appears to make no difference when looking at the results in the analyser (see attached), but for the same Solr query as before the error message has now changed to:

 

2018-03-07 21:36:25.122 ERROR (qtp257895351-19) [   x:address] o.a.s.s.HttpSolrCall null:java.lang.IndexOutOfBoundsException: Index: 14, Size: 14

        at java.util.ArrayList.rangeCheck(ArrayList.java:657)

        at java.util.ArrayList.get(ArrayList.java:433)

        at org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:204)

        at org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)

        at org.apache.solr.highlight.TokenOrderingFilter.incrementToken(DefaultSolrHighlighter.java:813)

        at org.apache.lucene.search.highlight.OffsetLimitTokenFilter.incrementToken(OffsetLimitTokenFilter.java:42)

        at org.apache.lucene.analysis.CachingTokenFilter.fillCache(CachingTokenFilter.java:91)

        at org.apache.lucene.analysis.CachingTokenFilter.incrementToken(CachingTokenFilter.java:70)

        at org.apache.lucene.index.memory.MemoryIndex.storeTerms(MemoryIndex.java:583)

        at org.apache.lucene.index.memory.MemoryIndex.addField(MemoryIndex.java:474)

        at org.apache.lucene.index.memory.MemoryIndex.addField(MemoryIndex.java:452)

        at org.apache.lucene.index.memory.MemoryIndex.addField(MemoryIndex.java:431)

       at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getLeafContext(WeightedSpanTermExtractor.java:402)

        at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extractWeightedTerms(WeightedSpanTermExtractor.java:361)

        at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:140)

        at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:107)

        at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:154)

        at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.extract(WeightedSpanTermExtractor.java:111)

        at org.apache.lucene.search.highlight.WeightedSpanTermExtractor.getWeightedSpanTerms(WeightedSpanTermExtractor.java:510)

        at org.apache.lucene.search.highlight.QueryScorer.initExtractor(QueryScorer.java:218)

        at org.apache.lucene.search.highlight.QueryScorer.init(QueryScorer.java:186)

       at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:201)

        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:636)

        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingOfField(DefaultSolrHighlighter.java:480)

        at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:442)

        at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:176)

        at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:295)

        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:177)

        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2503)

        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:710)

        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:516)

        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:382)

        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:326)

        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1751)

        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:582)

        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)

        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)

        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)

        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1180)

        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:512)

        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)

        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1112)

        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)

        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)

        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)

        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

        at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)

        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

        at org.eclipse.jetty.server.Server.handle(Server.java:534)

        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:320)

        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:251)

        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:283)

        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:108)

        at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)

        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.executeProduceConsume(ExecuteProduceConsume.java:303)

        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceConsume(ExecuteProduceConsume.java:148)

        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:136)

        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:671)

        at org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:589)

        at java.lang.Thread.run(Thread.java:748)

 

Not sure if this helps or not.  I don’t understand why it is doing anything with the flatten graph filter on a query when this was specified as an index filter only.

 

Regards,

 

David

 


David Howe
Java Domain Architect
Postal Systems
Australia Post

Level 16, 111 Bourke Street Melbourne VIC 3000

T  
0391067904

M
0424036591

E  
[hidden email]
Australia Post website
StarTrack website
Follow us on Twitter Like us on Facebook Connect with us on LinkedIn
 

From: Howe, David [mailto:[hidden email]]
Sent: Wednesday, 7 March 2018 3:26 PM
To: [hidden email]
Subject: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

 

Hi all,

 

We are using Solr for indexing address data and one of the fields that we have contains the locality (e.g. suburb, town) with synonyms for the surrounding localities.  This has to handle multi-word synonyms as the original locality may have one word but the surrounding locality may contain two words.  We have found that when we have a large number of surrounding localities, the highlighting breaks with the exception:

 

"error":{

    "metadata":[

      "error-class","org.apache.solr.common.SolrException",

     "root-error-class","org.apache.lucene.search.highlight.InvalidTokenOffsetsException"],

    "msg":"org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token wail exceeds length of provided text sized 258",

    "trace":"org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token wail exceeds length of provided text sized 258\n\tat org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingByHighlighter(DefaultSolrHighlighter.java:648)\n\tat org.apache.solr.highlight.DefaultSolrHighlighter.doHighlightingOfField(DefaultSolrHighlighter.java:480)\n\tat

 

This has only just started happening when we realised that the default length on a Solr text field is 256 characters and not everything was being indexed, so we increased the length using maxTokenLength on the StandardTokenizerFactory.  Prior to this, only a limited number of surrounding localities were being processed but highlighting was working with no errors.  When we increased the length so that we got all surrounding localities loaded, this error started happening without us making any other changes and running an automated test suite.

 

These surrounding localities are stored in a database, so we have written our own token filter to handle building the synonyms.  When we build the index, the localties are in a token that looks like:

 

lcx__balmoral__cannum__clear_lake__lower_norton

 

so the StandardTokenizer keeps this as a single token.  Our filter looks for tokens that start with “lcx__” and then creates synonyms from the following data.  For the above, we end up with tokens being output as follows:

 

Position 1                         Position 2

balmoral                           lake

cannum (SYNONYM)      norton (SYNONYM)

clear (SYNONYM)

clearlake (SYNONYM)

lower (SYNONYM)

lowernorton (SYNONYM)

 

As you can see, we also combine two words into one word as a synonym as well.  I have attached the full output from the Solr analyser for this example below this email.  The definition of the field type for this field is:

 

  curl -X POST -H 'Content-type:application/json' --data-binary '{

    "add-field-type" : {

      "name":"localitySynonymType2",

      "class":"solr.TextField",

      "indexAnalyzer": {

        "tokenizer":{

            "class":"solr.StandardTokenizerFactory",

            "maxTokenLength": 4000

          },

          "filters": [

              {

                "class":"solr.LowerCaseFilterFactory"

              },

              {

                "class":"au.com.auspost.postal.ame.solr.LocalityTokenFilterFactory"

              }

          ]

      },

      "queryAnalyzer":{

          "tokenizer":{

            "class":"solr.StandardTokenizerFactory"

          },

          "filters": [

              {

                "class":"solr.LowerCaseFilterFactory"

              }

          ]

      }

    }

  }' http://localhost:8983/solr/address/schema

 

  echo "Creating surroundingLocalityNamesSynonym field"

    curl -X POST -H 'Content-type:application/json' --data-binary '{

      "add-field":{

        "name":"surroundingLocalityNamesSynonym",

        "type":"localitySynonymType2",

        "stored":true,

        "indexed":true

      }

    }' http://localhost:8983/solr/address/schema

 

The term that is mention in the error message above is “wail” which is the 48th locality in the list.  On another test it is the 64th locality in the list, so I think it has something to do with the length of the synonyms (as evidenced by the fact that if we remove the maxTokenLength from the StandardTokenizerFactory for these fields then everything goes back to working).  It also appears to work without problem for addresses when the locality list is short.

 

I’m not sure where the length of 258 is coming from in the error message, as it doesn’t match up with anything that I can see.

 

I have attached the full analysis for one of the data values that is causing the problem.  In building our token filter, I have tried to follow what the standard Solr synonym filter produces as an example but I may have missed something.

 

Does anybody have any ideas about what might be causing this?

 

Thanks,

 

David

 

 

Field Value (Index)

lcx__balmoral__cannum__clear_lake__lower_norton

 

text

raw_bytes

start

end

positionLength

type

termFrequency

position

keyword

balmoral

[62 61 6c 6d 6f 72 61 6c]

0

47

1

<ALPHANUM>

1

1

false

 

cannum

[63 61 6e 6e 75 6d]

0

47

1

SYNONYM

1

1

false

 

clear

[63 6c 65 61 72]

0

47

1

SYNONYM

1

1

false

 

clearlake

[63 6c 65 61 72 6c 61 6b 65]

0

47

1

SYNONYM

1

1

false

 

lower

[6c 6f 77 65 72]

0

47

1

SYNONYM

1

1

false

 

lowernorton

[6c 6f 77 65 72 6e 6f 72 74 6f 6e]

0

47

1

SYNONYM

1

1

false

 

ake

[6c 61 6b 65]

0

47

1

<ALPHANUM>

1

2

false

 

norton

[6e 6f 72 74 6f 6e]

0

47

1

SYNONYM

1

2

false

 

 

David Howe
Java Domain Architect
Postal Systems
Australia Post

Level 16, 111 Bourke Street Melbourne VIC 3000

T  
0391067904

M
0424036591

E  
[hidden email]

Australia Post website

StarTrack website

Follow us on Twitter Like us on Facebook Connect with us on LinkedIn


----------------------

Start your free trial today


Australia Post is committed to providing our customers with excellent service. If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, confidential or legally professionally privileged. It is intended exclusively for the individual or entity to which it is addressed. You should only read, disclose, re-transmit, copy, distribute, act in reliance on or commercialise the information if you are authorised to do so. Australia Post does not represent, warrant or guarantee that the integrity of this email communication has been maintained nor that the communication is free of errors, virus or interference.

If you are not the addressee or intended recipient please notify us by replying direct to the sender and then destroy any electronic or paper copy of this message. Any views expressed in this email communication are taken to be those of the individual sender, except where the sender specifically attributes those views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.

Australia Post is committed to providing our customers with excellent service. If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, confidential or legally professionally privileged. It is intended exclusively for the individual or entity to which it is addressed. You should only read, disclose, re-transmit, copy, distribute, act in reliance on or commercialise the information if you are authorised to do so. Australia Post does not represent, warrant or guarantee that the integrity of this email communication has been maintained nor that the communication is free of errors, virus or interference.

If you are not the addressee or intended recipient please notify us by replying direct to the sender and then destroy any electronic or paper copy of this message. Any views expressed in this email communication are taken to be those of the individual sender, except where the sender specifically attributes those views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.

SolrAnalysisFlattenGraph.xlsx (27K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

Rick Leir-2
In reply to this post by howed
David
When you have "lcx__balmoral__cannum__clear_lake__lower_norton" in a field, would you search for *cannum* ? That might not perform well.
Why not have a multivalue field for this information?

It could be that you have a good reason for this, and I just do not understand.
Cheers -- Rick
--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

howed

Hi Rick,

Thanks for your response.  The reason that we do it like this is that the localities are also part of another indexed field that contains the entire address.  We actually do the search over that field, and we are only using the highlighting on the problematic field so that we can tell which parts of the address that we matched to.  We never search for wildcards like "*cannum*".

As an example, we might have an address that we index which is "19 some st cannum vic 3456".  When we index the address, we actually index the text "19 some st lcx__balmoral__cannum__clear_lake__lower_norton vic 3456" into a Solr field that has our custom synonym filter.  This then causes the synonyms for the locality "cannum" to be generated, and if we search for "19 some st balmoral" we will still get a match on the locality component of the address.  Using this method, the searching for addresses is working fine.

We have a requirement once we have a match to know which part of the address that we matched to, which is where the highlighting comes in.  By loading just the locality part of the address into a separate field and applying the same synonym filter, through the highlighting we can see if we get a hit on the locality.  We do this with the other components of the address, like the number, the street name, the street type, the post code etc. so that we can return to the caller what bits of their input matched to the address we are returning.

I could load them as a multi-valued field for just the highlighting, but that means I need to extract them in a different format to what I am using for the whole address which I would like to avoid if possible.  We are loading these addresses from a database table using the data import handler.

Regards,

David

David Howe
Java Domain Architect
Postal Systems
Level 16, 111 Bourke Street Melbourne VIC 3000

T  0391067904

M  0424036591

E  [hidden email]

W  auspost.com.au
W  startrack.com.au

Australia Post is committed to providing our customers with excellent service. If we can assist you in any way please telephone 13 13 18 or visit our website.

The information contained in this email communication may be proprietary, confidential or legally professionally privileged. It is intended exclusively for the individual or entity to which it is addressed. You should only read, disclose, re-transmit, copy, distribute, act in reliance on or commercialise the information if you are authorised to do so. Australia Post does not represent, warrant or guarantee that the integrity of this email communication has been maintained nor that the communication is free of errors, virus or interference.

If you are not the addressee or intended recipient please notify us by replying direct to the sender and then destroy any electronic or paper copy of this message. Any views expressed in this email communication are taken to be those of the individual sender, except where the sender specifically attributes those views to Australia Post and is authorised to do so.

Please consider the environment before printing this email.
Reply | Threaded
Open this post in threaded view
|

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

Rick Leir-2
David
Yes, highlighting is tricky, especially with synonyms. Sorry, I would need to see a bit more of your config before saying more about it.
Thanks -- Rick
--
Sorry for being brief. Alternate email is rickleir at yahoo dot com
Reply | Threaded
Open this post in threaded view
|

RE: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

howed
Finally got back to looking at this, and found that the solution was to
switch to the  unified
<https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter>  
highlighter which doesn't seem to have the same problem with my complex
synonyms.  This required some tweaking of the highlighting parameters and my
code as it doesn't highlight exactly the same as the default highlighter,
but all is working now.

Thanks again for the assistance.

David



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Highlighter throwing InvalidTokenOffsetsException for field with large number of synonyms

david.w.smiley@gmail.com
Yay!  I'm glad the UnifiedHighlighter is serving you well.  I was about to
suggest it.  If you think the fragmentation/snippeting could be improved in
a general way then post a JIRA for consideration.  Note: identical results
with the original Highlighter is a non-goal.

On Mon, Apr 23, 2018 at 10:14 PM howed <[hidden email]> wrote:

> Finally got back to looking at this, and found that the solution was to
> switch to the  unified
> <
> https://lucene.apache.org/solr/guide/7_2/highlighting.html#choosing-a-highlighter>
>
> highlighter which doesn't seem to have the same problem with my complex
> synonyms.  This required some tweaking of the highlighting parameters and
> my
> code as it doesn't highlight exactly the same as the default highlighter,
> but all is working now.
>
> Thanks again for the assistance.
>
> David
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
--
Lucene/Solr Search Committer, Consultant, Developer, Author, Speaker
LinkedIn: http://linkedin.com/in/davidwsmiley | Book:
http://www.solrenterprisesearchserver.com