[jira] [Resolved] (LUCENE-8933) JapaneseTokenizer creates Token objects with corrupt offsets

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (LUCENE-8933) JapaneseTokenizer creates Token objects with corrupt offsets

Shalin Shekhar Mangar (Jira)

     [ https://issues.apache.org/jira/browse/LUCENE-8933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tomoko Uchida resolved LUCENE-8933.
-----------------------------------
       Resolution: Fixed
         Assignee: Tomoko Uchida
    Fix Version/s: 8.3
                   master (9.0)

I merged the PRs, one for master and one for 8.x.

> JapaneseTokenizer creates Token objects with corrupt offsets
> ------------------------------------------------------------
>
>                 Key: LUCENE-8933
>                 URL: https://issues.apache.org/jira/browse/LUCENE-8933
>             Project: Lucene - Core
>          Issue Type: Bug
>            Reporter: Adrien Grand
>            Assignee: Tomoko Uchida
>            Priority: Minor
>             Fix For: master (9.0), 8.3
>
>          Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> An Elasticsearch user reported the following stack trace when parsing synonyms. It looks like the only reason why this might occur is if the offset of a {{org.apache.lucene.analysis.ja.Token}} is not within the expected range.
>  
> {noformat}
> Caused by: java.lang.ArrayIndexOutOfBoundsException
>     at org.apache.lucene.analysis.tokenattributes.CharTermAttributeImpl.copyBuffer(CharTermAttributeImpl.java:44) ~[lucene-core-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:20]
>     at org.apache.lucene.analysis.ja.JapaneseTokenizer.incrementToken(JapaneseTokenizer.java:486) ~[?:?]
>     at org.apache.lucene.analysis.synonym.SynonymMap$Parser.analyze(SynonymMap.java:318) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
>     at org.elasticsearch.index.analysis.ESSolrSynonymParser.analyze(ESSolrSynonymParser.java:57) ~[elasticsearch-6.6.1.jar:6.6.1]
>     at org.apache.lucene.analysis.synonym.SolrSynonymParser.addInternal(SolrSynonymParser.java:114) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
>     at org.apache.lucene.analysis.synonym.SolrSynonymParser.parse(SolrSynonymParser.java:70) ~[lucene-analyzers-common-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:44:48]
>     at org.elasticsearch.index.analysis.SynonymTokenFilterFactory.buildSynonyms(SynonymTokenFilterFactory.java:154) ~[elasticsearch-6.6.1.jar:6.6.1]
>     ... 24 more
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]