Random possible analysis error due to IndexOutOfBoundsException caused by FlattenGraphFilter

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Random possible analysis error due to IndexOutOfBoundsException caused by FlattenGraphFilter

Contact IntraCherche
Hi,

We are using SOLR 7.4 and with one of our customers we are encountering
the same problem as described on StackOverflow
https://stackoverflow.com/questions/52783491/solr-indexing-error-possible-analysis-error 
(no solution provided).

 From our side what's happening is the following : while indexing
documents with SOLR 7.4 this exception appears randomly. It is very
annoying because it cannot be reproduced on our development machine and
we don't have access to customer's data. So far it has happened on pdf,
xls, or eml documents (but never the same document is involved). It
looks like the issues stems from /FlattenGraphFilter/ class and
specifically /restoreState(inputNode.tokens.get(inputNode.nextOut))/
The complete stack trace shows :

/2018-12-20 22:09:04.140 ERROR (qtp1990098664-22) [   x:EM]
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException:
Exception writing document id 0acb5bf3-4caa-4f31-ab8f-76a8ea7ce782 to
the index; possible analysis error.//
//    at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:246)//
//    at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:67)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:950)//
//    at
org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1168)//
//    at
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:633)//
//    at
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:475)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:75)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:118)//
//    at
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:55)//
//    at
org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:92)//
//    at
org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:98)//
//    at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:188)//
//    at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:144)//
//    at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:311)//
//    at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)//
//    at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:130)//
//    at
org.apache.solr.common.util.JavaBinCodec.readObject(JavaBinCodec.java:276)//
//    at
org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:256)//
//    at
org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:178)//
//    at
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:195)//
//    at
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:109)//
//    at
org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:55)//
//    at
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)//
//    at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:68)//
//    at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)//
//    at org.apache.solr.core.SolrCore.execute(SolrCore.java:2539)//
//    at
org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:709)//
//    at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:515)//
//    at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)//
//    at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)//
//    at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)//
//    at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)//
//    at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)//
//    at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)//
//    at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)//
//    at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)//
//    at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)//
//    at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)//
//    at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1253)//
//    at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)//
//    at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)//
//    at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)//
//    at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)//
//    at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1155)//
//    at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)//
//    at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)//
//    at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)//
//    at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)//
//    at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)//
//    at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)//
//    at org.eclipse.jetty.server.Server.handle(Server.java:531)//
//    at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)//
//    at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)//
//    at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)//
//    at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)//
//    at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)//
//    at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)//
//    at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)//
//    at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)//
//    at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)//
//    at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)//
//    at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:760)//
//    at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:678)//
//    at java.lang.Thread.run(Thread.java:748)//
//Caused by: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1//
//    at java.util.ArrayList.rangeCheck(ArrayList.java:657)//
//    at java.util.ArrayList.get(ArrayList.java:433)//
//    at
org.apache.lucene.analysis.core.FlattenGraphFilter.releaseBufferedToken(FlattenGraphFilter.java:204)//
//    at
org.apache.lucene.analysis.core.FlattenGraphFilter.incrementToken(FlattenGraphFilter.java:258)//
//    at
org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:738)//
//    at
org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:428)//
//    at
org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:392)//
//    at
org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:251)//
//    at
org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:494)//
//    at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1602)//
//    at
org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1594)//
//    at
org.apache.solr.update.DirectUpdateHandler2.updateDocument(DirectUpdateHandler2.java:982)//
//    at
org.apache.solr.update.DirectUpdateHandler2.updateDocOrDocValues(DirectUpdateHandler2.java:971)//
//    at
org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:348)//
//    at
org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:284)//
//    at
org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:234)//
//    ... 76 more/

contractions_fr.txt, stopwords_fr.txt are the stock ones,
synonyms_fr.txt contains :
voiture,renault,peugeot

The schema used in managed-schema is :

/<dynamicField name="*_txt_fra" type="text_fr"  indexed="true" 
stored="false"/>//
//    <fieldType name="text_fr" class="solr.TextField"
positionIncrementGap="100">//
//        <analyzer type="index">//
//            <tokenizer class="solr.WhitespaceTokenizerFactory"/>//
//            <!-- removes l', etc -->//
//            <filter class="solr.ElisionFilterFactory"
ignoreCase="true" articles="lang/contractions_fr.txt"/>//
//            <!-- Separates on hyphen numbers, ... -->//
//            <filter class="solr.WordDelimiterGraphFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"/> //
//            <filter class="solr.LowerCaseFilterFactory"/>//
//            <filter class="solr.StopFilterFactory" ignoreCase="true"
words="lang/stopwords_fr.txt" format="snowball" />//
//            <filter class="solr.FrenchLightStemFilterFactory"/>//
//
//            <filter class="solr.FlattenGraphFilterFactory" />//
//        </analyzer>//
//        <analyzer type="query">//
//            // query analyzer omitted for clearness sake//
//        </analyzer>//
//    </fieldType>/

As opposed to the StackOverflow question, we only use a single
FlattenGraphFilter in the end of the analyzer (as stated by
Michael McCandless-2
<http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=user_nodes&user=50893>in
http://lucene.472066.n3.nabble.com/SynonymFilterFactory-deprecated-td4360455.html 
"You need only one FlattenGraphFilter at the end of your analysis chain. ")

So where should we have a look at to prevent this exception from
happening  ?

Thank you very much in advance,

Best
Remi