Exception while integrating openNLP with Solr

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Exception while integrating openNLP with Solr

aruninfo100
Hi,

I am trying to integrate openNLP with Solr.

The fieldtype is :

      <fieldType name="open_nlp" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="opennlp/en-sent.bin"  tokenizerModel="opennlp/en-token.bin"/>
        <filter class="solr.OpenNLPFilterFactory" posTaggerModel="opennlp/en-pos-maxent.bin"/>
       <filter class="solr.OpenNLPLemmatizerFilterFactory" dictionary="opennlp/en-lemmatizer.txt"/>
      </analyzer>
    </fieldType>

en-lemmatizer.txt->The file has a size close to 5mb.
I am using the lemmatizer dictionary from below link:

https://raw.githubusercontent.com/richardwilly98/elasticsearch-opennlp-auto-tagging/master/src/main/resources/models/en-lemmatizer.dict
field schema:

<field name="opennlp_text" type="open_nlp" indexed="true" stored="true"/>

When I try to index I get the following error:

error :Error from server at http://localhost:8983/solr/star: Exception writing document id 578df0de-6adc-4ca2-9d5d-23c5c088f83a to the index; possible analysis error.

solr.log:


2017-03-22 00:03:42.477 INFO  (qtp1389647288-14) [   x:star] o.a.s.u.p.LogUpdateProcessorFactory [star]  webapp=/solr path=/update params={wt=javabin&version=2}{} 0 116
2017-03-22 00:03:42.478 ERROR (qtp1389647288-14) [   x:star] o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception writing document id 303e190b-b02c-4927-9669-733e76164f61 to the index; possible analysis error.
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:181)
        at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:939)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1094)
        at org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:720)
        at org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
        at org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
        at org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
        at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
        at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
        at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
        at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:274)
        at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
        at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:239)
        at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:157)
        at org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
        at org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
        at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
        at org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
        at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
        at org.eclipse.jetty.server.Server.handle(Server.java:518)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
        at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
        at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
        at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
        at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ArrayIndexOutOfBoundsException: 1
        at opennlp.tools.lemmatizer.SimpleLemmatizer.<init>(SimpleLemmatizer.java:46)
        at org.apache.lucene.analysis.opennlp.tools.NLPLemmatizerOp.<init>(NLPLemmatizerOp.java:28)
        at org.apache.lucene.analysis.opennlp.tools.OpenNLPOpsFactory.getLemmatizer(OpenNLPOpsFactory.java:129)
        at org.apache.lucene.analysis.opennlp.OpenNLPLemmatizerFilterFactory.create(OpenNLPLemmatizerFilterFactory.java:62)
        at org.apache.lucene.analysis.opennlp.OpenNLPLemmatizerFilterFactory.create(OpenNLPLemmatizerFilterFactory.java:46)
        at org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:91)
        at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
        at org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
        at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:176)
        at org.apache.lucene.document.Field.tokenStream(Field.java:570)
        at org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:708)
        at org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417)
        at org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373)
        at org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
        at org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
        at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1492)
        at org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
        at org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
        ... 64 more

Thanks  and Regards,
Arun
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

Markus Jelsma-2
Hello - there is an underlying SIOoBE causing you trouble:

        at java.lang.Thread.run(Thread.java:745)
*Caused by: java.lang.ArrayIndexOutOfBoundsException: 1*
        at
opennlp.tools.lemmatizer.SimpleLemmatizer.<init>(SimpleLemmatizer.java:46)

Regards,,
Marks
 
-----Original message-----

> From:aruninfo100 <[hidden email]>
> Sent: Wednesday 22nd March 2017 1:33
> To: [hidden email]
> Subject: Exception while integrating openNLP with Solr
>
> Hi,
>
> I am trying to integrate openNLP with Solr.
>
> The fieldtype is :
>
>       <fieldType name="open_nlp" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.OpenNLPTokenizerFactory"
> sentenceModel="opennlp/en-sent.bin"  tokenizerModel="opennlp/en-token.bin"/>
>         <filter class="solr.OpenNLPFilterFactory"
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>        <filter class="solr.OpenNLPLemmatizerFilterFactory"
> dictionary="opennlp/en-lemmatizer.txt"/>
>       </analyzer>
>     </fieldType>
>
> en-lemmatizer.txt->The file has a size close to 5mb.
> I am using the lemmatizer dictionary from below link:
>
> https://raw.githubusercontent.com/richardwilly98/elasticsearch-opennlp-auto-tagging/master/src/main/resources/models/en-lemmatizer.dict
> <https://raw.githubusercontent.com/richardwilly98/elasticsearch-opennlp-auto-tagging/master/src/main/resources/models/en-lemmatizer.dict>  
> field schema:
>
> <field name="opennlp_text" type="open_nlp" indexed="true" stored="true"/>
>
> When I try to index I get the following error:
>
> error :Error from server at http://localhost:8983/solr/star: Exception
> writing document id 578df0de-6adc-4ca2-9d5d-23c5c088f83a to the index;
> possible analysis error.
>
> solr.log:
>
>
> 2017-03-22 00:03:42.477 INFO  (qtp1389647288-14) [   x:star]
> o.a.s.u.p.LogUpdateProcessorFactory [star]  webapp=/solr path=/update
> params={wt=javabin&version=2}{} 0 116
> 2017-03-22 00:03:42.478 ERROR (qtp1389647288-14) [   x:star]
> o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Exception
> writing document id 303e190b-b02c-4927-9669-733e76164f61 to the index;
> possible analysis error.
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:181)
> at
> org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:68)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.AddSchemaFieldsUpdateProcessorFactory$AddSchemaFieldsUpdateProcessor.processAdd(AddSchemaFieldsUpdateProcessorFactory.java:335)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldNameMutatingUpdateProcessorFactory$1.processAdd(FieldNameMutatingUpdateProcessorFactory.java:74)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.FieldMutatingUpdateProcessor.processAdd(FieldMutatingUpdateProcessor.java:117)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:939)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:1094)
> at
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:720)
> at
> org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
> at
> org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
> at
> org.apache.solr.update.processor.AbstractDefaultValueUpdateProcessorFactory$DefaultValueUpdateProcessor.processAdd(AbstractDefaultValueUpdateProcessorFactory.java:93)
> at
> org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:274)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:239)
> at
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:157)
> at
> org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
> at
> org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
> at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
> at
> org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
> at
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
> at
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
> at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
> at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
> at
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
> at
> org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
> at
> org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
> at
> org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
> at
> org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
> at
> org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
> at
> org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
> at
> org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
> at
> org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
> at
> org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
> at
> org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
> at
> org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
> at org.eclipse.jetty.server.Server.handle(Server.java:518)
> at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
> at
> org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
> at
> org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
> at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
> at
> org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
> at
> org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
> at
> org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
> at java.lang.Thread.run(Thread.java:745)
> *Caused by: java.lang.ArrayIndexOutOfBoundsException: 1*
> at
> opennlp.tools.lemmatizer.SimpleLemmatizer.<init>(SimpleLemmatizer.java:46)
> at
> org.apache.lucene.analysis.opennlp.tools.NLPLemmatizerOp.<init>(NLPLemmatizerOp.java:28)
> at
> org.apache.lucene.analysis.opennlp.tools.OpenNLPOpsFactory.getLemmatizer(OpenNLPOpsFactory.java:129)
> at
> org.apache.lucene.analysis.opennlp.OpenNLPLemmatizerFilterFactory.create(OpenNLPLemmatizerFilterFactory.java:62)
> at
> org.apache.lucene.analysis.opennlp.OpenNLPLemmatizerFilterFactory.create(OpenNLPLemmatizerFilterFactory.java:46)
> at
> org.apache.solr.analysis.TokenizerChain.createComponents(TokenizerChain.java:91)
> at
> org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
> at
> org.apache.lucene.analysis.AnalyzerWrapper.createComponents(AnalyzerWrapper.java:101)
> at org.apache.lucene.analysis.Analyzer.tokenStream(Analyzer.java:176)
> at org.apache.lucene.document.Field.tokenStream(Field.java:570)
> at
> org.apache.lucene.index.DefaultIndexingChain$PerField.invert(DefaultIndexingChain.java:708)
> at
> org.apache.lucene.index.DefaultIndexingChain.processField(DefaultIndexingChain.java:417)
> at
> org.apache.lucene.index.DefaultIndexingChain.processDocument(DefaultIndexingChain.java:373)
> at
> org.apache.lucene.index.DocumentsWriterPerThread.updateDocument(DocumentsWriterPerThread.java:232)
> at
> org.apache.lucene.index.DocumentsWriter.updateDocument(DocumentsWriter.java:449)
> at
> org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1492)
> at
> org.apache.solr.update.DirectUpdateHandler2.doNormalUpdate(DirectUpdateHandler2.java:282)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc0(DirectUpdateHandler2.java:214)
> at
> org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:169)
> ... 64 more
>
> Thanks  and Regards,
> Arun
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Exception-while-integrating-openNLP-with-Solr-tp4326146.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

aruninfo100
Hi,

I was able to resolve the issue.But when I run the indexing process it is taking too long to index bigger documents and some times I get java heap memory exception.
How can I improve the performance while using dictionary lemmmatizers.

Thanks and Regards,
Arun
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

Markus Jelsma-2
In reply to this post by aruninfo100
Hello - you need to increase the heap to work around the out of memory exception. There is not much you can to do increase the indexing speed using OpenNLP.

Regards,
Markus
 
-----Original message-----

> From:aruninfo100 <[hidden email]>
> Sent: Wednesday 22nd March 2017 12:27
> To: [hidden email]
> Subject: RE: Exception while integrating openNLP with Solr
>
> Hi,
>
> I was able to resolve the issue.But when I run the indexing process it is
> taking too long to index bigger documents and some times I get java heap
> memory exception.
> How can I improve the performance while using dictionary lemmmatizers.
>
> Thanks and Regards,
> Arun
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Exception-while-integrating-openNLP-with-Solr-tp4326146p4326197.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

aruninfo100
Hi

I am really finding it difficult to index documents using openNLP lemmatizer.The indexing is taking too much time(including commit).Is there a way to optimize or increase the performance.
Also it will be helpful in knowing different opennlp lemmatizer implementations which are also  good performance based.

Thanks,
Arun
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

Markus Jelsma-2
In reply to this post by aruninfo100
Hi - We are not having large issues using OpenNLP for POS-tagging in Lucene. But you mention commits, a committing with or without POS payloads is hardly any different so  commits should be unaffected. Maybe you have another issue? Perhaps use a sampler to pinpoint the problem.

Markus

 
 
-----Original message-----

> From:aruninfo100 <[hidden email]>
> Sent: Wednesday 22nd March 2017 18:30
> To: [hidden email]
> Subject: RE: Exception while integrating openNLP with Solr
>
> Hi
>
> I am really finding it difficult to index documents using openNLP
> lemmatizer.The indexing is taking too much time(including commit).Is there a
> way to optimize or increase the performance.
> Also it will be helpful in knowing different opennlp lemmatizer
> implementations which are also  good performance based.
>
> Thanks,
> Arun
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Exception-while-integrating-openNLP-with-Solr-tp4326146p4326296.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

aruninfo100
This post was updated on .
Hi,
Thanks for the reply.

Kindly find  the filed type scghema i am using :

 <field name="opennlp_text" type="open_nlp" indexed="true" stored="true"/>
<copyField source="content" dest="opennlp_text"/>

Does the opennlp_text field be indexed="true"?

 <fieldType name="open_nlp" class="solr.TextField" positionIncrementGap="100">
      <analyzer>
        <tokenizer class="solr.OpenNLPTokenizerFactory" sentenceModel="opennlp/en-sent.bin"  tokenizerModel="opennlp/en-token.bin"/>
        <filter class="solr.OpenNLPFilterFactory" posTaggerModel="opennlp/en-pos-maxent.bin"/>
       <filter class="solr.OpenNLPLemmatizerFilterFactory" dictionary="opennlp/en-lemmatizer.txt"/>
      </analyzer>
    </fieldType>

Here the en-lemmatizer.txt is 7mb in size.Without lemmatization usually the whole indexing process takes on an average basis 2-3mts,but here it took more than 2hr to complete.Is the scenario related to the lemmatizer file.Even after compleetion of index,when i used solr admin UI to analyse the field ,I was unable to get any desired output.
Could you please guide me.

Thanks,
Arun
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

Markus Jelsma-2
In reply to this post by aruninfo100
Hi - We don't use that OpenNLP patch, nor do we use such kind of lemmatizer. We just rely on POS-tagging via a CharFilter with custom trained maxent models and it is fast enough.

So, do you really need that analyzer that is giving you a hard time? I don't know what that lemmatizer does but you can get a really fine search engine with POS-tagging alone, and that is fast enough.

My question now is, why do you need that patch? What do you intend to do with it? Maybe you can get what you need with simpler things than that patch.

Regards,
Markus
 
-----Original message-----

> From:aruninfo100 <[hidden email]>
> Sent: Wednesday 22nd March 2017 19:15
> To: [hidden email]
> Subject: RE: Exception while integrating openNLP with Solr
>
> Hi,
> Thanks for the reply.
>
> Kindly find  the filed type scghema i am using :
>
>  <field name="opennlp_text" type="open_nlp" indexed="true" stored="true"/>
> <copyField source="content" dest="opennlp_text"/>
>
> Does the *opennlp_text* field be indexed="true"?
>
>  <fieldType name="open_nlp" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer>
>         <tokenizer class="solr.OpenNLPTokenizerFactory"
> sentenceModel="opennlp/en-sent.bin"  tokenizerModel="opennlp/en-token.bin"/>
>         <filter class="solr.OpenNLPFilterFactory"
> posTaggerModel="opennlp/en-pos-maxent.bin"/>
>        <filter class="solr.OpenNLPLemmatizerFilterFactory"
> dictionary="opennlp/en-lemmatizer.txt"/>
>       </analyzer>
>     </fieldType>
>
> Here the en-lemmatizer.txt is 7mb in size.Without lemmatization usually the
> whole indexing process takes on an average basis 2-3mts,but here it is
> taking more than 1hr and continuing.Is the scenario related to the
> lemmatizer file.
> Could you please guide me.
>
> Thanks,
> Arun
>
>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Exception-while-integrating-openNLP-with-Solr-tp4326146p4326311.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

aruninfo100
Hi,

I applied the LUCENE-2899.patch which provide the openNLP capabilities to solr for nlp capabilities.One such feature it provides is lemmatization,which helps to match the root word.But integrating the same was too much time consuming(indexing). It provides you with POS,Sentence detection,Named entity recognition too.As u said here too models has to be trained for better performance.

I am also trying to use  POS-tagging:

<filter class="solr.OpenNLPFilterFactory"  posTaggerModel="opennlp/en-pos-maxent.bin"/> 

I tried analyzing the output through this filter from solr admin UI and I could see the tagging.
I haven't trained the model-en-pos-maxent.bin as of now.

It will be helpful if you can help me in providing details on:
I can build on top of the knowledge provided by you on:

1.How good the training data should be.Things to be noticed.
2.Training tool you have used.openNLP provides command line interface for training and also APIs.
3.The schema structure to follow.
4.Query structure.

Thanks once again for spending time on my queries :) .

Thanks and Regards,
Arun
Reply | Threaded
Open this post in threaded view
|

RE: Exception while integrating openNLP with Solr

pressione03
In reply to this post by aruninfo100
How'd you figure it out? I have the same problem and I can't find a solution.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html