Integrate UIMA and DIH

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Integrate UIMA and DIH

paulparsons
Hi,

I am trying to integrate UIMA and Solr. I'm following the guide here: https://cwiki.apache.org/confluence/display/solr/UIMA+Integration


I'm also already using DIH to import from XML files.

Here is what I've added to solrconfig.xml:

libraries:

  <lib dir="../../../contrib/uima/lib" />
  <lib dir="../../../contrib/uima/lucene-libs" />
  <lib dir="../../../dist/" regex="solr-uima-\d.*\.jar" />


UpdateRequestProcessorChain:

    <updateRequestProcessorChain name="uima">
      <processor class="org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory">
        <lst name="uimaConfig">
          <str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>
          <bool name="ignoreErrors">false</bool>
          <lst name="analyzeFields">
            <bool name="merge">false</bool>
            <arr name="fields">
               <str>text_en</str>
            </arr>
          </lst>
          <lst name="fieldMappings">
            <lst name="type">
              <str name="name">org.apache.uima.SentenceAnnotation</str>
              <lst name="mapping">
                <str name="feature">coveredText</str>
                <str name="field">sentence</str>
              </lst>
            </lst>
          </lst>
        </lst>
      </processor>
      <processor class="solr.LogUpdateProcessorFactory" />
      <processor class="solr.RunUpdateProcessorFactory" />
    </updateRequestProcessorChain>


and the DIH:

  <requestHandler name="/medline_ingest" class="org.apache.solr.handler.dataimport.DataImportHandler">
    <lst name="defaults">
      <str name="config">./medline_DIHconfig.xml</str>
      <str name="update.processor">uima</str>
    </lst>
  </requestHandler>


and I've added this field to the schema:

<field name="sentence" type="text_en" indexed="true" stored="true" multiValued="true" required="false" />


The DIH part of the indexing works fine and no errors are given. But nothing happens with the sentence field. I'm not sure what I'm missing here. I've been searching all over but can't seem to find any useful information.

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: Integrate UIMA and DIH

paulparsons
I added default="true" to my updateRequestProcessorChain:

<updateRequestProcessorChain name="uima" default="true">

Now I'm getting errors when running the DIH:


ERROR org.apache.solr.core.SolrCore  – org.apache.solr.common.SolrException: org.apache.uima.resource.ResourceInitializationException
        at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:64)
        at org.apache.solr.update.processor.UpdateRequestProcessorChain.createProcessor(UpdateRequestProcessorChain.java:204)
        at org.apache.solr.handler.dataimport.DataImportHandler.handleRequestBody(DataImportHandler.java:178)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1962)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:777)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:418)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:207)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:247)
        at org.eclipse.jetty.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:210)
        at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
        at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
        at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
        at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
        at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
        at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
        at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
        at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
        at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
        at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:255)
        at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:154)
        at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
        at org.eclipse.jetty.server.Server.handle(Server.java:368)
        at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:489)
        at org.eclipse.jetty.server.BlockingHttpConnection.handleRequest(BlockingHttpConnection.java:53)
        at org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:953)
        at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:1014)
        at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:861)
        at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:240)
        at org.eclipse.jetty.server.BlockingHttpConnection.handle(BlockingHttpConnection.java:72)
        at org.eclipse.jetty.server.bio.SocketConnector$ConnectorEndPoint.run(SocketConnector.java:264)
        at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
        at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
        at java.lang.Thread.run(Thread.java:745)
Caused by: org.apache.uima.resource.ResourceInitializationException
        at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:58)
        at org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory.getInstance(UIMAUpdateRequestProcessorFactory.java:61)
        ... 35 more
Caused by: java.lang.NullPointerException
        at org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:118)
        at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getInputSource(BasicAEProvider.java:84)
        at org.apache.lucene.analysis.uima.ae.BasicAEProvider.getAE(BasicAEProvider.java:50)
        ... 36 more



I've looked at the source code that is pointed to, but can't determine what the problem is. I've also noticed from other posts that people in the past have had a similar problem with ResourceInitializationException, but there doesn't seem to be any general solution.
Reply | Threaded
Open this post in threaded view
|

Re: Integrate UIMA and DIH

paulparsons
I forgot to mention in the previous post that I changed the analysis engine from

<str name="analysisEngine">/org/apache/uima/desc/OverridingParamsExtServicesAE.xml</str>

to

<str name="analysisEngine">/org/apache/uima/desc/AggregateSentenceAE</str>

In doing so, I forgot the '.xml' extension, which is what was causing the error. It would be helpful if the error messages where a little more descriptive!