[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2129) Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12968818#action_12968818 ]

Justinas Jaronis commented on SOLR-2129:
----------------------------------------

I tried Your latest patch however after compiling it doesn't include resources (./contrib/uima/src/resources/*) to the compiled project. So posting fails :

java.lang.RuntimeException: org.apache.uima.resource.ResourceInitializationException
    at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:81)
    at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
    at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1359)
    at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:337)
    at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:240)
    at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
    at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
    at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
    at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
    at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
    at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
    at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
    at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
    at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
    at org.mortbay.jetty.Server.handle(Server.java:326)
    at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
    at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
    at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
    at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
    at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
    at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
    at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
Caused by: org.apache.uima.resource.ResourceInitializationException
    at org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:85)
    at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processText(UIMAUpdateRequestProcessor.java:115)
    at org.apache.solr.uima.processor.UIMAUpdateRequestProcessor.processAdd(UIMAUpdateRequestProcessor.java:68)
    ... 24 more
Caused by: java.lang.NullPointerException
    at org.apache.uima.util.XMLInputSource.<init>(XMLInputSource.java:114)
    at org.apache.solr.uima.processor.ae.OverridingParamsAEProvider.getAE(OverridingParamsAEProvider.java:64)
    ... 26 more

when OverridingParamsAEProvider tries to read  /org/apache/uima/desc/OverridingParamsExtServicesAE.xml . Where this file (and its fellow XMLs) should be located?


Thanks for the effort. Great project!

> Provide a Solr module for dynamic metadata extraction/indexing with Apache UIMA
> -------------------------------------------------------------------------------
>
>                 Key: SOLR-2129
>                 URL: https://issues.apache.org/jira/browse/SOLR-2129
>             Project: Solr
>          Issue Type: New Feature
>            Reporter: Tommaso Teofili
>            Assignee: Robert Muir
>         Attachments: lib-jars.zip, SOLR-2129-asf-headers.patch, SOLR-2129-version2.patch, SOLR-2129.patch
>
>
> Provide components to enable Apache UIMA automatic metadata extraction to be exploited when indexing documents.
> The purpose of this is to get unstructured information "inside" a document and create structured metadata (as fields) to enrich each document.
> Basically this can be done with a custom UpdateRequestProcessor which triggers UIMA while indexing documents.
> The basic UIMA implementation of UpdateRequestProcessor extracts sentences (with a tokenizer and an hidden Markov model tagger), named entities, language, suggested category, keywords and concepts (exploiting external services from OpenCalais and AlchemyAPI). Such an implementation can be easily extended adding or selecting different UIMA analysis engines, both from UIMA repositories on the web or creating new ones from scratch.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]