[jira] [Commented] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

Clark Perkins (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17093461#comment-17093461 ]

Tim Allison commented on TIKA-3093:

I wonder if it would be simpler if we offered four forwarding options: Solr, Elasticsearch, FileSystem and custom.  We could load the "custom" from SPI...users could drop their jar in the tika-server.jar directory.

Under this proposal, we would not use the Solr/ES clients, we'd do our own mapping, which should be fairly straightforward.

I'm hesitant to add implementation/tool specific forwarding options (e.g. Solr and Elasticsearch), but I don't want to have everyone rolling their own.  The problem here, obv, will be tracking with different versions of Solr/ES.  My sense is that the APIs for adding docs hasn't changed much in these two projects.

There are several things that I don't like about this, and I'm open to -1 and better options.  I'd rather not be stuck here: https://xkcd.com/974/

> Enable tika-server to forward parse results to another endpoint
> ---------------------------------------------------------------
>                 Key: TIKA-3093
>                 URL: https://issues.apache.org/jira/browse/TIKA-3093
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: test_recursive_embedded.docx.json
> bq. I see the "send the results to a remote network service" thing as probably being separate from the Content Handler.
> The above is from [~nick] on TIKA-2972.
> It would be useful to allow users to forward the results of parsing to another endpoint.  For example, a user could specify a Solr URL/update/json/docs handler or an elastic /<index>/_doc/<_id>
> We may want to allow users to do custom mapping before redirecting to another URL, whitelisting/blacklisting of metadata keys, etc.
> I'd propose using /rmeta as the basis for this.
> cc [~ehatcher] and [~dadoonet].

This message was sent by Atlassian Jira