[jira] [Comment Edited] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Comment Edited] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091797#comment-17091797 ]

Tim Allison edited comment on TIKA-3093 at 4/24/20, 5:50 PM:
-------------------------------------------------------------

The big remaining question is how do we allow users to map from /rmeta's format:

{noformat}
[
  { ...main doc },
  { embedded doc1},
  { embedded doc2}
]
{noformat}

to, say:
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-JSONExamples
{noformat}
[
  {
    "id": "1",
    "title": "Solr adds block join support",
    "content_type": "parentDocument",
    "_childDocuments_": [
      {
        "id": "2",
        "comments": "SolrCloud supports it too!"
      }
    ]
  },
{noformat}


was (Author: [hidden email]):
The big bit remaining question is how do we allow users to map from

{noformat}
[
  { ...main doc },
  { embedded doc1},
  { embedded doc2}
]
{noformat}

to, say:
https://lucene.apache.org/solr/guide/6_6/uploading-data-with-index-handlers.html#UploadingDatawithIndexHandlers-JSONExamples
{noformat}
[
  {
    "id": "1",
    "title": "Solr adds block join support",
    "content_type": "parentDocument",
    "_childDocuments_": [
      {
        "id": "2",
        "comments": "SolrCloud supports it too!"
      }
    ]
  },
{noformat}

> Enable tika-server to forward parse results to another endpoint
> ---------------------------------------------------------------
>
>                 Key: TIKA-3093
>                 URL: https://issues.apache.org/jira/browse/TIKA-3093
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: test_recursive_embedded.docx.json
>
>
> bq. I see the "send the results to a remote network service" thing as probably being separate from the Content Handler.
> The above is from [~nick] on TIKA-2972.
> It would be useful to allow users to forward the results of parsing to another endpoint.  For example, a user could specify a Solr URL/update/json/docs handler or an elastic /<index>/_doc/<_id>
> We may want to allow users to do custom mapping before redirecting to another URL, whitelisting/blacklisting of metadata keys, etc.
> I'd propose using /rmeta as the basis for this.
> cc [~ehatcher] and [~dadoonet].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)