[jira] [Commented] (TIKA-2966) Create a tika-eval SAXHandler

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (TIKA-2966) Create a tika-eval SAXHandler

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-2966?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16953639#comment-16953639 ]

Tim Allison commented on TIKA-2966:

I'd want this in streaming mode to handle text as it came in by {{characters()}}, but tokenization is critical and we can't guarantee that parsers will call {{characters()}} on logical chunks.

> Create a tika-eval SAXHandler
> -----------------------------
>                 Key: TIKA-2966
>                 URL: https://issues.apache.org/jira/browse/TIKA-2966
>             Project: Tika
>          Issue Type: Improvement
>            Reporter: Tim Allison
>            Priority: Major
> One of the improvements coming in 1.23 is the decoupling of the text stats calculator from the tika-eval app.  To make this even easier to use, let's add a handler that will calculate the text stats on .endDocument() and record those stats in a metadata object.

This message was sent by Atlassian Jira