[jira] [Commented] (TIKA-1332) Create tika-eval module

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (TIKA-1332) Create tika-eval module

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15870748#comment-15870748 ]

Hudson commented on TIKA-1332:

SUCCESS: Integrated in Jenkins build Tika-trunk #1201 (See [https://builds.apache.org/job/Tika-trunk/1201/])
TIKA-1332 -- fix analyzer chain for common tokens, clean up UTF-8 (tallison: rev a2d214c71602f4f5a84635adc38c43182a39a390)
* (edit) tika-eval/src/main/java/org/apache/tika/eval/tokens/AnalyzerManager.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/tokens/TokenIntPair.java
* (edit) tika-eval/src/main/resources/lucene-analyzers.json
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java

> Create tika-eval module
> -----------------------
>                 Key: TIKA-1332
>                 URL: https://issues.apache.org/jira/browse/TIKA-1332
>             Project: Tika
>          Issue Type: Sub-task
>          Components: cli, general, server
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 2.0, 1.15
>         Attachments: comparison_reports.xml
> For this issue, we can start with code to gather statistics on each run (# of exceptions per file type, most common exceptions per file type, number of metadata items, total text extracted, etc).  We should also be able to compare one run against another.  Going forward, there's plenty of room to improve.

This message was sent by Atlassian JIRA