[jira] [Commented] (TIKA-1332) Create tika-eval module

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-1332) Create tika-eval module

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-1332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15871033#comment-15871033 ]

Hudson commented on TIKA-1332:
------------------------------

SUCCESS: Integrated in Jenkins build tika-2.x #218 (See [https://builds.apache.org/job/tika-2.x/218/])
TIKA-1332 -- add English Spanish common tokens;  fix logging (tallison: rev 81150859bdb25fe7faec575f5b916c8efad963cb)
* (edit) tika-eval/src/main/resources/tika-eval-comparison-config.xml
* (delete) tika-eval/src/test/resources/commontokens/zh-tw
* (add) tika-eval/src/test/resources/common_tokens/zh-cn
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (add) tika-eval/src/test/resources/common_tokens/zh-tw
* (add) tika-eval/src/test/resources/common_tokens/en
* (add) tika-eval/src/test/resources/common_tokens/es
* (add) tika-eval/src/main/resources/log4j.properties
* (delete) tika-eval/src/test/resources/commontokens/zh-cn
* (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/SingleFileConsumerBuilder.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (edit) tika-eval/src/test/resources/single-file-profiler-crawl-extract-config.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenCountManager.java
* (delete) tika-eval/src/test/resources/commontokens/es
* (add) tika-eval/src/main/resources/common_tokens/es
* (edit) tika-eval/src/main/resources/tika-eval-profiler-config.xml
* (add) tika-eval/src/main/resources/common_tokens/en
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/TikaEvalCLITest.java
* (delete) tika-eval/src/test/resources/log4j_process.properties
* (edit) tika-eval/src/test/resources/single-file-profiler-crawl-input-config.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/EvalConsumersBuilder.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/TikaEvalCLI.java
* (delete) tika-eval/src/test/resources/commontokens/en
* (delete) tika-eval/src/test/resources/log4j.properties


> Create tika-eval module
> -----------------------
>
>                 Key: TIKA-1332
>                 URL: https://issues.apache.org/jira/browse/TIKA-1332
>             Project: Tika
>          Issue Type: Sub-task
>          Components: cli, general, server
>            Reporter: Tim Allison
>            Assignee: Tim Allison
>             Fix For: 2.0, 1.15
>
>         Attachments: comparison_reports.xml
>
>
> For this issue, we can start with code to gather statistics on each run (# of exceptions per file type, most common exceptions per file type, number of metadata items, total text extracted, etc).  We should also be able to compare one run against another.  Going forward, there's plenty of room to improve.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)