[jira] [Resolved] (NUTCH-2603) Bring back legacy pre-Tika parsers and use them as back up parsers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (NUTCH-2603) Bring back legacy pre-Tika parsers and use them as back up parsers

David Pilato (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-2603.
------------------------------------
    Resolution: Won't Fix

> Bring back legacy pre-Tika parsers and use them as back up parsers
> ------------------------------------------------------------------
>
>                 Key: NUTCH-2603
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2603
>             Project: Nutch
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 1.15
>            Reporter: Arkadi Kosmynin
>            Priority: Major
>         Attachments: public_docs.txt
>
>
> There are cases when legacy parsers successfully parse documents on which Tika fails. I am attaching a list of examples of such documents. Nutch allows use of more than one parser on a document, in a sequence, until the document has been parsed successfully. Thus, old parsers can be combined with Tika to achieve better parsing success rate, at least until Tika is perfect.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)