[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.16

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.16

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16200495#comment-16200495 ]

Markus Jelsma commented on NUTCH-2439:
--------------------------------------

Ah, i removed slf4j-api from plugin.xml and it works. But errors are logged:fetching: https://www.sitesearch.io/
robots.txt whitelist not configured.
{code}
fetching: https://www.sitesearch.io/
robots.txt whitelist not configured.
Oct 11, 2017 5:50:50 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: JBIG2ImageReader not loaded. jbig2 files will be ignored
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
TIFFImageWriter not loaded. tiff files will not be processed
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.
J2KImageReader not loaded. JPEG2000 files will not be processed.
See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
for optional dependencies.

Oct 11, 2017 5:50:50 PM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
WARNING: org.xerial's sqlite-jdbc is not loaded.
Please provide the jar on your classpath to parse sqlite files.
See tika-parsers/pom.xml for the correct version.
parsing: https://www.sitesearch.io/
{code}

> Upgrade to Apache Tika 1.16
> ---------------------------
>
>                 Key: NUTCH-2439
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2439
>             Project: Nutch
>          Issue Type: Improvement
>    Affects Versions: 1.13
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>             Fix For: 1.14
>
>         Attachments: NUTCH-2439.patch
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)