[jira] [Created] (TIKA-2643) Tika call hangs when processes a pdf on Cloudera Hadoop
feng ye created TIKA-2643:
Summary: Tika call hangs when processes a pdf on Cloudera Hadoop
URL: https://issues.apache.org/jira/browse/TIKA-2643 Project: Tika
Issue Type: Bug
Affects Versions: 1.17
Environment: Cloudera Hadoop 5.8
Reporter: feng ye
Fix For: 1.17
Attachments: hang-stdout.txt, hang.zip, testJournalParser.pdf
Tika.parseToString(InputStream) hangs when called within a MapReduce job to process a pdf file from Cloudera Hadoop 5.8 (observed on 5.4 too). It can process some other pdf files on the same cluster. I am attaching the file and the syslog as well as stdout logs. Interesting that the same file can be processed fine over a Hortonworks cluster.
This issue is a blocker for us to make our feature based on Tika available to Cloudera cluster, a major flavor of Hadoop, so your timely attention would be very much appreciated.
This message was sent by Atlassian JIRA