[jira] [Commented] (TIKA-3128) MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-3128) MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1

Nicholas DiPiazza (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-3128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17213255#comment-17213255 ]

Tim Allison commented on TIKA-3128:
-----------------------------------

Right.  That was an upgrade to avoid numerous vulnerabilities. I'll take a look and see if there is a workaround.

> MOV file produces RuntimeException with 1.24.1, used to work with earlier version 1.19.1
> ----------------------------------------------------------------------------------------
>
>                 Key: TIKA-3128
>                 URL: https://issues.apache.org/jira/browse/TIKA-3128
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.24.1
>            Reporter: Sameer Apte
>            Priority: Major
>         Attachments: HDSIT_157516.mov
>
>
> Attached _mov_ file produces _RuntimeException_ when parsed with *tika v1.24.1*
> The same _mov_ file can be parsed without any issues with *tika v1.19.1*
>  *Tika 1.19.1 stand alone app _SUCCESSFUL_ run*
> {code:java}
> [sapte@sapte-dt tikatest]$ java -jar tika-app-1.19.1.jar -m HDSIT_157516.mov
> Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> for optional dependencies.Jun 18, 2020 11:25:00 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Content-Length: 51066400
> Content-Type: application/mp4
> Creation-Date: 2015-05-18T16:23:25Z
> Last-Modified: 2015-05-18T16:31:09Z
> Last-Save-Date: 2015-05-18T16:31:09Z
> X-Parsed-By: org.apache.tika.parser.DefaultParser
> X-Parsed-By: org.apache.tika.parser.mp4.MP4Parser
> date: 2015-05-18T16:31:09Z
> dcterms:created: 2015-05-18T16:23:25Z
> dcterms:modified: 2015-05-18T16:31:09Z
> meta:creation-date: 2015-05-18T16:23:25Z
> meta:save-date: 2015-05-18T16:31:09Z
> modified: 2015-05-18T16:31:09Z
> resourceName: HDSIT_157516.mov
> tiff:ImageLength: 1080
> tiff:ImageWidth: 1920
> xmpDM:audioSampleRate: 30000
> xmpDM:duration: 125.99
>  {code}
> *Tika 1.24.1 standalone app _RUNTIMEEXCEPTION_ run*
> {code:java}
> [sapte@sapte-dt tikatest]$ java -jar tika-app-1.24.1.jar -m HDSIT_157516.mov
> Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> WARNING: J2KImageReader not loaded. JPEG2000 files will not be processed.
> See https://pdfbox.apache.org/2.0/dependencies.html#jai-image-io
> for optional dependencies.
> Jun 18, 2020 11:24:50 AM org.apache.tika.config.InitializableProblemHandler$3 handleInitializableProblem
> WARNING: org.xerial's sqlite-jdbc is not loaded.
> Please provide the jar on your classpath to parse sqlite files.
> See tika-parsers/pom.xml for the correct version.
> Exception in thread "main" org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.mp4.MP4Parser@23348b5d
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:293)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:143)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:209)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:496)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:149)
> Caused by: java.lang.RuntimeException: box size of zero means 'till end of file. That is not yet supported
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:90)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.boxes.sampleentry.VisualSampleEntry.parse(VisualSampleEntry.java:195)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.boxes.iso14496.part12.SampleDescriptionBox.parse(SampleDescriptionBox.java:91)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.support.AbstractContainerBox.parse(AbstractContainerBox.java:76)
> at org.mp4parser.AbstractBoxParser.parseBox(AbstractBoxParser.java:115)
> at org.mp4parser.BasicContainer.initContainer(BasicContainer.java:107)
> at org.mp4parser.IsoFile.<init>(IsoFile.java:58)
> at org.mp4parser.IsoFile.<init>(IsoFile.java:45)
> at org.apache.tika.parser.mp4.MP4Parser.parse(MP4Parser.java:130)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
> ... 5 more
> {code}
> Commit _8e2eb05292bc35503a3d82a908c426854e23ac83_ in v1.24.1 which switched the mp4 parser from _googlecode_ to _tallison_ appears to be directly responsible for the change in behavior.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)