[jira] Created: (TIKA-346) ZipParser throws "invalid compression method" error for some archives

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-346) ZipParser throws "invalid compression method" error for some archives

JIRA jira@apache.org
ZipParser throws "invalid compression method" error for some archives
---------------------------------------------------------------------

                 Key: TIKA-346
                 URL: https://issues.apache.org/jira/browse/TIKA-346
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.5
         Environment: Windows XP, JVM 1.6.16
            Reporter: Robert Trickey
         Attachments: moby.zip

This could be a bug in the underlying apache-commons code. When trying to parse the attached file to extract text content, an error is thrown with the following stacktrace:

org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at my.code.wherever.....
Caused by: java.lang.IllegalArgumentException: invalid compression method
        at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
        at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
        at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
        at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
        at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        ... 25 more

I have extracted the content of the zip and ran the autodetect parser against all content files without problems, so it is definitely the zip that is the problem.

The attached zip is from Project Gutenberg and hence public domain.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-346) ZipParser throws "invalid compression method" error for some archives

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Trickey updated TIKA-346:
--------------------------------

    Attachment: moby.zip

> ZipParser throws "invalid compression method" error for some archives
> ---------------------------------------------------------------------
>
>                 Key: TIKA-346
>                 URL: https://issues.apache.org/jira/browse/TIKA-346
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.5
>         Environment: Windows XP, JVM 1.6.16
>            Reporter: Robert Trickey
>         Attachments: moby.zip
>
>
> This could be a bug in the underlying apache-commons code. When trying to parse the attached file to extract text content, an error is thrown with the following stacktrace:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at my.code.wherever.....
> Caused by: java.lang.IllegalArgumentException: invalid compression method
> at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
> at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
> at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
> at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
> at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> ... 25 more
> I have extracted the content of the zip and ran the autodetect parser against all content files without problems, so it is definitely the zip that is the problem.
> The attached zip is from Project Gutenberg and hence public domain.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-346) ZipParser throws "invalid compression method" error for some archives

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-346:
-------------------------------

    Attachment: TIKA-346.patch

The attached patch fixes this problem after recent Commons Compress changes related to COMPRESS-93. We can apply the patch once Commons Compress 1.1 is available.

> ZipParser throws "invalid compression method" error for some archives
> ---------------------------------------------------------------------
>
>                 Key: TIKA-346
>                 URL: https://issues.apache.org/jira/browse/TIKA-346
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.5
>         Environment: Windows XP, JVM 1.6.16
>            Reporter: Robert Trickey
>         Attachments: moby.zip, TIKA-346.patch
>
>
> This could be a bug in the underlying apache-commons code. When trying to parse the attached file to extract text content, an error is thrown with the following stacktrace:
> org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pkg.ZipParser@1b963c4
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:122)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at my.code.wherever.....
> Caused by: java.lang.IllegalArgumentException: invalid compression method
> at java.util.zip.ZipEntry.setMethod(ZipEntry.java:209)
> at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextZipEntry(ZipArchiveInputStream.java:146)
> at org.apache.commons.compress.archivers.zip.ZipArchiveInputStream.getNextEntry(ZipArchiveInputStream.java:188)
> at org.apache.tika.parser.pkg.PackageParser.parseArchive(PackageParser.java:66)
> at org.apache.tika.parser.pkg.ZipParser.parse(ZipParser.java:49)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> ... 25 more
> I have extracted the content of the zip and ran the autodetect parser against all content files without problems, so it is definitely the zip that is the problem.
> The attached zip is from Project Gutenberg and hence public domain.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.