[jira] Created: (TIKA-370) Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-370) Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox

JIRA jira@apache.org
Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
--------------------------------------------------------------------------

                 Key: TIKA-370
                 URL: https://issues.apache.org/jira/browse/TIKA-370
             Project: Tika
          Issue Type: Bug
    Affects Versions: 0.6
            Reporter: Ken Krugler
            Assignee: Ken Krugler


While processing a bunch of PDFs off the web, I ran into a ClassNotFoundException thrown inside of PDFBox:

java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider
        at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1092)
        at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
        at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
        at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
        at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
        at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
        at bixo.parser.SimpleParser.parse(SimpleParser.java:153)
Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider
        at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:248)

I believe the issue is that the PDFBox pom.xml declares the dependency on the missing BouncyCastleProvider jar as "optional".

   <dependency>
     <groupId>bouncycastle</groupId>
     <artifactId>bcprov-jdk14</artifactId>
     <version>136</version>
     <optional>true</optional>
   </dependency>

As explained in the Maven documentation, this means that Tika needs to explicitly include the jar:

http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html

I see a few other optional dependencies in the PDFBox pom.xml, but perhaps the only one that's really critical is the above.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-370) Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804704#action_12804704 ]

Ken Krugler commented on TIKA-370:
----------------------------------

On the list, Jukka said:

{quote}
Yep. I think the <optional/> setting was added in PDFBox 0.8.0 and and
we lost those dependencies when upgrading from 0.7.3. No problem
adding them back in.
{quote}


> Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
> --------------------------------------------------------------------------
>
>                 Key: TIKA-370
>                 URL: https://issues.apache.org/jira/browse/TIKA-370
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Ken Krugler
>            Assignee: Ken Krugler
>
> While processing a bunch of PDFs off the web, I ran into a ClassNotFoundException thrown inside of PDFBox:
> java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider
> at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1092)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
> at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> at bixo.parser.SimpleParser.parse(SimpleParser.java:153)
> Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> I believe the issue is that the PDFBox pom.xml declares the dependency on the missing BouncyCastleProvider jar as "optional".
>    <dependency>
>      <groupId>bouncycastle</groupId>
>      <artifactId>bcprov-jdk14</artifactId>
>      <version>136</version>
>      <optional>true</optional>
>    </dependency>
> As explained in the Maven documentation, this means that Tika needs to explicitly include the jar:
> http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html
> I see a few other optional dependencies in the PDFBox pom.xml, but perhaps the only one that's really critical is the above.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (TIKA-370) Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12804704#action_12804704 ]

Ken Krugler edited comment on TIKA-370 at 1/25/10 8:53 PM:
-----------------------------------------------------------

On the list, Jukka said:

Yep. I think the <optional/> setting was added in PDFBox 0.8.0 and and
we lost those dependencies when upgrading from 0.7.3. No problem
adding them back in.



      was (Author: kkrugler):
    On the list, Jukka said:

{quote}
Yep. I think the <optional/> setting was added in PDFBox 0.8.0 and and
we lost those dependencies when upgrading from 0.7.3. No problem
adding them back in.
{quote}

 

> Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
> --------------------------------------------------------------------------
>
>                 Key: TIKA-370
>                 URL: https://issues.apache.org/jira/browse/TIKA-370
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Ken Krugler
>            Assignee: Ken Krugler
>
> While processing a bunch of PDFs off the web, I ran into a ClassNotFoundException thrown inside of PDFBox:
> java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider
> at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1092)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
> at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> at bixo.parser.SimpleParser.parse(SimpleParser.java:153)
> Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> I believe the issue is that the PDFBox pom.xml declares the dependency on the missing BouncyCastleProvider jar as "optional".
>    <dependency>
>      <groupId>bouncycastle</groupId>
>      <artifactId>bcprov-jdk14</artifactId>
>      <version>136</version>
>      <optional>true</optional>
>    </dependency>
> As explained in the Maven documentation, this means that Tika needs to explicitly include the jar:
> http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html
> I see a few other optional dependencies in the PDFBox pom.xml, but perhaps the only one that's really critical is the above.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (TIKA-370) Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-370?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-370.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7
         Assignee: Jukka Zitting  (was: Ken Krugler)

Fixed in revision 910924.

> Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
> --------------------------------------------------------------------------
>
>                 Key: TIKA-370
>                 URL: https://issues.apache.org/jira/browse/TIKA-370
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Ken Krugler
>            Assignee: Jukka Zitting
>             Fix For: 0.7
>
>
> While processing a bunch of PDFs off the web, I ran into a ClassNotFoundException thrown inside of PDFBox:
> java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider
> at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1092)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
> at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> at bixo.parser.SimpleParser.parse(SimpleParser.java:153)
> Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> I believe the issue is that the PDFBox pom.xml declares the dependency on the missing BouncyCastleProvider jar as "optional".
>    <dependency>
>      <groupId>bouncycastle</groupId>
>      <artifactId>bcprov-jdk14</artifactId>
>      <version>136</version>
>      <optional>true</optional>
>    </dependency>
> As explained in the Maven documentation, this means that Tika needs to explicitly include the jar:
> http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html
> I see a few other optional dependencies in the PDFBox pom.xml, but perhaps the only one that's really critical is the above.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-370) Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12849725#action_12849725 ]

Kenny Neal commented on TIKA-370:
---------------------------------

In the mean time before 0.7 is released:
http://www.bouncycastle.org/latest_releases.html

> Tika pom.xml is missing dependencies on bouncycastle jars needed by PDFBox
> --------------------------------------------------------------------------
>
>                 Key: TIKA-370
>                 URL: https://issues.apache.org/jira/browse/TIKA-370
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 0.6
>            Reporter: Ken Krugler
>            Assignee: Jukka Zitting
>             Fix For: 0.7
>
>
> While processing a bunch of PDFs off the web, I ran into a ClassNotFoundException thrown inside of PDFBox:
> java.lang.NoClassDefFoundError: org/bouncycastle/jce/provider/BouncyCastleProvider
> at org.apache.pdfbox.pdmodel.PDDocument.openProtection(PDDocument.java:1092)
> at org.apache.pdfbox.pdmodel.PDDocument.decrypt(PDDocument.java:573)
> at org.apache.pdfbox.util.PDFTextStripper.writeText(PDFTextStripper.java:235)
> at org.apache.pdfbox.util.PDFTextStripper.getText(PDFTextStripper.java:180)
> at org.apache.tika.parser.pdf.PDF2XHTML.process(PDF2XHTML.java:56)
> at org.apache.tika.parser.pdf.PDFParser.parse(PDFParser.java:69)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:120)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:101)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:114)
> at bixo.parser.SimpleParser.parse(SimpleParser.java:153)
> Caused by: java.lang.ClassNotFoundException: org.bouncycastle.jce.provider.BouncyCastleProvider
> at java.net.URLClassLoader$1.run(URLClassLoader.java:200)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:188)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:303)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> I believe the issue is that the PDFBox pom.xml declares the dependency on the missing BouncyCastleProvider jar as "optional".
>    <dependency>
>      <groupId>bouncycastle</groupId>
>      <artifactId>bcprov-jdk14</artifactId>
>      <version>136</version>
>      <optional>true</optional>
>    </dependency>
> As explained in the Maven documentation, this means that Tika needs to explicitly include the jar:
> http://maven.apache.org/guides/introduction/introduction-to-optional-and-excludes-dependencies.html
> I see a few other optional dependencies in the PDFBox pom.xml, but perhaps the only one that's really critical is the above.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.