[jira] Created: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

Tim Allison (Jira)
PDFParser causes Problems when using encrypted PDF documents
------------------------------------------------------------

                 Key: TIKA-223
                 URL: https://issues.apache.org/jira/browse/TIKA-223
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 0.3
         Environment: Java 1.5.x on MAC, WIN, LIN
            Reporter: Joachim Zittmayr
             Fix For: 0.4


The PDFParser.parse() method decrypts the document for the metadata already and then passes it over to PDF2XHTML.process(), which in turn calls the inherited getText(). This calls writeText(), which tries to decrypt the PDDocument again, but this will fail as it is already decrypted. The solution would be to override  writeText(), without the document.isEncrypted check.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712259#action_12712259 ]

Jukka Zitting commented on TIKA-223:
------------------------------------

Sounds reasonable. Do you have a patch for this change?

> PDFParser causes Problems when using encrypted PDF documents
> ------------------------------------------------------------
>
>                 Key: TIKA-223
>                 URL: https://issues.apache.org/jira/browse/TIKA-223
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>         Environment: Java 1.5.x on MAC, WIN, LIN
>            Reporter: Joachim Zittmayr
>             Fix For: 0.4
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The PDFParser.parse() method decrypts the document for the metadata already and then passes it over to PDF2XHTML.process(), which in turn calls the inherited getText(). This calls writeText(), which tries to decrypt the PDDocument again, but this will fail as it is already decrypted. The solution would be to override  writeText(), without the document.isEncrypted check.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12730186#action_12730186 ]

Chris A. Mattmann commented on TIKA-223:
----------------------------------------

Hi All:

Is there a patch for this issue, which includes e.g., a unit test for verification? I'm trying to get the 0.4 RC together and this is one of the 2 only remaining open issues. Please let me know. I'll use the same approach as for the other open issue. If I don't hear back from anyone in the next 48 hrs, I'll assume it's OK to push this to 0.5. If I do hear back and there is significant support to push this to 0.5, I'll do so sooner. If not, can we get a patch together ASAP? I'd like to cut an RC this week and call for a vote?

My vote is -1 that this is a blocker for 0.4 and +1 to move this to 0.5.

Cheers,
Chris


> PDFParser causes Problems when using encrypted PDF documents
> ------------------------------------------------------------
>
>                 Key: TIKA-223
>                 URL: https://issues.apache.org/jira/browse/TIKA-223
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>         Environment: Java 1.5.x on MAC, WIN, LIN
>            Reporter: Joachim Zittmayr
>             Fix For: 0.4
>
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The PDFParser.parse() method decrypts the document for the metadata already and then passes it over to PDF2XHTML.process(), which in turn calls the inherited getText(). This calls writeText(), which tries to decrypt the PDDocument again, but this will fail as it is already decrypted. The solution would be to override  writeText(), without the document.isEncrypted check.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting updated TIKA-223:
-------------------------------

    Fix Version/s:     (was: 0.4)

Unscheduling.

> PDFParser causes Problems when using encrypted PDF documents
> ------------------------------------------------------------
>
>                 Key: TIKA-223
>                 URL: https://issues.apache.org/jira/browse/TIKA-223
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>         Environment: Java 1.5.x on MAC, WIN, LIN
>            Reporter: Joachim Zittmayr
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The PDFParser.parse() method decrypts the document for the metadata already and then passes it over to PDF2XHTML.process(), which in turn calls the inherited getText(). This calls writeText(), which tries to decrypt the PDDocument again, but this will fail as it is already decrypted. The solution would be to override  writeText(), without the document.isEncrypted check.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12742806#action_12742806 ]

Joachim Zittmayr commented on TIKA-223:
---------------------------------------

sorry, guys for not having responded to this issue. recently i downloaded the fresh new 0.4 release, which still has this bug.
if you could tell me, how you want this patch sent/filed/comitted - i am an absolute fresher regarding filing bugs against opensource software projects...

> PDFParser causes Problems when using encrypted PDF documents
> ------------------------------------------------------------
>
>                 Key: TIKA-223
>                 URL: https://issues.apache.org/jira/browse/TIKA-223
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>         Environment: Java 1.5.x on MAC, WIN, LIN
>            Reporter: Joachim Zittmayr
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The PDFParser.parse() method decrypts the document for the metadata already and then passes it over to PDF2XHTML.process(), which in turn calls the inherited getText(). This calls writeText(), which tries to decrypt the PDDocument again, but this will fail as it is already decrypted. The solution would be to override  writeText(), without the document.isEncrypted check.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (TIKA-223) PDFParser causes Problems when using encrypted PDF documents

Tim Allison (Jira)
In reply to this post by Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-223.
--------------------------------

    Resolution: Duplicate
      Assignee: Jukka Zitting

Resolving as a duplicate of TIKA-267 that I fixed in revision 806888.

> PDFParser causes Problems when using encrypted PDF documents
> ------------------------------------------------------------
>
>                 Key: TIKA-223
>                 URL: https://issues.apache.org/jira/browse/TIKA-223
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.3
>         Environment: Java 1.5.x on MAC, WIN, LIN
>            Reporter: Joachim Zittmayr
>            Assignee: Jukka Zitting
>   Original Estimate: 2h
>  Remaining Estimate: 2h
>
> The PDFParser.parse() method decrypts the document for the metadata already and then passes it over to PDF2XHTML.process(), which in turn calls the inherited getText(). This calls writeText(), which tries to decrypt the PDDocument again, but this will fail as it is already decrypted. The solution would be to override  writeText(), without the document.isEncrypted check.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.