PDFParser fails to decyrpt metadata (patch included)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

PDFParser fails to decyrpt metadata (patch included)

Ingo Feltes
Hi folks,

the fix for TIKA-267 seems to work fine for the content, but Tika still
fails to decrypt the meta data of those PDFs. The meta data seems to be
still encrypted. If you switch the order of processing text and extracting
meta data the meta data is decrypted correctly.

Cheers,

Ingo

Index: PDFParser.java
===================================================================
--- PDFParser.java (revision 812208)
+++ PDFParser.java (working copy)
@@ -63,9 +63,9 @@
                     // Ignore
                 }
             }
+            PDF2XHTML.process(pdfDocument, handler, metadata);
+            extractMetadata(pdfDocument, metadata);
             metadata.set(Metadata.CONTENT_TYPE, "application/pdf");
-            extractMetadata(pdfDocument, metadata);
-            PDF2XHTML.process(pdfDocument, handler, metadata);
         } finally {
             pdfDocument.close();
         }