problems parsing pdf's

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

problems parsing pdf's

Edward Quick


I keep getting the following errors when parsing pdf's:

Error parsing:$FILE/Three+wishes.pdf: failed(2,0): Can't be handled as pdf document. java.lang.ClassCastException: org.pdfbox.pdmodel.encryption.PDEncryptionDictionary

fetch of$FILE/BAUWS.pdf failed with: java.lang.NoClassDefFoundError: javax/media/jai/PlanarImage

I have applied the patch mentioned here=>
but this didn't stop the ClassCastExceptions for everything.

Currently I've got about 243 pdfs on our Intranet which I cant get Nutch to parse :-(



Try Facebook in Windows Live Messenger! Try it Now!