indexing url without parsed content

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

indexing url without parsed content

Edward Quick

I have a pdf document which nutch can't parse (despite the fact I applied the patch in - see below

Error parsing:$FILE/Lost+at+sea.pdf: failed(2,0): Can't be handled as pdf document. java.lang.ClassCastException: org.pdfbox.pdmodel.encryption.PDEncryptionDictionary

Can I manually add a title to the index with this url ?

Win New York holidays with Kellogg’s & Live Search