I'm running into the "org.pdfbox.cos.COSArray cannot be cast to
org.pdfbox.cos.COSDictionary" exception parsing quite often certain PDFs
with Tika. I noticed that it's been fixed in the trunk of PDFBox (0.8.0):
Unfortunately this version of PDFBox is not a drop-in replacement since
they shuffled things around and it now exists under the
org.apache.pdfbox package instead of org.pdfbox.
Is there a timeline for upgrading to PDFBox 0.8.0? Perhaps the upgrade
could be done in a branch that could be merged once 0.8.0 is released?
If it's a simple matter of replacing "org.pdfbox" with
"org.apache.pdfbox" I could volunteer for that, but if the upgrade is
more complicated it may very well be beyond my meager Java skills.