Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email] WWW: http://sunset.usc.edu/~mattmann/ ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
Just want to send in a note of much appreciation for the work you've done (and the others tika contributors, poi, pdf, lucene, the list goes on). Work is underway on a project which feeds off the tika parser, as one of the content providers. Although tika is still in a pre-1.0 stage, it is providing enough content to allow us to avoid delays and keep momentum. Thanks for that!
What I am hoping to contribute as we continue, are examples of files that aren't parsing quite correctly, or have the wrong encoding set, etc. This project is running against English and Thai data, and will be moving into Japanese and Chinese sometime next year. So, maybe we will have access to a wider range of asian language files than you might have currently.
I wish that we had the technical level to contribute patches, but if there's anything that can be passed along to you to help with test / dev, I'd be happy to do so.
Thanks again, and letting you know that your efforts are being put to good use.