Hi Jukka,
> Once TIKA-43 is committed (I'm giving it a day or two for reviews and
> comments) there are still two Parser related changes that I'd like to
> do before I think we're ready to do the first 0.1 release.
+1, agreed. At present, we've worked through 30 JIRA issues so far (great
job guys!), and I think that the library is reaching stability and is primed
for an official release.
I'll put my name out there as someone available to be the release master
when the time comes. I've done it on Nutch before and wouldn't mind doing it
for Tika. Just let me know if you guys agree.
>
> First, I'd like to replace the current Iterable<Content> construct
> with a Metadata object that allows metadata to be passed in and out of
> the parser. Also, this Metadata object should be decoupled from parser
> configuration.
I completely agree. I'd like to help with this issue as the Metadata
framework is very near and dear to my heart. What's the interface that you
are proposing for it look like again? Something like:
String parse(InputStream stream, Metadata metadata)
throws IOException, TikaException;
>
> Second, instead of returning the text content of a document as a
> String, I'd like the parsers to generate SAX events with the text
> content passed as characters() events.
Then, the next evolutionary step would be:
SAXEvent parse(InputStream stream, Metadata metadata)
throws IOException, TikaException;
?
>
> Unless anyone objects (feel free to do so if you have better design
> ideas!), I'll follow up with new patches for these two issues in the
> next week or two. Once these changes are done, I think we're good to
> go for the first Tika release. Such a timing would also be perfect for
> the upcoming ApacheCon US conference. :-)
Totally agree! Great job so far: I am really starting to like this new
Parsing interface...
Cheers,
Chris
>
> BR,
>
> Jukka Zitting
______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory Pasadena, CA
Office: 171-266B Mailstop: 171-246
_______________________________________________________
Disclaimer: The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.