Hi Rida,
I agree totally! You should take a look at the MarkupLanguageProposal
(within Nutch
http://wiki.apache.org/nutch/MarkupLanguageParserProposal) and
the work done in Frutch
(
http://www.krugle.com/kse/files?query=frutch%20parse%20out) on the ParseXml
plugin.
I'd love to chat with you more about this. Let me know what you think.
Thanks,
Chris
On 10/10/07 9:28 AM, "Rida Benjelloun" <
[hidden email]>
wrote:
> Hi,
> Do you think that we should have a XmlOutputter that save the extracted
> content and metadata in XML file ? This will simplify integration with other
> technologies like Solr for example.
> The XmlOutputter will process File (File or Directory recursively) and Url.
> Will use XSLT as a filter to masque or display the elements needed and an
> output encoding :
> Example
> TikaXmlOutputter txo = new TikaXmlOutputter()
> txo.output(File|URL input, File xmlOutput, File xsltFilter, String
> encoding);
>
> Regards.
______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory Pasadena, CA
Office: 171-266B Mailstop: 171-246
_______________________________________________________
Disclaimer: The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.