Which parsers support title properties?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Which parsers support title properties?

Keith R. Bennett
All -

Can anyone tell me which parsers support extracting titles and which do not?  Here is the list of parsers:

HTML
Excel
Powerpoint
Word
OpenOffice
PDF
RTF
TXT (obviously not)
XML

...and do they all fit with Content's text/xpath/regex select strings?

Wouldn't we need to modify this so that we have a strategy per parser, rather than one per string lookup type (text, xpath, regex)?

- Keith
Reply | Threaded
Open this post in threaded view
|

Re: Which parsers support title properties?

Rida Benjelloun
Hi Keith
All parsers support title extraction except RTF and TXT.
Regards

On 10/5/07, kbennett <[hidden email]> wrote:

>
> All -
>
> Can anyone tell me which parsers support extracting titles and which do not?
> Here is the list of parsers:
>
> HTML
> Excel
> Powerpoint
> Word
> OpenOffice
> PDF
> RTF
> TXT (obviously not)
> XML
>
> ...and do they all fit with Content's text/xpath/regex select strings?
>
> Wouldn't we need to modify this so that we have a strategy per parser,
> rather than one per string lookup type (text, xpath, regex)?
>
> - Keith
>
> --
> View this message in context:
> http://www.nabble.com/Which-parsers-support-title-properties--tf4577427.html#a13066766
> Sent from the Apache Tika - Development mailing list archive at Nabble.com.
>
>


--
---------------------------------------------------------
Rida Benjelloun
Doculibre inc.
[hidden email]
[hidden email]
Cel: 418-262-3222
Tel: 418-353-3390
Site Web : http://www.doculibre.com
---------------------------------------------------------