On Mon, Jun 4, 2012 at 2:21 PM, andrewtr <[hidden email]> wrote:
> While I am parsing the PDF or Word document using AutoDetectParser the <li>,
> <ul> tags are converted as <p> tags. I need the exact HTML content what is
> been there for PDF or Word Document.
<li> and <ul> tags in PDF or Word? I assume you rather mean the native
list formatting of those document types?
The Tika parsers for PDF and Office documents could/should
automatically map such formatting to equivalent XHTML constructs, but
I don't think they currently do. You'll need to look into the source
code to see how to make that happen.