[jira] Created: (TIKA-128) HTML parser should produce XHTML SAX events

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-128) HTML parser should produce XHTML SAX events

JIRA jira@apache.org
HTML parser should produce XHTML SAX events
-------------------------------------------

                 Key: TIKA-128
                 URL: https://issues.apache.org/jira/browse/TIKA-128
             Project: Tika
          Issue Type: Improvement
          Components: parser
            Reporter: Jukka Zitting


The current HTML parser just sanitizes the input HTML and passes it forward with no structural changes.

Unfortunately this is incompatible with the other Tika parsers that produce XHTML output, and so IMHO we should be outputting XHTML also from the HTML parser.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (TIKA-128) HTML parser should produce XHTML SAX events

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-128.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.2-incubating
         Assignee: Jukka Zitting

Resolved in revision 638657.

> HTML parser should produce XHTML SAX events
> -------------------------------------------
>
>                 Key: TIKA-128
>                 URL: https://issues.apache.org/jira/browse/TIKA-128
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>             Fix For: 0.2-incubating
>
>
> The current HTML parser just sanitizes the input HTML and passes it forward with no structural changes.
> Unfortunately this is incompatible with the other Tika parsers that produce XHTML output, and so IMHO we should be outputting XHTML also from the HTML parser.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.