[jira] Updated: (TIKA-608) IOException from tagsoup

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (TIKA-608) IOException from tagsoup

Hudson (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-608?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Hetzner updated TIKA-608:
------------------------------

    Attachment: test.html

Cause IOException from tagsoup.

> IOException from tagsoup
> ------------------------
>
>                 Key: TIKA-608
>                 URL: https://issues.apache.org/jira/browse/TIKA-608
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 0.9
>            Reporter: Erik Hetzner
>            Priority: Minor
>         Attachments: test.html
>
>
> Attached HTML file causes IOexception from tagsoup.
> (Changing CR to LF fixes problem.)
> Exception in thread "main" org.apache.tika.exception.TikaException: TIKA-198: Illegal IOException from org.apache.tika.parser.html.HtmlParser@22b6d6ab
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:203)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
> at org.apache.tika.cli.TikaCLI$OutputType.process(TikaCLI.java:107)
> at org.apache.tika.cli.TikaCLI.process(TikaCLI.java:288)
> at org.apache.tika.cli.TikaCLI.main(TikaCLI.java:94)
> Caused by: java.io.IOException: Pushback buffer overflow
> at java.io.PushbackReader.unread(PushbackReader.java:138)
> at org.ccil.cowan.tagsoup.HTMLScanner.unread(HTMLScanner.java:274)
> at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:487)
> at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
> at org.apache.tika.parser.html.HtmlParser.parse(HtmlParser.java:198)
> at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:197)
> ... 5 more

--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira