[jira] [Commented] (TIKA-2001) Parsing XML outputs empty string

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-2001) Parsing XML outputs empty string

Chris Mattmann (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-2001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17065870#comment-17065870 ]

Stefan commented on TIKA-2001:
------------------------------

I just tested this with 1.19 and 1.24, leading to the same result.

For us, the second version would also make the most sense. Attribute names do carry meaning.

I would argue, since xml files are human readable, to handle them as close to text files as possible.

> Parsing XML outputs empty string
> --------------------------------
>
>                 Key: TIKA-2001
>                 URL: https://issues.apache.org/jira/browse/TIKA-2001
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.11, 1.12, 1.13
>            Reporter: George L. Yermulnik
>            Priority: Minor
>
> Can't get Tika parse my xml files:
> {code}
> root@spring:/tmp# java -version
> java version "1.8.0_91"
> Java(TM) SE Runtime Environment (build 1.8.0_91-b14)
> Java HotSpot(TM) 64-Bit Server VM (build 25.91-b14, mixed mode)
> root@spring:/tmp# cat /tmp/xml/5751061032fbd-7148.xml
> <?xml version="1.0" encoding="UTF-8"?>
> <spocosy version="1.0"><subscription-update subscriptionid="0" requestid="0" last_push="2016-06-03 06:21:34" current_push="2016-06-03 06:21:37" exec="0.002"><lineup id="0" event_participantsFK="0" participantFK="0" lineup_typeFK="0" shirt_number="0" pos="0" enet_pos="0" n="0" ut="2016-06-03 06:21:37" del="no"/></subscription-update></spocosy>
> root@spring:/tmp# for i in 3 2 1; do
>     echo -n "tika-app-1.1${i}.jar: "
>     java -jar tika-app-1.1${i}.jar --text /tmp/xml/5751061032fbd-7148.xml
> done
> tika-app-1.13.jar:
> tika-app-1.12.jar:
> tika-app-1.11.jar:
> root@spring:/tmp#
> {code}
> Appreciate any help. Thanx.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)