[jira] [Commented] (TIKA-2636) ENVI Header metadata fields can span more than one line

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-2636) ENVI Header metadata fields can span more than one line

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-2636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16450467#comment-16450467 ]

ASF GitHub Bot commented on TIKA-2636:
--------------------------------------

lewismc opened a new pull request #234: TIKA-2636 ENVI Header metadata fields can span more than one line
URL: https://github.com/apache/tika/pull/234
 
 
   This issue addresses https://issues.apache.org/jira/browse/TIKA-2636 and also provides augmented unit tests and a new test resource.
   Additionally, this PR addresses [invocation of the EnviHeaderParser via the tika-server](https://s.apache.org/Is7G) by adding the Parser discovery to src/main/resources/META-INF

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> ENVI Header metadata fields can span more than one line
> -------------------------------------------------------
>
>                 Key: TIKA-2636
>                 URL: https://issues.apache.org/jira/browse/TIKA-2636
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.17
>            Reporter: Lewis John McGibbney
>            Assignee: Lewis John McGibbney
>            Priority: Major
>             Fix For: 2.0.0
>
>         Attachments: ang20150420t182050_corr_v1e_img.hdr
>
>
> [~tpalsulich] was correct when [he stated|https://issues.apache.org/jira/browse/TIKA-1357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14046140#comment-14046140] "...See below for how to read and output line by line (copy & paste between the xml start/end in EnviHeaderParser). I have a hunch this isn't really what we want -- what if a metadata field has a newline in it? What if the line is too long to fit into a string? On the other hand, with nice input, it's much nicer output."
> As it turns out ENVI header metadata fields can span more than one line. An example is as follows
> {code}
> 1.    ENVI
> 2.    description = {
> 3.      Georeferenced Image built from input GLT. [Wed Jun 10 04:37:54 2015] [Wed
> 4.      Jun 10 04:48:52 2015]}
> 5.    samples = 739
> 6.    lines = 14674
> 7.    bands = 432
> 8.    header offset = 0
> 9.    file type = ENVI Standard
> 10.    data type = 4
> 11.    interleave = bil
> 12.    sensor type = Unknown
> 13.    byte order = 0
> 14.    map info = { UTM , 1.000 , 1.000 , 724522.127 , 4074620.759 , 1.1000000000e+00 , 1.1000000000e+00 , 12 , North , WGS-84 , units=Meters , rotation=75.00000000 }
> 15.    wavelength units = Nanometers
> ...
> {code}
> The case here is when a metadata field value is contained within curly brackets. The examples above are clearly L2-L4 where the value is spread over three lines and L14 where the value is contained within the one line.
> This requires a patch to fix the [EnviHeaderParser|https://github.com/apache/tika/blob/9130bbc1fa6d69419b2ad294917260d6b1cced08/tika-parsers/src/main/java/org/apache/tika/parser/envi/EnviHeaderParser.java]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)