[jira] Created: (TIKA-330) Better HWP (Hangul Word Processor) detection pattern

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-330) Better HWP (Hangul Word Processor) detection pattern

ASF GitHub Bot (Jira)
Better HWP (Hangul Word Processor) detection pattern
----------------------------------------------------

                 Key: TIKA-330
                 URL: https://issues.apache.org/jira/browse/TIKA-330
             Project: Tika
          Issue Type: Improvement
          Components: mime
            Reporter: Jukka Zitting
            Assignee: Jukka Zitting
            Priority: Minor


The current magic byte pattern we have for the HWP (Hangul Word Processor, application/x-hwp) file format matches also the test-outlook.msg test file we have. I looked for a better detection pattern and found one from OpenOffice.org.

The hwpfilter/source/hwpfile.cpp file suggests that all HWP files start with the signature string "HWP Document File V", so I'll change the detection pattern accordingly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (TIKA-330) Better HWP (Hangul Word Processor) detection pattern

ASF GitHub Bot (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-330.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.6

Fixed in revision 883306.

> Better HWP (Hangul Word Processor) detection pattern
> ----------------------------------------------------
>
>                 Key: TIKA-330
>                 URL: https://issues.apache.org/jira/browse/TIKA-330
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>            Reporter: Jukka Zitting
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.6
>
>
> The current magic byte pattern we have for the HWP (Hangul Word Processor, application/x-hwp) file format matches also the test-outlook.msg test file we have. I looked for a better detection pattern and found one from OpenOffice.org.
> The hwpfilter/source/hwpfile.cpp file suggests that all HWP files start with the signature string "HWP Document File V", so I'll change the detection pattern accordingly.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.