[jira] [Commented] (TIKA-2224) OneNote formats support - Mime Magic and Parser

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-2224) OneNote formats support - Mime Magic and Parser

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992897#comment-16992897 ]

Tim Allison commented on TIKA-2224:
-----------------------------------

[~ndipiazza_gmail], let me know what you think.  If you have any other PRs you'd like to merge, please aim them at TIKA-2224.

I'm still working to dig up other OneNote files.

Given that we're just at the beginning of development for the next release, I _think_ this is ready to go.

Fellow devs, if you'd like to take a look at the TIKA-2224 branch, please do!

> OneNote formats support - Mime Magic and Parser
> -----------------------------------------------
>
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>            Priority: Major
>         Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
>
>
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers, we don't have any magic for the OneNote formats. Several years ago we dug out the file format specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't have volunteer energy to implement a parser. However, armed with those specs, we should be able to come up with some mime magic for detection



--
This message was sent by Atlassian Jira
(v8.3.4#803005)