[jira] [Commented] (TIKA-2224) OneNote formats support - Mime Magic and Parser

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-2224) OneNote formats support - Mime Magic and Parser

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16994865#comment-16994865 ]

ASF GitHub Bot commented on TIKA-2224:
--------------------------------------

nddipiazza commented on pull request #303: TIKA-2224 OneNote parser support
URL: https://github.com/apache/tika/pull/303
 
 
   # OneNote parser
   
   The following adds `.one` file format parsing support.
   `application/onenote; format=one`
   
   Supports embedded documents as well.
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> OneNote formats support - Mime Magic and Parser
> -----------------------------------------------
>
>                 Key: TIKA-2224
>                 URL: https://issues.apache.org/jira/browse/TIKA-2224
>             Project: Tika
>          Issue Type: Improvement
>          Components: mime
>    Affects Versions: 1.14
>            Reporter: Nick Burch
>            Priority: Major
>         Attachments: Sample1.json, Sample1.one, note-ssn-test-mmmm.one
>
>
> As raised at http://stackoverflow.com/questions/41272195/onenote-support-for-apache-tika-parsers, we don't have any magic for the OneNote formats. Several years ago we dug out the file format specs (see http://lucene.472066.n3.nabble.com/Tika-OneNote-Support-td4020393.html), but didn't have volunteer energy to implement a parser. However, armed with those specs, we should be able to come up with some mime magic for detection



--
This message was sent by Atlassian Jira
(v8.3.4#803005)