[jira] [Commented] (TIKA-3034) Detector always returns text/plain when scanning Mathematica files

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (TIKA-3034) Detector always returns text/plain when scanning Mathematica files

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-3034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035473#comment-17035473 ]

Hudson commented on TIKA-3034:

SUCCESS: Integrated in Jenkins build tika-branch-1x #303 (See [https://builds.apache.org/job/tika-branch-1x/303/])
TIKA-3034 Mathematica files don't have a unique magic, but try to detect (tallison: [https://github.com/apache/tika/commit/5b88431a5135c9cf1ab00624a26d3f0fdec36a2b])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

> Detector always returns text/plain when scanning Mathematica files
> ------------------------------------------------------------------
>                 Key: TIKA-3034
>                 URL: https://issues.apache.org/jira/browse/TIKA-3034
>             Project: Tika
>          Issue Type: Bug
>          Components: detector
>    Affects Versions: 1.23
>            Reporter: Tung Nguyen
>            Priority: Blocker
>              Labels: math
>             Fix For: 1.23
> We are working with Tika to implement our mime types detection module. The library seemingly cannot detect Mathematica files although the documentation confirmed it does [1]. The Tika detector always returns `text/plain` instead of `application/mathematica` as described in the documentation as well as unit tests [2].
> By doing the same need with Python code as below, we can obtain the right mime types for any Mathematica file downloaded from the Wolfram Library Archive [3]. 
> {code:java}
> #!/usr/bin/python3
> import mimetypes, os, sys
> test_file = sys.argv[1]
> print(mimetypes.MimeTypes().guess_type(test_file)[0])
> {code}
> Therefore, we suspected there is a bug in Tika detector where it tries to guess mime types for Mathematica files.
> Also, there is an existing ticket asking for the implementation of Mathematica file detector. Here it is: https://issues.apache.org/jira/browse/TIKA-1520
> References:
>  [1] [https://tika.apache.org/1.23/formats.html]
>  [2] [https://github.com/apache/tika/blob/master/tika-core/src/test/java/org/apache/tika/TikaDetectionTest.java#L64]
>  [3] [https://library.wolfram.com/infocenter/Courseware/4706/]

This message was sent by Atlassian Jira