[jira] [Commented] (TIKA-3023) Text files starting with MOVI are detected as X-SGI-Movie

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-3023) Text files starting with MOVI are detected as X-SGI-Movie

ASF GitHub Bot (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-3023?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17035474#comment-17035474 ]

Hudson commented on TIKA-3023:
------------------------------

SUCCESS: Integrated in Jenkins build tika-branch-1x #303 (See [https://builds.apache.org/job/tika-branch-1x/303/])
TIKA-3023 Make the SGI Movie mime magic more specific to avoid false (tallison: [https://github.com/apache/tika/commit/28aecab3e8267747136bf6c4cf7377054330789b])
* (edit) tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml


> Text files starting with MOVI are detected as X-SGI-Movie
> ---------------------------------------------------------
>
>                 Key: TIKA-3023
>                 URL: https://issues.apache.org/jira/browse/TIKA-3023
>             Project: Tika
>          Issue Type: Bug
>    Affects Versions: 1.23
>         Environment: Issue recreated on
> Windows 10 Professional 64bit running the runnable Jar
> Ubuntu 16.04.6 LTS running Tika-Python
>            Reporter: Steve
>            Priority: Minor
>             Fix For: 1.24
>
>         Attachments: capitalmovie.txt
>
>
> If a plaintext file starts with "MOVI" Tika labels it as an SGI Movie.
> The hex conversion for MOVI is 4D 4F 56 49 which is the same as the header for the SGI Movie file format
> [https://reposcope.com/mimetype/video/x-sgi-movie]
>  
> This SGI format isn't supported so any information from a text file starting like this would be lost. I've attached a simple file that should recreate the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)