[jira] [Created] (TIKA-3023) Text files starting with MOVI are detected as X-SGI-Movie

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (TIKA-3023) Text files starting with MOVI are detected as X-SGI-Movie

Sebastian Nagel (Jira)
Steve created TIKA-3023:
---------------------------

             Summary: Text files starting with MOVI are detected as X-SGI-Movie
                 Key: TIKA-3023
                 URL: https://issues.apache.org/jira/browse/TIKA-3023
             Project: Tika
          Issue Type: Bug
    Affects Versions: 1.23
         Environment: Issue recreated on

Windows 10 Professional 64bit running the runnable Jar

Ubuntu 16.04.6 LTS running Tika-Python
            Reporter: Steve
         Attachments: capitalmovie.txt

If a plaintext file starts with "MOVI" Tika labels it as an SGI Movie.

The hex conversion for MOVI is 4D 4F 56 49 which is the same as the header for the SGI Movie file format

[https://reposcope.com/mimetype/video/x-sgi-movie]

 

This SGI format isn't supported so any information from a text file starting like this would be lost. I've attached a simple file that should recreate the problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)