[jira] Created: (TIKA-336) More issues with RDF mime detection

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (TIKA-336) More issues with RDF mime detection

JIRA jira@apache.org
More issues with RDF mime detection
-----------------------------------

                 Key: TIKA-336
                 URL: https://issues.apache.org/jira/browse/TIKA-336
             Project: Tika
          Issue Type: Bug
          Components: mime
    Affects Versions: 0.5
         Environment: several user environments as well as validated in Mattmann's environment.
            Reporter: Chris A. Mattmann
            Assignee: Chris A. Mattmann
             Fix For: 0.6


See TIKA-309 for related discussion, but there seems to be further errors in RDF mime detection, on the OWL file located here:

http://www.w3.org/2002/07/owl#

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (TIKA-336) More issues with RDF mime detection

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/TIKA-336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Chris A. Mattmann resolved TIKA-336.
------------------------------------

    Resolution: Fixed

- fixed in r884340

Yuan-Fang, please test out the latest Tika trunk. I've:

* updated the test-difficult-rdf2.xml file to remove the <?xml header
* updated the tika-mimetypes.xml to detect files that start with <!-- as xml files (as a default magic first check). Then, this forces xmlRoot detection to occur where the specific XML subclass is detected (which is what we want). There, application/rdf+xml is properly detected. Before, since there was no magic header for <!--, the initial magic result check was null and then the mimeTypes detector ended up returning text/plain.

In the future we may want to make:

* xmlRoot extraction occur on text/plain documents
* move the text/plain check to the beginning of the o.a.tika.mime.MimeTypes#getMimeType(byte[] data) function

> More issues with RDF mime detection
> -----------------------------------
>
>                 Key: TIKA-336
>                 URL: https://issues.apache.org/jira/browse/TIKA-336
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.5
>         Environment: several user environments as well as validated in Mattmann's environment.
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.6
>
>
> See TIKA-309 for related discussion, but there seems to be further errors in RDF mime detection, on the OWL file located here:
> http://www.w3.org/2002/07/owl#

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-336) More issues with RDF mime detection

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12786755#action_12786755 ]

Yuan-Fang Li commented on TIKA-336:
-----------------------------------

Hi Chris, I just did an update and can confirm that the bug has been plugged. Thanks for the fix.

Cheers
Yuan-Fang

> More issues with RDF mime detection
> -----------------------------------
>
>                 Key: TIKA-336
>                 URL: https://issues.apache.org/jira/browse/TIKA-336
>             Project: Tika
>          Issue Type: Bug
>          Components: mime
>    Affects Versions: 0.5
>         Environment: several user environments as well as validated in Mattmann's environment.
>            Reporter: Chris A. Mattmann
>            Assignee: Chris A. Mattmann
>             Fix For: 0.6
>
>
> See TIKA-309 for related discussion, but there seems to be further errors in RDF mime detection, on the OWL file located here:
> http://www.w3.org/2002/07/owl#

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.