[jira] [Commented] (TIKA-2648) mime detection based on resource name detects resources as "text/x-php" instead of "text/html"

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-2648) mime detection based on resource name detects resources as "text/x-php" instead of "text/html"

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-2648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16541312#comment-16541312 ]

ASF GitHub Bot commented on TIKA-2648:
--------------------------------------

GerardBouchar commented on issue #236: TIKA-2648 : detect interpreted server-side scripting languages
URL: https://github.com/apache/tika/pull/236#issuecomment-404431036
 
 
   @tballison anything new on this ?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> mime detection based on resource name detects resources as "text/x-php" instead of "text/html"
> -----------------------------------------------------------------------------------------------
>
>                 Key: TIKA-2648
>                 URL: https://issues.apache.org/jira/browse/TIKA-2648
>             Project: Tika
>          Issue Type: Bug
>            Reporter: Gerard Bouchar
>            Priority: Major
>
> When using tika to detect a mime type given only an URL containing ".php" and a content-type hint of "text/html", it guesses "text/x-php", whereas one could expect "text/html".
> {code}
> TikaConfig tika = new TikaConfig();
> Metadata metadata = new Metadata();
> String url = "https://www.facebook.com/home.php";
> metadata.set(Metadata.RESOURCE_NAME_KEY, url);
> metadata.set(Metadata.CONTENT_TYPE, "text/html");
> MediaType type = tika.getDetector().detect(null, metadata);
> System.out.println(url + " is of type " + type.toString());
> // Prints https://www.facebook.com/home.php is of type text/x-php
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)