[jira] Commented: (TIKA-447) Container aware mimetype detection

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (TIKA-447) Container aware mimetype detection

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12967036#action_12967036 ]

Jukka Zitting commented on TIKA-447:
------------------------------------

I refactored the code a bit in revision 1042476 to make it easier to compose with other kinds of detectors. Most notably I removed the ContainerDetector interface and made the POIFSContainerDetector and ZipContainerDetector classes directly implement the Detector interface.

> Container aware mimetype detection
> ----------------------------------
>
>                 Key: TIKA-447
>                 URL: https://issues.apache.org/jira/browse/TIKA-447
>             Project: Tika
>          Issue Type: New Feature
>          Components: mime
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>         Attachments: TIKA-447-TikaInputStream.patch, TikaContainerDetection.patch
>
>
> As discussed on the dev list, Tika should ideally have a configurable way to process container based formats (eg zip files and ole2 files) when trying to detect the correct mime type for a document.
> This needs to be configurable, because some people won't want Tika to have to do all the work of parsing the whole file when they're not interested in knowing exactly what's in it
> Once we have gone to the trouble of opening and parsing the container file, we should try to keep the open container around to speed up parsing of the contents

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.