MimeTypes.java final?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

MimeTypes.java final?

Ryan McKinley
Hello-

I'm in the process of ditching a custom format registry and trying to
replace it with tika MimeTypes.  For the most part things are going
pretty well.

The key things I am stuck with:
1. As is, MimeTypes#forName(String name) will get or create the
MimeType.  There is no way to ask if the MimeTypes registry already
knows about the type.

2. No way to show magic or rootXML in my UI since they are private,
final and don't have getters:
    private List<Magic> magics = null;
    private List<RootXML> rootXML = null;

What is the general philosophy in Tika on this?  Should I submit a
patch adding (read only) getters for these things?  a patch removing
final and or making the variables protected so a subclass can do what
works in my case?

In a similar question, is there interest in adding other metadata to
the core MimeType class, like URLs to documentation, or the UTI
(http://en.wikipedia.org/wiki/Uniform_Type_Identifier), or a friendly
display name?

thanks
ryan
Reply | Threaded
Open this post in threaded view
|

Re: MimeTypes.java final?

Nick Burch-2
On Mon, 29 Oct 2012, Ryan McKinley wrote:
> The key things I am stuck with:
> 1. As is, MimeTypes#forName(String name) will get or create the
> MimeType.  There is no way to ask if the MimeTypes registry already
> knows about the type.

I think the idea is that you use the underlying MediaTypeRegistry if you
want to have more control over this

> 2. No way to show magic or rootXML in my UI since they are private,
> final and don't have getters:
>    private List<Magic> magics = null;
>    private List<RootXML> rootXML = null;

Could you maybe explain why you need these?


> In a similar question, is there interest in adding other metadata to
> the core MimeType class, like URLs to documentation, or the UTI
> (http://en.wikipedia.org/wiki/Uniform_Type_Identifier), or a friendly
> display name?

There might be. We already have things like comments, so these might be a
good addition

Could you perhaps propose what the XML would look like for a few common
types with this extra info it in, so we can get a better idea of what info
you'd be adding?

Nick
Reply | Threaded
Open this post in threaded view
|

Re: MimeTypes.java final?

Ryan McKinley
On Mon, Oct 29, 2012 at 12:42 PM, Nick Burch <[hidden email]> wrote:

> On Mon, 29 Oct 2012, Ryan McKinley wrote:
>>
>> The key things I am stuck with:
>> 1. As is, MimeTypes#forName(String name) will get or create the
>> MimeType.  There is no way to ask if the MimeTypes registry already
>> knows about the type.
>
>
> I think the idea is that you use the underlying MediaTypeRegistry if you
> want to have more control over this
>

With MediaTypeRegistry, I can get a list of all the known types and
build a parallel map.

Since


>
>> 2. No way to show magic or rootXML in my UI since they are private,
>> final and don't have getters:
>>    private List<Magic> magics = null;
>>    private List<RootXML> rootXML = null;
>
>
> Could you maybe explain why you need these?
>

I want to display it in our UI.  Our management UI shows information
about supported formats and I want to expose as much information on
how/why things match.  We don't want people to need to open the .xml
file to see these values, and I would rather not have to parse them
independently if that can be avoided.


>
>
>> In a similar question, is there interest in adding other metadata to
>> the core MimeType class, like URLs to documentation, or the UTI
>> (http://en.wikipedia.org/wiki/Uniform_Type_Identifier), or a friendly
>> display name?
>
>
> There might be. We already have things like comments, so these might be a
> good addition
>
> Could you perhaps propose what the XML would look like for a few common
> types with this extra info it in, so we can get a better idea of what info
> you'd be adding?
>

Following the existing format for comments, what about something like:

<_url>http://...</_url>
and
<_uti>http://...</_uti>

For BMP, this could be:

  <mime-type type="image/x-ms-bmp">
    <alias type="image/bmp"/>
    <acronym>BMP</acronym>
    <_comment>Windows bitmap</_comment>
    <_url>http://en.wikipedia.org/wiki/BMP_file_format</_url>
    <_uti>com.microsoft.bmp</_uti>
    <magic priority="50">
      ....


With URLs, it should likely support multiple entries since there are
undoubtedly formats with multiple good reference links.

My motivation here is also a debug/management UI -- but this seems
like a reasonable way to help document the formats described in
tika-mimetypes.xml


thanks
ryan
Reply | Threaded
Open this post in threaded view
|

Re: MimeTypes.java final?

Ryan McKinley
>>
>> I think the idea is that you use the underlying MediaTypeRegistry if you
>> want to have more control over this
>>
>
> With MediaTypeRegistry, I can get a list of all the known types and
> build a parallel map.
>
> Since
>

oops -- hit send too soon!

I'll poke around and see what I can do.

thanks
ryan
Reply | Threaded
Open this post in threaded view
|

Re: MimeTypes.java final?

Mattmann, Chris A (3010)
In reply to this post by Ryan McKinley
Hi Ryan,

I think #1 has been suggested before, in a thread called "Appending
MIME Types": http://s.apache.org/TVe

As for #2, I think that's the type of information we're trying to hide through
the class interface.

I like the adding more URL information and URI stuff to the MIME registry
though too.

Cheers,
Chris

On Oct 29, 2012, at 11:44 AM, Ryan McKinley wrote:

> Hello-
>
> I'm in the process of ditching a custom format registry and trying to
> replace it with tika MimeTypes.  For the most part things are going
> pretty well.
>
> The key things I am stuck with:
> 1. As is, MimeTypes#forName(String name) will get or create the
> MimeType.  There is no way to ask if the MimeTypes registry already
> knows about the type.
>
> 2. No way to show magic or rootXML in my UI since they are private,
> final and don't have getters:
>  private List<Magic> magics = null;
>  private List<RootXML> rootXML = null;
>
> What is the general philosophy in Tika on this?  Should I submit a
> patch adding (read only) getters for these things?  a patch removing
> final and or making the variables protected so a subclass can do what
> works in my case?
>
> In a similar question, is there interest in adding other metadata to
> the core MimeType class, like URLs to documentation, or the UTI
> (http://en.wikipedia.org/wiki/Uniform_Type_Identifier), or a friendly
> display name?
>
> thanks
> ryan

Reply | Threaded
Open this post in threaded view
|

Re: MimeTypes.java final?

Ryan McKinley
On Mon, Oct 29, 2012 at 2:03 PM, Mattmann, Chris A (388J)
<[hidden email]> wrote:
> Hi Ryan,
>
> I think #1 has been suggested before, in a thread called "Appending
> MIME Types": http://s.apache.org/TVe

ah yes -- this is actually a similar use case to what I want to do!


>
> As for #2, I think that's the type of information we're trying to hide through
> the class interface.

As I dig around the Magic and RootXML stuff... I see why it should be
hidden.  This makes sense.

I'll find some other way to give good debug feedback

>
> I like the adding more URL information and URI stuff to the MIME registry
> though too.
>

I went ahead and added:
https://issues.apache.org/jira/browse/TIKA-1012
https://issues.apache.org/jira/browse/TIKA-1013


thanks
ryan
Reply | Threaded
Open this post in threaded view
|

Re: MimeTypes.java final?

Mattmann, Chris A (3010)
Thanks Ryan you the man. Appreciate it. I will take a look
at the issues and try to help shepherd them in!

Cheers,
Chris

On Oct 29, 2012, at 6:52 PM, Ryan McKinley wrote:

> On Mon, Oct 29, 2012 at 2:03 PM, Mattmann, Chris A (388J)
> <[hidden email]> wrote:
>> Hi Ryan,
>>
>> I think #1 has been suggested before, in a thread called "Appending
>> MIME Types": http://s.apache.org/TVe
>
> ah yes -- this is actually a similar use case to what I want to do!
>
>
>>
>> As for #2, I think that's the type of information we're trying to hide through
>> the class interface.
>
> As I dig around the Magic and RootXML stuff... I see why it should be
> hidden.  This makes sense.
>
> I'll find some other way to give good debug feedback
>
>>
>> I like the adding more URL information and URI stuff to the MIME registry
>> though too.
>>
>
> I went ahead and added:
> https://issues.apache.org/jira/browse/TIKA-1012
> https://issues.apache.org/jira/browse/TIKA-1013
>
>
> thanks
> ryan