Metadata design

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

Metadata design

Jukka Zitting
Hi,

Last weekend I spent some time thinking about our Metadata class and
more generally the handling of metadata in Tika. To summarize my
thoughts, here's a list of things I think are important for metadata
handling in Tika.

0) Metadata in Tika is always about the document being parsed.

1) Metadata should consist of a modifiable set of keys mapped to values.

2) Metadata keys should be designed to avoid collisions or misspellings.

3) It should be possible to store non-String metadata, like Locale
settings, Date instances, thumbnail images, etc.

4) We should document and enforce a standard set of metadata keys,
based on Dublin Core and other standards where possible.

5) It should be easy to extend the set of metadata keys to include
custom metadata.

6) All metadata keys (both standard and custom) should be clearly
documented with the expected value type and recommended usage.

7) No two distinct metadata keys should be used for the same metadata semantics.

8) The Metadata class should have convenience methods for accessing
the most commonly used metadata.

The current Metadata class fails somewhat with 2 (there's even a
SpellCheckedMetadata class) and doesn't support 3 or 8. The constants
in o.a.tika.metadata interfaces go some way towards 4 and 6, but not
as far as they could. And we don't do that well on 7.

So in general I think there's much that we could improve on. To
resolve most of the issues I'd like to modify metadata handling as
follows:

a) Allow both metadata keys and values to be arbitrary Objects.

b) Instead of String constants as metadata keys, use constant object
instances like DublinCore.TITLE = new DublinCore("title"). These
objects should have good hashCode(), equals(), and toString()
implementations.

c) Use Date instances for date metadata, URI instances for URIs, etc.
All value objects should preferably have good toString()
implementations.

d) Use the Dublin Core "identifier" property instead of the current
RESOURCE_NAME_KEY, and the "format" property instead of CONTENT_TYPE.

e) Add utility methods like set/getIdentifier(), set/getFormat(),
set/getTitle(), etc. to the Metadata class for accessing the key
Dublin Core metadata.

WDYT?

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jérôme Charron-2
Hi Jukka,

Very good summary of what a metadata system smust be.
From my point of view, +1 for all your proposals.
Just one point I don't understand : why would you like to use objects for
metadata keys ? what are the benefits?

BR

Jérôme


On Mon, Mar 10, 2008 at 9:54 AM, Jukka Zitting <[hidden email]>
wrote:

> Hi,
>
> Last weekend I spent some time thinking about our Metadata class and
> more generally the handling of metadata in Tika. To summarize my
> thoughts, here's a list of things I think are important for metadata
> handling in Tika.
>
> 0) Metadata in Tika is always about the document being parsed.
>
> 1) Metadata should consist of a modifiable set of keys mapped to values.
>
> 2) Metadata keys should be designed to avoid collisions or misspellings.
>
> 3) It should be possible to store non-String metadata, like Locale
> settings, Date instances, thumbnail images, etc.
>
> 4) We should document and enforce a standard set of metadata keys,
> based on Dublin Core and other standards where possible.
>
> 5) It should be easy to extend the set of metadata keys to include
> custom metadata.
>
> 6) All metadata keys (both standard and custom) should be clearly
> documented with the expected value type and recommended usage.
>
> 7) No two distinct metadata keys should be used for the same metadata
> semantics.
>
> 8) The Metadata class should have convenience methods for accessing
> the most commonly used metadata.
>
> The current Metadata class fails somewhat with 2 (there's even a
> SpellCheckedMetadata class) and doesn't support 3 or 8. The constants
> in o.a.tika.metadata interfaces go some way towards 4 and 6, but not
> as far as they could. And we don't do that well on 7.
>
> So in general I think there's much that we could improve on. To
> resolve most of the issues I'd like to modify metadata handling as
> follows:
>
> a) Allow both metadata keys and values to be arbitrary Objects.
>
> b) Instead of String constants as metadata keys, use constant object
> instances like DublinCore.TITLE = new DublinCore("title"). These
> objects should have good hashCode(), equals(), and toString()
> implementations.
>
> c) Use Date instances for date metadata, URI instances for URIs, etc.
> All value objects should preferably have good toString()
> implementations.
>
> d) Use the Dublin Core "identifier" property instead of the current
> RESOURCE_NAME_KEY, and the "format" property instead of CONTENT_TYPE.
>
> e) Add utility methods like set/getIdentifier(), set/getFormat(),
> set/getTitle(), etc. to the Metadata class for accessing the key
> Dublin Core metadata.
>
> WDYT?
>
> BR,
>
> Jukka Zitting
>



--
Jérôme Charron
Directeur Technique @ WebPulse
Tel: +33673716743 - [hidden email]
http://blog.shopreflex.com/
http://www.shopreflex.com/
http://www.staragora.com/
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jeremias Maerki-2
In reply to this post by Jukka Zitting
*g* Sounds a lot like what I built in XML Graphics Commons with the XMP
support:
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/examples/java/xmp/MetadataFromScratch.java?view=markup
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/examples/java/xmp/ParseMetadata.java?view=markup
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/xmp/XMPProperty.java?view=markup

...and what Adobe's XMP package provides. ;-)

Contrary to my earlier proposal, I haven't switched to Adobe's package,
yet, as I was nearly finished in XG Commons to support all I needed and
I didn't feel enough support for a joint solution. Switching to Adobe's
package would have been more work. Still, XG Commons' XMP support is not
100% complete, yet, but it's almost there. I'm still prepared to put
some work into a common solution if there's enough interest.

But if you prefer a local solution, I'll shut up.

On 10.03.2008 09:54:10 Jukka Zitting wrote:

> Hi,
>
> Last weekend I spent some time thinking about our Metadata class and
> more generally the handling of metadata in Tika. To summarize my
> thoughts, here's a list of things I think are important for metadata
> handling in Tika.
>
> 0) Metadata in Tika is always about the document being parsed.
>
> 1) Metadata should consist of a modifiable set of keys mapped to values.
>
> 2) Metadata keys should be designed to avoid collisions or misspellings.
>
> 3) It should be possible to store non-String metadata, like Locale
> settings, Date instances, thumbnail images, etc.
>
> 4) We should document and enforce a standard set of metadata keys,
> based on Dublin Core and other standards where possible.
>
> 5) It should be easy to extend the set of metadata keys to include
> custom metadata.
>
> 6) All metadata keys (both standard and custom) should be clearly
> documented with the expected value type and recommended usage.
>
> 7) No two distinct metadata keys should be used for the same metadata semantics.
>
> 8) The Metadata class should have convenience methods for accessing
> the most commonly used metadata.
>
> The current Metadata class fails somewhat with 2 (there's even a
> SpellCheckedMetadata class) and doesn't support 3 or 8. The constants
> in o.a.tika.metadata interfaces go some way towards 4 and 6, but not
> as far as they could. And we don't do that well on 7.
>
> So in general I think there's much that we could improve on. To
> resolve most of the issues I'd like to modify metadata handling as
> follows:
>
> a) Allow both metadata keys and values to be arbitrary Objects.
>
> b) Instead of String constants as metadata keys, use constant object
> instances like DublinCore.TITLE = new DublinCore("title"). These
> objects should have good hashCode(), equals(), and toString()
> implementations.
>
> c) Use Date instances for date metadata, URI instances for URIs, etc.
> All value objects should preferably have good toString()
> implementations.
>
> d) Use the Dublin Core "identifier" property instead of the current
> RESOURCE_NAME_KEY, and the "format" property instead of CONTENT_TYPE.
>
> e) Add utility methods like set/getIdentifier(), set/getFormat(),
> set/getTitle(), etc. to the Metadata class for accessing the key
> Dublin Core metadata.
>
> WDYT?
>
> BR,
>
> Jukka Zitting




Jeremias Maerki

Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jukka Zitting
In reply to this post by Jérôme Charron-2
Hi,

On Mon, Mar 10, 2008 at 11:08 AM, Jérôme Charron
<[hidden email]> wrote:
>  Just one point I don't understand : why would you like to use objects for
>  metadata keys ? what are the benefits?

Using objects instead of strings nicely solves the namespacing issue
(TIKA-61). If the keys were just strings we'd need to use some
namespacing mechanism to ensure that two custom parser implementations
won't choose to use the same metadata key for different purposes.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jukka Zitting
In reply to this post by Jeremias Maerki-2
Hi,

On Mon, Mar 10, 2008 at 11:33 AM, Jeremias Maerki
<[hidden email]> wrote:
> *g* Sounds a lot like what I built in XML Graphics Commons with the XMP
>  support:

XMP is a valid option. I briefly looked at the Adobe XMP library and
JempBox as options, but I'm a bit worried about the complexity of the
API and the fact that there is little guidance on what metadata
properties to use for which purposes.

I agree that using a standard metadata representation is very useful,
but is it worth the extra complexity? At least we should find a way to
cover requirements 4, 6, and 8 on top of XMP.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jeremias Maerki-2
On 10.03.2008 11:03:07 Jukka Zitting wrote:

> Hi,
>
> On Mon, Mar 10, 2008 at 11:33 AM, Jeremias Maerki
> <[hidden email]> wrote:
> > *g* Sounds a lot like what I built in XML Graphics Commons with the XMP
> >  support:
>
> XMP is a valid option. I briefly looked at the Adobe XMP library and
> JempBox as options, but I'm a bit worried about the complexity of the
> API and the fact that there is little guidance on what metadata
> properties to use for which purposes.

Take a look at the XMP specification [2]. It contains documentation for
a number of metadata schemas.

[1] http://www.adobe.com/products/xmp/index.html
[2] http://www.adobe.com/devnet/xmp/pdfs/xmp_specification.pdf

Of course, some properties might be missing which Tika might need. But
they can be defined by Tika in your own schema and you can provide your
own adapter class for easy, type-safe access.

> I agree that using a standard metadata representation is very useful,
> but is it worth the extra complexity? At least we should find a way to
> cover requirements 4, 6, and 8 on top of XMP.

That's why I added the link to:
http://svn.apache.org/viewvc/xmlgraphics/commons/trunk/examples/java/xmp/MetadataFromScratch.java?view=markup
See also:
http://svn.apache.org/repos/asf/xmlgraphics/commons/trunk/src/java/org/apache/xmlgraphics/xmp/schemas/

You can see how easy it is to access the individual values (type-safe) while
still offering generic access to the properties. The documentation (your
no 6) can be done through Javadocs on the adapter classes and, if
necessary, a separate XML containing the Schema from which you can
generate tables as found in the XMP specification. The PDF/A standard
even contains a schema expressed in XMP that allows to describe XMP
schemas (not that this is very legible, something simpler is probably
better).

I'm pretty sure that things such as thumbnail can also be mapped. When
serialized to an XMP packet that would simply be converted into a
RFC2397 data URL.


HTH
Jeremias Maerki

Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jukka Zitting
Hi,

On Mon, Mar 10, 2008 at 12:18 PM, Jeremias Maerki
<[hidden email]> wrote:
>  Of course, some properties might be missing which Tika might need. But
>  they can be defined by Tika in your own schema [...]

Instead of missing some properties, I'm more concerned about the fact
that the wide variety of XMP properties will put off a number of
potential users and that we have little chance of enforcing
consistency across parser implementations.

> [...] you can provide your  own adapter class for easy, type-safe access.

That sounds like a good approach. We could make the Tika Metadata
class be an XMPMeta adapter that provides simple getters and setters
for the most commonly used types of metadata. Parsers and clients that
produce or need more detailed metadata can directly access the
underlying XMP model, while others can ignore the underlying
complexity.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

chrismattmann
In reply to this post by Jukka Zitting
Hi Jukka,

Very nice set of requirements. I'd like to chime in on each below:

>
> 0) Metadata in Tika is always about the document being parsed.

+1

>
> 1) Metadata should consist of a modifiable set of keys mapped to values.

Let's be concrete here:

I think that Metadata should be of the form:

[key]=>1...n [value]

Where [key]'s are modifiable (as are [value]'s)

This is what you are expressing, correct?

>
> 2) Metadata keys should be designed to avoid collisions or misspellings.

+1 in all cases that we are in control of, with the caveat that sometimes we
aren't in control of the keys being used, especially in situations where we
have automated Metadata retrieval. For instance, in Nutch, we use the
Metadata object to store "content-type", which is returned from a web
server, processing the met keys in an automated fashion. Well, some servers
return "Content-type", some return "content-type", or "content type", well,
... you get the picture. So we still need that support, at least to support
that type of use case.

>
> 3) It should be possible to store non-String metadata, like Locale
> settings, Date instances, thumbnail images, etc.

I somewhat have to agree with Jerome here -- what is the value of storing
non-String metadata [values]? To me, this goes against standards like Dublin
Core, or ISO 11179, which explicitly define metadata values to be of String
form.


>
> 4) We should document and enforce a standard set of metadata keys,
> based on Dublin Core and other standards where possible.

+1, with the caveat, that we need to allow folks to define their own keys as
well.

>
> 5) It should be easy to extend the set of metadata keys to include
> custom metadata.

+1, ah perfect, you covered my caveat from #4 above here. Great.

>
> 6) All metadata keys (both standard and custom) should be clearly
> documented with the expected value type and recommended usage.

+1

>
> 7) No two distinct metadata keys should be used for the same metadata
> semantics.

Could you elaborate on this with an explicit example?

>
> 8) The Metadata class should have convenience methods for accessing
> the most commonly used metadata.

I'm not so sure that these types of methods should be part of a canonical
Metadata class. To me, it's like building the data model into the software,
which is typically a bad practice. Having them kept as separate, independent
entities, allows both data and software models to evolve independently, and
promotes loose coupling between them, making the software less fragile to
changes in the underlying data model (what if Dublin Core changes in 5
years, and does away with certain fields, while adding others)?

I would support this idea, however, as a set of higher level, convenience
Metadata decorator classes (e.g., like SpellCheckedMetadata).

>
> The current Metadata class fails somewhat with 2 (there's even a
> SpellCheckedMetadata class) and doesn't support 3 or 8. The constants
> in o.a.tika.metadata interfaces go some way towards 4 and 6, but not
> as far as they could. And we don't do that well on 7.
>
> So in general I think there's much that we could improve on. To
> resolve most of the issues I'd like to modify metadata handling as
> follows:
>
> a) Allow both metadata keys and values to be arbitrary Objects.

I'd still like to know what the value-added from having Metadata values be
non-Strings first.

>
> b) Instead of String constants as metadata keys, use constant object
> instances like DublinCore.TITLE = new DublinCore("title"). These
> objects should have good hashCode(), equals(), and toString()
> implementations.

Having metadata keys be non-Strings may lead to some boundary case
situations, and requires developers to understand how to create good
hashCode() methods, and equals() methods, which have been shown in practice
to be things that developers are not really good at.

In addition, doing things this way arguably makes support for #5 above a bit
more difficult...

>
> c) Use Date instances for date metadata, URI instances for URIs, etc.
> All value objects should preferably have good toString()
> implementations.
>
> d) Use the Dublin Core "identifier" property instead of the current
> RESOURCE_NAME_KEY, and the "format" property instead of CONTENT_TYPE.

+1

>
> e) Add utility methods like set/getIdentifier(), set/getFormat(),
> set/getTitle(), etc. to the Metadata class for accessing the key
> Dublin Core metadata.

As a decorator, I totally support this. As extensions to the canonical
Metadata class, I think they are a bit heavy-weight, and tightly coupled to
the underlying data model.


In general, I like your proposal Jukka -- I'd just like to hear some more
rationale for things like using non-String met keys, and for not having some
of the data model specific items (e.g., DublinCore) be placed in decorators
rather than the canonical Metadata class.

My 2 cents,
 Chris


>
> WDYT?
>
> BR,
>
> Jukka Zitting

______________________________________________
Chris Mattmann, Ph.D.
[hidden email]
Cognizant Development Engineer
Early Detection Research Network Project
_________________________________________________
Jet Propulsion Laboratory            Pasadena, CA
Office: 171-266B                     Mailstop:  171-246
_______________________________________________________

Disclaimer:  The opinions presented within are my own and do not reflect
those of either NASA, JPL, or the California Institute of Technology.


Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jukka Zitting
Hi,

On Mon, Mar 10, 2008 at 3:55 PM, Chris Mattmann
<[hidden email]> wrote:

>  > 1) Metadata should consist of a modifiable set of keys mapped to values.
>
>  Let's be concrete here:
>
>  I think that Metadata should be of the form:
>
>  [key]=>1...n [value]
>
>  Where [key]'s are modifiable (as are [value]'s)
>
>  This is what you are expressing, correct?

Yes, though I'm not sure where the 1...n requirement for metadata
values comes from. I'm my proposal I'd handle such needs more
generally by allowing structured metadata values, not just strings.

>  > 2) Metadata keys should be designed to avoid collisions or misspellings.
>
>  +1 in all cases that we are in control of, with the caveat that sometimes we
>  aren't in control of the keys being used, especially in situations where we
>  have automated Metadata retrieval.

Metadata is not very useful if it can't be reliably and automatically
processed, so I'd rather avoid situations where there's a chance for
confusion.

Also, it's IMHO better to resolve any ambiguities before putting
things into the Metadata instance instead of guessing later on what
the metadata you have really means.

In the Nutch case you mentioned, I would ask Nutch to understand the
difference between the various forms of HTTP headers and to normalize
that metadata before feeding it to Tika. After all, there's nothing
HTTP-specific in Tika, whereas Nutch knows much more about the
relevant details and actual reality out there.

>  > 3) It should be possible to store non-String metadata, like Locale
>  > settings, Date instances, thumbnail images, etc.
>
>  I somewhat have to agree with Jerome here -- what is the value of storing
>  non-String metadata [values]? To me, this goes against standards like Dublin
>  Core, or ISO 11179, which explicitly define metadata values to be of String
>  form.

Again, the more reliably the metadata can be automatically processed,
the better. It's of course possible to serialize all sorts of metadata
to strings, but each such case introduces an inherently unreliable
parsing operation. Since Tika doesn't need to worry about serializing
the metadata, we should IMHO opt for structured data types instead of
strings where appropriate.

>  > 7) No two distinct metadata keys should be used for the same metadata
>  > semantics.
>
>  Could you elaborate on this with an explicit example?

For example, we currently have both DublinCore.FORMAT and
HttpHeaders.CONTENT_TYPE whose semantics are largely overlapping. Each
such case adds ambiguity and makes automatic metadata processing
harder.

>  > 8) The Metadata class should have convenience methods for accessing
>  > the most commonly used metadata.
>
>  I'm not so sure that these types of methods should be part of a canonical
>  Metadata class. To me, it's like building the data model into the software,
>  which is typically a bad practice.

Good point, having such methods on a decorator layer makes sense.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jérôme Charron-2
> >  I think that Metadata should be of the form:
> >
> >  [key]=>1...n [value]
> >
> >  Where [key]'s are modifiable (as are [value]'s)
> >
> >  This is what you are expressing, correct?
>
> Yes, though I'm not sure where the 1...n requirement for metadata
> values comes from.

It comes from Nutch, and more generaly from HTTP where a header (a metadata)
can be multivalued.


> I'm my proposal I'd handle such needs more
> generally by allowing structured metadata values, not just strings.

If I understand, instead of storing many values for a specified key, I will
store a List of values?

In the Nutch case you mentioned, I would ask Nutch to understand the
> difference between the various forms of HTTP headers and to normalize
> that metadata before feeding it to Tika. After all, there's nothing
> HTTP-specific in Tika, whereas Nutch knows much more about the
> relevant details and actual reality out there.

+1


> parsing operation. Since Tika doesn't need to worry about serializing
> the metadata, we should IMHO opt for structured data types instead of
> strings where appropriate.

+1 but it adds a level of complexity in metadata handling for tika users :
knowing the type associated to a specific metadata, no?
(I agree that it is more or less the case with date or url values
serialized)


>
> >  > 7) No two distinct metadata keys should be used for the same metadata
> >  > semantics.
> >
> >  Could you elaborate on this with an explicit example?
>
> For example, we currently have both DublinCore.FORMAT and
> HttpHeaders.CONTENT_TYPE whose semantics are largely overlapping. Each
> such case adds ambiguity and makes automatic metadata processing
> harder.

-1
HttpHeaders.CONTENT_TYPE and DublinCore.FORMAT have the same semantic but
they  doesn't come from the same  level of information : HTTP is a low level
of information and Dublin is a high level => Tika client should have access
to both information and then guess what is the more reliable information in
their case.

Best Regards

Jérôme


--
Jérôme Charron
Directeur Technique @ WebPulse
Tel: +33673716743 - [hidden email]
http://blog.shopreflex.com/
http://www.shopreflex.com/
http://www.staragora.com/
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jukka Zitting
Hi,

On Mon, Mar 10, 2008 at 8:21 PM, Jérôme Charron
<[hidden email]> wrote:
>  > I'm my proposal I'd handle such needs more
>  > generally by allowing structured metadata values, not just strings.
>
>  If I understand, instead of storing many values for a specified key, I will
>  store a List of values?

Yes.

>  > parsing operation. Since Tika doesn't need to worry about serializing
>  > the metadata, we should IMHO opt for structured data types instead of
>  > strings where appropriate.
>
>  +1 but it adds a level of complexity in metadata handling for tika users :
>  knowing the type associated to a specific metadata, no?
>  (I agree that it is more or less the case with date or url values
>  serialized)

Yes, but you can't do anything (I'm assuming automated processing
here) with a piece of metadata unless you know what type it is. And as
long as the structured values have good toString() implementations,
they will still be useful also for manual processing.

IMHO it's much better to know that this piece of metadata is a Date
than that it's a String that (hopefully) matches one of the ISO 8601
or other well known date patterns.

>  > For example, we currently have both DublinCore.FORMAT and
>  > HttpHeaders.CONTENT_TYPE whose semantics are largely overlapping. Each
>  > such case adds ambiguity and makes automatic metadata processing
>  > harder.
>
>  -1
>  HttpHeaders.CONTENT_TYPE and DublinCore.FORMAT have the same semantic but
>  they  doesn't come from the same  level of information : HTTP is a low level
>  of information and Dublin is a high level => Tika client should have access
>  to both information and then guess what is the more reliable information in
>  their case.

That's true if you think of  the Metadata just as a container for
information from various sources.

My motivation for requirement 7 was more about Tika as a whole and
especially the Tika parsers. The parsers should always try to produce
as accurate and consistent document metadata as possible. IMHO it
would be a major problem for one parser to report the document type as
CONTENT_TYPE and another as FORMAT. We should pick one and only one
metadata key as the canonical place for document type information
reported by Tika.

If Tika receives a document of type X with HTTP Content-Type set to A
and Dublin Core dc:format set to B, then the metadata output to a
client should be X. It would be nice if the output contained also A
and B as auxiliary information, but we should make it very clear that
the metadata key used for X in this case is the one and only key that
a client should look at to find the most accurate information that
Tika has about the document.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jukka Zitting
In reply to this post by Jeremias Maerki-2
Hi,

On Mon, Mar 10, 2008 at 11:33 AM, Jeremias Maerki
<[hidden email]> wrote:
>  ...and what Adobe's XMP package provides. ;-)

Do you know if there's an official pre-compiled xmpcore.jar file
available somewhere? I couldn't find it on the central Maven
repository, and I'd prefer to request the Maven people to upload an
official binary instead of something I built myself from Adobe's XMP
SDK.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jeremias Maerki-2
No, I don't think so. The XMP SDK can be downloaded here:
http://www.adobe.com/devnet/xmp/
But the distribution only contains sources, no binaries.

On 13.03.2008 23:50:57 Jukka Zitting wrote:

> Hi,
>
> On Mon, Mar 10, 2008 at 11:33 AM, Jeremias Maerki
> <[hidden email]> wrote:
> >  ...and what Adobe's XMP package provides. ;-)
>
> Do you know if there's an official pre-compiled xmpcore.jar file
> available somewhere? I couldn't find it on the central Maven
> repository, and I'd prefer to request the Maven people to upload an
> official binary instead of something I built myself from Adobe's XMP
> SDK.
>
> BR,
>
> Jukka Zitting




Jeremias Maerki

Reply | Threaded
Open this post in threaded view
|

Re: Metadata design

Jérôme Charron-2
Hi Jukka,

Did you take a look at Java Metadata Interface?
There is some very good implementation such as NetBeans Metadata Repository.
It could be a good starting point for metadata management.

I notice that in most metadata management framework, the metadata key is
always a String (I still think introducing Object keys will be really
confusing and complicated for most tika clients).

Best Regards

Jérôme