[jira] [Commented] (TIKA-955) Unable to extract "Track Changes" metadata from a microsoft word document

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (TIKA-955) Unable to extract "Track Changes" metadata from a microsoft word document

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16088751#comment-16088751 ]

Bruno P. Kinoshita commented on TIKA-955:

>Is there interest in implementing this?

Don't have any specific use case for it right now. But sounds like this could be useful for both someone with valid use cases, or for a quick analysis about the changes in the document.

>Does anyone know of a standard which shows where/how this data is stored?

The PROV ontology https://www.w3.org/TR/prov-overview/

Provenance is information about entities, activities, and people involved in producing a piece of data or thing, which can be used to form assessments about its quality, reliability or trustworthiness. The PROV Family of Documents defines a model, corresponding serializations and other supporting definitions to enable the inter-operable interchange of provenance information in heterogeneous environments such as the Web. This document provides an overview of this family of documents.

There are libraries (https://github.com/lucmoreau/ProvToolbox) and even a tool in Apache incubator that utilises it (https://github.com/taverna/taverna-prov).

Whenever I need to keep track of changes in entities in a system, I either use a simple audit table in some data storage system when it's simple enough, or adopt the provenance ontology.

This could work for tracking changes in Microsoft Word documents.

> Unable to extract "Track Changes" metadata from a microsoft word document
> -------------------------------------------------------------------------
>                 Key: TIKA-955
>                 URL: https://issues.apache.org/jira/browse/TIKA-955
>             Project: Tika
>          Issue Type: Bug
>          Components: metadata
>    Affects Versions: 0.9
>         Environment: OS: Windows 7
>            Reporter: Priya Kujur
>            Priority: Minor
> A microsoft word document has feature to track and review the changes. How can tika jar help me identify such changes.

This message was sent by Atlassian JIRA