[jira] [Created] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
Erik Peterson created TIKA-918:
----------------------------------

             Summary: iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
                 Key: TIKA-918
                 URL: https://issues.apache.org/jira/browse/TIKA-918
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0
         Environment: Windows 7, 64 bit
            Reporter: Erik Peterson
            Priority: Minor


Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404532#comment-13404532 ]

Jukka Zitting commented on TIKA-918:
------------------------------------

Do you have a test case that illustrates this problem?
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Minor
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439044#comment-13439044 ]

Erik Peterson commented on TIKA-918:
------------------------------------

I can append a sample document illustrating the issue.  
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Minor
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Peterson updated TIKA-918:
-------------------------------

    Attachment: testNumbersTemplateCharts.numbers

numbers file with Charts embedded.  Nothing about the chart is being parsed at this time.
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

       
Reply | Threaded
Open this post in threaded view
|

[jira] [Assigned] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned TIKA-918:
---------------------------------------

    Assignee: Michael McCandless
   

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Updated] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

     [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-918:
------------------------------------

    Attachment: TIKA-918.patch

Patch, extracting the title of charts from numbers docs.  The chart comes out like this:
{noformat}
<div class="chart"><h1>Chart Title</h1></div>
{noformat}
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453082#comment-13453082 ]

Michael McCandless commented on TIKA-918:
-----------------------------------------

I committed the fix for numbers docs; Erik do you have a Keynote and Pages example...?
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456021#comment-13456021 ]

Erik Peterson commented on TIKA-918:
------------------------------------

I do not have a pages example, but I do have a Keynote example.  However it's over the limit on file upload sizes.  > 10MB .  
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456382#comment-13456382 ]

Michael McCandless commented on TIKA-918:
-----------------------------------------

Maybe try to whittle down the Keynote example?  Ideally it'd be a minimal test case showing the issue.
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480662#comment-13480662 ]

Erik Peterson commented on TIKA-918:
------------------------------------

I'd have to get access to another Mac system, I've lost touch with the contact we initially were working with.  If it's critical I can email it to someone else with access for trimming the file down.
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480684#comment-13480684 ]

Dave Meikle commented on TIKA-918:
----------------------------------

Erik - feel free to fire it to me and I can slim the file down.
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496357#comment-13496357 ]

Erik Peterson commented on TIKA-918:
------------------------------------

sent.
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)

    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503502#comment-13503502 ]

Dave Meikle commented on TIKA-918:
----------------------------------

Erik - I am afraid I have not received it. What email did you send it to?
               

> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira