Quantcast

[jira] [Commented] (TIKA-2265) Problem with footnotes/endnotes in Tika.parseToString with MS Word (.docx) files

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

[jira] [Commented] (TIKA-2265) Problem with footnotes/endnotes in Tika.parseToString with MS Word (.docx) files

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-2265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15868041#comment-15868041 ]

Tim Allison commented on TIKA-2265:
-----------------------------------

Right, it turns out that footnote/endnote numbers are calculated dynamically. We should at least do a better job of consecutively numbering the footnotes/endnotes starting from 1 and not relying on the "id". :)

We will never(?) be able to figure out where the end of the page is so we won't be able to implement:
{noformat}
<w:footnotePr>
   <w:numRestart w:val="eachPage" />
</w:footnotePr>
{noformat}


> Problem with footnotes/endnotes in Tika.parseToString with MS Word (.docx) files
> --------------------------------------------------------------------------------
>
>                 Key: TIKA-2265
>                 URL: https://issues.apache.org/jira/browse/TIKA-2265
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.14
>         Environment: N/A
>            Reporter: Mike Rodent
>            Assignee: Tim Allison
>            Priority: Minor
>              Labels: newbie
>         Attachments: test.docx, test shorter.docx
>
>
> It seems to be the case that a footnote numbered "1" in the real document will be outputted by Tika.parseToString() as "2" in the footnote reference, and "2" in the corresponding footnote body text.... real footnote "2" becomes "3", "3" becomes "4", etc.  Have not yet looked at source code ... I can't imagine it would be difficult to correct this.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)
Loading...