[jira] [Commented] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/TIKA-2735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16646969#comment-16646969 ]

Hudson commented on TIKA-2735:
------------------------------

FAILURE: Integrated in Jenkins build tika-branch-1x #113 (See [https://builds.apache.org/job/tika-branch-1x/113/])
TIKA-2735 -- allow user to avoid extracting "master" sections and notes (tallison: [https://github.com/apache/tika/commit/307a8bd592d6e25419bbad19aac47cc7de201c4d])
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
* (edit) tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java
* (edit) tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java


> notes and footer contents are duplicated in extracting text from power point slides
> -----------------------------------------------------------------------------------
>
>                 Key: TIKA-2735
>                 URL: https://issues.apache.org/jira/browse/TIKA-2735
>             Project: Tika
>          Issue Type: Bug
>          Components: handler
>    Affects Versions: 1.18
>            Reporter: feng ye
>            Priority: Major
>         Attachments: Oneslide.ppt, pptTextResults.txt
>
>
> notes and footer contents are duplicated at the end when extract text from ppt slides (like the one in the attachment). Both the input file and the text results are attached. 
> Is there a configuration option that can be used to suppress this kind of duplication?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Commented] (TIKA-2735) notes and footer contents are duplicated in extracting text from power point slides

Anubha Balani
unsubscribe

On Thu, Oct 11, 2018 at 12:49 PM Hudson (JIRA) <[hidden email]> wrote:

>
>     [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2735-3Fpage-3Dcom.atlassian.jira.plugin.system.issuetabpanels-3Acomment-2Dtabpanel-26focusedCommentId-3D16646969-23comment-2D16646969&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=AwS5sC4rfobH6ZIR6xweVrD0Tn_-DNyCi7gZaV3dDFM&e=
> ]
>
> Hudson commented on TIKA-2735:
> ------------------------------
>
> FAILURE: Integrated in Jenkins build tika-branch-1x #113 (See [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__builds.apache.org_job_tika-2Dbranch-2D1x_113_&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=YGO9-ykotYFQaBOLGtOXkZNSmmPzYQNBJMll0DBuMIQ&e=
> ])
> TIKA-2735 -- allow user to avoid extracting "master" sections and notes
> (tallison: [
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_apache_tika_commit_307a8bd592d6e25419bbad19aac47cc7de201c4d&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=ml87qxUhpeY6vmA_VfyJKvP_PjaXhxwqsPN0jJE5b_U&e=
> ])
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/OOXMLParserTest.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/microsoft/PowerPointParserTest.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParserConfig.java
> * (edit)
> tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/SXSLFPowerPointExtractorDecorator.java
> * (edit)
> tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ooxml/SXSLFExtractorTest.java
>
>
> > notes and footer contents are duplicated in extracting text from power
> point slides
> >
> -----------------------------------------------------------------------------------
> >
> >                 Key: TIKA-2735
> >                 URL:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__issues.apache.org_jira_browse_TIKA-2D2735&d=DwIFaQ&c=clK7kQUTWtAVEOVIgvi0NU5BOUHhpN0H8p7CSfnc_gI&r=hWcASyFQmOiqKtRZsobP0w&m=QX4MmHkznfkIhOlAAvMpMpH1-Klfpw0on3kIvGF-NOw&s=tWgXQDsRm26dLawXmBaknk92SsTf8g-42yM2VHKyiiI&e=
> >             Project: Tika
> >          Issue Type: Bug
> >          Components: handler
> >    Affects Versions: 1.18
> >            Reporter: feng ye
> >            Priority: Major
> >         Attachments: Oneslide.ppt, pptTextResults.txt
> >
> >
> > notes and footer contents are duplicated at the end when extract text
> from ppt slides (like the one in the attachment). Both the input file and
> the text results are attached.
> > Is there a configuration option that can be used to suppress this kind
> of duplication?
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


--
Warm Regards
Anubha Balani