Branch_1x build broke?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Branch_1x build broke?

Chris Mattmann
Tim,

 

Are you seeing this?

 

Results :

 

Failed tests:

  PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103 pdf_haystack not found in:

<html xmlns="http://www.w3.org/1999/xhtml">

<head>

<meta name="date" content="2013-05-23T18:30:00Z" />

<meta name="cp:revision" content="1" />

<meta name="extended-properties:AppVersion" content="14.0000" />

<meta name="meta:paragraph-count" content="1" />

<meta name="meta:word-count" content="16" />

<meta name="extended-properties:Company" content="" />

<meta name="Word-Count" content="16" />

<meta name="dcterms:created" content="2013-05-23T18:30:00Z" />

<meta name="meta:line-count" content="1" />

<meta name="Last-Modified" content="2013-05-23T18:30:00Z" />

<meta name="dcterms:modified" content="2013-05-23T18:30:00Z" />

<meta name="Last-Save-Date" content="2013-05-23T18:30:00Z" />

<meta name="meta:character-count" content="96" />

<meta name="Template" content="Normal.dotm" />

<meta name="Line-Count" content="1" />

<meta name="Paragraph-Count" content="1" />

<meta name="meta:save-date" content="2013-05-23T18:30:00Z" />

<meta name="meta:character-count-with-spaces" content="111" />

<meta name="Application-Name" content="Microsoft Office Word" />

<meta name="modified" content="2013-05-23T18:30:00Z" />

<meta name="Content-Type" content="application/vnd.openxmlformats-officedocument.wordprocessingml.document" />

<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />

<meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />

<meta name="meta:creation-date" content="2013-05-23T18:30:00Z" />

<meta name="extended-properties:Application" content="Microsoft Office Word" />

<meta name="Creation-Date" content="2013-05-23T18:30:00Z" />

<meta name="xmpTPg:NPages" content="1" />

<meta name="Character-Count-With-Spaces" content="111" />

<meta name="Character Count" content="96" />

<meta name="Page-Count" content="1" />

<meta name="Revision-Number" content="1" />

<meta name="Application-Version" content="14.0000" />

<meta name="extended-properties:Template" content="Normal.dotm" />

<meta name="publisher" content="" />

<meta name="meta:page-count" content="1" />

<meta name="dc:publisher" content="" />

<title></title>

</head>

<body><p class="header" />

<p class="header" />

<p class="header" />

<p>Outer_haystack</p>

<p>Outer_haystack</p>

<p><div class="embedded" id="rId8" />

</p>

<p>Outer_haystack</p>

<p />

<p>Outer_haystack</p>

<p />

<p>Outer_haystack</p>

<p><a name="_GoBack" /></p>

<p class="footer" />

<p class="footer" />

<p class="footer" />

<p>attached.pdf</p>

<div class="page"><div class="ocr">dehayslack dehaystack dehayslack dehaystack dehaystack dehaystack pd'

 

</div>

</div>

<p class="header" />

 

<p class="header" />

 

<p class="header" />

 

<p>Haystack</p>

 

<p>Needle</p>

 

<p>Haystack</p>

 

<p><a name="_GoBack" /></p>

 

<p class="footer" />

 

<p class="footer" />

 

<p class="footer" />

 

<div source="attachment" class="embedded" id="Test.docx" />

</body></html>

 

Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30

 

[INFO] ------------------------------------------------------------------------

[INFO] Reactor Summary:

[INFO]

[INFO] Apache Tika parent ................................. SUCCESS [  1.565 s]

[INFO] Apache Tika core ................................... SUCCESS [ 32.977 s]

[INFO] Apache Tika parsers ................................ FAILURE [05:52 min]

[INFO] Apache Tika XMP .................................... SKIPPED

[INFO] Apache Tika serialization .......................... SKIPPED

[INFO] Apache Tika batch .................................. SKIPPED

[INFO] Apache Tika language detection ..................... SKIPPED

[INFO] Apache Tika application ............................ SKIPPED

[INFO] Apache Tika OSGi bundle ............................ SKIPPED

[INFO] Apache Tika translate .............................. SKIPPED

[INFO] Apache Tika server ................................. SKIPPED

[INFO] Apache Tika examples ............................... SKIPPED

[INFO] Apache Tika Java-7 Components ...................... SKIPPED

[INFO] Apache Tika eval ................................... SKIPPED

[INFO] Apache Tika Deep Learning (powered by DL4J) ........ SKIPPED

[INFO] Apache Tika Natural Language Processing ............ SKIPPED

[INFO] Apache Tika ........................................ SKIPPED

[INFO] ------------------------------------------------------------------------

[INFO] BUILD FAILURE

[INFO] ------------------------------------------------------------------------

[INFO] Total time: 06:27 min

[INFO] Finished at: 2018-05-24T09:04:59-07:00

[INFO] Final Memory: 72M/1029M

[INFO] ------------------------------------------------------------------------

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test) on project tika-parsers: There are test failures.

[ERROR]

[ERROR] Please refer to /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the individual test results.

[ERROR] -> [Help 1]

[ERROR]

[ERROR] To see the full stack trace of the errors, re-run Maven with the -e switch.

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

[ERROR]

[ERROR] For more information about the errors and possible solutions, please read the following articles:

[ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

[ERROR]

[ERROR] After correcting the problems, you can resume the build with the command

[ERROR]   mvn <goals> -rf :tika-parsers

 

Keeps failing for me.

nonas:tika2.0.0 mattmann$ java -version

java version "1.8.0_144"

Java(TM) SE Runtime Environment (build 1.8.0_144-b01)

Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

nonas:tika2.0.0 mattmann$

 

Any ideas?

 

Cheers,

Chris

 

Reply | Threaded
Open this post in threaded view
|

Re: Branch_1x build broke?

David Meikle
Hey Chris,

This is happening to me with Tesseract enabled but only on my MacBook.

Are you running this on OSX?

Been trying to get some time to dig into it as it works perfectly on my
Windows and Linux setups.

Cheers,
Dave



On Thu, 24 May 2018, 17:09 Chris Mattmann, <[hidden email]> wrote:

> Tim,
>
>
>
> Are you seeing this?
>
>
>
> Results :
>
>
>
> Failed tests:
>
>
> PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
> pdf_haystack not found in:
>
> <html xmlns="http://www.w3.org/1999/xhtml">
>
> <head>
>
> <meta name="date" content="2013-05-23T18:30:00Z" />
>
> <meta name="cp:revision" content="1" />
>
> <meta name="extended-properties:AppVersion" content="14.0000" />
>
> <meta name="meta:paragraph-count" content="1" />
>
> <meta name="meta:word-count" content="16" />
>
> <meta name="extended-properties:Company" content="" />
>
> <meta name="Word-Count" content="16" />
>
> <meta name="dcterms:created" content="2013-05-23T18:30:00Z" />
>
> <meta name="meta:line-count" content="1" />
>
> <meta name="Last-Modified" content="2013-05-23T18:30:00Z" />
>
> <meta name="dcterms:modified" content="2013-05-23T18:30:00Z" />
>
> <meta name="Last-Save-Date" content="2013-05-23T18:30:00Z" />
>
> <meta name="meta:character-count" content="96" />
>
> <meta name="Template" content="Normal.dotm" />
>
> <meta name="Line-Count" content="1" />
>
> <meta name="Paragraph-Count" content="1" />
>
> <meta name="meta:save-date" content="2013-05-23T18:30:00Z" />
>
> <meta name="meta:character-count-with-spaces" content="111" />
>
> <meta name="Application-Name" content="Microsoft Office Word" />
>
> <meta name="modified" content="2013-05-23T18:30:00Z" />
>
> <meta name="Content-Type"
> content="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
> />
>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
>
> <meta name="X-Parsed-By"
> content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
>
> <meta name="meta:creation-date" content="2013-05-23T18:30:00Z" />
>
> <meta name="extended-properties:Application" content="Microsoft Office
> Word" />
>
> <meta name="Creation-Date" content="2013-05-23T18:30:00Z" />
>
> <meta name="xmpTPg:NPages" content="1" />
>
> <meta name="Character-Count-With-Spaces" content="111" />
>
> <meta name="Character Count" content="96" />
>
> <meta name="Page-Count" content="1" />
>
> <meta name="Revision-Number" content="1" />
>
> <meta name="Application-Version" content="14.0000" />
>
> <meta name="extended-properties:Template" content="Normal.dotm" />
>
> <meta name="publisher" content="" />
>
> <meta name="meta:page-count" content="1" />
>
> <meta name="dc:publisher" content="" />
>
> <title></title>
>
> </head>
>
> <body><p class="header" />
>
> <p class="header" />
>
> <p class="header" />
>
> <p>Outer_haystack</p>
>
> <p>Outer_haystack</p>
>
> <p><div class="embedded" id="rId8" />
>
> </p>
>
> <p>Outer_haystack</p>
>
> <p />
>
> <p>Outer_haystack</p>
>
> <p />
>
> <p>Outer_haystack</p>
>
> <p><a name="_GoBack" /></p>
>
> <p class="footer" />
>
> <p class="footer" />
>
> <p class="footer" />
>
> <p>attached.pdf</p>
>
> <div class="page"><div class="ocr">dehayslack dehaystack dehayslack
> dehaystack dehaystack dehaystack pd'
>
>
>
> </div>
>
> </div>
>
> <p class="header" />
>
>
>
> <p class="header" />
>
>
>
> <p class="header" />
>
>
>
> <p>Haystack</p>
>
>
>
> <p>Needle</p>
>
>
>
> <p>Haystack</p>
>
>
>
> <p><a name="_GoBack" /></p>
>
>
>
> <p class="footer" />
>
>
>
> <p class="footer" />
>
>
>
> <p class="footer" />
>
>
>
> <div source="attachment" class="embedded" id="Test.docx" />
>
> </body></html>
>
>
>
> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30
>
>
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Reactor Summary:
>
> [INFO]
>
> [INFO] Apache Tika parent ................................. SUCCESS [
> 1.565 s]
>
> [INFO] Apache Tika core ................................... SUCCESS [
> 32.977 s]
>
> [INFO] Apache Tika parsers ................................ FAILURE [05:52
> min]
>
> [INFO] Apache Tika XMP .................................... SKIPPED
>
> [INFO] Apache Tika serialization .......................... SKIPPED
>
> [INFO] Apache Tika batch .................................. SKIPPED
>
> [INFO] Apache Tika language detection ..................... SKIPPED
>
> [INFO] Apache Tika application ............................ SKIPPED
>
> [INFO] Apache Tika OSGi bundle ............................ SKIPPED
>
> [INFO] Apache Tika translate .............................. SKIPPED
>
> [INFO] Apache Tika server ................................. SKIPPED
>
> [INFO] Apache Tika examples ............................... SKIPPED
>
> [INFO] Apache Tika Java-7 Components ...................... SKIPPED
>
> [INFO] Apache Tika eval ................................... SKIPPED
>
> [INFO] Apache Tika Deep Learning (powered by DL4J) ........ SKIPPED
>
> [INFO] Apache Tika Natural Language Processing ............ SKIPPED
>
> [INFO] Apache Tika ........................................ SKIPPED
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] BUILD FAILURE
>
> [INFO]
> ------------------------------------------------------------------------
>
> [INFO] Total time: 06:27 min
>
> [INFO] Finished at: 2018-05-24T09:04:59-07:00
>
> [INFO] Final Memory: 72M/1029M
>
> [INFO]
> ------------------------------------------------------------------------
>
> [ERROR] Failed to execute goal
> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
> on project tika-parsers: There are test failures.
>
> [ERROR]
>
> [ERROR] Please refer to
> /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the
> individual test results.
>
> [ERROR] -> [Help 1]
>
> [ERROR]
>
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
>
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1]
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn <goals> -rf :tika-parsers
>
>
>
> Keeps failing for me.
>
> nonas:tika2.0.0 mattmann$ java -version
>
> java version "1.8.0_144"
>
> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>
> nonas:tika2.0.0 mattmann$
>
>
>
> Any ideas?
>
>
>
> Cheers,
>
> Chris
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Branch_1x build broke?

Chris Mattmann
Thanks Dave, yes I have tesseract enabled and this is on my Mac Book.

Thanks for looking into it Dave…

 

Cheers,

Chris

 

 

 

From: "[hidden email]" <[hidden email]>
Reply-To: "[hidden email]" <[hidden email]>
Date: Thursday, May 24, 2018 at 11:34 AM
To: "[hidden email]" <[hidden email]>
Subject: Re: Branch_1x build broke?

 

Hey Chris,

 

This is happening to me with Tesseract enabled but only on my MacBook.

 

Are you running this on OSX?

 

Been trying to get some time to dig into it as it works perfectly on my

Windows and Linux setups.

 

Cheers,

Dave

 

 

 

On Thu, 24 May 2018, 17:09 Chris Mattmann, <[hidden email]> wrote:

 

Tim,

 

 

 

Are you seeing this?

 

 

 

Results :

 

 

 

Failed tests:

 

 

PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103

pdf_haystack not found in:

 

<html xmlns="http://www.w3.org/1999/xhtml">

 

<head>

 

<meta name="date" content="2013-05-23T18:30:00Z" />

 

<meta name="cp:revision" content="1" />

 

<meta name="extended-properties:AppVersion" content="14.0000" />

 

<meta name="meta:paragraph-count" content="1" />

 

<meta name="meta:word-count" content="16" />

 

<meta name="extended-properties:Company" content="" />

 

<meta name="Word-Count" content="16" />

 

<meta name="dcterms:created" content="2013-05-23T18:30:00Z" />

 

<meta name="meta:line-count" content="1" />

 

<meta name="Last-Modified" content="2013-05-23T18:30:00Z" />

 

<meta name="dcterms:modified" content="2013-05-23T18:30:00Z" />

 

<meta name="Last-Save-Date" content="2013-05-23T18:30:00Z" />

 

<meta name="meta:character-count" content="96" />

 

<meta name="Template" content="Normal.dotm" />

 

<meta name="Line-Count" content="1" />

 

<meta name="Paragraph-Count" content="1" />

 

<meta name="meta:save-date" content="2013-05-23T18:30:00Z" />

 

<meta name="meta:character-count-with-spaces" content="111" />

 

<meta name="Application-Name" content="Microsoft Office Word" />

 

<meta name="modified" content="2013-05-23T18:30:00Z" />

 

<meta name="Content-Type"

content="application/vnd.openxmlformats-officedocument.wordprocessingml.document"

/>

 

<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />

 

<meta name="X-Parsed-By"

content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />

 

<meta name="meta:creation-date" content="2013-05-23T18:30:00Z" />

 

<meta name="extended-properties:Application" content="Microsoft Office

Word" />

 

<meta name="Creation-Date" content="2013-05-23T18:30:00Z" />

 

<meta name="xmpTPg:NPages" content="1" />

 

<meta name="Character-Count-With-Spaces" content="111" />

 

<meta name="Character Count" content="96" />

 

<meta name="Page-Count" content="1" />

 

<meta name="Revision-Number" content="1" />

 

<meta name="Application-Version" content="14.0000" />

 

<meta name="extended-properties:Template" content="Normal.dotm" />

 

<meta name="publisher" content="" />

 

<meta name="meta:page-count" content="1" />

 

<meta name="dc:publisher" content="" />

 

<title></title>

 

</head>

 

<body><p class="header" />

 

<p class="header" />

 

<p class="header" />

 

<p>Outer_haystack</p>

 

<p>Outer_haystack</p>

 

<p><div class="embedded" id="rId8" />

 

</p>

 

<p>Outer_haystack</p>

 

<p />

 

<p>Outer_haystack</p>

 

<p />

 

<p>Outer_haystack</p>

 

<p><a name="_GoBack" /></p>

 

<p class="footer" />

 

<p class="footer" />

 

<p class="footer" />

 

<p>attached.pdf</p>

 

<div class="page"><div class="ocr">dehayslack dehaystack dehayslack

dehaystack dehaystack dehaystack pd'

 

 

 

</div>

 

</div>

 

<p class="header" />

 

 

 

<p class="header" />

 

 

 

<p class="header" />

 

 

 

<p>Haystack</p>

 

 

 

<p>Needle</p>

 

 

 

<p>Haystack</p>

 

 

 

<p><a name="_GoBack" /></p>

 

 

 

<p class="footer" />

 

 

 

<p class="footer" />

 

 

 

<p class="footer" />

 

 

 

<div source="attachment" class="embedded" id="Test.docx" />

 

</body></html>

 

 

 

Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30

 

 

 

[INFO]

------------------------------------------------------------------------

 

[INFO] Reactor Summary:

 

[INFO]

 

[INFO] Apache Tika parent ................................. SUCCESS [

1.565 s]

 

[INFO] Apache Tika core ................................... SUCCESS [

32.977 s]

 

[INFO] Apache Tika parsers ................................ FAILURE [05:52

min]

 

[INFO] Apache Tika XMP .................................... SKIPPED

 

[INFO] Apache Tika serialization .......................... SKIPPED

 

[INFO] Apache Tika batch .................................. SKIPPED

 

[INFO] Apache Tika language detection ..................... SKIPPED

 

[INFO] Apache Tika application ............................ SKIPPED

 

[INFO] Apache Tika OSGi bundle ............................ SKIPPED

 

[INFO] Apache Tika translate .............................. SKIPPED

 

[INFO] Apache Tika server ................................. SKIPPED

 

[INFO] Apache Tika examples ............................... SKIPPED

 

[INFO] Apache Tika Java-7 Components ...................... SKIPPED

 

[INFO] Apache Tika eval ................................... SKIPPED

 

[INFO] Apache Tika Deep Learning (powered by DL4J) ........ SKIPPED

 

[INFO] Apache Tika Natural Language Processing ............ SKIPPED

 

[INFO] Apache Tika ........................................ SKIPPED

 

[INFO]

------------------------------------------------------------------------

 

[INFO] BUILD FAILURE

 

[INFO]

------------------------------------------------------------------------

 

[INFO] Total time: 06:27 min

 

[INFO] Finished at: 2018-05-24T09:04:59-07:00

 

[INFO] Final Memory: 72M/1029M

 

[INFO]

------------------------------------------------------------------------

 

[ERROR] Failed to execute goal

org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)

on project tika-parsers: There are test failures.

 

[ERROR]

 

[ERROR] Please refer to

/Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the

individual test results.

 

[ERROR] -> [Help 1]

 

[ERROR]

 

[ERROR] To see the full stack trace of the errors, re-run Maven with the

-e switch.

 

[ERROR] Re-run Maven using the -X switch to enable full debug logging.

 

[ERROR]

 

[ERROR] For more information about the errors and possible solutions,

please read the following articles:

 

[ERROR] [Help 1]

http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException

 

[ERROR]

 

[ERROR] After correcting the problems, you can resume the build with the

command

 

[ERROR]   mvn <goals> -rf :tika-parsers

 

 

 

Keeps failing for me.

 

nonas:tika2.0.0 mattmann$ java -version

 

java version "1.8.0_144"

 

Java(TM) SE Runtime Environment (build 1.8.0_144-b01)

 

Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

 

nonas:tika2.0.0 mattmann$

 

 

 

Any ideas?

 

 

 

Cheers,

 

Chris

 

 

 

 

 

Reply | Threaded
Open this post in threaded view
|

Re: Branch_1x build broke?

Tim Allison
In reply to this post by Chris Mattmann
Y, you're probably running a different version of tesseract than I was
running and getting different (worse) text out during ocr.  I guess we
could add an or 'dehaystack'?

On Thu, May 24, 2018 at 12:09 PM, Chris Mattmann <[hidden email]>
wrote:

> Tim,
>
>
>
> Are you seeing this?
>
>
>
> Results :
>
>
>
> Failed tests:
>
>   PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
> pdf_haystack not found in:
>
> <html xmlns="http://www.w3.org/1999/xhtml">
>
> <head>
>
> <meta name="date" content="2013-05-23T18:30:00Z" />
>
> <meta name="cp:revision" content="1" />
>
> <meta name="extended-properties:AppVersion" content="14.0000" />
>
> <meta name="meta:paragraph-count" content="1" />
>
> <meta name="meta:word-count" content="16" />
>
> <meta name="extended-properties:Company" content="" />
>
> <meta name="Word-Count" content="16" />
>
> <meta name="dcterms:created" content="2013-05-23T18:30:00Z" />
>
> <meta name="meta:line-count" content="1" />
>
> <meta name="Last-Modified" content="2013-05-23T18:30:00Z" />
>
> <meta name="dcterms:modified" content="2013-05-23T18:30:00Z" />
>
> <meta name="Last-Save-Date" content="2013-05-23T18:30:00Z" />
>
> <meta name="meta:character-count" content="96" />
>
> <meta name="Template" content="Normal.dotm" />
>
> <meta name="Line-Count" content="1" />
>
> <meta name="Paragraph-Count" content="1" />
>
> <meta name="meta:save-date" content="2013-05-23T18:30:00Z" />
>
> <meta name="meta:character-count-with-spaces" content="111" />
>
> <meta name="Application-Name" content="Microsoft Office Word" />
>
> <meta name="modified" content="2013-05-23T18:30:00Z" />
>
> <meta name="Content-Type" content="application/vnd.
> openxmlformats-officedocument.wordprocessingml.document" />
>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
>
> <meta name="X-Parsed-By" content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser"
> />
>
> <meta name="meta:creation-date" content="2013-05-23T18:30:00Z" />
>
> <meta name="extended-properties:Application" content="Microsoft Office
> Word" />
>
> <meta name="Creation-Date" content="2013-05-23T18:30:00Z" />
>
> <meta name="xmpTPg:NPages" content="1" />
>
> <meta name="Character-Count-With-Spaces" content="111" />
>
> <meta name="Character Count" content="96" />
>
> <meta name="Page-Count" content="1" />
>
> <meta name="Revision-Number" content="1" />
>
> <meta name="Application-Version" content="14.0000" />
>
> <meta name="extended-properties:Template" content="Normal.dotm" />
>
> <meta name="publisher" content="" />
>
> <meta name="meta:page-count" content="1" />
>
> <meta name="dc:publisher" content="" />
>
> <title></title>
>
> </head>
>
> <body><p class="header" />
>
> <p class="header" />
>
> <p class="header" />
>
> <p>Outer_haystack</p>
>
> <p>Outer_haystack</p>
>
> <p><div class="embedded" id="rId8" />
>
> </p>
>
> <p>Outer_haystack</p>
>
> <p />
>
> <p>Outer_haystack</p>
>
> <p />
>
> <p>Outer_haystack</p>
>
> <p><a name="_GoBack" /></p>
>
> <p class="footer" />
>
> <p class="footer" />
>
> <p class="footer" />
>
> <p>attached.pdf</p>
>
> <div class="page"><div class="ocr">dehayslack dehaystack dehayslack
> dehaystack dehaystack dehaystack pd'
>
>
>
> </div>
>
> </div>
>
> <p class="header" />
>
>
>
> <p class="header" />
>
>
>
> <p class="header" />
>
>
>
> <p>Haystack</p>
>
>
>
> <p>Needle</p>
>
>
>
> <p>Haystack</p>
>
>
>
> <p><a name="_GoBack" /></p>
>
>
>
> <p class="footer" />
>
>
>
> <p class="footer" />
>
>
>
> <p class="footer" />
>
>
>
> <div source="attachment" class="embedded" id="Test.docx" />
>
> </body></html>
>
>
>
> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30
>
>
>
> [INFO] ------------------------------------------------------------
> ------------
>
> [INFO] Reactor Summary:
>
> [INFO]
>
> [INFO] Apache Tika parent ................................. SUCCESS [
> 1.565 s]
>
> [INFO] Apache Tika core ................................... SUCCESS [
> 32.977 s]
>
> [INFO] Apache Tika parsers ................................ FAILURE
> [05:52 min]
>
> [INFO] Apache Tika XMP .................................... SKIPPED
>
> [INFO] Apache Tika serialization .......................... SKIPPED
>
> [INFO] Apache Tika batch .................................. SKIPPED
>
> [INFO] Apache Tika language detection ..................... SKIPPED
>
> [INFO] Apache Tika application ............................ SKIPPED
>
> [INFO] Apache Tika OSGi bundle ............................ SKIPPED
>
> [INFO] Apache Tika translate .............................. SKIPPED
>
> [INFO] Apache Tika server ................................. SKIPPED
>
> [INFO] Apache Tika examples ............................... SKIPPED
>
> [INFO] Apache Tika Java-7 Components ...................... SKIPPED
>
> [INFO] Apache Tika eval ................................... SKIPPED
>
> [INFO] Apache Tika Deep Learning (powered by DL4J) ........ SKIPPED
>
> [INFO] Apache Tika Natural Language Processing ............ SKIPPED
>
> [INFO] Apache Tika ........................................ SKIPPED
>
> [INFO] ------------------------------------------------------------
> ------------
>
> [INFO] BUILD FAILURE
>
> [INFO] ------------------------------------------------------------
> ------------
>
> [INFO] Total time: 06:27 min
>
> [INFO] Finished at: 2018-05-24T09:04:59-07:00
>
> [INFO] Final Memory: 72M/1029M
>
> [INFO] ------------------------------------------------------------
> ------------
>
> [ERROR] Failed to execute goal org.apache.maven.plugins:
> maven-surefire-plugin:2.18.1:test (default-test) on project tika-parsers:
> There are test failures.
>
> [ERROR]
>
> [ERROR] Please refer to /Users/mattmann/tmp/tika2.0.0/
> tika-parsers/target/surefire-reports for the individual test results.
>
> [ERROR] -> [Help 1]
>
> [ERROR]
>
> [ERROR] To see the full stack trace of the errors, re-run Maven with the
> -e switch.
>
> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>
> [ERROR]
>
> [ERROR] For more information about the errors and possible solutions,
> please read the following articles:
>
> [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/
> MojoFailureException
>
> [ERROR]
>
> [ERROR] After correcting the problems, you can resume the build with the
> command
>
> [ERROR]   mvn <goals> -rf :tika-parsers
>
>
>
> Keeps failing for me.
>
> nonas:tika2.0.0 mattmann$ java -version
>
> java version "1.8.0_144"
>
> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>
> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>
> nonas:tika2.0.0 mattmann$
>
>
>
> Any ideas?
>
>
>
> Cheers,
>
> Chris
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Branch_1x build broke?

Tim Allison
Or it might be that you have the python image preprocessing libraries
installed (and I don’t)...

Will fix today.

On Thu, May 24, 2018 at 2:55 PM Tim Allison <[hidden email]> wrote:

> Y, you're probably running a different version of tesseract than I was
> running and getting different (worse) text out during ocr.  I guess we
> could add an or 'dehaystack'?
>
> On Thu, May 24, 2018 at 12:09 PM, Chris Mattmann <[hidden email]>
> wrote:
>
>> Tim,
>>
>>
>>
>> Are you seeing this?
>>
>>
>>
>> Results :
>>
>>
>>
>> Failed tests:
>>
>>
>> PDFParserTest.testEmbeddedDocsWithOCROnly:1250->TikaTest.assertContains:103
>> pdf_haystack not found in:
>>
>> <html xmlns="http://www.w3.org/1999/xhtml">
>>
>> <head>
>>
>> <meta name="date" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="cp:revision" content="1" />
>>
>> <meta name="extended-properties:AppVersion" content="14.0000" />
>>
>> <meta name="meta:paragraph-count" content="1" />
>>
>> <meta name="meta:word-count" content="16" />
>>
>> <meta name="extended-properties:Company" content="" />
>>
>> <meta name="Word-Count" content="16" />
>>
>> <meta name="dcterms:created" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="meta:line-count" content="1" />
>>
>> <meta name="Last-Modified" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="dcterms:modified" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="Last-Save-Date" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="meta:character-count" content="96" />
>>
>> <meta name="Template" content="Normal.dotm" />
>>
>> <meta name="Line-Count" content="1" />
>>
>> <meta name="Paragraph-Count" content="1" />
>>
>> <meta name="meta:save-date" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="meta:character-count-with-spaces" content="111" />
>>
>> <meta name="Application-Name" content="Microsoft Office Word" />
>>
>> <meta name="modified" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="Content-Type"
>> content="application/vnd.openxmlformats-officedocument.wordprocessingml.document"
>> />
>>
>> <meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
>>
>> <meta name="X-Parsed-By"
>> content="org.apache.tika.parser.microsoft.ooxml.OOXMLParser" />
>>
>> <meta name="meta:creation-date" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="extended-properties:Application" content="Microsoft Office
>> Word" />
>>
>> <meta name="Creation-Date" content="2013-05-23T18:30:00Z" />
>>
>> <meta name="xmpTPg:NPages" content="1" />
>>
>> <meta name="Character-Count-With-Spaces" content="111" />
>>
>> <meta name="Character Count" content="96" />
>>
>> <meta name="Page-Count" content="1" />
>>
>> <meta name="Revision-Number" content="1" />
>>
>> <meta name="Application-Version" content="14.0000" />
>>
>> <meta name="extended-properties:Template" content="Normal.dotm" />
>>
>> <meta name="publisher" content="" />
>>
>> <meta name="meta:page-count" content="1" />
>>
>> <meta name="dc:publisher" content="" />
>>
>> <title></title>
>>
>> </head>
>>
>> <body><p class="header" />
>>
>> <p class="header" />
>>
>> <p class="header" />
>>
>> <p>Outer_haystack</p>
>>
>> <p>Outer_haystack</p>
>>
>> <p><div class="embedded" id="rId8" />
>>
>> </p>
>>
>> <p>Outer_haystack</p>
>>
>> <p />
>>
>> <p>Outer_haystack</p>
>>
>> <p />
>>
>> <p>Outer_haystack</p>
>>
>> <p><a name="_GoBack" /></p>
>>
>> <p class="footer" />
>>
>> <p class="footer" />
>>
>> <p class="footer" />
>>
>> <p>attached.pdf</p>
>>
>> <div class="page"><div class="ocr">dehayslack dehaystack dehayslack
>> dehaystack dehaystack dehaystack pd'
>>
>>
>>
>> </div>
>>
>> </div>
>>
>> <p class="header" />
>>
>>
>>
>> <p class="header" />
>>
>>
>>
>> <p class="header" />
>>
>>
>>
>> <p>Haystack</p>
>>
>>
>>
>> <p>Needle</p>
>>
>>
>>
>> <p>Haystack</p>
>>
>>
>>
>> <p><a name="_GoBack" /></p>
>>
>>
>>
>> <p class="footer" />
>>
>>
>>
>> <p class="footer" />
>>
>>
>>
>> <p class="footer" />
>>
>>
>>
>> <div source="attachment" class="embedded" id="Test.docx" />
>>
>> </body></html>
>>
>>
>>
>> Tests run: 1009, Failures: 1, Errors: 0, Skipped: 30
>>
>>
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [INFO] Reactor Summary:
>>
>> [INFO]
>>
>> [INFO] Apache Tika parent ................................. SUCCESS [
>> 1.565 s]
>>
>> [INFO] Apache Tika core ................................... SUCCESS [
>> 32.977 s]
>>
>> [INFO] Apache Tika parsers ................................ FAILURE
>> [05:52 min]
>>
>> [INFO] Apache Tika XMP .................................... SKIPPED
>>
>> [INFO] Apache Tika serialization .......................... SKIPPED
>>
>> [INFO] Apache Tika batch .................................. SKIPPED
>>
>> [INFO] Apache Tika language detection ..................... SKIPPED
>>
>> [INFO] Apache Tika application ............................ SKIPPED
>>
>> [INFO] Apache Tika OSGi bundle ............................ SKIPPED
>>
>> [INFO] Apache Tika translate .............................. SKIPPED
>>
>> [INFO] Apache Tika server ................................. SKIPPED
>>
>> [INFO] Apache Tika examples ............................... SKIPPED
>>
>> [INFO] Apache Tika Java-7 Components ...................... SKIPPED
>>
>> [INFO] Apache Tika eval ................................... SKIPPED
>>
>> [INFO] Apache Tika Deep Learning (powered by DL4J) ........ SKIPPED
>>
>> [INFO] Apache Tika Natural Language Processing ............ SKIPPED
>>
>> [INFO] Apache Tika ........................................ SKIPPED
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [INFO] BUILD FAILURE
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [INFO] Total time: 06:27 min
>>
>> [INFO] Finished at: 2018-05-24T09:04:59-07:00
>>
>> [INFO] Final Memory: 72M/1029M
>>
>> [INFO]
>> ------------------------------------------------------------------------
>>
>> [ERROR] Failed to execute goal
>> org.apache.maven.plugins:maven-surefire-plugin:2.18.1:test (default-test)
>> on project tika-parsers: There are test failures.
>>
>> [ERROR]
>>
>> [ERROR] Please refer to
>> /Users/mattmann/tmp/tika2.0.0/tika-parsers/target/surefire-reports for the
>> individual test results.
>>
>> [ERROR] -> [Help 1]
>>
>> [ERROR]
>>
>> [ERROR] To see the full stack trace of the errors, re-run Maven with the
>> -e switch.
>>
>> [ERROR] Re-run Maven using the -X switch to enable full debug logging.
>>
>> [ERROR]
>>
>> [ERROR] For more information about the errors and possible solutions,
>> please read the following articles:
>>
>> [ERROR] [Help 1]
>> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
>>
>> [ERROR]
>>
>> [ERROR] After correcting the problems, you can resume the build with the
>> command
>>
>> [ERROR]   mvn <goals> -rf :tika-parsers
>>
>>
>>
>> Keeps failing for me.
>>
>> nonas:tika2.0.0 mattmann$ java -version
>>
>> java version "1.8.0_144"
>>
>> Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
>>
>> Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)
>>
>> nonas:tika2.0.0 mattmann$
>>
>>
>>
>> Any ideas?
>>
>>
>>
>> Cheers,
>>
>> Chris
>>
>>
>>
>>
>