[VOTE] Apache Tika 1.14 Release Candidate #1

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[VOTE] Apache Tika 1.14 Release Candidate #1

Chris Mattmann
Hi Folks,

A first candidate for the Tika 1.14 release is available at:

  https://dist.apache.org/repos/dist/dev/tika/

The release candidate is a zip archive of the sources in:

https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c9778e4f49f2834a07e5a9d99b23042b 

The SHA1 checksum of the archive is:
ad9152392ffe6b620c8102ab538df0579b36c520

In addition, a staged maven repository is available here:

https://repository.apache.org/content/repositories/orgapachetika-1020/

Please vote on releasing this package as Apache Tika 1.14.
The vote is open for the next 72 hours and passes if a majority of at
least three +1 Tika PMC votes are cast.

[ ] +1 Release this package as Apache Tika 1.14
[ ] -1 Do not release this package because..

Cheers,
Chris

P.S. Of course here is my +1.





Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Julien Nioche-4
Hi

Am getting the following when running 'mvn clean package', have I forgotten
something obvious?

Julien

*Failed tests: *
*  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>*
*Tests in error: *
*
ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
» Tika*
*  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
serialize ...*
*  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
serialize ...*

*Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*

*[INFO]
------------------------------------------------------------------------*
*[INFO] Reactor Summary:*
*[INFO] *
*[INFO] Apache Tika parent ................................ SUCCESS
[4.368s]*
*[INFO] Apache Tika core .................................. SUCCESS
[16.487s]*
*[INFO] Apache Tika parsers ............................... FAILURE
[4:54.631s]*



On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:

> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14
> [ ] -1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
>


--

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>
Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Allison, Timothy B.
https://issues.apache.org/jira/browse/TIKA-2056

Perhaps?

-----Original Message-----
From: Julien Nioche [mailto:[hidden email]]
Sent: Thursday, October 20, 2016 8:34 AM
To: [hidden email]
Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Hi

Am getting the following when running 'mvn clean package', have I forgotten something obvious?

Julien

*Failed tests: *
*  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
expected:<Unable to serialize [ParseContext] to pass to the Fork...> but was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>* *Tests in error: *
*
ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
» Tika*
*  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to serialize ...*
*  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to serialize ...*

*Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*

*[INFO]
------------------------------------------------------------------------*
*[INFO] Reactor Summary:*
*[INFO] *
*[INFO] Apache Tika parent ................................ SUCCESS
[4.368s]*
*[INFO] Apache Tika core .................................. SUCCESS
[16.487s]*
*[INFO] Apache Tika parsers ............................... FAILURE
[4:54.631s]*



On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:

> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
>   https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
> this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
>


--

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Bob Paulin-2
In reply to this post by Chris Mattmann
+1 Builds and tests pass on Java 8 and Windows 10.


- Bob


On 10/19/2016 1:48 PM, Chris Mattmann wrote:

> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
>    https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c9778e4f49f2834a07e5a9d99b23042b
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14
> [ ] -1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Allison, Timothy B.
+1

builds on Windows 10/Java 8 with and without Tesseract, RHEL w/out tesseract

I reran the regression tests against a sample of 100k files.  We had a few more exceptions for truncated files due to changes in the DigestingParser, but these are expected.

I left a println in the AppleSingleFileParser, but that's not a show stopper.

Thank you, Chris!

-----Original Message-----
From: Bob Paulin [mailto:[hidden email]]
Sent: Thursday, October 20, 2016 9:46 AM
To: [hidden email]
Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1

+1 Builds and tests pass on Java 8 and Windows 10.


- Bob


On 10/19/2016 1:48 PM, Chris Mattmann wrote:

> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
>    https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c
> 9778e4f49f2834a07e5a9d99b23042b
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
> this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Julien Nioche-4
In reply to this post by Allison, Timothy B.
Hi Tim

I had exiftool installed indeed, so that might explain it. All tests now
pass. Will have a closer look at it all later.

Thanks

Julien

On 20 October 2016 at 13:45, Allison, Timothy B. <[hidden email]> wrote:

> https://issues.apache.org/jira/browse/TIKA-2056
>
> Perhaps?
>
> -----Original Message-----
> From: Julien Nioche [mailto:[hidden email]]
> Sent: Thursday, October 20, 2016 8:34 AM
> To: [hidden email]
> Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1
>
> Hi
>
> Am getting the following when running 'mvn clean package', have I
> forgotten something obvious?
>
> Julien
>
> *Failed tests: *
> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
> was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>* *Tests
> in error: *
> *
> ForkParserIntegrationTest.testAttachingADebuggerOnTheFor
> kedParserShouldWork:234
> » Tika*
> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> serialize ...*
> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> serialize ...*
>
> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>
> *[INFO]
> ------------------------------------------------------------------------*
> *[INFO] Reactor Summary:*
> *[INFO] *
> *[INFO] Apache Tika parent ................................ SUCCESS
> [4.368s]*
> *[INFO] Apache Tika core .................................. SUCCESS
> [16.487s]*
> *[INFO] Apache Tika parsers ............................... FAILURE
> [4:54.631s]*
>
>
>
> On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
>
> > Hi Folks,
> >
> > A first candidate for the Tika 1.14 release is available at:
> >
> >   https://dist.apache.org/repos/dist/dev/tika/
> >
> > The release candidate is a zip archive of the sources in:
> >
> > https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
> > 687d7706c9778e4f49f2834a07e5a9d99b23042b
> >
> > The SHA1 checksum of the archive is:
> > ad9152392ffe6b620c8102ab538df0579b36c520
> >
> > In addition, a staged maven repository is available here:
> >
> > https://repository.apache.org/content/repositories/orgapachetika-1020/
> >
> > Please vote on releasing this package as Apache Tika 1.14.
> > The vote is open for the next 72 hours and passes if a majority of at
> > least three +1 Tika PMC votes are cast.
> >
> > [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
> > this package because..
> >
> > Cheers,
> > Chris
> >
> > P.S. Of course here is my +1.
> >
> >
> >
> >
> >
> >
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>
>



--

*Open Source Solutions for Text Engineering*

http://www.digitalpebble.com
http://digitalpebble.blogspot.com/
#digitalpebble <http://twitter.com/digitalpebble>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Oleg Tikhonov-2
Hi,
+1 for release.
Built on Ubuntu 16.04 and CentOS 7.0 x86_64.

All tests are passed. Java 8.

BR,
Oleg

On Thu, Oct 20, 2016 at 5:54 PM, Julien Nioche <
[hidden email]> wrote:

> Hi Tim
>
> I had exiftool installed indeed, so that might explain it. All tests now
> pass. Will have a closer look at it all later.
>
> Thanks
>
> Julien
>
> On 20 October 2016 at 13:45, Allison, Timothy B. <[hidden email]>
> wrote:
>
> > https://issues.apache.org/jira/browse/TIKA-2056
> >
> > Perhaps?
> >
> > -----Original Message-----
> > From: Julien Nioche [mailto:[hidden email]]
> > Sent: Thursday, October 20, 2016 8:34 AM
> > To: [hidden email]
> > Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1
> >
> > Hi
> >
> > Am getting the following when running 'mvn clean package', have I
> > forgotten something obvious?
> >
> > Julien
> >
> > *Failed tests: *
> > *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> > expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
> > was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>*
> *Tests
> > in error: *
> > *
> > ForkParserIntegrationTest.testAttachingADebuggerOnTheFor
> > kedParserShouldWork:234
> > » Tika*
> > *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> > serialize ...*
> > *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> > serialize ...*
> >
> > *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
> >
> > *[INFO]
> > ------------------------------------------------------------
> ------------*
> > *[INFO] Reactor Summary:*
> > *[INFO] *
> > *[INFO] Apache Tika parent ................................ SUCCESS
> > [4.368s]*
> > *[INFO] Apache Tika core .................................. SUCCESS
> > [16.487s]*
> > *[INFO] Apache Tika parsers ............................... FAILURE
> > [4:54.631s]*
> >
> >
> >
> > On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
> >
> > > Hi Folks,
> > >
> > > A first candidate for the Tika 1.14 release is available at:
> > >
> > >   https://dist.apache.org/repos/dist/dev/tika/
> > >
> > > The release candidate is a zip archive of the sources in:
> > >
> > > https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
> > > 687d7706c9778e4f49f2834a07e5a9d99b23042b
> > >
> > > The SHA1 checksum of the archive is:
> > > ad9152392ffe6b620c8102ab538df0579b36c520
> > >
> > > In addition, a staged maven repository is available here:
> > >
> > > https://repository.apache.org/content/repositories/orgapachetika-1020/
> > >
> > > Please vote on releasing this package as Apache Tika 1.14.
> > > The vote is open for the next 72 hours and passes if a majority of at
> > > least three +1 Tika PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
> > > this package because..
> > >
> > > Cheers,
> > > Chris
> > >
> > > P.S. Of course here is my +1.
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >
> > --
> >
> > *Open Source Solutions for Text Engineering*
> >
> > http://www.digitalpebble.com
> > http://digitalpebble.blogspot.com/
> > #digitalpebble <http://twitter.com/digitalpebble>
> >
>
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

David Meikle
In reply to this post by Julien Nioche-4
Hello,

I am getting the same as Julien without exiftool installed on my Mac. Everything passes on Windows 10 and Ubuntu.

Will have a dig and see what I find.

Cheers,
Dave

> On 20 Oct 2016, at 13:34, Julien Nioche <[hidden email]> wrote:
>
> Hi
>
> Am getting the following when running 'mvn clean package', have I forgotten
> something obvious?
>
> Julien
>
> *Failed tests: *
> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
> was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>*
> *Tests in error: *
> *
> ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
> » Tika*
> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> serialize ...*
> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> serialize ...*
>
> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>
> *[INFO]
> ------------------------------------------------------------------------*
> *[INFO] Reactor Summary:*
> *[INFO] *
> *[INFO] Apache Tika parent ................................ SUCCESS
> [4.368s]*
> *[INFO] Apache Tika core .................................. SUCCESS
> [16.487s]*
> *[INFO] Apache Tika parsers ............................... FAILURE
> [4:54.631s]*
>
>
>
> On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
>
>> Hi Folks,
>>
>> A first candidate for the Tika 1.14 release is available at:
>>
>>  https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>
>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>>
>> The SHA1 checksum of the archive is:
>> ad9152392ffe6b620c8102ab538df0579b36c520
>>
>> In addition, a staged maven repository is available here:
>>
>> https://repository.apache.org/content/repositories/orgapachetika-1020/
>>
>> Please vote on releasing this package as Apache Tika 1.14.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.14
>> [ ] -1 Do not release this package because..
>>
>> Cheers,
>> Chris
>>
>> P.S. Of course here is my +1.
>>
>>
>>
>>
>>
>>
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

kkrugler
In reply to this post by Chris Mattmann
Just for grins, I pulled from git and checked out the the 1.14-rc1 tag, then ran “mvn clean package”.

For me it fails with:

Running org.apache.tika.parser.strings.StringsParserTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.685 sec <<< FAILURE! - in org.apache.tika.parser.strings.StringsParserTest
testParse(org.apache.tika.parser.strings.StringsParserTest)  Time elapsed: 1.685 sec  <<< FAILURE!
java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.apache.tika.parser.strings.StringsParserTest.testParse(StringsParserTest.java:68)



Results :

Failed tests:
  StringsParserTest.testParse:68 null

Tests run: 755, Failures: 1, Errors: 0, Skipped: 18

— Ken

> On Oct 19, 2016, at 11:48am, Chris Mattmann <[hidden email]> wrote:
>
> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
>  https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c9778e4f49f2834a07e5a9d99b23042b 
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14
> [ ] -1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Konstantin Gribov
In reply to this post by David Meikle
Chris,

you have new PGP key which is not present your account in [1]. Could you
please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains
only your old PGP key.

SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on
Arch Linux, OpenJDK 8u112 with and without Tesseract.

Build (`mvn clean package verify`) fails same way as Julien Nioche and Dave
mentioned on Arch Linux with or without tesseract. I have no exiftool, so
I'll try to investigate what else make `AutoDetectParser` non-serializable.
I hope, I'll have a bit time this evening for this.

Also, one test fails `testParserHandlingOfNonSerializable` because
exception message was `Unable to serialize [AutoDetectParser] to pass to
the Fork...` instead of `Unable to serialize [ParseContext] to pass to the
Fork...`. But it seems the same issue as above.

Both issues aren't strict blockers to me but I'd ask you to increase voting
time to dig into issue with non-serializable `AutoDetectParser` if you
don't mind.

[1]: https://people.apache.org/keys/committer/mattmann.asc

пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:

Hello,

I am getting the same as Julien without exiftool installed on my Mac.
Everything passes on Windows 10 and Ubuntu.

Will have a dig and see what I find.

Cheers,
Dave

> On 20 Oct 2016, at 13:34, Julien Nioche <[hidden email]>
wrote:
>
> Hi
>
> Am getting the following when running 'mvn clean package', have I
forgotten

> something obvious?
>
> Julien
>
> *Failed tests: *
> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
> was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>*
> *Tests in error: *
> *
>
ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234

> » Tika*
> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> serialize ...*
> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> serialize ...*
>
> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>
> *[INFO]
> ------------------------------------------------------------------------*
> *[INFO] Reactor Summary:*
> *[INFO] *
> *[INFO] Apache Tika parent ................................ SUCCESS
> [4.368s]*
> *[INFO] Apache Tika core .................................. SUCCESS
> [16.487s]*
> *[INFO] Apache Tika parsers ............................... FAILURE
> [4:54.631s]*
>
>
>
> On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
>
>> Hi Folks,
>>
>> A first candidate for the Tika 1.14 release is available at:
>>
>>  https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>
>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>>
>> The SHA1 checksum of the archive is:
>> ad9152392ffe6b620c8102ab538df0579b36c520
>>
>> In addition, a staged maven repository is available here:
>>
>> https://repository.apache.org/content/repositories/orgapachetika-1020/
>>
>> Please vote on releasing this package as Apache Tika 1.14.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.14
>> [ ] -1 Do not release this package because..
>>
>> Cheers,
>> Chris
>>
>> P.S. Of course here is my +1.
>>
>>
>>
>>
>>
>>
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>

--

Best regards,
Konstantin Gribov
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Konstantin Gribov
`ForkParser` related tests fail in presence of ffmpeg on my system. Dave,
check `ffmpeg` presence on the PATH, please. It seems to be TIKA-2056 as
Tim said above. I've excluded `ffmpeg` from `tika-external-parsers.xml` and
all tests pass after that.

Also, tested on ArchLinux w/ Grobid.

пн, 24 окт. 2016 г. в 19:41, Konstantin Gribov <[hidden email]>:

> Chris,
>
> you have new PGP key which is not present your account in [1]. Could you
> please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains
> only your old PGP key.
>
> SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on
> Arch Linux, OpenJDK 8u112 with and without Tesseract.
>
> Build (`mvn clean package verify`) fails same way as Julien Nioche and
> Dave mentioned on Arch Linux with or without tesseract. I have no exiftool,
> so I'll try to investigate what else make `AutoDetectParser`
> non-serializable. I hope, I'll have a bit time this evening for this.
>
> Also, one test fails `testParserHandlingOfNonSerializable` because
> exception message was `Unable to serialize [AutoDetectParser] to pass to
> the Fork...` instead of `Unable to serialize [ParseContext] to pass to
> the Fork...`. But it seems the same issue as above.
>
> Both issues aren't strict blockers to me but I'd ask you to increase
> voting time to dig into issue with non-serializable `AutoDetectParser` if
> you don't mind.
>
> [1]: https://people.apache.org/keys/committer/mattmann.asc
>
> пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:
>
> Hello,
>
> I am getting the same as Julien without exiftool installed on my Mac.
> Everything passes on Windows 10 and Ubuntu.
>
> Will have a dig and see what I find.
>
> Cheers,
> Dave
>
> > On 20 Oct 2016, at 13:34, Julien Nioche <[hidden email]>
> wrote:
> >
> > Hi
> >
> > Am getting the following when running 'mvn clean package', have I
> forgotten
> > something obvious?
> >
> > Julien
> >
> > *Failed tests: *
> > *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> > expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
> > was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>*
> > *Tests in error: *
> > *
> >
> ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
> > » Tika*
> > *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> > serialize ...*
> > *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> > serialize ...*
> >
> > *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
> >
> > *[INFO]
> > ------------------------------------------------------------------------*
> > *[INFO] Reactor Summary:*
> > *[INFO] *
> > *[INFO] Apache Tika parent ................................ SUCCESS
> > [4.368s]*
> > *[INFO] Apache Tika core .................................. SUCCESS
> > [16.487s]*
> > *[INFO] Apache Tika parsers ............................... FAILURE
> > [4:54.631s]*
> >
> >
> >
> > On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
> >
> >> Hi Folks,
> >>
> >> A first candidate for the Tika 1.14 release is available at:
> >>
> >>  https://dist.apache.org/repos/dist/dev/tika/
> >>
> >> The release candidate is a zip archive of the sources in:
> >>
> >> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
> >> 687d7706c9778e4f49f2834a07e5a9d99b23042b
> >>
> >> The SHA1 checksum of the archive is:
> >> ad9152392ffe6b620c8102ab538df0579b36c520
> >>
> >> In addition, a staged maven repository is available here:
> >>
> >> https://repository.apache.org/content/repositories/orgapachetika-1020/
> >>
> >> Please vote on releasing this package as Apache Tika 1.14.
> >> The vote is open for the next 72 hours and passes if a majority of at
> >> least three +1 Tika PMC votes are cast.
> >>
> >> [ ] +1 Release this package as Apache Tika 1.14
> >> [ ] -1 Do not release this package because..
> >>
> >> Cheers,
> >> Chris
> >>
> >> P.S. Of course here is my +1.
> >>
> >>
> >>
> >>
> >>
> >>
> >
> >
> > --
> >
> > *Open Source Solutions for Text Engineering*
> >
> > http://www.digitalpebble.com
> > http://digitalpebble.blogspot.com/
> > #digitalpebble <http://twitter.com/digitalpebble>
>
> --
>
> Best regards,
> Konstantin Gribov
>
--

Best regards,
Konstantin Gribov
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

David Meikle
Hey.

Spot on Konstantin, thanks. Removed ffmpeg from path and it works.

Also, noticed the same KEY issue too.

Just running some content tests now.

Cheers,
Dave

> On 24 Oct 2016, at 19:30, Konstantin Gribov <[hidden email]> wrote:
>
> `ForkParser` related tests fail in presence of ffmpeg on my system. Dave,
> check `ffmpeg` presence on the PATH, please. It seems to be TIKA-2056 as
> Tim said above. I've excluded `ffmpeg` from `tika-external-parsers.xml` and
> all tests pass after that.
>
> Also, tested on ArchLinux w/ Grobid.
>
> пн, 24 окт. 2016 г. в 19:41, Konstantin Gribov <[hidden email]>:
>
>> Chris,
>>
>> you have new PGP key which is not present your account in [1]. Could you
>> please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains
>> only your old PGP key.
>>
>> SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on
>> Arch Linux, OpenJDK 8u112 with and without Tesseract.
>>
>> Build (`mvn clean package verify`) fails same way as Julien Nioche and
>> Dave mentioned on Arch Linux with or without tesseract. I have no exiftool,
>> so I'll try to investigate what else make `AutoDetectParser`
>> non-serializable. I hope, I'll have a bit time this evening for this.
>>
>> Also, one test fails `testParserHandlingOfNonSerializable` because
>> exception message was `Unable to serialize [AutoDetectParser] to pass to
>> the Fork...` instead of `Unable to serialize [ParseContext] to pass to
>> the Fork...`. But it seems the same issue as above.
>>
>> Both issues aren't strict blockers to me but I'd ask you to increase
>> voting time to dig into issue with non-serializable `AutoDetectParser` if
>> you don't mind.
>>
>> [1]: https://people.apache.org/keys/committer/mattmann.asc
>>
>> пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:
>>
>> Hello,
>>
>> I am getting the same as Julien without exiftool installed on my Mac.
>> Everything passes on Windows 10 and Ubuntu.
>>
>> Will have a dig and see what I find.
>>
>> Cheers,
>> Dave
>>
>>> On 20 Oct 2016, at 13:34, Julien Nioche <[hidden email]>
>> wrote:
>>>
>>> Hi
>>>
>>> Am getting the following when running 'mvn clean package', have I
>> forgotten
>>> something obvious?
>>>
>>> Julien
>>>
>>> *Failed tests: *
>>> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
>>> expected:<Unable to serialize [ParseContext] to pass to the Fork...> but
>>> was:<Unable to serialize [AutoDetectParser] to pass to the Fork...>*
>>> *Tests in error: *
>>> *
>>>
>> ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
>>> » Tika*
>>> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
>>> serialize ...*
>>> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
>>> serialize ...*
>>>
>>> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>>>
>>> *[INFO]
>>> ------------------------------------------------------------------------*
>>> *[INFO] Reactor Summary:*
>>> *[INFO] *
>>> *[INFO] Apache Tika parent ................................ SUCCESS
>>> [4.368s]*
>>> *[INFO] Apache Tika core .................................. SUCCESS
>>> [16.487s]*
>>> *[INFO] Apache Tika parsers ............................... FAILURE
>>> [4:54.631s]*
>>>
>>>
>>>
>>> On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
>>>
>>>> Hi Folks,
>>>>
>>>> A first candidate for the Tika 1.14 release is available at:
>>>>
>>>> https://dist.apache.org/repos/dist/dev/tika/
>>>>
>>>> The release candidate is a zip archive of the sources in:
>>>>
>>>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>>>> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>>>>
>>>> The SHA1 checksum of the archive is:
>>>> ad9152392ffe6b620c8102ab538df0579b36c520
>>>>
>>>> In addition, a staged maven repository is available here:
>>>>
>>>> https://repository.apache.org/content/repositories/orgapachetika-1020/
>>>>
>>>> Please vote on releasing this package as Apache Tika 1.14.
>>>> The vote is open for the next 72 hours and passes if a majority of at
>>>> least three +1 Tika PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Tika 1.14
>>>> [ ] -1 Do not release this package because..
>>>>
>>>> Cheers,
>>>> Chris
>>>>
>>>> P.S. Of course here is my +1.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>>
>>> *Open Source Solutions for Text Engineering*
>>>
>>> http://www.digitalpebble.com
>>> http://digitalpebble.blogspot.com/
>>> #digitalpebble <http://twitter.com/digitalpebble>
>>
>> --
>>
>> Best regards,
>> Konstantin Gribov
>>
> --
>
> Best regards,
> Konstantin Gribov

Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Allison, Timothy B.
In reply to this post by Konstantin Gribov
Aside from the PGP key issues, do we need more time to understand the serializable issues with AutoDetectParser?  Or, are we good to go?

-----Original Message-----
From: Konstantin Gribov [mailto:[hidden email]]
Sent: Monday, October 24, 2016 12:42 PM
To: [hidden email]
Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Chris,

you have new PGP key which is not present your account in [1]. Could you please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains only your old PGP key.

SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on Arch Linux, OpenJDK 8u112 with and without Tesseract.

Build (`mvn clean package verify`) fails same way as Julien Nioche and Dave mentioned on Arch Linux with or without tesseract. I have no exiftool, so I'll try to investigate what else make `AutoDetectParser` non-serializable.
I hope, I'll have a bit time this evening for this.

Also, one test fails `testParserHandlingOfNonSerializable` because exception message was `Unable to serialize [AutoDetectParser] to pass to the Fork...` instead of `Unable to serialize [ParseContext] to pass to the Fork...`. But it seems the same issue as above.

Both issues aren't strict blockers to me but I'd ask you to increase voting time to dig into issue with non-serializable `AutoDetectParser` if you don't mind.

[1]: https://people.apache.org/keys/committer/mattmann.asc

пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:

Hello,

I am getting the same as Julien without exiftool installed on my Mac.
Everything passes on Windows 10 and Ubuntu.

Will have a dig and see what I find.

Cheers,
Dave

> On 20 Oct 2016, at 13:34, Julien Nioche
> <[hidden email]>
wrote:
>
> Hi
>
> Am getting the following when running 'mvn clean package', have I
forgotten

> something obvious?
>
> Julien
>
> *Failed tests: *
> *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
> expected:<Unable to serialize [ParseContext] to pass to the Fork...>
> but was:<Unable to serialize [AutoDetectParser] to pass to the
> Fork...>* *Tests in error: *
> *
>
ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234

> » Tika*
> *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
> serialize ...*
> *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
> serialize ...*
>
> *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>
> *[INFO]
> ----------------------------------------------------------------------
> --*
> *[INFO] Reactor Summary:*
> *[INFO] *
> *[INFO] Apache Tika parent ................................ SUCCESS
> [4.368s]*
> *[INFO] Apache Tika core .................................. SUCCESS
> [16.487s]*
> *[INFO] Apache Tika parsers ............................... FAILURE
> [4:54.631s]*
>
>
>
> On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
>
>> Hi Folks,
>>
>> A first candidate for the Tika 1.14 release is available at:
>>
>>  https://dist.apache.org/repos/dist/dev/tika/
>>
>> The release candidate is a zip archive of the sources in:
>>
>> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>>
>> The SHA1 checksum of the archive is:
>> ad9152392ffe6b620c8102ab538df0579b36c520
>>
>> In addition, a staged maven repository is available here:
>>
>> https://repository.apache.org/content/repositories/orgapachetika-1020
>> /
>>
>> Please vote on releasing this package as Apache Tika 1.14.
>> The vote is open for the next 72 hours and passes if a majority of at
>> least three +1 Tika PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
>> this package because..
>>
>> Cheers,
>> Chris
>>
>> P.S. Of course here is my +1.
>>
>>
>>
>>
>>
>>
>
>
> --
>
> *Open Source Solutions for Text Engineering*
>
> http://www.digitalpebble.com
> http://digitalpebble.blogspot.com/
> #digitalpebble <http://twitter.com/digitalpebble>

--

Best regards,
Konstantin Gribov
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Mattmann, Chris A (3010)
I’m happy to roll with this RC – it definitely has to do with something that exiftool
installs or with exiftool itself. Recommend checking out the issue that we filed that
Tim linked earlier.

Konstantin, Tim, I will finish off the release this week if there are no further objections.
I think we have enough +1s to move forward. I’ll finish the RC on Friday if there are
no further comments.

Thanks,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-502
Email: [hidden email]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 10/31/16, 9:02 AM, "Allison, Timothy B." <[hidden email]> wrote:

    Aside from the PGP key issues, do we need more time to understand the serializable issues with AutoDetectParser?  Or, are we good to go?
   
    -----Original Message-----
    From: Konstantin Gribov [mailto:[hidden email]]
    Sent: Monday, October 24, 2016 12:42 PM
    To: [hidden email]
    Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1
   
    Chris,
   
    you have new PGP key which is not present your account in [1]. Could you please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains only your old PGP key.
   
    SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on Arch Linux, OpenJDK 8u112 with and without Tesseract.
   
    Build (`mvn clean package verify`) fails same way as Julien Nioche and Dave mentioned on Arch Linux with or without tesseract. I have no exiftool, so I'll try to investigate what else make `AutoDetectParser` non-serializable.
    I hope, I'll have a bit time this evening for this.
   
    Also, one test fails `testParserHandlingOfNonSerializable` because exception message was `Unable to serialize [AutoDetectParser] to pass to the Fork...` instead of `Unable to serialize [ParseContext] to pass to the Fork...`. But it seems the same issue as above.
   
    Both issues aren't strict blockers to me but I'd ask you to increase voting time to dig into issue with non-serializable `AutoDetectParser` if you don't mind.
   
    [1]: https://people.apache.org/keys/committer/mattmann.asc
   
    пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:
   
    Hello,
   
    I am getting the same as Julien without exiftool installed on my Mac.
    Everything passes on Windows 10 and Ubuntu.
   
    Will have a dig and see what I find.
   
    Cheers,
    Dave
   
    > On 20 Oct 2016, at 13:34, Julien Nioche
    > <[hidden email]>
    wrote:
    >
    > Hi
    >
    > Am getting the following when running 'mvn clean package', have I
    forgotten
    > something obvious?
    >
    > Julien
    >
    > *Failed tests: *
    > *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
    > expected:<Unable to serialize [ParseContext] to pass to the Fork...>
    > but was:<Unable to serialize [AutoDetectParser] to pass to the
    > Fork...>* *Tests in error: *
    > *
    >
    ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
    > » Tika*
    > *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
    > serialize ...*
    > *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
    > serialize ...*
    >
    > *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
    >
    > *[INFO]
    > ----------------------------------------------------------------------
    > --*
    > *[INFO] Reactor Summary:*
    > *[INFO] *
    > *[INFO] Apache Tika parent ................................ SUCCESS
    > [4.368s]*
    > *[INFO] Apache Tika core .................................. SUCCESS
    > [16.487s]*
    > *[INFO] Apache Tika parsers ............................... FAILURE
    > [4:54.631s]*
    >
    >
    >
    > On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
    >
    >> Hi Folks,
    >>
    >> A first candidate for the Tika 1.14 release is available at:
    >>
    >>  https://dist.apache.org/repos/dist/dev/tika/
    >>
    >> The release candidate is a zip archive of the sources in:
    >>
    >> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
    >> 687d7706c9778e4f49f2834a07e5a9d99b23042b
    >>
    >> The SHA1 checksum of the archive is:
    >> ad9152392ffe6b620c8102ab538df0579b36c520
    >>
    >> In addition, a staged maven repository is available here:
    >>
    >> https://repository.apache.org/content/repositories/orgapachetika-1020
    >> /
    >>
    >> Please vote on releasing this package as Apache Tika 1.14.
    >> The vote is open for the next 72 hours and passes if a majority of at
    >> least three +1 Tika PMC votes are cast.
    >>
    >> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
    >> this package because..
    >>
    >> Cheers,
    >> Chris
    >>
    >> P.S. Of course here is my +1.
    >>
    >>
    >>
    >>
    >>
    >>
    >
    >
    > --
    >
    > *Open Source Solutions for Text Engineering*
    >
    > http://www.digitalpebble.com
    > http://digitalpebble.blogspot.com/
    > #digitalpebble <http://twitter.com/digitalpebble>
   
    --
   
    Best regards,
    Konstantin Gribov
   

Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Allison, Timothy B.
Thank you, Chris!

-----Original Message-----
From: Mattmann, Chris A (3010) [mailto:[hidden email]]
Sent: Monday, October 31, 2016 12:05 PM
To: [hidden email]
Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1

I’m happy to roll with this RC – it definitely has to do with something that exiftool installs or with exiftool itself. Recommend checking out the issue that we filed that Tim linked earlier.

Konstantin, Tim, I will finish off the release this week if there are no further objections.
I think we have enough +1s to move forward. I’ll finish the RC on Friday if there are no further comments.

Thanks,
Chris


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010) Manager, Open Source Projects Formulation and Development Office (8212) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-502
Email: [hidden email]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS) Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 10/31/16, 9:02 AM, "Allison, Timothy B." <[hidden email]> wrote:

    Aside from the PGP key issues, do we need more time to understand the serializable issues with AutoDetectParser?  Or, are we good to go?
   
    -----Original Message-----
    From: Konstantin Gribov [mailto:[hidden email]]
    Sent: Monday, October 24, 2016 12:42 PM
    To: [hidden email]
    Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1
   
    Chris,
   
    you have new PGP key which is not present your account in [1]. Could you please update it there? Also, `KEYS` file in `tika-1.14-src.zip` contains only your old PGP key.
   
    SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine on Arch Linux, OpenJDK 8u112 with and without Tesseract.
   
    Build (`mvn clean package verify`) fails same way as Julien Nioche and Dave mentioned on Arch Linux with or without tesseract. I have no exiftool, so I'll try to investigate what else make `AutoDetectParser` non-serializable.
    I hope, I'll have a bit time this evening for this.
   
    Also, one test fails `testParserHandlingOfNonSerializable` because exception message was `Unable to serialize [AutoDetectParser] to pass to the Fork...` instead of `Unable to serialize [ParseContext] to pass to the Fork...`. But it seems the same issue as above.
   
    Both issues aren't strict blockers to me but I'd ask you to increase voting time to dig into issue with non-serializable `AutoDetectParser` if you don't mind.
   
    [1]: https://people.apache.org/keys/committer/mattmann.asc
   
    пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:
   
    Hello,
   
    I am getting the same as Julien without exiftool installed on my Mac.
    Everything passes on Windows 10 and Ubuntu.
   
    Will have a dig and see what I find.
   
    Cheers,
    Dave
   
    > On 20 Oct 2016, at 13:34, Julien Nioche
    > <[hidden email]>
    wrote:
    >
    > Hi
    >
    > Am getting the following when running 'mvn clean package', have I
    forgotten
    > something obvious?
    >
    > Julien
    >
    > *Failed tests: *
    > *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
    > expected:<Unable to serialize [ParseContext] to pass to the Fork...>
    > but was:<Unable to serialize [AutoDetectParser] to pass to the
    > Fork...>* *Tests in error: *
    > *
    >
    ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
    > » Tika*
    > *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable to
    > serialize ...*
    > *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable to
    > serialize ...*
    >
    > *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
    >
    > *[INFO]
    > ----------------------------------------------------------------------
    > --*
    > *[INFO] Reactor Summary:*
    > *[INFO] *
    > *[INFO] Apache Tika parent ................................ SUCCESS
    > [4.368s]*
    > *[INFO] Apache Tika core .................................. SUCCESS
    > [16.487s]*
    > *[INFO] Apache Tika parsers ............................... FAILURE
    > [4:54.631s]*
    >
    >
    >
    > On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]> wrote:
    >
    >> Hi Folks,
    >>
    >> A first candidate for the Tika 1.14 release is available at:
    >>
    >>  https://dist.apache.org/repos/dist/dev/tika/
    >>
    >> The release candidate is a zip archive of the sources in:
    >>
    >> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
    >> 687d7706c9778e4f49f2834a07e5a9d99b23042b
    >>
    >> The SHA1 checksum of the archive is:
    >> ad9152392ffe6b620c8102ab538df0579b36c520
    >>
    >> In addition, a staged maven repository is available here:
    >>
    >> https://repository.apache.org/content/repositories/orgapachetika-1020
    >> /
    >>
    >> Please vote on releasing this package as Apache Tika 1.14.
    >> The vote is open for the next 72 hours and passes if a majority of at
    >> least three +1 Tika PMC votes are cast.
    >>
    >> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
    >> this package because..
    >>
    >> Cheers,
    >> Chris
    >>
    >> P.S. Of course here is my +1.
    >>
    >>
    >>
    >>
    >>
    >>
    >
    >
    > --
    >
    > *Open Source Solutions for Text Engineering*
    >
    > http://www.digitalpebble.com
    > http://digitalpebble.blogspot.com/
    > #digitalpebble <http://twitter.com/digitalpebble>
   
    --
   
    Best regards,
    Konstantin Gribov
   

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Konstantin Gribov
In reply to this post by Mattmann, Chris A (3010)
I've pushed simple fix for TIKA-2056 which broke build in presense of
`ffmpeg` or `exiftool`. I'm unaware how much of downstream users actually
use `ForkParser` but issue in JIRA wasn't really hot, so my +1 for rc1.
Thanks, Chris.

[x] +1 Release this package as Apache Tika 1.14
[ ] -1 Do not release this package because..

пн, 31 окт. 2016 г. в 19:05, Mattmann, Chris A (3010) <
[hidden email]>:

> I’m happy to roll with this RC – it definitely has to do with something
> that exiftool
> installs or with exiftool itself. Recommend checking out the issue that we
> filed that
> Tim linked earlier.
>
> Konstantin, Tim, I will finish off the release this week if there are no
> further objections.
> I think we have enough +1s to move forward. I’ll finish the RC on Friday
> if there are
> no further comments.
>
> Thanks,
> Chris
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Principal Data Scientist, Engineering Administrative Office (3010)
> Manager, Open Source Projects Formulation and Development Office (8212)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 180-503E, Mailstop: 180-502
> Email: [hidden email]
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Director, Information Retrieval and Data Science Group (IRDS)
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> WWW: http://irds.usc.edu/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
> On 10/31/16, 9:02 AM, "Allison, Timothy B." <[hidden email]> wrote:
>
>     Aside from the PGP key issues, do we need more time to understand the
> serializable issues with AutoDetectParser?  Or, are we good to go?
>
>     -----Original Message-----
>     From: Konstantin Gribov [mailto:[hidden email]]
>     Sent: Monday, October 24, 2016 12:42 PM
>     To: [hidden email]
>     Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1
>
>     Chris,
>
>     you have new PGP key which is not present your account in [1]. Could
> you please update it there? Also, `KEYS` file in `tika-1.14-src.zip`
> contains only your old PGP key.
>
>     SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine
> on Arch Linux, OpenJDK 8u112 with and without Tesseract.
>
>     Build (`mvn clean package verify`) fails same way as Julien Nioche and
> Dave mentioned on Arch Linux with or without tesseract. I have no exiftool,
> so I'll try to investigate what else make `AutoDetectParser`
> non-serializable.
>     I hope, I'll have a bit time this evening for this.
>
>     Also, one test fails `testParserHandlingOfNonSerializable` because
> exception message was `Unable to serialize [AutoDetectParser] to pass to
> the Fork...` instead of `Unable to serialize [ParseContext] to pass to the
> Fork...`. But it seems the same issue as above.
>
>     Both issues aren't strict blockers to me but I'd ask you to increase
> voting time to dig into issue with non-serializable `AutoDetectParser` if
> you don't mind.
>
>     [1]: https://people.apache.org/keys/committer/mattmann.asc
>
>     пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:
>
>     Hello,
>
>     I am getting the same as Julien without exiftool installed on my Mac.
>     Everything passes on Windows 10 and Ubuntu.
>
>     Will have a dig and see what I find.
>
>     Cheers,
>     Dave
>
>     > On 20 Oct 2016, at 13:34, Julien Nioche
>     > <[hidden email]>
>     wrote:
>     >
>     > Hi
>     >
>     > Am getting the following when running 'mvn clean package', have I
>     forgotten
>     > something obvious?
>     >
>     > Julien
>     >
>     > *Failed tests: *
>     > *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
>     > expected:<Unable to serialize [ParseContext] to pass to the Fork...>
>     > but was:<Unable to serialize [AutoDetectParser] to pass to the
>     > Fork...>* *Tests in error: *
>     > *
>     >
>
> ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
>     > » Tika*
>     > *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable
> to
>     > serialize ...*
>     > *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable
> to
>     > serialize ...*
>     >
>     > *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
>     >
>     > *[INFO]
>     >
> ----------------------------------------------------------------------
>     > --*
>     > *[INFO] Reactor Summary:*
>     > *[INFO] *
>     > *[INFO] Apache Tika parent ................................ SUCCESS
>     > [4.368s]*
>     > *[INFO] Apache Tika core .................................. SUCCESS
>     > [16.487s]*
>     > *[INFO] Apache Tika parsers ............................... FAILURE
>     > [4:54.631s]*
>     >
>     >
>     >
>     > On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]>
> wrote:
>     >
>     >> Hi Folks,
>     >>
>     >> A first candidate for the Tika 1.14 release is available at:
>     >>
>     >>  https://dist.apache.org/repos/dist/dev/tika/
>     >>
>     >> The release candidate is a zip archive of the sources in:
>     >>
>     >> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
>     >> 687d7706c9778e4f49f2834a07e5a9d99b23042b
>     >>
>     >> The SHA1 checksum of the archive is:
>     >> ad9152392ffe6b620c8102ab538df0579b36c520
>     >>
>     >> In addition, a staged maven repository is available here:
>     >>
>     >>
> https://repository.apache.org/content/repositories/orgapachetika-1020
>     >> /
>     >>
>     >> Please vote on releasing this package as Apache Tika 1.14.
>     >> The vote is open for the next 72 hours and passes if a majority of
> at
>     >> least three +1 Tika PMC votes are cast.
>     >>
>     >> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not
> release
>     >> this package because..
>     >>
>     >> Cheers,
>     >> Chris
>     >>
>     >> P.S. Of course here is my +1.
>     >>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >
>     >
>     > --
>     >
>     > *Open Source Solutions for Text Engineering*
>     >
>     > http://www.digitalpebble.com
>     > http://digitalpebble.blogspot.com/
>     > #digitalpebble <http://twitter.com/digitalpebble>
>
>     --
>
>     Best regards,
>     Konstantin Gribov
>
>
> --

Best regards,
Konstantin Gribov
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

Mattmann, Chris A (3010)
Sounds awesome thanks K

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Principal Data Scientist, Engineering Administrative Office (3010)
Manager, Open Source Projects Formulation and Development Office (8212)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 180-503E, Mailstop: 180-502
Email: [hidden email]
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Director, Information Retrieval and Data Science Group (IRDS)
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
WWW: http://irds.usc.edu/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 

On 10/31/16, 12:02 PM, "Konstantin Gribov" <[hidden email]> wrote:

    I've pushed simple fix for TIKA-2056 which broke build in presense of
    `ffmpeg` or `exiftool`. I'm unaware how much of downstream users actually
    use `ForkParser` but issue in JIRA wasn't really hot, so my +1 for rc1.
    Thanks, Chris.
   
    [x] +1 Release this package as Apache Tika 1.14
    [ ] -1 Do not release this package because..
   
    пн, 31 окт. 2016 г. в 19:05, Mattmann, Chris A (3010) <
    [hidden email]>:
   
    > I’m happy to roll with this RC – it definitely has to do with something
    > that exiftool
    > installs or with exiftool itself. Recommend checking out the issue that we
    > filed that
    > Tim linked earlier.
    >
    > Konstantin, Tim, I will finish off the release this week if there are no
    > further objections.
    > I think we have enough +1s to move forward. I’ll finish the RC on Friday
    > if there are
    > no further comments.
    >
    > Thanks,
    > Chris
    >
    >
    > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > Chris Mattmann, Ph.D.
    > Principal Data Scientist, Engineering Administrative Office (3010)
    > Manager, Open Source Projects Formulation and Development Office (8212)
    > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
    > Office: 180-503E, Mailstop: 180-502
    > Email: [hidden email]
    > WWW:  http://sunset.usc.edu/~mattmann/
    > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    > Director, Information Retrieval and Data Science Group (IRDS)
    > Adjunct Associate Professor, Computer Science Department
    > University of Southern California, Los Angeles, CA 90089 USA
    > WWW: http://irds.usc.edu/
    > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
    >
    >
    > On 10/31/16, 9:02 AM, "Allison, Timothy B." <[hidden email]> wrote:
    >
    >     Aside from the PGP key issues, do we need more time to understand the
    > serializable issues with AutoDetectParser?  Or, are we good to go?
    >
    >     -----Original Message-----
    >     From: Konstantin Gribov [mailto:[hidden email]]
    >     Sent: Monday, October 24, 2016 12:42 PM
    >     To: [hidden email]
    >     Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1
    >
    >     Chris,
    >
    >     you have new PGP key which is not present your account in [1]. Could
    > you please update it there? Also, `KEYS` file in `tika-1.14-src.zip`
    > contains only your old PGP key.
    >
    >     SHA-1 and MD5 digests are fine, `tika-server` and `tika-app` work fine
    > on Arch Linux, OpenJDK 8u112 with and without Tesseract.
    >
    >     Build (`mvn clean package verify`) fails same way as Julien Nioche and
    > Dave mentioned on Arch Linux with or without tesseract. I have no exiftool,
    > so I'll try to investigate what else make `AutoDetectParser`
    > non-serializable.
    >     I hope, I'll have a bit time this evening for this.
    >
    >     Also, one test fails `testParserHandlingOfNonSerializable` because
    > exception message was `Unable to serialize [AutoDetectParser] to pass to
    > the Fork...` instead of `Unable to serialize [ParseContext] to pass to the
    > Fork...`. But it seems the same issue as above.
    >
    >     Both issues aren't strict blockers to me but I'd ask you to increase
    > voting time to dig into issue with non-serializable `AutoDetectParser` if
    > you don't mind.
    >
    >     [1]: https://people.apache.org/keys/committer/mattmann.asc
    >
    >     пн, 24 окт. 2016 г. в 16:15, David Meikle <[hidden email]>:
    >
    >     Hello,
    >
    >     I am getting the same as Julien without exiftool installed on my Mac.
    >     Everything passes on Windows 10 and Ubuntu.
    >
    >     Will have a dig and see what I find.
    >
    >     Cheers,
    >     Dave
    >
    >     > On 20 Oct 2016, at 13:34, Julien Nioche
    >     > <[hidden email]>
    >     wrote:
    >     >
    >     > Hi
    >     >
    >     > Am getting the following when running 'mvn clean package', have I
    >     forgotten
    >     > something obvious?
    >     >
    >     > Julien
    >     >
    >     > *Failed tests: *
    >     > *  ForkParserIntegrationTest.testParserHandlingOfNonSerializable:210
    >     > expected:<Unable to serialize [ParseContext] to pass to the Fork...>
    >     > but was:<Unable to serialize [AutoDetectParser] to pass to the
    >     > Fork...>* *Tests in error: *
    >     > *
    >     >
    >
    > ForkParserIntegrationTest.testAttachingADebuggerOnTheForkedParserShouldWork:234
    >     > » Tika*
    >     > *  ForkParserIntegrationTest.testForkedPDFParsing:257 » Tika Unable
    > to
    >     > serialize ...*
    >     > *  ForkParserIntegrationTest.testForkedTextParsing:66 » Tika Unable
    > to
    >     > serialize ...*
    >     >
    >     > *Tests run: 755, Failures: 1, Errors: 3, Skipped: 17*
    >     >
    >     > *[INFO]
    >     >
    > ----------------------------------------------------------------------
    >     > --*
    >     > *[INFO] Reactor Summary:*
    >     > *[INFO] *
    >     > *[INFO] Apache Tika parent ................................ SUCCESS
    >     > [4.368s]*
    >     > *[INFO] Apache Tika core .................................. SUCCESS
    >     > [16.487s]*
    >     > *[INFO] Apache Tika parsers ............................... FAILURE
    >     > [4:54.631s]*
    >     >
    >     >
    >     >
    >     > On 19 October 2016 at 19:48, Chris Mattmann <[hidden email]>
    > wrote:
    >     >
    >     >> Hi Folks,
    >     >>
    >     >> A first candidate for the Tika 1.14 release is available at:
    >     >>
    >     >>  https://dist.apache.org/repos/dist/dev/tika/
    >     >>
    >     >> The release candidate is a zip archive of the sources in:
    >     >>
    >     >> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=
    >     >> 687d7706c9778e4f49f2834a07e5a9d99b23042b
    >     >>
    >     >> The SHA1 checksum of the archive is:
    >     >> ad9152392ffe6b620c8102ab538df0579b36c520
    >     >>
    >     >> In addition, a staged maven repository is available here:
    >     >>
    >     >>
    > https://repository.apache.org/content/repositories/orgapachetika-1020
    >     >> /
    >     >>
    >     >> Please vote on releasing this package as Apache Tika 1.14.
    >     >> The vote is open for the next 72 hours and passes if a majority of
    > at
    >     >> least three +1 Tika PMC votes are cast.
    >     >>
    >     >> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not
    > release
    >     >> this package because..
    >     >>
    >     >> Cheers,
    >     >> Chris
    >     >>
    >     >> P.S. Of course here is my +1.
    >     >>
    >     >>
    >     >>
    >     >>
    >     >>
    >     >>
    >     >
    >     >
    >     > --
    >     >
    >     > *Open Source Solutions for Text Engineering*
    >     >
    >     > http://www.digitalpebble.com
    >     > http://digitalpebble.blogspot.com/
    >     > #digitalpebble <http://twitter.com/digitalpebble>
    >
    >     --
    >
    >     Best regards,
    >     Konstantin Gribov
    >
    >
    > --
   
    Best regards,
    Konstantin Gribov
   

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 1.14 Release Candidate #1

kkrugler
In reply to this post by Chris Mattmann
[Resending - has anyone else run into this same issue, when building from the 1.14-rc1 tag?]

Just for grins, I pulled from git and checked out the the 1.14-rc1 tag, then ran “mvn clean package”.

For me it fails with:

Running org.apache.tika.parser.strings.StringsParserTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.685 sec <<< FAILURE! - in org.apache.tika.parser.strings.StringsParserTest
testParse(org.apache.tika.parser.strings.StringsParserTest)  Time elapsed: 1.685 sec  <<< FAILURE!
java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.apache.tika.parser.strings.StringsParserTest.testParse(StringsParserTest.java:68)



Results :

Failed tests:
 StringsParserTest.testParse:68 null

Tests run: 755, Failures: 1, Errors: 0, Skipped: 18

— Ken

> On Oct 19, 2016, at 11:48am, Chris Mattmann <[hidden email]> wrote:
>
> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
> https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c9778e4f49f2834a07e5a9d99b23042b 
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14
> [ ] -1 Do not release this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Allison, Timothy B.
Ken,
  I don't have strings installed.  I suspect what's happening, though, is that this file is now being handled by the dbf parser, and I'm getting this exception with that parser.


org.apache.tika.exception.TikaException: Expecting space or asterisk at beginning of record, not:10

        at org.apache.tika.parser.dbf.DBFReader.fillRow(DBFReader.java:165)
        at org.apache.tika.parser.dbf.DBFReader.next(DBFReader.java:138)
        at org.apache.tika.parser.dbf.DBFParser.parse(DBFParser.java:81)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.TikaTest.getXML(TikaTest.java:186)
        at org.apache.tika.TikaTest.getXML(TikaTest.java:171)
        at org.apache.tika.parser.strings.StringsParserTest.testParse2(StringsParserTest.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at ...

-----Original Message-----
From: Ken Krugler [mailto:[hidden email]]
Sent: Tuesday, November 1, 2016 11:47 PM
To: [hidden email]
Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1

[Resending - has anyone else run into this same issue, when building from the 1.14-rc1 tag?]

Just for grins, I pulled from git and checked out the the 1.14-rc1 tag, then ran “mvn clean package”.

For me it fails with:

Running org.apache.tika.parser.strings.StringsParserTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.685 sec <<< FAILURE! - in org.apache.tika.parser.strings.StringsParserTest
testParse(org.apache.tika.parser.strings.StringsParserTest)  Time elapsed: 1.685 sec  <<< FAILURE!
java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.apache.tika.parser.strings.StringsParserTest.testParse(StringsParserTest.java:68)



Results :

Failed tests:
 StringsParserTest.testParse:68 null

Tests run: 755, Failures: 1, Errors: 0, Skipped: 18

— Ken

> On Oct 19, 2016, at 11:48am, Chris Mattmann <[hidden email]> wrote:
>
> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
> https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c
> 9778e4f49f2834a07e5a9d99b23042b
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
> this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



Reply | Threaded
Open this post in threaded view
|

RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Allison, Timothy B.
Or, in other words, we need to find another test file or a modification of the current test file for strings since we now have a dbf parser.  I don't think this is a blocker, do you?

Given that this is a truncated file, I'd expect the exception from the DBFParser, but if we don't want that behavior, let's open a ticket and fix.

-----Original Message-----
From: Allison, Timothy B. [mailto:[hidden email]]
Sent: Wednesday, November 2, 2016 9:17 AM
To: [hidden email]; [hidden email]
Subject: RE: [VOTE] Apache Tika 1.14 Release Candidate #1

Ken,
  I don't have strings installed.  I suspect what's happening, though, is that this file is now being handled by the dbf parser, and I'm getting this exception with that parser.


org.apache.tika.exception.TikaException: Expecting space or asterisk at beginning of record, not:10

        at org.apache.tika.parser.dbf.DBFReader.fillRow(DBFReader.java:165)
        at org.apache.tika.parser.dbf.DBFReader.next(DBFReader.java:138)
        at org.apache.tika.parser.dbf.DBFParser.parse(DBFParser.java:81)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
        at org.apache.tika.TikaTest.getXML(TikaTest.java:186)
        at org.apache.tika.TikaTest.getXML(TikaTest.java:171)
        at org.apache.tika.parser.strings.StringsParserTest.testParse2(StringsParserTest.java:42)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at ...

-----Original Message-----
From: Ken Krugler [mailto:[hidden email]]
Sent: Tuesday, November 1, 2016 11:47 PM
To: [hidden email]
Subject: Re: [VOTE] Apache Tika 1.14 Release Candidate #1

[Resending - has anyone else run into this same issue, when building from the 1.14-rc1 tag?]

Just for grins, I pulled from git and checked out the the 1.14-rc1 tag, then ran “mvn clean package”.

For me it fails with:

Running org.apache.tika.parser.strings.StringsParserTest
Tests run: 1, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.685 sec <<< FAILURE! - in org.apache.tika.parser.strings.StringsParserTest
testParse(org.apache.tika.parser.strings.StringsParserTest)  Time elapsed: 1.685 sec  <<< FAILURE!
java.lang.AssertionError: null
        at org.junit.Assert.fail(Assert.java:86)
        at org.junit.Assert.assertTrue(Assert.java:41)
        at org.junit.Assert.assertTrue(Assert.java:52)
        at org.apache.tika.parser.strings.StringsParserTest.testParse(StringsParserTest.java:68)



Results :

Failed tests:
 StringsParserTest.testParse:68 null

Tests run: 755, Failures: 1, Errors: 0, Skipped: 18

— Ken

> On Oct 19, 2016, at 11:48am, Chris Mattmann <[hidden email]> wrote:
>
> Hi Folks,
>
> A first candidate for the Tika 1.14 release is available at:
>
> https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
>
> https://git-wip-us.apache.org/repos/asf?p=tika.git;a=tree;hb=687d7706c
> 9778e4f49f2834a07e5a9d99b23042b
>
> The SHA1 checksum of the archive is:
> ad9152392ffe6b620c8102ab538df0579b36c520
>
> In addition, a staged maven repository is available here:
>
> https://repository.apache.org/content/repositories/orgapachetika-1020/
>
> Please vote on releasing this package as Apache Tika 1.14.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.14 [ ] -1 Do not release
> this package because..
>
> Cheers,
> Chris
>
> P.S. Of course here is my +1.
>
>
>
>
>

--------------------------
Ken Krugler
+1 530-210-6378
http://www.scaleunlimited.com
custom big data solutions & training
Hadoop, Cascading, Cassandra & Solr



12