[VOTE] Apache Tika 0.4

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] Apache Tika 0.4

Mattmann, Chris A (3010)
Hi Folks,

I have posted a candidate for the Apache Tika 0.4 release at

http://people.apache.org/~mattmann/apache-tika-0.4/rc1/

See the included CHANGES.txt file for details on release contents and latest
changes. The release was made from the 0.4 branch at:

http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/

Please vote on releasing these packages as Apache Tika 0.4. The vote is open
for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
is welcome to check the release candidate and voice their approval or
disapproval. The vote passes if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Tika 0.4.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4

Karl Heinz Marbaise-3
Hi,

just a question is it the usual way to have no directories inside the
.tar.gz file ?

All other Apache software works the way ...

apache-ant-1.7.0.tar.gz would produce a sub folder apache-ant-1.7.0

instead of putting all the files directly in the current folder...

The 0.3 Release of Tika behaves exactly the way and contains a subfolder...

Kind regards
Karl Heinz Marbaise

> Hi Folks,
>
> I have posted a candidate for the Apache Tika 0.4 release at
>
> http://people.apache.org/~mattmann/apache-tika-0.4/rc1/
>
> See the included CHANGES.txt file for details on release contents and latest
> changes. The release was made from the 0.4 branch at:
>
> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>
> Please vote on releasing these packages as Apache Tika 0.4. The vote is open
> for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
> is welcome to check the release candidate and voice their approval or
> disapproval. The vote passes if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache Tika 0.4.
>
> [ ] -1 Do not release the packages because...
>
> Thanks!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>


--
SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
Hauptstrasse 177                         USt.IdNr: DE191347579
52146 Würselen                           http://www.soebes.de
Reply | Threaded
Open this post in threaded view
|

[VOTE] Apache Tika 0.4 Release Candidate 2

Mattmann, Chris A (3010)
In reply to this post by Mattmann, Chris A (3010)
Hi Folks,

I have posted a second release candidate for the Apache Tika 0.4 release at

http://people.apache.org/~mattmann/apache-tika-0.4/rc2/

This release candidate addresses Karl's comments regarding the src tarball
unpacking to an apache-tika-0.4 directory rather than just unpacking itself
in the current working directory.

See the included CHANGES.txt file for details on release contents and latest
changes. The release candidate was made from the 0.4 branch at:

http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/

Please vote on releasing these packages as Apache Tika 0.4. The vote is open
for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
is welcome to check the release candidate and voice their approval or
disapproval. The vote passes if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Tika 0.4.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Grant Ingersoll-2
1. When I download and run "mvn compile" or "mvn test" from the top  
directory, I get:
<snip>
Missing:
----------
1) org.apache.tika:tika-core:jar:0.4

   Try downloading the file manually from the project website.

   Then, install it using the command:
       mvn install:install-file -DgroupId=org.apache.tika -
DartifactId=tika-core -Dversion=0.4 -Dpackaging=jar -Dfile=/path/to/file

   Alternatively, if you host your own repository you can deploy the  
file there:
       mvn deploy:deploy-file -DgroupId=org.apache.tika -
DartifactId=tika-core -Dversion=0.4 -Dpackaging=jar -Dfile=/path/to/
file -Durl=[url] -DrepositoryId=[id]

   Path to dependency:
         1) org.apache.tika:tika-parsers:bundle:0.4
         2) org.apache.tika:tika-core:jar:0.4

----------
1 required artifact is missing.
</snip>

Not a show stopper, but not exactly a good out of the box experience,  
especially given reading READMEs is not likely to happen.

Note, this, to me, is a big Maven problem and we have the same issue  
over in Mahout.  How is it that Maven can get all the other  
dependencies, but it isn't smart enough to know when a dependency is  
in a local module?  Someone once pointed me at a means to make sure  
module dependencies build first, but I've never implemented it.

2. While not required, it would be good to publish your public key on  
a key server such as http://pgp.mit.edu:11371/ and also to get your  
key signed by someone else at the ASF.

3. Did something change such that CONTENT_LANGUAGE is now not being  
set for HTML?  We have a test in Solr that looks for that attribute,  
and it was passing with 0.3 but is now not passing in 0.4.

As of now, +0 due to #3 above.

-Grant


On Jul 15, 2009, at 12:45 AM, Mattmann, Chris A wrote:

> Hi Folks,
>
> I have posted a second release candidate for the Apache Tika 0.4  
> release at
>
> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>
> This release candidate addresses Karl's comments regarding the src  
> tarball
> unpacking to an apache-tika-0.4 directory rather than just unpacking  
> itself
> in the current working directory.
>
> See the included CHANGES.txt file for details on release contents  
> and latest
> changes. The release candidate was made from the 0.4 branch at:
>
> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>
> Please vote on releasing these packages as Apache Tika 0.4. The vote  
> is open
> for the next 72 hours. Only votes from Lucene PMC are binding, but  
> everyone
> is welcome to check the release candidate and voice their approval or
> disapproval. The vote passes if at least three binding +1 votes are  
> cast.
>
> [ ] +1 Release the packages as Apache Tika 0.4.
>
> [ ] -1 Do not release the packages because...
>
> Thanks!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Karl Heinz Marbaise-3
Hi,
> 1. When I download and run "mvn compile" or "mvn test" from the top
You have to start with "mvn install" instead of the others...

Kind regards
Karl Heinz Marbaise
--
SoftwareEntwicklung Beratung Schulung    Tel.: +49 (0) 2405 / 415 893
Dipl.Ing.(FH) Karl Heinz Marbaise        ICQ#: 135949029
Hauptstrasse 177                         USt.IdNr: DE191347579
52146 Würselen                           http://www.soebes.de
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Mattmann, Chris A (3010)
In reply to this post by Grant Ingersoll-2
Hey Grant,

> 2. While not required, it would be good to publish your public key on
> a key server such as http://pgp.mit.edu:11371/ and also to get your
> key signed by someone else at the ASF.

+1, done.

>
> 3. Did something change such that CONTENT_LANGUAGE is now not being
> set for HTML?  We have a test in Solr that looks for that attribute,
> and it was passing with 0.3 but is now not passing in 0.4.

Can you point me at the test in SOLR? I'll take a look. Jukka can also
comment probably, as he's been intimately involved in recent developments in
Tika...

Thanks,
Chris


>
> As of now, +0 due to #3 above.
>
> -Grant
>
>
> On Jul 15, 2009, at 12:45 AM, Mattmann, Chris A wrote:
>
>> Hi Folks,
>>
>> I have posted a second release candidate for the Apache Tika 0.4
>> release at
>>
>> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>>
>> This release candidate addresses Karl's comments regarding the src
>> tarball
>> unpacking to an apache-tika-0.4 directory rather than just unpacking
>> itself
>> in the current working directory.
>>
>> See the included CHANGES.txt file for details on release contents
>> and latest
>> changes. The release candidate was made from the 0.4 branch at:
>>
>> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>>
>> Please vote on releasing these packages as Apache Tika 0.4. The vote
>> is open
>> for the next 72 hours. Only votes from Lucene PMC are binding, but
>> everyone
>> is welcome to check the release candidate and voice their approval or
>> disapproval. The vote passes if at least three binding +1 votes are
>> cast.
>>
>> [ ] +1 Release the packages as Apache Tika 0.4.
>>
>> [ ] -1 Do not release the packages because...
>>
>> Thanks!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [hidden email]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Mattmann, Chris A (3010)
In reply to this post by Grant Ingersoll-2
Hey Grant,

>
> 2. While not required, it would be good to publish your public key on
> a key server such as http://pgp.mit.edu:11371/ and also to get your
> key signed by someone else at the ASF.

To clarify, I was saying "done" to part 1 of your above request. As for part
2 (getting someone else at the ASF to sign it), is that something you can do
for me?

Thanks,
Chris

>
>
> On Jul 15, 2009, at 12:45 AM, Mattmann, Chris A wrote:
>
>> Hi Folks,
>>
>> I have posted a second release candidate for the Apache Tika 0.4
>> release at
>>
>> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>>
>> This release candidate addresses Karl's comments regarding the src
>> tarball
>> unpacking to an apache-tika-0.4 directory rather than just unpacking
>> itself
>> in the current working directory.
>>
>> See the included CHANGES.txt file for details on release contents
>> and latest
>> changes. The release candidate was made from the 0.4 branch at:
>>
>> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>>
>> Please vote on releasing these packages as Apache Tika 0.4. The vote
>> is open
>> for the next 72 hours. Only votes from Lucene PMC are binding, but
>> everyone
>> is welcome to check the release candidate and voice their approval or
>> disapproval. The vote passes if at least three binding +1 votes are
>> cast.
>>
>> [ ] +1 Release the packages as Apache Tika 0.4.
>>
>> [ ] -1 Do not release the packages because...
>>
>> Thanks!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [hidden email]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Grant Ingersoll-2
In reply to this post by Karl Heinz Marbaise-3

On Jul 15, 2009, at 9:04 AM, Karl Heinz Marbaise wrote:

> Hi,
>> 1. When I download and run "mvn compile" or "mvn test" from the top
> You have to start with "mvn install" instead of the others...

Yes, I know, it just isn't usually someone's first inclination to run  
install, IMO.  Just commenting that the out of the box experience is  
slightly annoying.  It is, however, Maven's fault, not Tika's.  We  
have the same problem in Mahout, that's why I said it wasn't a  
showstopper.
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Jukka Zitting
In reply to this post by Grant Ingersoll-2
Hi,

[Oops, my reply went first just to general@lucene. Here's a copy to tika-dev@.]

On Wed, Jul 15, 2009 at 3:00 PM, Grant Ingersoll<[hidden email]> wrote:
> 3. Did something change such that CONTENT_LANGUAGE is now not being set for
> HTML?  We have a test in Solr that looks for that attribute, and it was
> passing with 0.3 but is now not passing in 0.4.

This is because of TIKA-208.

We used to use the ICU4J charset detection mechanism to automatically
detect the encoding of HTML files. ICU4J would also guess the content
language based on the detected encoding (e.g. a document encoded in
KOI8-R is most likely written in Russian).

However, this mechanism wasn't as accurate as the encoding detection
already present in NekoHtml and language detection based on just the
encoding is often incorrect.

See TIKA-209 for some ideas on how to make the language detection more
generic and accurate. For now I think it's better to ship Tika 0.4
without the earlier flawed CONTENT_LANGUAGE implementation for HTML.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Jukka Zitting
In reply to this post by Mattmann, Chris A (3010)
Hi,

On Wed, Jul 15, 2009 at 6:45 AM, Mattmann, Chris
A<[hidden email]> wrote:
> Please vote on releasing these packages as Apache Tika 0.4.

[x] +1 Release the packages as Apache Tika 0.4.

Build and tests OK on Maven 2.2.0 / Sun Java 1.6.0_07 / Fedora Core 9.
Checksum (MD5: 368618df671ad6e9bf0f7f33843a3cd0) and signature OK.
Sources match tika/branches/0.4 as of revision 794268.

Some comments (none blocking):

* It would be good to have also a SHA1 checksum
(ad04d3e02be57a51b5f446c4f921d9280e5b11b9) of the release archive.

* As mentioned by Grant, it would be good to have you included in the
Apache Web of Trust. See
http://www.apache.org/dev/release-signing.html#link-into-wot for
details. Meanwhile, see below for my signature of the release archive.
You can append it to the .asc file.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEABECAAYFAkpd4PIACgkQpzBSnKNVpj68ygCgh9uRcqQLWUBNwi8Tnif+AxEW
xgYAniydVppX2W1KLi+is5XVr4R+G5lH
=bPdd
-----END PGP SIGNATURE-----

* Please drop the tika-app and tika-reactor directories from the
staged Maven repository before copying them over to
m2-ibiblio-rsync-repository. The licensing issues with the PDFBox
dependency make me prefer not to publish the pre-built tika-app jar,
and tika-reactor is of no use to any downstream project.

* It would be nice to have the release sources tagged in svn. Even if
you plan to create the final 0.4 tag only after the vote passes, it
would be good to have a tag like 0.4-rc2 for this candidate.

* I'd prefer if the next release was packaged as a source jar instead
of a tarball. We've seen a number of issues with people having trouble
unpacking the tarballs on windows.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Grant Ingersoll-2
In reply to this post by Grant Ingersoll-2
OK, I change my vote to +1.  I'll update Solr as needed.

On Jul 15, 2009, at 9:30 AM, Jukka Zitting wrote:

> Hi,
>
> On Wed, Jul 15, 2009 at 3:00 PM, Grant  
> Ingersoll<[hidden email]> wrote:
>> 3. Did something change such that CONTENT_LANGUAGE is now not being  
>> set for
>> HTML?  We have a test in Solr that looks for that attribute, and it  
>> was
>> passing with 0.3 but is now not passing in 0.4.
>
> This is because of TIKA-208.
>
> We used to use the ICU4J charset detection mechanism to automatically
> detect the encoding of HTML files. ICU4J would also guess the content
> language based on the detected encoding (e.g. a document encoded in
> KOI8-R is most likely written in Russian).
>
> However, this mechanism wasn't as accurate as the encoding detection
> already present in NekoHtml and language detection based on just the
> encoding is often incorrect.
>
> See TIKA-209 for some ideas on how to make the language detection more
> generic and accurate. For now I think it's better to ship Tika 0.4
> without the earlier flawed CONTENT_LANGUAGE implementation for HTML.
>
> BR,
>
> Jukka Zitting


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Mattmann, Chris A (3010)
In reply to this post by Mattmann, Chris A (3010)
Hi All,

So far I have +1's from Grant and from Jukka. I need one more +1 to push this out - any takers from the PMC???

Thanks!!

Cheers,
Chris


On 7/14/09 9:45 PM, "Mattmann, Chris A" <[hidden email]> wrote:

Hi Folks,

I have posted a second release candidate for the Apache Tika 0.4 release at

http://people.apache.org/~mattmann/apache-tika-0.4/rc2/

This release candidate addresses Karl's comments regarding the src tarball
unpacking to an apache-tika-0.4 directory rather than just unpacking itself
in the current working directory.

See the included CHANGES.txt file for details on release contents and latest
changes. The release candidate was made from the 0.4 branch at:

http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/

Please vote on releasing these packages as Apache Tika 0.4. The vote is open
for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
is welcome to check the release candidate and voice their approval or
disapproval. The vote passes if at least three binding +1 votes are cast.

[ ] +1 Release the packages as Apache Tika 0.4.

[ ] -1 Do not release the packages because...

Thanks!

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++





++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Michael McCandless-2
I'll have a look today!

Thanks for the reminder.

Mike

On Sun, Jul 19, 2009 at 12:58 AM, Mattmann, Chris A
(388J)<[hidden email]> wrote:

> Hi All,
>
> So far I have +1's from Grant and from Jukka. I need one more +1 to push this out - any takers from the PMC???
>
> Thanks!!
>
> Cheers,
> Chris
>
>
> On 7/14/09 9:45 PM, "Mattmann, Chris A" <[hidden email]> wrote:
>
> Hi Folks,
>
> I have posted a second release candidate for the Apache Tika 0.4 release at
>
> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>
> This release candidate addresses Karl's comments regarding the src tarball
> unpacking to an apache-tika-0.4 directory rather than just unpacking itself
> in the current working directory.
>
> See the included CHANGES.txt file for details on release contents and latest
> changes. The release candidate was made from the 0.4 branch at:
>
> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>
> Please vote on releasing these packages as Apache Tika 0.4. The vote is open
> for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
> is welcome to check the release candidate and voice their approval or
> disapproval. The vote passes if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache Tika 0.4.
>
> [ ] -1 Do not release the packages because...
>
> Thanks!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Michael McCandless-2
In reply to this post by Mattmann, Chris A (3010)
+1 to release

Signature checks out.

I tested on OpenSolaris 2009.06.  "mvn install" ran fine; I then took
tika-app-0.4.jar and swapped it in for Lucene in Actions's 2nd edition
TikaIndexer example and it runs fine w/ no source code changes
required (nice back compat!).

Then I extracted all text from LIA2's current draft manuscripts (MS
Word 2003 docs) and it did great.

Good work!

Mike

On Sun, Jul 19, 2009 at 12:58 AM, Mattmann, Chris A
(388J)<[hidden email]> wrote:

> Hi All,
>
> So far I have +1's from Grant and from Jukka. I need one more +1 to push this out - any takers from the PMC???
>
> Thanks!!
>
> Cheers,
> Chris
>
>
> On 7/14/09 9:45 PM, "Mattmann, Chris A" <[hidden email]> wrote:
>
> Hi Folks,
>
> I have posted a second release candidate for the Apache Tika 0.4 release at
>
> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>
> This release candidate addresses Karl's comments regarding the src tarball
> unpacking to an apache-tika-0.4 directory rather than just unpacking itself
> in the current working directory.
>
> See the included CHANGES.txt file for details on release contents and latest
> changes. The release candidate was made from the 0.4 branch at:
>
> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>
> Please vote on releasing these packages as Apache Tika 0.4. The vote is open
> for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
> is welcome to check the release candidate and voice their approval or
> disapproval. The vote passes if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache Tika 0.4.
>
> [ ] -1 Do not release the packages because...
>
> Thanks!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Mattmann, Chris A (3010)
In reply to this post by Mattmann, Chris A (3010)
Alright, all,

The vote passes. For the record:

+1¹s from Lucene PMC members:

Jukka Zitting
Grant Ingersoll
Michael McCandless

Non-binding +1's:

Dave Meikle

I'll prepare to push the release out to the Apache mirrors later today and
prepare an announce to send out to announce@

Thanks to all who voted!

Cheers,
Chris



On 7/18/09 9:58 PM, "Mattmann, Chris A (388J)"
<[hidden email]> wrote:

> Hi All,
>
> So far I have +1's from Grant and from Jukka. I need one more +1 to push this
> out - any takers from the PMC???
>
> Thanks!!
>
> Cheers,
> Chris
>
>
> On 7/14/09 9:45 PM, "Mattmann, Chris A" <[hidden email]> wrote:
>
> Hi Folks,
>
> I have posted a second release candidate for the Apache Tika 0.4 release at
>
> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>
> This release candidate addresses Karl's comments regarding the src tarball
> unpacking to an apache-tika-0.4 directory rather than just unpacking itself
> in the current working directory.
>
> See the included CHANGES.txt file for details on release contents and latest
> changes. The release candidate was made from the 0.4 branch at:
>
> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>
> Please vote on releasing these packages as Apache Tika 0.4. The vote is open
> for the next 72 hours. Only votes from Lucene PMC are binding, but everyone
> is welcome to check the release candidate and voice their approval or
> disapproval. The vote passes if at least three binding +1 votes are cast.
>
> [ ] +1 Release the packages as Apache Tika 0.4.
>
> [ ] -1 Do not release the packages because...
>
> Thanks!
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++



Reply | Threaded
Open this post in threaded view
|

Fwd: [VOTE] Apache Tika 0.4 Release Candidate 2

Grant Ingersoll-2
I haven't seen an announcement yet, has this been pushed out?

Begin forwarded message:

> From: "Mattmann, Chris A (388J)" <[hidden email]>
> Date: July 20, 2009 2:24:52 PM EDT
> To: "[hidden email]" <[hidden email]>, "[hidden email]
> " <[hidden email]>
> Subject: Re: [VOTE] Apache Tika 0.4 Release Candidate 2
> Reply-To: [hidden email]
>
> Alright, all,
>
> The vote passes. For the record:
>
> +1¹s from Lucene PMC members:
>
> Jukka Zitting
> Grant Ingersoll
> Michael McCandless
>
> Non-binding +1's:
>
> Dave Meikle
>
> I'll prepare to push the release out to the Apache mirrors later  
> today and
> prepare an announce to send out to announce@
>
> Thanks to all who voted!
>
> Cheers,
> Chris
>
>
>
> On 7/18/09 9:58 PM, "Mattmann, Chris A (388J)"
> <[hidden email]> wrote:
>
>> Hi All,
>>
>> So far I have +1's from Grant and from Jukka. I need one more +1 to  
>> push this
>> out - any takers from the PMC???
>>
>> Thanks!!
>>
>> Cheers,
>> Chris
>>
>>
>> On 7/14/09 9:45 PM, "Mattmann, Chris A" <[hidden email]
>> > wrote:
>>
>> Hi Folks,
>>
>> I have posted a second release candidate for the Apache Tika 0.4  
>> release at
>>
>> http://people.apache.org/~mattmann/apache-tika-0.4/rc2/
>>
>> This release candidate addresses Karl's comments regarding the src  
>> tarball
>> unpacking to an apache-tika-0.4 directory rather than just  
>> unpacking itself
>> in the current working directory.
>>
>> See the included CHANGES.txt file for details on release contents  
>> and latest
>> changes. The release candidate was made from the 0.4 branch at:
>>
>> http://svn.apache.org/repos/asf/lucene/tika/branches/0.4/
>>
>> Please vote on releasing these packages as Apache Tika 0.4. The  
>> vote is open
>> for the next 72 hours. Only votes from Lucene PMC are binding, but  
>> everyone
>> is welcome to check the release candidate and voice their approval or
>> disapproval. The vote passes if at least three binding +1 votes are  
>> cast.
>>
>> [ ] +1 Release the packages as Apache Tika 0.4.
>>
>> [ ] -1 Do not release the packages because...
>>
>> Thanks!
>>
>> Cheers,
>> Chris
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [hidden email]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: [hidden email]
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: [hidden email]
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Apache Tika 0.4 Release Candidate 2

Mattmann, Chris A (3010)
In reply to this post by Jukka Zitting
Hey Jukka,

>
> Some comments (none blocking):
>
> * It would be good to have also a SHA1 checksum
> (ad04d3e02be57a51b5f446c4f921d9280e5b11b9) of the release archive.

+1, done.

>
> * As mentioned by Grant, it would be good to have you included in the
> Apache Web of Trust. See
> http://www.apache.org/dev/release-signing.html#link-into-wot for
> details. Meanwhile, see below for my signature of the release archive.
> You can append it to the .asc file.
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (GNU/Linux)
>
> iEYEABECAAYFAkpd4PIACgkQpzBSnKNVpj68ygCgh9uRcqQLWUBNwi8Tnif+AxEW
> xgYAniydVppX2W1KLi+is5XVr4R+G5lH
> =bPdd
> -----END PGP SIGNATURE-----

+1, done as well. The .asc file for the source release archive includes your
PGP sig as well. I'd love to attend a code signing party as well.

>
> * Please drop the tika-app and tika-reactor directories from the
> staged Maven repository before copying them over to
> m2-ibiblio-rsync-repository. The licensing issues with the PDFBox
> dependency make me prefer not to publish the pre-built tika-app jar,
> and tika-reactor is of no use to any downstream project.

+1, done.

>
> * It would be nice to have the release sources tagged in svn. Even if
> you plan to create the final 0.4 tag only after the vote passes, it
> would be good to have a tag like 0.4-rc2 for this candidate.

+1, done in r797816 and r797817.

>
> * I'd prefer if the next release was packaged as a source jar instead
> of a tarball. We've seen a number of issues with people having trouble
> unpacking the tarballs on windows.

+1, sorry I did it the old fashioned way. If I'm the RM on 0.5, I'll make
sure to handle it as a source jar.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: [hidden email]
WWW:   http://sunset.usc.edu/~mattmann/
Phone: +1 (818) 354-8810
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++