[VOTE] Mahout 0.1

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[VOTE] Mahout 0.1

Grant Ingersoll-2
Please review and vote for releasing Mahout 0.1.  This is our first  
release and is all new code.

The artifacts in are located in:
http://people.apache.org/~gsingers/staging-repo/mahout/org/apache/mahout/

The mahout directory contains a tarball/zip of the whole project (for  
building from source)
The core, examples and taste-web directories contain the artifacts for  
each of those components.
The other directories contain various dependencies and artifacts.


Thanks,
Grant
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Mahout 0.1

Grant Ingersoll-2
Just realized, I didn't add my +1, although it seems implied since I  
produced the candidate.

Anyway, +1

-Grant

On Mar 19, 2009, at 5:36 PM, Grant Ingersoll wrote:

> Please review and vote for releasing Mahout 0.1.  This is our first  
> release and is all new code.
>
> The artifacts in are located in:
> http://people.apache.org/~gsingers/staging-repo/mahout/org/apache/mahout/
>
> The mahout directory contains a tarball/zip of the whole project  
> (for building from source)
> The core, examples and taste-web directories contain the artifacts  
> for each of those components.
> The other directories contain various dependencies and artifacts.
>
>
> Thanks,
> Grant

Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Mahout 0.1

Jukka Zitting
In reply to this post by Grant Ingersoll-2
Hi,

On Thu, Mar 19, 2009 at 10:36 PM, Grant Ingersoll <[hidden email]> wrote:
> Please review and vote for releasing Mahout 0.1.  This is our first release
> and is all new code.

-0 I get the following test failures on "mvn install":

    Running org.apache.mahout.ga.watchmaker.MahoutEvaluatorTest
    Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
0.896 sec <<< FAILURE!
    Running org.apache.mahout.clustering.kmeans.TestKmeansClustering
    Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
0.748 sec <<< FAILURE!
    Running org.apache.mahout.clustering.canopy.TestCanopyCreation
    Tests run: 19, Failures: 0, Errors: 7, Skipped: 0, Time elapsed:
0.951 sec <<< FAILURE!
    Running org.apache.mahout.clustering.meanshift.TestMeanShift
    Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed:
0.66 sec <<< FAILURE!
    Running org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering
    Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed:
0.747 sec <<< FAILURE!

I'm on Windows, running Maven 2.0.9 on Sun Java 1.6.0_03. Most of the
failures (see the end of this message for full details) seem
system-specific, so I'm not voting -1. Anyway, I don't feel
comfortable voting +1 when I see test failures.

Based on a quick look all the licensing stuff seems to be pretty much
in order. However, note the following:

* The copyright year in NOTICE.txt should be 2009, not 2007

* The main purpose of the NOTICE.txt file is to list the copyright
attributions other notes required by the licenses of external
software. For example, as required by the respective license, the
entry for the xpp3 package should include the copyright notice
"Copyright (c) 2002 Extreme! Lab, Indiana University" in addition to
the already existing part ("This product includes..."). Something like
the following should be fine:

    This product includes software developed by the Indiana University
    Extreme! Lab (http://www.extreme.indiana.edu/).
    Copyright (c) 2002 Extreme! Lab, Indiana University.

Also, some notes about the release structure:

* The mahout-0.1-project packages contain the following entries not
included in svn:

    Only in mahout-0.1/core: build
    Only in mahout-0.1/core: input
    Only in mahout-0.1/examples: input
    Only in mahout-0.1/examples: output

* Related to the above, see my comments on general@ and on nutch-dev@
about the nature of source releases. For me the source release is not
one release artefact among many, but the *source* of any other release
artefacts.

BR,

Jukka Zitting


-------------------------------------------------------------------------------
Test set: org.apache.mahout.ga.watchmaker.MahoutEvaluatorTest
-------------------------------------------------------------------------------
Tests run: 2, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.897
sec <<< FAILURE!
testEvaluate(org.apache.mahout.ga.watchmaker.MahoutEvaluatorTest)
Time elapsed: 0.717 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.ga.watchmaker.MahoutEvaluator.evaluate(MahoutEvaluator.java:70)
        at org.apache.mahout.ga.watchmaker.MahoutEvaluatorTest.testEvaluate(MahoutEvaluatorTest.java:49)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 31 more

-------------------------------------------------------------------------------
Test set: org.apache.mahout.clustering.kmeans.TestKmeansClustering
-------------------------------------------------------------------------------
Tests run: 5, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.749
sec <<< FAILURE!
testKMeansMRJob(org.apache.mahout.clustering.kmeans.TestKmeansClustering)
 Time elapsed: 0.477 sec  <<< FAILURE!
junit.framework.AssertionFailedError: output dir exists?
        at junit.framework.Assert.fail(Assert.java:47)
        at junit.framework.Assert.assertTrue(Assert.java:20)
        at org.apache.mahout.clustering.kmeans.TestKmeansClustering.testKMeansMRJob(TestKmeansClustering.java:412)

-------------------------------------------------------------------------------
Test set: org.apache.mahout.clustering.canopy.TestCanopyCreation
-------------------------------------------------------------------------------
Tests run: 19, Failures: 0, Errors: 7, Skipped: 0, Time elapsed: 0.95
sec <<< FAILURE!
testCanopyGenManhattanMR(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.245 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testCanopyGenManhattanMR(TestCanopyCreation.java:475)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 31 more

testCanopyGenEuclideanMR(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.084 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testCanopyGenEuclideanMR(TestCanopyCreation.java:511)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 31 more

testClusteringManhattanMR(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.08 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.CanopyClusteringJob.runJob(CanopyClusteringJob.java:50)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testClusteringManhattanMR(TestCanopyCreation.java:681)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 32 more

testClusteringEuclideanMR(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.076 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.CanopyClusteringJob.runJob(CanopyClusteringJob.java:50)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testClusteringEuclideanMR(TestCanopyCreation.java:709)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 32 more

testClusteringManhattanMRWithPayload(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.109 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.CanopyClusteringJob.runJob(CanopyClusteringJob.java:50)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testClusteringManhattanMRWithPayload(TestCanopyCreation.java:739)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 32 more

testClusteringEuclideanMRWithPayload(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.085 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.CanopyClusteringJob.runJob(CanopyClusteringJob.java:50)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testClusteringEuclideanMRWithPayload(TestCanopyCreation.java:771)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 32 more

testUserDefinedDistanceMeasure(org.apache.mahout.clustering.canopy.TestCanopyCreation)
 Time elapsed: 0.075 sec  <<< ERROR!
java.io.IOException: Failed to get the current user's information.
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:717)
        at org.apache.hadoop.mapred.JobClient.configureCommandLineOptions(JobClient.java:592)
        at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:788)
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1142)
        at org.apache.mahout.clustering.canopy.CanopyDriver.runJob(CanopyDriver.java:80)
        at org.apache.mahout.clustering.canopy.TestCanopyCreation.testUserDefinedDistanceMeasure(TestCanopyCreation.java:802)
Caused by: javax.security.auth.login.LoginException: Login failed:
Expect one token as the result of whoami: oiva\jukka zitting
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:250)
        at org.apache.hadoop.security.UnixUserGroupInformation.login(UnixUserGroupInformation.java:275)
        at org.apache.hadoop.mapred.JobClient.getUGI(JobClient.java:715)
        ... 31 more

-------------------------------------------------------------------------------
Test set: org.apache.mahout.clustering.meanshift.TestMeanShift
-------------------------------------------------------------------------------
Tests run: 4, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 0.661
sec <<< FAILURE!
testCanopyEuclideanMRJob(org.apache.mahout.clustering.meanshift.TestMeanShift)
 Time elapsed: 0.309 sec  <<< ERROR!
java.io.FileNotFoundException: File output/canopies-0/part-00000 does not exist.
        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:420)
        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:244)
        at org.apache.hadoop.fs.FileSystem.getLength(FileSystem.java:679)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1417)
        at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1412)
        at org.apache.mahout.clustering.meanshift.MeanShiftCanopyJob.isConverged(MeanShiftCanopyJob.java:103)
        at org.apache.mahout.clustering.meanshift.MeanShiftCanopyJob.runJob(MeanShiftCanopyJob.java:82)
        at org.apache.mahout.clustering.meanshift.TestMeanShift.testCanopyEuclideanMRJob(TestMeanShift.java:336)

-------------------------------------------------------------------------------
Test set: org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering
-------------------------------------------------------------------------------
Tests run: 6, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 0.747
sec <<< FAILURE!
testFuzzyKMeansMRJob(org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering)
 Time elapsed: 0.444 sec  <<< FAILURE!
junit.framework.AssertionFailedError: output dir exists?
        at junit.framework.Assert.fail(Assert.java:47)
        at junit.framework.Assert.assertTrue(Assert.java:20)
        at org.apache.mahout.clustering.fuzzykmeans.TestFuzzyKmeansClustering.testFuzzyKMeansMRJob(TestFuzzyKmeansClustering.java:254)
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Mahout 0.1

Grant Ingersoll-2

On Mar 22, 2009, at 10:31 PM, Jukka Zitting wrote:

> Hi,
> Based on a quick look all the licensing stuff seems to be pretty much
> in order. However, note the following:
>
> * The copyright year in NOTICE.txt should be 2009, not 2007
>
> * The main purpose of the NOTICE.txt file is to list the copyright
> attributions other notes required by the licenses of external
> software. For example, as required by the respective license, the
> entry for the xpp3 package should include the copyright notice
> "Copyright (c) 2002 Extreme! Lab, Indiana University" in addition to
> the already existing part ("This product includes..."). Something like
> the following should be fine:
>
>    This product includes software developed by the Indiana University
>    Extreme! Lab (http://www.extreme.indiana.edu/).
>    Copyright (c) 2002 Extreme! Lab, Indiana University.

I will fix these and generate a new candidate

>
>
> Also, some notes about the release structure:
>
> * The mahout-0.1-project packages contain the following entries not
> included in svn:
>
>    Only in mahout-0.1/core: build
>    Only in mahout-0.1/core: input
>    Only in mahout-0.1/examples: input
>    Only in mahout-0.1/examples: output
>

Hmm, that's odd.  I'll look into it.  They can safely be ignored.

> * Related to the above, see my comments on general@ and on nutch-dev@
> about the nature of source releases. For me the source release is not
> one release artefact among many, but the *source* of any other release
> artefacts.

You mean about the size?  I figured this is a 0.1 release, so we're  
not going to put too much into packaging just yet.  I picked the  
easiest mvn target that produced something reasonable for people to use.


Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Mahout 0.1

Jukka Zitting
Hi,

On Mon, Mar 23, 2009 at 8:51 AM, Grant Ingersoll <[hidden email]> wrote:
> I will fix these and generate a new candidate

OK, good. Let me know if you need a hand with those, I have a lot (too
much :-) of experience digging through license files.

> You mean about the size?  I figured this is a 0.1 release, so we're not
> going to put too much into packaging just yet.  I picked the easiest mvn
> target that produced something reasonable for people to use.

No, more about how the source release is produced and used. From where
I come the source release is the exact set of bits that the release
manager is using to build any additional release artifacts. So by
definition it can't be a build target among others. Having a build
target produce the source package can easily miss something or
introduce extra stuff like the build and test directories that Maven
picked up for this release candidate.

But as mentioned in the Nutch release vote, that's just a preference,
not something over which I'd -1 a release.

BR,

Jukka Zitting
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Mahout 0.1

hossman

: No, more about how the source release is produced and used. From where
: I come the source release is the exact set of bits that the release
: manager is using to build any additional release artifacts. So by
: definition it can't be a build target among others. Having a build
: target produce the source package can easily miss something or

+1 ... I remember suggesting something along these lines for lucene-java a
while back: create the source tgz, then unpack it and use it to create the
binary tgz.  (IIRC, we had a problem once where the one of hte build files
was being excluded from the release, so you couldn't build the demo)


-Hoss