Quantcast

Trigger job from Java application causes ClassNotFound

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Trigger job from Java application causes ClassNotFound

Steve Armstrong
Hello,

I'm trying to trigger a Mahout job from inside my Java application
(running in Eclipse), and get it running on my cluster. I have a main
class that simply contains:

String[] args = new String[] { "--input", "/input/triples.csv",
"--output", "/output/vectors.txt", "--similarityClassname",
VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(),
"--numRecommendations", "10000", "--tempDir", "temp/" +
System.currentTimeMillis() };
Configuration conf = new Configuration();
ToolRunner.run(conf, new RecommenderJob(), args);

If I package the whole project up in a single jar (using Maven), copy
it to the namenode, and run it with "hadoop jar project.jar" it works
fine. But if I try and run it from my dev pc in Eclipse (where all the
same dependencies are still in the classpath), and add the 3 hadoop
xml files to the classpath, it triggers hadoop jobs, but they fail
with errors like:

12/07/26 14:42:09 INFO mapred.JobClient: Task Id :
attempt_201206261211_0173_m_000001_0, Status : FAILED
Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs
        at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
...

What I'm trying to create is a self-contained JAR that can be run from
the command-line and launch the mahout job on the cluster. I've got
this all working with embedded pig scripts, but I can't get it working
here.

Any help is appreciated, or advice on better ways to trigger the jobs from code.

Thanks
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trigger job from Java application causes ClassNotFound

in.abdul
Hi Steve ,
I hope you had missed that Sep ific jar to copy into your Hadoop lib
directories.  Have a look on ur lib .
On Jul 27, 2012 4:49 AM, "Steve Armstrong" <[hidden email]> wrote:

> Hello,
>
> I'm trying to trigger a Mahout job from inside my Java application
> (running in Eclipse), and get it running on my cluster. I have a main
> class that simply contains:
>
> String[] args = new String[] { "--input", "/input/triples.csv",
> "--output", "/output/vectors.txt", "--similarityClassname",
> VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(),
> "--numRecommendations", "10000", "--tempDir", "temp/" +
> System.currentTimeMillis() };
> Configuration conf = new Configuration();
> ToolRunner.run(conf, new RecommenderJob(), args);
>
> If I package the whole project up in a single jar (using Maven), copy
> it to the namenode, and run it with "hadoop jar project.jar" it works
> fine. But if I try and run it from my dev pc in Eclipse (where all the
> same dependencies are still in the classpath), and add the 3 hadoop
> xml files to the classpath, it triggers hadoop jobs, but they fail
> with errors like:
>
> 12/07/26 14:42:09 INFO mapred.JobClient: Task Id :
> attempt_201206261211_0173_m_000001_0, Status : FAILED
> Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
> ...
>
> What I'm trying to create is a self-contained JAR that can be run from
> the command-line and launch the mahout job on the cluster. I've got
> this all working with embedded pig scripts, but I can't get it working
> here.
>
> Any help is appreciated, or advice on better ways to trigger the jobs from
> code.
>
> Thanks
>
THANKS AND REGARDS, SYED ABDUL KATHER
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trigger job from Java application causes ClassNotFound

Steve Armstrong
Hi Syed,

Do you mean I need to deploy the mahout jars to the lib directory of
the master node? Or all the data nodes? Or is there a way to simply
tell the hadoop job launcher to upload the jars itself?

Steve

On Thu, Jul 26, 2012 at 6:10 PM, syed kather <[hidden email]> wrote:

> Hi Steve ,
> I hope you had missed that Sep ific jar to copy into your Hadoop lib
> directories.  Have a look on ur lib .
> On Jul 27, 2012 4:49 AM, "Steve Armstrong" <[hidden email]> wrote:
>
>> Hello,
>>
>> I'm trying to trigger a Mahout job from inside my Java application
>> (running in Eclipse), and get it running on my cluster. I have a main
>> class that simply contains:
>>
>> String[] args = new String[] { "--input", "/input/triples.csv",
>> "--output", "/output/vectors.txt", "--similarityClassname",
>> VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(),
>> "--numRecommendations", "10000", "--tempDir", "temp/" +
>> System.currentTimeMillis() };
>> Configuration conf = new Configuration();
>> ToolRunner.run(conf, new RecommenderJob(), args);
>>
>> If I package the whole project up in a single jar (using Maven), copy
>> it to the namenode, and run it with "hadoop jar project.jar" it works
>> fine. But if I try and run it from my dev pc in Eclipse (where all the
>> same dependencies are still in the classpath), and add the 3 hadoop
>> xml files to the classpath, it triggers hadoop jobs, but they fail
>> with errors like:
>>
>> 12/07/26 14:42:09 INFO mapred.JobClient: Task Id :
>> attempt_201206261211_0173_m_000001_0, Status : FAILED
>> Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>> ...
>>
>> What I'm trying to create is a self-contained JAR that can be run from
>> the command-line and launch the mahout job on the cluster. I've got
>> this all working with embedded pig scripts, but I can't get it working
>> here.
>>
>> Any help is appreciated, or advice on better ways to trigger the jobs from
>> code.
>>
>> Thanks
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trigger job from Java application causes ClassNotFound

in.abdul
Hi Steve

But if I try and run it from my dev pc in Eclipse (where all the
same dependencies are still in the classpath), and add the 3 hadoop
xml files to the classpath, it triggers hadoop jobs, but they fail
with error

There is problem in eclipse build path . I had faced same problem when i am trying to do clustering . Have look on build path.


In mvn case it will download all the dependency jar from repo . if you want to excete that in eclipse then you have to configure build path

I can suggest you to look at
http://shuyo.wordpress.com/2011/02/01/mahout-development-environment-with-maven-and-eclipse-1/

which can help you ..






  
            Thanks and Regards,
        S SYED ABDUL KATHER 
               


On Fri, Jul 27, 2012 at 6:51 AM, Steve Armstrong [via Lucene] <[hidden email]> wrote:
Hi Syed,

Do you mean I need to deploy the mahout jars to the lib directory of
the master node? Or all the data nodes? Or is there a way to simply
tell the hadoop job launcher to upload the jars itself?

Steve

On Thu, Jul 26, 2012 at 6:10 PM, syed kather <[hidden email]> wrote:

> Hi Steve ,
> I hope you had missed that Sep ific jar to copy into your Hadoop lib
> directories.  Have a look on ur lib .
> On Jul 27, 2012 4:49 AM, "Steve Armstrong" <[hidden email]> wrote:

>
>> Hello,
>>
>> I'm trying to trigger a Mahout job from inside my Java application
>> (running in Eclipse), and get it running on my cluster. I have a main
>> class that simply contains:
>>
>> String[] args = new String[] { "--input", "/input/triples.csv",
>> "--output", "/output/vectors.txt", "--similarityClassname",
>> VectorSimilarityMeasures.SIMILARITY_COOCCURRENCE.toString(),
>> "--numRecommendations", "10000", "--tempDir", "temp/" +
>> System.currentTimeMillis() };
>> Configuration conf = new Configuration();
>> ToolRunner.run(conf, new RecommenderJob(), args);
>>
>> If I package the whole project up in a single jar (using Maven), copy
>> it to the namenode, and run it with "hadoop jar project.jar" it works
>> fine. But if I try and run it from my dev pc in Eclipse (where all the
>> same dependencies are still in the classpath), and add the 3 hadoop
>> xml files to the classpath, it triggers hadoop jobs, but they fail
>> with errors like:
>>
>> 12/07/26 14:42:09 INFO mapred.JobClient: Task Id :
>> attempt_201206261211_0173_m_000001_0, Status : FAILED
>> Error: java.lang.ClassNotFoundException: com.google.common.primitives.Longs
>>         at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
>>         at java.security.AccessController.doPrivileged(Native Method)
>>         at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
>>         at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
>> ...
>>
>> What I'm trying to create is a self-contained JAR that can be run from
>> the command-line and launch the mahout job on the cluster. I've got
>> this all working with embedded pig scripts, but I can't get it working
>> here.
>>
>> Any help is appreciated, or advice on better ways to trigger the jobs from
>> code.
>>
>> Thanks
>>



If you reply to this email, your message will be added to the discussion below:
http://lucene.472066.n3.nabble.com/Trigger-job-from-Java-application-causes-ClassNotFound-tp3997583p3997615.html
To unsubscribe from Lucene, click here.
NAML

THANKS AND REGARDS, SYED ABDUL KATHER
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Trigger job from Java application causes ClassNotFound

John Armstrong-2
In reply to this post by Steve Armstrong
On 07/26/2012 09:20 PM, Steve Armstrong wrote:
> Do you mean I need to deploy the mahout jars to the lib directory of
> the master node? Or all the data nodes? Or is there a way to simply
> tell the hadoop job launcher to upload the jars itself?

Every node that runs a Task (mapper or reducer) needs access to your
libraries.

There are ways to tell Hadoop to use JARs on the "distributed classpath"
in HDFS, yes.  But I think most people find it simplest to create a "fat
JAR" with something like Maven's shade plugin that contains everything
the mappers and reducers need.  Then you just need to hand that one
library around.  I'd suggest starting with this simple approach just to
get your stuff working, and then go back to investigate things like the
distributed classpath.

hth
Loading...