nutch 2.2.1, job failed: name=generate: null

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

nutch 2.2.1, job failed: name=generate: null

javozzo
This post has NOT been accepted by the mailing list yet.
This post was updated on .
Hi,
i'm new in nutch. i configured nutch 1.7 with Solr and now I try to use nutch 2.2.1 with Solr and Cassandra in a java Project.
When i launch this project
(this is my code)

package web;

import java.util.StringTokenizer;

import org.apache.hadoop.util.ToolRunner;
import org.apache.nutch.crawl.Crawler;
import org.apache.nutch.util.NutchConfiguration;



public class MyCrawler {
        /**
         * @param args
         * @throws Exception
         */
        public static void main(String[] args) throws Exception {
                String crawlArg = "urls -dir crawl -depth 3 -topN 5";
                // Run Crawl tool
                Crawler crawl = new Crawler();
                crawl.setConf(NutchConfiguration.create());
                try {
                        ToolRunner.run(crawl, tokenize(crawlArg));
                } catch (Exception e) {
                        e.printStackTrace();
                        return;
                }
        }
               
                 
        /**
         * Helper function to convert a string into an array of strings by
         * separating them using whitespace.
         *
         * @param str
         *            string to be tokenized
         * @return an array of strings that contain a each word each
         */
        public static String[] tokenize(String str) {
                StringTokenizer tok = new StringTokenizer(str);
                String tokens[] = new String[tok.countTokens()];
                int i = 0;
                while (tok.hasMoreTokens()) {
                        tokens[i] = tok.nextToken();
                        i++;
                }
                return tokens;
        }
}


i have this error

2013-11-14 10:31:13,449 WARN  mapred.LocalJobRunner (LocalJobRunner.java:run(435)) - job_local1945753776_0002
java.lang.NullPointerException
        at org.apache.avro.util.Utf8.<init>(Utf8.java:37)
        at org.apache.nutch.crawl.GeneratorReducer.setup(GeneratorReducer.java:100)
        at org.apache.hadoop.mapreduce.Reducer.run(Reducer.java:174)
        at org.apache.hadoop.mapred.ReduceTask.runNewReducer(ReduceTask.java:649)
        at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:418)
        at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:398)
2013-11-14 10:31:13,907 INFO  mapred.JobClient (JobClient.java:monitorAndPrintJob(1393)) -  map 100% reduce 0%
2013-11-14 10:31:13,907 INFO  mapred.JobClient (JobClient.java:monitorAndPrintJob(1448)) - Job complete: job_local1945753776_0002
2013-11-14 10:31:13,909 INFO  mapred.JobClient (Counters.java:log(585)) - Counters: 19
2013-11-14 10:31:13,909 INFO  mapred.JobClient (Counters.java:log(587)) -   File Input Format Counters
2013-11-14 10:31:13,910 INFO  mapred.JobClient (Counters.java:log(589)) -     Bytes Read=0
2013-11-14 10:31:13,910 INFO  mapred.JobClient (Counters.java:log(587)) -   FileSystemCounters
2013-11-14 10:31:13,910 INFO  mapred.JobClient (Counters.java:log(589)) -     FILE_BYTES_READ=885
2013-11-14 10:31:13,911 INFO  mapred.JobClient (Counters.java:log(589)) -     FILE_BYTES_WRITTEN=160588
2013-11-14 10:31:13,911 INFO  mapred.JobClient (Counters.java:log(587)) -   Map-Reduce Framework
2013-11-14 10:31:13,911 INFO  mapred.JobClient (Counters.java:log(589)) -     Reduce input groups=0
2013-11-14 10:31:13,911 INFO  mapred.JobClient (Counters.java:log(589)) -     Map output materialized bytes=256
2013-11-14 10:31:13,912 INFO  mapred.JobClient (Counters.java:log(589)) -     Combine output records=0
2013-11-14 10:31:13,912 INFO  mapred.JobClient (Counters.java:log(589)) -     Map input records=5
2013-11-14 10:31:13,912 INFO  mapred.JobClient (Counters.java:log(589)) -     Reduce shuffle bytes=0
2013-11-14 10:31:13,912 INFO  mapred.JobClient (Counters.java:log(589)) -     Physical memory (bytes) snapshot=0
2013-11-14 10:31:13,913 INFO  mapred.JobClient (Counters.java:log(589)) -     Reduce output records=0
2013-11-14 10:31:13,913 INFO  mapred.JobClient (Counters.java:log(589)) -     Spilled Records=2
2013-11-14 10:31:13,913 INFO  mapred.JobClient (Counters.java:log(589)) -     Map output bytes=245
2013-11-14 10:31:13,913 INFO  mapred.JobClient (Counters.java:log(589)) -     Total committed heap usage (bytes)=217055232
2013-11-14 10:31:13,914 INFO  mapred.JobClient (Counters.java:log(589)) -     CPU time spent (ms)=0
2013-11-14 10:31:13,914 INFO  mapred.JobClient (Counters.java:log(589)) -     Virtual memory (bytes) snapshot=0
2013-11-14 10:31:13,914 INFO  mapred.JobClient (Counters.java:log(589)) -     SPLIT_RAW_BYTES=815
2013-11-14 10:31:13,915 INFO  mapred.JobClient (Counters.java:log(589)) -     Map output records=2
2013-11-14 10:31:13,915 INFO  mapred.JobClient (Counters.java:log(589)) -     Combine input records=0
2013-11-14 10:31:13,916 INFO  mapred.JobClient (Counters.java:log(589)) -     Reduce input records=0
java.lang.RuntimeException: job failed: name=generate: null, jobid=job_local1945753776_0002
        at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
        at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:199)
        at org.apache.nutch.crawl.Crawler.runTool(Crawler.java:68)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:152)
        at org.apache.nutch.crawl.Crawler.run(Crawler.java:250)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
        at web.MyCrawler.main(MyCrawler.java:22)

any ideas?
Thanks
Danilo
Reply | Threaded
Open this post in threaded view
|

Re: nutch 2.2.1, job failed: name=generate: null

akash2489
Hi,
Are you able to resolve that issue? I am getting the same exception.
Please reply.

Thanks,
Akash Makkar