All nutch jobs Failing | Nutch 2.3.1 + MongoDB

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

All nutch jobs Failing | Nutch 2.3.1 + MongoDB

shubham.gupta
Hey

While I am running the whole process flow of Nutch i.e.
Inject,Generate,Fetch,Parse,Update.

The following errors are being logged:

*Generator Job*

java.lang.Exception: java.lang.ClassCastException:
org.bson.types.ObjectId cannot be cast to java.lang.String
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot
be cast to java.lang.String
         at
org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)
2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob:
java.lang.RuntimeException: job failed: name=[rss_new]generate:
1488880683-1996901673, jobid=job_local78754654_0001
         at
org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
         at
org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
         at org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)

*Fetcher Job:*

java.lang.Exception: java.lang.ClassCastException:
org.bson.types.ObjectId cannot be cast to java.lang.String
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot
be cast to java.lang.String
         at
org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)

*Parser Job:*

java.lang.Exception: java.lang.ClassCastException:
org.bson.types.ObjectId cannot be cast to java.lang.String
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.ClassCastException: org.bson.types.ObjectId cannot
be cast to java.lang.String
         at
org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
         at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
         at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
         at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
         at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
         at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
         at java.lang.Thread.run(Thread.java:745)

The plugin.folder directory specified in conf/nutch-site.xml is correct.
And, when checked in code it point towards the line where the class is
specified.

Like public class GeneratorMapper(). What changes need to be made in the
configuration files.

--
Thanks and Regards,
Shubham Gupta

Reply | Threaded
Open this post in threaded view
|

Re: All nutch jobs Failing | Nutch 2.3.1 + MongoDB

shubham.gupta
Hey

I was inserting the data in a table rss_webpage (webpage appended
automatically by nutch), but when i changed the table to rss_one_webpage
the error disappeared. Is this the reason behind Nutch or MongoDB.

Thanks and Regards,
Shubham Gupta

On Wednesday 08 March 2017 12:44 PM, shubham.gupta wrote:

> Hey
>
> While I am running the whole process flow of Nutch i.e.
> Inject,Generate,Fetch,Parse,Update.
>
> The following errors are being logged:
>
> *Generator Job*
>
> java.lang.Exception: java.lang.ClassCastException:
> org.bson.types.ObjectId cannot be cast to java.lang.String
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ClassCastException: org.bson.types.ObjectId
> cannot be cast to java.lang.String
>         at
> org.apache.nutch.crawl.GeneratorMapper.map(GeneratorMapper.java:34)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
> 2017-03-07 15:28:07,696 ERROR crawl.GeneratorJob - GeneratorJob:
> java.lang.RuntimeException: job failed: name=[rss_new]generate:
> 1488880683-1996901673, jobid=job_local78754654_0001
>         at
> org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:121)
>         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:233)
>         at
> org.apache.nutch.crawl.GeneratorJob.generate(GeneratorJob.java:262)
>         at org.apache.nutch.crawl.GeneratorJob.run(GeneratorJob.java:328)
>         at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>         at
> org.apache.nutch.crawl.GeneratorJob.main(GeneratorJob.java:336)
>
> *Fetcher Job:*
>
> java.lang.Exception: java.lang.ClassCastException:
> org.bson.types.ObjectId cannot be cast to java.lang.String
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ClassCastException: org.bson.types.ObjectId
> cannot be cast to java.lang.String
>         at
> org.apache.nutch.fetcher.FetcherJob$FetcherMapper.map(FetcherJob.java:96)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> *Parser Job:*
>
> java.lang.Exception: java.lang.ClassCastException:
> org.bson.types.ObjectId cannot be cast to java.lang.String
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
> Caused by: java.lang.ClassCastException: org.bson.types.ObjectId
> cannot be cast to java.lang.String
>         at
> org.apache.nutch.parse.ParserJob$ParserMapper.map(ParserJob.java:80)
>         at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
>         at
> org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
>         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:340)
>         at
> org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
>         at java.lang.Thread.run(Thread.java:745)
>
> The plugin.folder directory specified in conf/nutch-site.xml is
> correct. And, when checked in code it point towards the line where the
> class is specified.
>
> Like public class GeneratorMapper(). What changes need to be made in
> the configuration files.
>