generator fail

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

generator fail

Ankit Goel
Hi,
I am using Nutch 1.13 with Solr 5.5.0. I have not started hadoop on my system, and i’m trying to run this as a single nose.When I run the nutch script I get the following error,

$ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch ./urls/ TestCrawl2/  2
Injecting seed URLs
${NUTCH_RUNTIME_HOME}/bin/nutch inject TestCrawl2//crawldb ./urls/
Injector: starting at 2017-10-25 19:52:11
Injector: crawlDb: TestCrawl2/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.
Injector: Total urls rejected by filters: 0
Injector: Total urls injected after normalization and filtering: 0
Injector: Total urls injected but already in CrawlDb: 0
Injector: Total new urls injected: 0
Injector: finished at 2017-10-25 19:52:14, elapsed: 00:00:03
Wed Oct 25 19:52:14 IST 2017 : Iteration 1 of 2
Generating a new segment
${NUTCH_RUNTIME_HOME}/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true TestCrawl2//crawldb TestCrawl2//segments -topN 50000 -numFetchers 1 -noFilter
Generator: starting at 2017-10-25 19:52:16
Generator: Selecting best-scoring urls due for fetch.
Generator: filtering: false
Generator: normalizing: true
Generator: topN: 50000
Generator: java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
        at org.apache.nutch.crawl.Generator.generate(Generator.java:591)
        at org.apache.nutch.crawl.Generator.run(Generator.java:766)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.nutch.crawl.Generator.main(Generator.java:719)

Error running:
 ${NUTCH_RUNTIME_HOME}/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true TestCrawl2//crawldb TestCrawl2//segments -topN 50000 -numFetchers 1 -noFilter
Failed with exit value 255.

Unsure as to why I am getting this error for crawl generator. I followed the instruction on the nutch tutorial page. Never got this perviously with 1.9 or 1.10

Thanks,
Ankit Goel

Reply | Threaded
Open this post in threaded view
|

Re: generator fail

Sebastian Nagel
Hi,

the file hadoop.log should contain more information about the error.
It's located in ${NUTCH_RUNTIME_HOME}/logs/ or where $NUTCH_LOG_DIR points to.

Could you have a look at the hadoop.log and eventually send the snippet
where the error is logged.

Thanks,
Sebastian

On 10/25/2017 05:35 PM, Ankit Goel wrote:

> Hi,
> I am using Nutch 1.13 with Solr 5.5.0. I have not started hadoop on my system, and i’m trying to run this as a single nose.When I run the nutch script I get the following error,
>
> $ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch ./urls/ TestCrawl2/  2
> Injecting seed URLs
> ${NUTCH_RUNTIME_HOME}/bin/nutch inject TestCrawl2//crawldb ./urls/
> Injector: starting at 2017-10-25 19:52:11
> Injector: crawlDb: TestCrawl2/crawldb
> Injector: urlDir: urls
> Injector: Converting injected urls to crawl db entries.
> Injector: Total urls rejected by filters: 0
> Injector: Total urls injected after normalization and filtering: 0
> Injector: Total urls injected but already in CrawlDb: 0
> Injector: Total new urls injected: 0
> Injector: finished at 2017-10-25 19:52:14, elapsed: 00:00:03
> Wed Oct 25 19:52:14 IST 2017 : Iteration 1 of 2
> Generating a new segment
> ${NUTCH_RUNTIME_HOME}/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true TestCrawl2//crawldb TestCrawl2//segments -topN 50000 -numFetchers 1 -noFilter
> Generator: starting at 2017-10-25 19:52:16
> Generator: Selecting best-scoring urls due for fetch.
> Generator: filtering: false
> Generator: normalizing: true
> Generator: topN: 50000
> Generator: java.io.IOException: Job failed!
> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
> at org.apache.nutch.crawl.Generator.generate(Generator.java:591)
> at org.apache.nutch.crawl.Generator.run(Generator.java:766)
> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
> at org.apache.nutch.crawl.Generator.main(Generator.java:719)
>
> Error running:
>  ${NUTCH_RUNTIME_HOME}/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true TestCrawl2//crawldb TestCrawl2//segments -topN 50000 -numFetchers 1 -noFilter
> Failed with exit value 255.
>
> Unsure as to why I am getting this error for crawl generator. I followed the instruction on the nutch tutorial page. Never got this perviously with 1.9 or 1.10
>
> Thanks,
> Ankit Goel
>

Reply | Threaded
Open this post in threaded view
|

Re: generator fail

Ankit Goel
Hi Sebastian,
The error logs were catching a wrong “\” in the seed.txt. Thanks for pointing me there.
Regards,
Ankit Goel

> On 25-Oct-2017, at 9:38 PM, Sebastian Nagel <[hidden email]> wrote:
>
> Hi,
>
> the file hadoop.log should contain more information about the error.
> It's located in ${NUTCH_RUNTIME_HOME}/logs/ or where $NUTCH_LOG_DIR points to.
>
> Could you have a look at the hadoop.log and eventually send the snippet
> where the error is logged.
>
> Thanks,
> Sebastian
>
> On 10/25/2017 05:35 PM, Ankit Goel wrote:
>> Hi,
>> I am using Nutch 1.13 with Solr 5.5.0. I have not started hadoop on my system, and i’m trying to run this as a single nose.When I run the nutch script I get the following error,
>>
>> $ bin/crawl -i -D solr.server.url=http://localhost:8983/solr/nutch ./urls/ TestCrawl2/  2
>> Injecting seed URLs
>> ${NUTCH_RUNTIME_HOME}/bin/nutch inject TestCrawl2//crawldb ./urls/
>> Injector: starting at 2017-10-25 19:52:11
>> Injector: crawlDb: TestCrawl2/crawldb
>> Injector: urlDir: urls
>> Injector: Converting injected urls to crawl db entries.
>> Injector: Total urls rejected by filters: 0
>> Injector: Total urls injected after normalization and filtering: 0
>> Injector: Total urls injected but already in CrawlDb: 0
>> Injector: Total new urls injected: 0
>> Injector: finished at 2017-10-25 19:52:14, elapsed: 00:00:03
>> Wed Oct 25 19:52:14 IST 2017 : Iteration 1 of 2
>> Generating a new segment
>> ${NUTCH_RUNTIME_HOME}/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true TestCrawl2//crawldb TestCrawl2//segments -topN 50000 -numFetchers 1 -noFilter
>> Generator: starting at 2017-10-25 19:52:16
>> Generator: Selecting best-scoring urls due for fetch.
>> Generator: filtering: false
>> Generator: normalizing: true
>> Generator: topN: 50000
>> Generator: java.io.IOException: Job failed!
>> at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
>> at org.apache.nutch.crawl.Generator.generate(Generator.java:591)
>> at org.apache.nutch.crawl.Generator.run(Generator.java:766)
>> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
>> at org.apache.nutch.crawl.Generator.main(Generator.java:719)
>>
>> Error running:
>> ${NUTCH_RUNTIME_HOME}/bin/nutch generate -D mapreduce.job.reduces=2 -D mapred.child.java.opts=-Xmx1000m -D mapreduce.reduce.speculative=false -D mapreduce.map.speculative=false -D mapreduce.map.output.compress=true TestCrawl2//crawldb TestCrawl2//segments -topN 50000 -numFetchers 1 -noFilter
>> Failed with exit value 255.
>>
>> Unsure as to why I am getting this error for crawl generator. I followed the instruction on the nutch tutorial page. Never got this perviously with 1.9 or 1.10
>>
>> Thanks,
>> Ankit Goel
>>
>