Apache Nutch CleaningJob failed

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Apache Nutch CleaningJob failed

Anna Ente
Hi,


I am trying to install and use Apache Nutch for web crwaling.

I'm using Nutch 1.13 and Solr 5.5.5.


I'm following the steps on https://wiki.apache.org/nutch/NutchTutorial.

Everything seems to work properly until I get to the step  "Cleaning Solr". I use the command


bin/nutch clean crawl/crawldb/ http://localhost:8983/solr


 I get an exception:


SolrIndexer: deleting 1/1 documents
SolrIndexer: deleting 1/1 documents
ERROR CleaningJob: java.io.IOException: Job failed!
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:865)
    at org.apache.nutch.indexer.CleaningJob.delete(CleaningJob.java:174)
    at org.apache.nutch.indexer.CleaningJob.run(CleaningJob.java:197)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at org.apache.nutch.indexer.CleaningJob.main(CleaningJob.java:208)



I was googling for a solution for this error but I couldn't find anything helpful.


What might be the problem?


Thank you very much for your help.

Anna

NutchTutorial - Nutch Wiki<https://wiki.apache.org/nutch/NutchTutorial>
wiki.apache.org
Introduction. Nutch is a well matured, production ready Web crawler. Nutch 1.x enables fine grained configuration, relying on Apache Hadoop data structures ...


S L
Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch CleaningJob failed

S L
Anna,

I've run into it too. This is a known problem to be fixed in Nutch 1.14.
See here:

https://www.mail-archive.com/user@.../msg15573.html

Sol
Reply | Threaded
Open this post in threaded view
|

AW: Apache Nutch CleaningJob failed

Anna Ente
Hi Sol,


thank you very much for your answer. If I get it right, the problem should be fixed in Nutch 1.14 and 1.14 should have been released in October. But as I see, it has not been released yet. So, when is it supposed to be released? And meanwhile, is there a working version of Nutch?


Thanks a lot,

Anna

________________________________
Von: Sol Lederman <[hidden email]>
Gesendet: Freitag, 8. Dezember 2017 18:14:55
An: [hidden email]
Betreff: Re: Apache Nutch CleaningJob failed

Anna,

I've run into it too. This is a known problem to be fixed in Nutch 1.14.
See here:

https://www.mail-archive.com/user@.../msg15573.html

Sol
S L
Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch CleaningJob failed

S L
Hi Anna,

Sebastian knows the schedule. I don't. I'll let Sebastian respond. In the
bug report I linked Sebastian noted that there was a 1.14 snapshot release
that you could build and run. I googled a bit for it and couldn't find it.

Sol
Reply | Threaded
Open this post in threaded view
|

Re: Apache Nutch CleaningJob failed

Sebastian Nagel
Hi Anna, hi Sol,

well, the schedule depends more on the community. Yes, usually we release
after 6 month. The last release (1.13) was April this year. I've started a discussion
about releasing 1.14 on the @dev mailing list.

It's easy to build a snapshot release:
 ant package-bin
or just
 ant runtime
and set NUTCH_HOME to
 $PWD/runtime/local
resp.
 $PWD/runtime/deploy

Best,
Sebastian

On 12/08/2017 06:58 PM, Sol Lederman wrote:
> Hi Anna,
>
> Sebastian knows the schedule. I don't. I'll let Sebastian respond. In the
> bug report I linked Sebastian noted that there was a 1.14 snapshot release
> that you could build and run. I googled a bit for it and couldn't find it.
>
> Sol
>