Reduce tasks doesn't start

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Reduce tasks doesn't start

Evgeny Zhulenev
Hi.

I'm trying to start Nutch with Hadoop using this tutorial:
http://wiki.apache.org/nutch/NutchHadoopTutorial. Everyting is ok till I
start nutch for crawling, then I get such message

nutch@linux:/nutch/search> bin/nutch crawl urls -dir crawled -depth 3
crawl started in: crawled
rootUrlDir = urls
threads = 10
depth = 3
Injector: starting
Injector: crawlDb: crawled/crawldb
Injector: urlDir: urls
Injector: Converting injected urls to crawl db entries.


and nothing more.

Hadoop JobTracker (http://localhost:50030/jobtracker.jsp) shows that job is
working, 3 map tasks completed, but reduce tasks are pending. Screenshots
could be found here:
http://www.picoodle.com/view.php?img=/4/4/2/f_Screenshotm_7850321.png&srv=img27and
here:
http://www.picoodle.com/view.php?img=/4/4/2/f_Screenshot1m_baf6951.png&srv=img36
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Evgeny Zhulenev
Hi again.

In hadoop.log i found this message:

2008-04-02 21:59:59,479 WARN  mapred.TaskTracker - Error running child
java.lang.NullPointerException
    at java.util.Hashtable.get(Hashtable.java:334)
    at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1020)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:259)
    at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
2008-04-02 22:00:05,094 WARN  mapred.TaskTracker - Error running child
java.lang.NullPointerException
    at java.util.Hashtable.get(Hashtable.java:334)
    at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1020)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:259)
    at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)

Why it could happen?
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Liu Yan
In reply to this post by Evgeny Zhulenev
1) Are you using nutch-0.9? It is based on hadoop-0.12.x and I haven't
successfully got it running in distributed deployment. I hit file
system corruption error soon when I started run nutch crawl. However,
when I build nutch from current trunk, whose hadoop version is 0.16.x,
the run succeeded in distributed mode. I was using 5 boxes.

2) Can your machines resolve the names of others? For example, I have
5 machines with names lab1, lab2, lab3, lab4, lab5, I need to put all
those entries in /etc/hosts file on every box so their names can be
properly resolved by others. DNS server probably can save some
configuration work but I don't want to be blamed by my MIS :-)

Check the logs/hadoop.log and other relevant log files, there should
be more hints.

HTH,
Yan

2008/4/3, Evgeny Zhulenev <[hidden email]>:

> Hi.
>
>  I'm trying to start Nutch with Hadoop using this tutorial:
>  http://wiki.apache.org/nutch/NutchHadoopTutorial. Everyting is ok till I
>  start nutch for crawling, then I get such message
>
>  nutch@linux:/nutch/search> bin/nutch crawl urls -dir crawled -depth 3
>  crawl started in: crawled
>  rootUrlDir = urls
>  threads = 10
>  depth = 3
>  Injector: starting
>  Injector: crawlDb: crawled/crawldb
>  Injector: urlDir: urls
>  Injector: Converting injected urls to crawl db entries.
>
>
>  and nothing more.
>
>  Hadoop JobTracker (http://localhost:50030/jobtracker.jsp) shows that job is
>  working, 3 map tasks completed, but reduce tasks are pending. Screenshots
>  could be found here:
>  http://www.picoodle.com/view.php?img=/4/4/2/f_Screenshotm_7850321.png&srv=img27and
>  here:
>  http://www.picoodle.com/view.php?img=/4/4/2/f_Screenshot1m_baf6951.png&srv=img36
>
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Evgeny Zhulenev
2008/4/2, Liu Yan <[hidden email]>:

1) Are you using nutch-0.9? It is based on hadoop-0.12.x and I haven't
> successfully got it running in distributed deployment. I hit file
> system corruption error soon when I started run nutch crawl. However,
> when I build nutch from current trunk, whose hadoop version is 0.16.x,
> the run succeeded in distributed mode. I was using 5 boxes.



I'm using nutch from trunk with hadoop 0.16.x

 2) Can your machines resolve the names of others? For example, I have
> 5 machines with names lab1, lab2, lab3, lab4, lab5, I need to put all
> those entries in /etc/hosts file on every box so their names can be
> properly resolved by others. DNS server probably can save some
> configuration work but I don't want to be blamed by my MIS :-)


 First I'm trying to run nutch only on one machine.

Check the logs/hadoop.log and other relevant log files, there should
> be more hints.



In hadoop.log I found such errors:

2008-04-02 22:55:14,599 WARN  mapred.TaskTracker - Error running child
java.lang.NullPointerException
    at java.util.Hashtable.get(Hashtable.java:334)
    at
org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1020)
    at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:259)
    at
org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)

After this error job stop working, and after 10 minute hadoop jobtracker
kills it:

Task task_200804022240_0002_r_000001_2 failed to report status for 601
seconds. Killing!
Exception in thread "main" java.io.IOException: Job failed!
        at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
        at org.apache.nutch.crawl.Injector.inject(Injector.java:157)
        at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)

Have you got any ideas, why it could happen?
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Liu Yan
Can you post your hadoop-site.xml?

Yan

2008/4/3, Evgeny Zhulenev <[hidden email]>:

> 2008/4/2, Liu Yan <[hidden email]>:
>
>
>  1) Are you using nutch-0.9? It is based on hadoop-0.12.x and I haven't
>  > successfully got it running in distributed deployment. I hit file
>  > system corruption error soon when I started run nutch crawl. However,
>  > when I build nutch from current trunk, whose hadoop version is 0.16.x,
>  > the run succeeded in distributed mode. I was using 5 boxes.
>
>
>
>
> I'm using nutch from trunk with hadoop 0.16.x
>
>
>   2) Can your machines resolve the names of others? For example, I have
>  > 5 machines with names lab1, lab2, lab3, lab4, lab5, I need to put all
>  > those entries in /etc/hosts file on every box so their names can be
>  > properly resolved by others. DNS server probably can save some
>  > configuration work but I don't want to be blamed by my MIS :-)
>
>
>
>  First I'm trying to run nutch only on one machine.
>
>
>  Check the logs/hadoop.log and other relevant log files, there should
>  > be more hints.
>
>
>
>
> In hadoop.log I found such errors:
>
>  2008-04-02 22:55:14,599 WARN  mapred.TaskTracker - Error running child
>
> java.lang.NullPointerException
>     at java.util.Hashtable.get(Hashtable.java:334)
>     at
>  org.apache.hadoop.mapred.ReduceTask$ReduceCopier.fetchOutputs(ReduceTask.java:1020)
>     at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:259)
>     at
>  org.apache.hadoop.mapred.TaskTracker$Child.main(TaskTracker.java:2071)
>
>
> After this error job stop working, and after 10 minute hadoop jobtracker
>  kills it:
>
>  Task task_200804022240_0002_r_000001_2 failed to report status for 601
>  seconds. Killing!
>  Exception in thread "main" java.io.IOException: Job failed!
>         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:894)
>         at org.apache.nutch.crawl.Injector.inject(Injector.java:157)
>         at org.apache.nutch.crawl.Crawl.main(Crawl.java:113)
>
>  Have you got any ideas, why it could happen?
>
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Evgeny Zhulenev
2008/4/3, Liu Yan <[hidden email]>:
>
> Can you post your hadoop-site.xml?
>

<configuration>

<property>
  <name>fs.default.name</name>
  <value>localhost:9000</value>
  <description>
    The name of the default file system. Either the literal string
    "local" or a host:port for NDFS.
  </description>
</property>

<property>
  <name>mapred.job.tracker</name>
  <value>localhost:9001</value>
  <description>
    The host and port that the MapReduce job tracker runs at. If
    "local", then jobs are run in-process as a single map and
    reduce task.
  </description>
</property>

<property>
  <name>mapred.map.tasks</name>
  <value>1</value>
  <description>
    define mapred.map tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>mapred.reduce.tasks</name>
  <value>1</value>
  <description>
    define mapred.reduce tasks to be number of slave hosts
  </description>
</property>

<property>
  <name>dfs.name.dir</name>
  <value>/nutch/filesystem/name</value>
</property>

<property>
  <name>dfs.data.dir</name>
  <value>/nutch/filesystem/data</value>
</property>

<property>
  <name>mapred.system.dir</name>
  <value>/nutch/filesystem/mapreduce/system</value>
</property>

<property>
  <name>mapred.local.dir</name>
  <value>/nutch/filesystem/mapreduce/local</value>
</property>

<property>
  <name>dfs.replication</name>
  <value>1</value>
</property>

</configuration>
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Liu Yan
Please provide:

1) conf/crawl-urlfilters.txt
2) files that contains urls list you used to do the fetch
3) conf/slaves
4) command that you used to run the nutch

Yan

2008/4/3, Evgeny Zhulenev <[hidden email]>:

> 2008/4/3, Liu Yan <[hidden email]>:
>
> >
>  > Can you post your hadoop-site.xml?
>  >
>
>
> <configuration>
>
>  <property>
>   <name>fs.default.name</name>
>   <value>localhost:9000</value>
>   <description>
>     The name of the default file system. Either the literal string
>     "local" or a host:port for NDFS.
>   </description>
>  </property>
>
>  <property>
>   <name>mapred.job.tracker</name>
>   <value>localhost:9001</value>
>   <description>
>     The host and port that the MapReduce job tracker runs at. If
>     "local", then jobs are run in-process as a single map and
>     reduce task.
>   </description>
>  </property>
>
>  <property>
>   <name>mapred.map.tasks</name>
>   <value>1</value>
>   <description>
>     define mapred.map tasks to be number of slave hosts
>   </description>
>  </property>
>
>  <property>
>   <name>mapred.reduce.tasks</name>
>   <value>1</value>
>   <description>
>     define mapred.reduce tasks to be number of slave hosts
>   </description>
>  </property>
>
>  <property>
>   <name>dfs.name.dir</name>
>   <value>/nutch/filesystem/name</value>
>  </property>
>
>  <property>
>   <name>dfs.data.dir</name>
>   <value>/nutch/filesystem/data</value>
>  </property>
>
>  <property>
>   <name>mapred.system.dir</name>
>   <value>/nutch/filesystem/mapreduce/system</value>
>  </property>
>
>  <property>
>   <name>mapred.local.dir</name>
>   <value>/nutch/filesystem/mapreduce/local</value>
>  </property>
>
>  <property>
>   <name>dfs.replication</name>
>   <value>1</value>
>  </property>
>
>  </configuration>
>
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Evgeny Zhulenev
2008/4/3, Liu Yan <[hidden email]>:
> Please provide:
>
>  1) conf/crawl-urlfilters.txt

-^(file|ftp|mailto):
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
-[?*!@=]
+.

I also thought that it could be a problem in url filtering, so I disabled
url's filtering by domain adding rule: "+."
Before it was:
+^http://([a-z0-9]*\.)*apache.org/
+^http://([a-z0-9]*\.)*familycar.com/




>  2) files that contains urls list you used to do the fetch

http://lucene.apache.org/
http://familycar.com/

>  3) conf/slaves

localhost

>  4) command that you used to run the nutch

First I tried:
bin/nutch crawl urls -dir crawled -depth 2

After it I dicovered that Nutch failes on injecting urls, and now I use
command: bin/nutch inject crawldb /users/nutch/urls/urllist.txt, where
/users/nutch/urls/urllist.txt is file with urls located ad HDFS.

P.S. Today I wrote a message "Nutch inject fails on reduce" with some more
details about this problem, it could be found here
http://www.mail-archive.com/nutch-user@.../msg10928.html. It
contains logs and some my thoughts ;)

Thanks for help
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Liu Yan
I am out of idea now. I set up a similar test here with 1 box running
everything, tried to fetch couple of urls with small depth and topN,
and succeeded. The only differences I can tell are:

1) in conf/hadoop-site.xml
-- mapred.map.tasks ==> 2
-- mapred.reduce.tasks ==> 2
-- mapred.child.java.opts ==> -Xmx512m

2) I used LAN IP address (192.168.x.x) rather than "localhost" in the
hadoop-site.xml and slaves files

3) The command I used was:
bin/nutch crawl urls -dir crawled -depth 2 -topN 5

Yan

2008/4/3, Evgeny Zhulenev <[hidden email]>:

> 2008/4/3, Liu Yan <[hidden email]>:
>
> > Please provide:
>  >
>  >  1) conf/crawl-urlfilters.txt
>
>
> -^(file|ftp|mailto):
>  -\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$
>  -[?*!@=]
>  +.
>
>  I also thought that it could be a problem in url filtering, so I disabled
>  url's filtering by domain adding rule: "+."
>  Before it was:
>  +^http://([a-z0-9]*\.)*apache.org/
>  +^http://([a-z0-9]*\.)*familycar.com/
>
>
>
>
>
>  >  2) files that contains urls list you used to do the fetch
>
>
> http://lucene.apache.org/
>  http://familycar.com/
>
>  >  3) conf/slaves
>
>  localhost
>
>
>  >  4) command that you used to run the nutch
>
>
> First I tried:
>  bin/nutch crawl urls -dir crawled -depth 2
>
>  After it I dicovered that Nutch failes on injecting urls, and now I use
>  command: bin/nutch inject crawldb /users/nutch/urls/urllist.txt, where
>  /users/nutch/urls/urllist.txt is file with urls located ad HDFS.
>
>  P.S. Today I wrote a message "Nutch inject fails on reduce" with some more
>  details about this problem, it could be found here
>  http://www.mail-archive.com/nutch-user@.../msg10928.html. It
>  contains logs and some my thoughts ;)
>
>  Thanks for help
>
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Evgeny Zhulenev
Can you please provide a screenshot of Jobtracker after job finished.
Somethig like:
http://192.168.x.x:50030/jobdetails.jsp?jobid=job_200804032050_0002, for
your job.
I've got "Combine input records"/"Combine output records" 0, it seems to me
that it could be a problem.


2008/4/3, Liu Yan <[hidden email]>:

>
> I am out of idea now. I set up a similar test here with 1 box running
> everything, tried to fetch couple of urls with small depth and topN,
> and succeeded. The only differences I can tell are:
>
> 1) in conf/hadoop-site.xml
> -- mapred.map.tasks ==> 2
> -- mapred.reduce.tasks ==> 2
> -- mapred.child.java.opts ==> -Xmx512m
>
> 2) I used LAN IP address (192.168.x.x) rather than "localhost" in the
> hadoop-site.xml and slaves files
>
> 3) The command I used was:
> bin/nutch crawl urls -dir crawled -depth 2 -topN 5
>
>
> Yan
>
Reply | Threaded
Open this post in threaded view
|

Re: Reduce tasks doesn't start

Evgeny Zhulenev
And please attach hadoop.log, and logs from /logs/history folder:
**jobname**.log, job configuration - **jobname**.xml and JobHistory.log

2008/4/3, Evgeny Zhulenev <[hidden email]>:
>
> Can you please provide a screenshot of Jobtracker after job finished.
> Somethig like:
> http://192.168.x.x:50030/jobdetails.jsp?jobid=job_200804032050_0002, for
> your job.
> I've got "Combine input records"/"Combine output records" 0, it seems to
> me that it could be a problem.
>
>