How to configure nutch to crawl parallelly

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

How to configure nutch to crawl parallelly

xiao yang
Hi, All

I'm using Nutch-1.0 on a 12 nodes cluster, and configure
conf/hadoop-site.xml as follow:
  ...
  <property>
    <name>mapred.tasktracker.map.tasks.maximum</name>
    <value>20</value>
  </property>
  <property>
    <name>mapred.tasktracker.reduce.tasks.maximum</name>
    <value>20</value>
  </property>
  ...
but the "Running Jobs" section in page
http://cluster0:50030/jobtracker.jsp never has more than one item.

Thanks!
Xiao
Reply | Threaded
Open this post in threaded view
|

Re: How to configure nutch to crawl parallelly

Otis Gospodnetic-2-2
I don't recall off the top of my head what that jobtracker.jsp shows, but judging by name, it shows your job.  Each job is composed of multiple map and reduce tasks.  Drill into your job and you should see multiple tasks running.

Otis
--
Sematext is hiring -- http://sematext.com/about/jobs.html?mls
Lucene, Solr, Nutch, Katta, Hadoop, HBase, UIMA, NLP, NER, IR



----- Original Message ----

> From: xiao yang <[hidden email]>
> To: [hidden email]
> Sent: Fri, November 13, 2009 12:16:55 PM
> Subject: How to configure nutch to crawl parallelly
>
> Hi, All
>
> I'm using Nutch-1.0 on a 12 nodes cluster, and configure
> conf/hadoop-site.xml as follow:
>   ...
>  
>     mapred.tasktracker.map.tasks.maximum
>     20
>  
>  
>     mapred.tasktracker.reduce.tasks.maximum
>     20
>  
>   ...
> but the "Running Jobs" section in page
> http://cluster0:50030/jobtracker.jsp never has more than one item.
>
> Thanks!
> Xiao