problem with inject url on mapred

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

problem with inject url on mapred

Anton Potekhin
I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111.

 

In nutch-site.xml I specified parameters:

 

1) On the both machines:

<property>

  <name>fs.default.name</name>

  <value>192.168.0.250:9009</value>

  <description>The name of the default file system.  Either the

  literal string "local" or a host:port for NDFS.</description>

</property>

 

<property>

  <name>mapred.job.tracker</name>

  <value>192.168.0.250:9010</value>

  <description>The host and port that the MapReduce job tracker runs

  at.  If "local", then jobs are run in-process as a single map

  and reduce task.

  </description>

</property>

 

<property>

  <name>mapred.map.tasks</name>

  <value>2</value>

  <description>The default number of map tasks per job.  Typically set

  to a prime several times greater than number of available hosts.

  Ignored when mapred.job.tracker is "local".  

  </description>

</property>

 

<property>

  <name>mapred.tasktracker.tasks.maximum</name>

  <value>2</value>

  <description>The maximum number of tasks that will be run

  simultaneously by a task tracker.

  </description>

</property>

 

2) On 192.168.0.250:

 

<property>

  <name>mapred.reduce.tasks</name>

  <value>1</value>

  <description>The default number of reduce tasks per job.  Typically
set

  to a prime close to the number of available hosts.  Ignored when

  mapred.job.tracker is "local".

  </description>

</property>

 

 

3) On 192.168.0.111

 

<property>

  <name>mapred.reduce.tasks</name>

  <value>2</value>

  <description>The default number of reduce tasks per job.  Typically
set

  to a prime close to the number of available hosts.  Ignored when

  mapred.job.tracker is "local".

  </description>

</property>

 

On 192.168.0.111 I started:

1)       bin/nutch-daemon.sh start tasktracker

 

On 192.168.0.250 I started:

2)       bin/nutch-daemon.sh start datanode

3)       bin/nutch-daemon.sh start namenode

4)       bin/nutch-daemon.sh start jobtracker

5)       bin/nutch-daemon.sh start tasktracker

 

I created directory seeds and file urls in it. Urls contained 2 links.
Then I added that directory to NDFS (bin/nutch ndfs -put ./seeds seeds).
Directory was added successfully..

 

Then I launched command:

bin/nutch crawl seeds -depth 2

 

I a result I received log written by jobtracker:

.....

051123 053118 Adding task 'task_m_z66npx' to set for tracker
'tracker_53845'

051123 053118 Adding task 'task_m_xaynqo' to set for tracker
'tracker_11518'

051123 053130 Task 'task_m_z66npx' has finished successfully.

 

Log written by tasktracker on 192.168.0.111:

......

051110 142607 task_m_z66npx 0.0% /user/root/seeds/urls:0+31

051110 142607 task_m_z66npx 1.0% /user/root/seeds/urls:0+31

051110 142607 Task task_m_z66npx is done.

 

Log written by tasktracker on 192.168.0.250:

....

051123 053125 task_m_xaynqo 0.12903225% /user/root/seeds/urls:31+31

051123 053126 task_m_xaynqo -683.9677% /user/root/seeds/urls:31+31

051123 053127 task_m_xaynqo -2129.9678% /user/root/seeds/urls:31+31

051123 053128 task_m_xaynqo -3483.0322% /user/root/seeds/urls:31+31

051123 053129 task_m_xaynqo -4976.2256% /user/root/seeds/urls:31+31

051123 053130 task_m_xaynqo -6449.1934% /user/root/seeds/urls:31+31

051123 053131 task_m_xaynqo -7898.258% /user/root/seeds/urls:31+31

051123 053132 task_m_xaynqo -9232.193% /user/root/seeds/urls:31+31

051123 053133 task_m_xaynqo -10694.3545% /user/root/seeds/urls:31+31

051123 053134 task_m_xaynqo -12139.226% /user/root/seeds/urls:31+31

051123 053135 task_m_xaynqo -13416.677% /user/root/seeds/urls:31+31

051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31

... and so on... e.g. in this log were records with reducing percents.

 

I concluded that was an attempt to separate inject to 2 machines e.g.
were 2 tasks: 'task_m_z66npx' and 'task_m_xaynqo'. And 'task_m_z66npx'
was finished successfully and 'task_m_xaynqo' caused some problems

 

Please help me to find out what the problem is? And what I did wrong?

 

Reply | Threaded
Open this post in threaded view
|

Re: problem with inject url on mapred

Paul E. Baclace
[regarding mapred ver 0.8]

Anton Potehin wrote:
> I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111.

 > 051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31

 > Please help me to find out what the problem is? And what I did wrong?

Is the problem the negative progress percentages (perhaps it also does not finish)?

I am going to write up a new wiki page about usage scenarios and it looks like you are using an unusual setup by having different configuration settings on each node.  Use the same settings (that's why it is called nutch-site.xml) on both (all) nodes.

The bin/start_all.sh provides a perfect example of how to start up the ensemble.  The sh scripts are extemely elegant, but be careful to notice the difference between nutch-daemon.sh and nutch-daemons.sh; in your case, 192.168.0.250 is the master node, so you need to put the following into $HOME/.slaves

192.168.0.111
192.168.0.250

(notice your master is also a slave; this should work, but the normal case is to have only namenode and jobtracker running on the master (and, of course, namenode and jobtracker are not run on any other nodes) and you need to allow ssh work from the master to the slave without a password (either that or customize bin/slaves.sh for rsh, etc. or unwind it and it by hand)

Paul
[hidden email]
Reply | Threaded
Open this post in threaded view
|

RE: problem with inject url on mapred

Anton Potekhin
Yes, problem in negative progress percentages.

My both hosts have same nutch-site.xml. Their difference just in parameter
mapred.reduce.tasks. How I understud, mapred.reduce.tasks is number of
tasktracker. May be mapred.reduce.tasks is not number, but then what is it?
What is reason of negative progress percentages?

-----Original Message-----
From: Paul Baclace [mailto:[hidden email]]
Sent: Thursday, November 10, 2005 11:14 PM
To: [hidden email]
Subject: Re: problem with inject url on mapred
Importance: High

[regarding mapred ver 0.8]

Anton Potehin wrote:
> I tried to launch mapred on 2 machines: 192.168.0.250 and 192.168.0.111.

 > 051123 053136 task_m_xaynqo -14885.741% /user/root/seeds/urls:31+31

 > Please help me to find out what the problem is? And what I did wrong?


Is the problem the negative progress percentages (perhaps it also does not
finish)?

I am going to write up a new wiki page about usage scenarios and it looks
like you are using an unusual setup by having different configuration
settings on each node.  Use the same settings (that's why it is called
nutch-site.xml) on both (all) nodes.

The bin/start_all.sh provides a perfect example of how to start up the
ensemble.  The sh scripts are extemely elegant, but be careful to notice the
difference between nutch-daemon.sh and nutch-daemons.sh; in your case,
192.168.0.250 is the master node, so you need to put the following into
$HOME/.slaves

192.168.0.111
192.168.0.250

(notice your master is also a slave; this should work, but the normal case
is to have only namenode and jobtracker running on the master (and, of
course, namenode and jobtracker are not run on any other nodes) and you need
to allow ssh work from the master to the slave without a password (either
that or customize bin/slaves.sh for rsh, etc. or unwind it and it by hand)

Paul
[hidden email]


Reply | Threaded
Open this post in threaded view
|

Re: problem with inject url on mapred

Doug Cutting-2
[hidden email] wrote:
> Yes, problem in negative progress percentages.

Is /usr/root/seeds/urls the same file on all hosts?  How big is it?

Doug
Reply | Threaded
Open this post in threaded view
|

RE: problem with inject url on mapred

Anton Potekhin
No, seeds/urls placed on ndfs. File 'urls' contain only two URLs.

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Wednesday, November 16, 2005 9:39 PM
To: [hidden email]
Subject: Re: problem with inject url on mapred

[hidden email] wrote:
> Yes, problem in negative progress percentages.

Is /usr/root/seeds/urls the same file on all hosts?  How big is it?

Doug


Reply | Threaded
Open this post in threaded view
|

RE: problem with inject url on mapred

Anton Potekhin
In reply to this post by Doug Cutting-2
I'm misunderstood you ;)
If start command 'bin/nutch ndfs -ls /user/root/seeds/urls ' on both hosts I
get same results :
Found 1 items
/user/root/seeds/urls   62

This file contain only two urls.
 
-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Wednesday, November 16, 2005 9:39 PM
To: [hidden email]
Subject: Re: problem with inject url on mapred

[hidden email] wrote:
> Yes, problem in negative progress percentages.

Is /usr/root/seeds/urls the same file on all hosts?  How big is it?

Doug