nutch 0.9, mapred-default.xml, hadoop-site.xml file usage on slaves

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

nutch 0.9, mapred-default.xml, hadoop-site.xml file usage on slaves

John Mendenhall
I currently have 3 servers, 1 serving as master and
slave.  Each has a different amount of memory available.
Each has a different processor type.

I configure the values for mapred.map.tasks,
mapred.reduce.tasks, and mapred.tasktracker.tasks.maximum
in the mapred-default.xml file.

I configure the values for mapred.child.java.opts
in the hadoop-site.xml file.

Am I putting the correct values in the correct files?

With each server having different capabilities,
is it best practice to have each server run using
as much of it's capabilities as possible, maximizing
heap size, and number of processes?

Or, is it best practice to choose values that are
the lowest common denominator?

Also, does each slave look at it's local mapred
configuration to determine memory to use for each
child process?  Is the number of tasks taken from
the master configuration, or the slave configurations?

Thanks in advance for any pointers or rules of thumb
you can provide.

JohnM

--
john mendenhall
[hidden email]
surf utopia
internet services