Benchmarking question - how do you test a new cluster?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Benchmarking question - how do you test a new cluster?

Steve Schlosser
Hello all

I've got a small hadoop cluster running (5 nodes today, going to 15+
soon), and I'd like to do some benchmarking.  My question to the group
is - what is the first benchmark you run on a new cluster?

I'd like to do some simple functionality, throughput, and, especially,
scaling experiments.  So far, the programs in the examples jar (grep,
wordcount, etc.) run fine.  I've had less success with the programs in
the test jar (DFSCIOTest, DistributedFSCheck, etc.).  Are some of
these deprecated?  Some have very similar names - are there
significant differences between them?

Thanks!

-steve
Reply | Threaded
Open this post in threaded view
|

Re: Benchmarking question - how do you test a new cluster?

Owen O'Malley-5

On Apr 23, 2007, at 7:39 AM, Steve Schlosser wrote:

> I've got a small hadoop cluster running (5 nodes today, going to 15+
> soon), and I'd like to do some benchmarking.  My question to the group
> is - what is the first benchmark you run on a new cluster?

I usually use random-writer to generate some random data (it defaults  
to 10g/node) and then use sort to sort it. Sort provides a pretty  
decent simple testcase for moving a lot of data through map/reduce.

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: Benchmarking question - how do you test a new cluster?

Dennis Kubes
You will also want to run hardware tests on the machines in your cluster
to make sure memory, disk, and network is working properly.  We use
tools such as memtest86+ and Doug Ledfors Memory Test Script to do burn
ins.  Here is a link to different test programs for linux.

http://linuxquality.sunsite.dk/articles/testsuites/

Dennis Kubes

Owen O'Malley wrote:

>
> On Apr 23, 2007, at 7:39 AM, Steve Schlosser wrote:
>
>> I've got a small hadoop cluster running (5 nodes today, going to 15+
>> soon), and I'd like to do some benchmarking.  My question to the group
>> is - what is the first benchmark you run on a new cluster?
>
> I usually use random-writer to generate some random data (it defaults to
> 10g/node) and then use sort to sort it. Sort provides a pretty decent
> simple testcase for moving a lot of data through map/reduce.
>
> -- Owen