Question on Critical Region size for SequenceFile next/write - 0.15.1

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Question on Critical Region size for SequenceFile next/write - 0.15.1

Jason Venner-2
We have relatively heavy weight objects that we pass around the cluster
for our map/reduce tasks.
We have noticed that when we are using the multi threaded mapper, we
don't get very high utilization of either cpu or disk.

On investigating, we discovered that the entirety of the next(key,value)
and the entirety of the write( key, value) are synchronized on the file
object.

This causes all threads to back up on the serialization/deserialization.

 Before we start coding, are there any current patches floating around
the shrink this critical window? It is pretty straight forward for
write, but not so simple for next.

We run multithreaded mappers because we have more cpu's than disk arms
on our cluster machines, and some of our tasks are inherently threaded
so we can't just set the maximum task number.

Thanks -- Jason
Reply | Threaded
Open this post in threaded view
|

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Doug Cutting
Jason Venner wrote:
> On investigating, we discovered that the entirety of the next(key,value)
> and the entirety of the write( key, value) are synchronized on the file
> object.
>
> This causes all threads to back up on the serialization/deserialization.

I'm not sure what you want to happen here.  If you've got a bunch of
threads writing to a single file, and that's your performance
bottleneck, I don't see how to improve the situation except to write to
multiple files on different drives, or to spread your load across a
larger cluster (another way to get more drives).

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Jason Venner-2
We have been monitoring the performance of our jobs using slaves.sh vmstat 5

When we are running the very simple mappers, that basically read input,
do very very little and write output, neither the cpu or the disk are
being fully utilized. We expect to saturate on either cpu or on disk. It
may be we are saturating on network, but our network read speed is about
the same as our disk read speed ~50mb/sec.
We only see about 1/5 of the disk bandwidth and 1/5 of the cpu being
utilized, and increasing the number of threads doesn't change the
utilization.

Our theory is that the serialization time (not the disk write time) and
the deserialization time (not the disk read time) is the bottleneck.
I have some test code nearly ready to go, if it changes the machine
utilization on my standard job, I will let you know...


Doug Cutting wrote:

> Jason Venner wrote:
>> On investigating, we discovered that the entirety of the
>> next(key,value) and the entirety of the write( key, value) are
>> synchronized on the file object.
>>
>> This causes all threads to back up on the serialization/deserialization.
>
> I'm not sure what you want to happen here.  If you've got a bunch of
> threads writing to a single file, and that's your performance
> bottleneck, I don't see how to improve the situation except to write
> to multiple files on different drives, or to spread your load across a
> larger cluster (another way to get more drives).
>
> Doug
Reply | Threaded
Open this post in threaded view
|

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Ted Dunning-3


It seems reasonable that (de)-serialization could be done in threaded
fashion and then just block on the (read) write itself.

That would explain the utilization which is suspect is close to 1/N where N
is the number of processors.


On 12/12/07 2:07 PM, "Jason Venner" <[hidden email]> wrote:

> Our theory is that the serialization time (not the disk write time) and
> the deserialization time (not the disk read time) is the bottleneck.
> I have some test code nearly ready to go, if it changes the machine
> utilization on my standard job, I will let you know...

Reply | Threaded
Open this post in threaded view
|

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Doug Cutting
Ted Dunning wrote:
> It seems reasonable that (de)-serialization could be done in threaded
> fashion and then just block on the (read) write itself.

That would require a buffer per thread, e.g., replacing Writer#buffer
with a ThreadLocal of DataOutputBuffers.  The deflater-related objects
would also need to accessed through ThreadLocals.  That could work.

Doug
Reply | Threaded
Open this post in threaded view
|

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Jason Venner-2
I have the write side working, the read side seems to be more complex
and I am digging into it.

Doug Cutting wrote:
> Ted Dunning wrote:
>> It seems reasonable that (de)-serialization could be done in threaded
>> fashion and then just block on the (read) write itself.
>
> That would require a buffer per thread, e.g., replacing Writer#buffer
> with a ThreadLocal of DataOutputBuffers.  The deflater-related objects
> would also need to accessed through ThreadLocals.  That could work.
>
> Doug
Reply | Threaded
Open this post in threaded view
|

Re: Question on Critical Region size for SequenceFile next/write - 0.15.1

Jason Venner-2
Our first cut at this is generating about 4x the IO, we are now
saturating on disk.

These results are not definitive, they are eyeball.


Jason Venner wrote:

> I have the write side working, the read side seems to be more complex
> and I am digging into it.
>
> Doug Cutting wrote:
>> Ted Dunning wrote:
>>> It seems reasonable that (de)-serialization could be done in threaded
>>> fashion and then just block on the (read) write itself.
>>
>> That would require a buffer per thread, e.g., replacing Writer#buffer
>> with a ThreadLocal of DataOutputBuffers.  The deflater-related
>> objects would also need to accessed through ThreadLocals.  That could
>> work.
>>
>> Doug