Quantcast

java.lang.OutOfMemoryError: GC overhead limit exceeded

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
Greetings,

I'm running into a brain-numbing problem on Elastic MapReduce. I'm
running a decent-size task (22,000 mappers, a ton of GZipped input
blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

I get failures randomly --- sometimes at the end of my 6-step process,
sometimes at the first reducer phase, sometimes in the mapper. It
seems to fail in multiple areas. Mostly in the reducers. Any ideas?

Here's the settings I've changed:
-Xmx400m
6 max mappers
1 max reducer
1GB swap partition
mapred.job.reuse.jvm.num.tasks=50
mapred.reduce.parallel.copies=3


java.lang.OutOfMemoryError: GC overhead limit exceeded
        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
        at java.lang.StringCoding.decode(StringCoding.java:173)
        at java.lang.String.(String.java:443)
        at java.lang.String.(String.java:515)
        at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
        at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
        at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
        at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
        at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
        at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
        at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
        at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
see if that helps.

On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
<[hidden email]> wrote:

> Greetings,
>
> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
> running a decent-size task (22,000 mappers, a ton of GZipped input
> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
>
> I get failures randomly --- sometimes at the end of my 6-step process,
> sometimes at the first reducer phase, sometimes in the mapper. It
> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
>
> Here's the settings I've changed:
> -Xmx400m
> 6 max mappers
> 1 max reducer
> 1GB swap partition
> mapred.job.reuse.jvm.num.tasks=50
> mapred.reduce.parallel.copies=3
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
>        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
>        at java.lang.StringCoding.decode(StringCoding.java:173)
>        at java.lang.String.(String.java:443)
>        at java.lang.String.(String.java:515)
>        at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
>        at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
>        at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
>        at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
>        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
>        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
>        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
>        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
>        at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
>        at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
>        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
>        at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>        at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
Nope, that didn't seem to help.

On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
<[hidden email]> wrote:

> I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
> see if that helps.
>
> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
> <[hidden email]> wrote:
>> Greetings,
>>
>> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
>> running a decent-size task (22,000 mappers, a ton of GZipped input
>> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
>>
>> I get failures randomly --- sometimes at the end of my 6-step process,
>> sometimes at the first reducer phase, sometimes in the mapper. It
>> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
>>
>> Here's the settings I've changed:
>> -Xmx400m
>> 6 max mappers
>> 1 max reducer
>> 1GB swap partition
>> mapred.job.reuse.jvm.num.tasks=50
>> mapred.reduce.parallel.copies=3
>>
>>
>> java.lang.OutOfMemoryError: GC overhead limit exceeded
>>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
>>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
>>        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
>>        at java.lang.StringCoding.decode(StringCoding.java:173)
>>        at java.lang.String.(String.java:443)
>>        at java.lang.String.(String.java:515)
>>        at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
>>        at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
>>        at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
>>        at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
>>        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
>>        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
>>        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
>>        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
>>        at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
>>        at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
>>        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>>        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
>>        at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>>        at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>>        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
>>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Ted Yu-3
Have you tried lowering mapred.job.reuse.jvm.num.tasks ?

On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens <
[hidden email]> wrote:

> Nope, that didn't seem to help.
>
> On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
> <[hidden email]> wrote:
> > I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
> > see if that helps.
> >
> > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
> > <[hidden email]> wrote:
> >> Greetings,
> >>
> >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
> >> running a decent-size task (22,000 mappers, a ton of GZipped input
> >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
> >>
> >> I get failures randomly --- sometimes at the end of my 6-step process,
> >> sometimes at the first reducer phase, sometimes in the mapper. It
> >> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
> >>
> >> Here's the settings I've changed:
> >> -Xmx400m
> >> 6 max mappers
> >> 1 max reducer
> >> 1GB swap partition
> >> mapred.job.reuse.jvm.num.tasks=50
> >> mapred.reduce.parallel.copies=3
> >>
> >>
> >> java.lang.OutOfMemoryError: GC overhead limit exceeded
> >>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
> >>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
> >>        at
> java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
> >>        at java.lang.StringCoding.decode(StringCoding.java:173)
> >>        at java.lang.String.(String.java:443)
> >>        at java.lang.String.(String.java:515)
> >>        at
> org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
> >>        at
> cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
> >>        at
> cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
> >>        at
> cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
> >>        at
> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
> >>        at
> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
> >>        at
> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
> >>        at
> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
> >>        at
> cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
> >>        at
> cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
> >>        at
> org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
> >>        at
> org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
> >>        at
> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
> >>        at
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
> >>        at
> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
> >>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
> >>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
> >>        at
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
> >>
> >> --
> >> Bradford Stephens,
> >> Founder, Drawn to Scale
> >> drawntoscalehq.com
> >> 727.697.7528
> >>
> >> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> >> solution. Process, store, query, search, and serve all your data.
> >>
> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> >> Media, and Computer Science
> >>
> >
> >
> >
> > --
> > Bradford Stephens,
> > Founder, Drawn to Scale
> > drawntoscalehq.com
> > 727.697.7528
> >
> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> > solution. Process, store, query, search, and serve all your data.
> >
> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
> > Media, and Computer Science
> >
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Chris K Wensel-2
In reply to this post by Bradford Stephens
fwiw

I run m2.xlarge slaves, using the default mappers/reducers (4/2 i think).

with swap
  --bootstrap-action s3://elasticmapreduce/bootstrap-actions/create-swap-file.rb --args "-E,/mnt/swap,1000"

historically i'v run this property with no issues, but should probably re-research the gc setting (comments please)
 "mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC"

i haven't co-installed ganglia to look at utilization lately, but any more mappers than 4 or more than 2 reducers have always given me headaches.

ckw

On Sep 26, 2010, at 12:55 AM, Bradford Stephens wrote:

> Greetings,
>
> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
> running a decent-size task (22,000 mappers, a ton of GZipped input
> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
>
> I get failures randomly --- sometimes at the end of my 6-step process,
> sometimes at the first reducer phase, sometimes in the mapper. It
> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
>
> Here's the settings I've changed:
> -Xmx400m
> 6 max mappers
> 1 max reducer
> 1GB swap partition
> mapred.job.reuse.jvm.num.tasks=50
> mapred.reduce.parallel.copies=3
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
> at java.nio.CharBuffer.wrap(CharBuffer.java:350)
> at java.nio.CharBuffer.wrap(CharBuffer.java:373)
> at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
> at java.lang.StringCoding.decode(StringCoding.java:173)
> at java.lang.String.(String.java:443)
> at java.lang.String.(String.java:515)
> at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
> at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
> at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
> at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
> at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
> at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
> at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
> at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
> at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
> at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
> at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
> at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
> at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
> at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
> at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to [hidden email].
> To unsubscribe from this group, send email to [hidden email].
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>

--
Chris K Wensel
[hidden email]
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support, and licensing for Cascading

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Ted Dunning
The old GC routinely gives me lower performance than modern GC.  The default
is now quite good for batch programs.

On Sun, Sep 26, 2010 at 8:10 AM, Chris K Wensel <[hidden email]> wrote:

> historically i'v run this property with no issues, but should probably
> re-research the gc setting (comments please)
>  "mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC"
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Ted Dunning
In reply to this post by Bradford Stephens
My feeling is that you have some kind of leak going on in your mappers or
reducers and that reducing the number of times the jvm is re-used would
improve matters.

GC overhead limit indicates that your (tiny) heap is full and collection is
not reducing that.

On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
[hidden email]> wrote:

> mapred.job.reuse.jvm.num.tasks=50
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
In reply to this post by Ted Yu-3
Hrm.... no. I've lowered it to -1, but I can try 1 again.

On Sun, Sep 26, 2010 at 6:47 AM, Ted Yu <[hidden email]> wrote:

> Have you tried lowering mapred.job.reuse.jvm.num.tasks ?
>
> On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens <
> [hidden email]> wrote:
>
>> Nope, that didn't seem to help.
>>
>> On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
>> <[hidden email]> wrote:
>> > I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
>> > see if that helps.
>> >
>> > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
>> > <[hidden email]> wrote:
>> >> Greetings,
>> >>
>> >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
>> >> running a decent-size task (22,000 mappers, a ton of GZipped input
>> >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
>> >>
>> >> I get failures randomly --- sometimes at the end of my 6-step process,
>> >> sometimes at the first reducer phase, sometimes in the mapper. It
>> >> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
>> >>
>> >> Here's the settings I've changed:
>> >> -Xmx400m
>> >> 6 max mappers
>> >> 1 max reducer
>> >> 1GB swap partition
>> >> mapred.job.reuse.jvm.num.tasks=50
>> >> mapred.reduce.parallel.copies=3
>> >>
>> >>
>> >> java.lang.OutOfMemoryError: GC overhead limit exceeded
>> >>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
>> >>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
>> >>        at
>> java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
>> >>        at java.lang.StringCoding.decode(StringCoding.java:173)
>> >>        at java.lang.String.(String.java:443)
>> >>        at java.lang.String.(String.java:515)
>> >>        at
>> org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
>> >>        at
>> cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
>> >>        at
>> cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
>> >>        at
>> cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
>> >>        at
>> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
>> >>        at
>> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
>> >>        at
>> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
>> >>        at
>> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
>> >>        at
>> cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
>> >>        at
>> cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
>> >>        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>> >>        at
>> org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
>> >>        at
>> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>> >>        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>> >>        at
>> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>> >>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>> >>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
>> >>        at
>> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
>> >>
>> >> --
>> >> Bradford Stephens,
>> >> Founder, Drawn to Scale
>> >> drawntoscalehq.com
>> >> 727.697.7528
>> >>
>> >> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> >> solution. Process, store, query, search, and serve all your data.
>> >>
>> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> >> Media, and Computer Science
>> >>
>> >
>> >
>> >
>> > --
>> > Bradford Stephens,
>> > Founder, Drawn to Scale
>> > drawntoscalehq.com
>> > 727.697.7528
>> >
>> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> > solution. Process, store, query, search, and serve all your data.
>> >
>> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> > Media, and Computer Science
>> >
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Ted Yu-3
-1 means there is no limit to reusing.
At the same time, you can generate heap dump from OOME and analyze with
YourKit, etc.

Cheers

On Sun, Sep 26, 2010 at 1:19 PM, Bradford Stephens <
[hidden email]> wrote:

> Hrm.... no. I've lowered it to -1, but I can try 1 again.
>
> On Sun, Sep 26, 2010 at 6:47 AM, Ted Yu <[hidden email]> wrote:
> > Have you tried lowering mapred.job.reuse.jvm.num.tasks ?
> >
> > On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens <
> > [hidden email]> wrote:
> >
> >> Nope, that didn't seem to help.
> >>
> >> On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens
> >> <[hidden email]> wrote:
> >> > I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
> >> > see if that helps.
> >> >
> >> > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens
> >> > <[hidden email]> wrote:
> >> >> Greetings,
> >> >>
> >> >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
> >> >> running a decent-size task (22,000 mappers, a ton of GZipped input
> >> >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
> >> >>
> >> >> I get failures randomly --- sometimes at the end of my 6-step
> process,
> >> >> sometimes at the first reducer phase, sometimes in the mapper. It
> >> >> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
> >> >>
> >> >> Here's the settings I've changed:
> >> >> -Xmx400m
> >> >> 6 max mappers
> >> >> 1 max reducer
> >> >> 1GB swap partition
> >> >> mapred.job.reuse.jvm.num.tasks=50
> >> >> mapred.reduce.parallel.copies=3
> >> >>
> >> >>
> >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded
> >> >>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
> >> >>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
> >> >>        at
> >> java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
> >> >>        at java.lang.StringCoding.decode(StringCoding.java:173)
> >> >>        at java.lang.String.(String.java:443)
> >> >>        at java.lang.String.(String.java:515)
> >> >>        at
> >> org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
> >> >>        at
> >> cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
> >> >>        at
> >> cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
> >> >>        at
> >>
> cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
> >> >>        at
> >>
> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
> >> >>        at
> >>
> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
> >> >>        at
> >>
> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
> >> >>        at
> >>
> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
> >> >>        at
> >>
> cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
> >> >>        at
> >>
> cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
> >> >>        at
> >> org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
> >> >>        at
> >> org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
> >> >>        at
> >> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
> >> >>        at
> >>
> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
> >> >>        at
> >> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
> >> >>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
> >> >>        at
> >>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
> >> >>        at
> >>
> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
> >> >>
> >> >> --
> >> >> Bradford Stephens,
> >> >> Founder, Drawn to Scale
> >> >> drawntoscalehq.com
> >> >> 727.697.7528
> >> >>
> >> >> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> >> >> solution. Process, store, query, search, and serve all your data.
> >> >>
> >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> >> >> Media, and Computer Science
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Bradford Stephens,
> >> > Founder, Drawn to Scale
> >> > drawntoscalehq.com
> >> > 727.697.7528
> >> >
> >> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> >> > solution. Process, store, query, search, and serve all your data.
> >> >
> >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
> >> > Media, and Computer Science
> >> >
> >>
> >>
> >>
> >> --
> >> Bradford Stephens,
> >> Founder, Drawn to Scale
> >> drawntoscalehq.com
> >> 727.697.7528
> >>
> >> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> >> solution. Process, store, query, search, and serve all your data.
> >>
> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> >> Media, and Computer Science
> >>
> >
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
In reply to this post by Ted Dunning
Sadly, making Chris's changes didn't help.

Here's the Cascading code, it's pretty simple but uses the new
"combiner"-like functionality:

http://pastebin.com/ccvDmLSX



On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> wrote:

> My feeling is that you have some kind of leak going on in your mappers or
> reducers and that reducing the number of times the jvm is re-used would
> improve matters.
>
> GC overhead limit indicates that your (tiny) heap is full and collection is
> not reducing that.
>
> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
> [hidden email]> wrote:
>
>> mapred.job.reuse.jvm.num.tasks=50
>>
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Chris K Wensel-2
Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach.

ckw

On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:

> Sadly, making Chris's changes didn't help.
>
> Here's the Cascading code, it's pretty simple but uses the new
> "combiner"-like functionality:
>
> http://pastebin.com/ccvDmLSX
>
>
>
> On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> wrote:
>> My feeling is that you have some kind of leak going on in your mappers or
>> reducers and that reducing the number of times the jvm is re-used would
>> improve matters.
>>
>> GC overhead limit indicates that your (tiny) heap is full and collection is
>> not reducing that.
>>
>> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
>> [hidden email]> wrote:
>>
>>> mapred.job.reuse.jvm.num.tasks=50
>>>
>>
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
> --
> You received this message because you are subscribed to the Google Groups "cascading-user" group.
> To post to this group, send email to [hidden email].
> To unsubscribe from this group, send email to [hidden email].
> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>

--
Chris K Wensel
[hidden email]
http://www.concurrentinc.com

-- Concurrent, Inc. offers mentoring, support, and licensing for Cascading

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
Yup, I've turned it down to 1,000. Let's see if that helps!

On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> wrote:

> Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach.
>
> ckw
>
> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
>
>> Sadly, making Chris's changes didn't help.
>>
>> Here's the Cascading code, it's pretty simple but uses the new
>> "combiner"-like functionality:
>>
>> http://pastebin.com/ccvDmLSX
>>
>>
>>
>> On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> wrote:
>>> My feeling is that you have some kind of leak going on in your mappers or
>>> reducers and that reducing the number of times the jvm is re-used would
>>> improve matters.
>>>
>>> GC overhead limit indicates that your (tiny) heap is full and collection is
>>> not reducing that.
>>>
>>> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
>>> [hidden email]> wrote:
>>>
>>>> mapred.job.reuse.jvm.num.tasks=50
>>>>
>>>
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>> --
>> You received this message because you are subscribed to the Google Groups "cascading-user" group.
>> To post to this group, send email to [hidden email].
>> To unsubscribe from this group, send email to [hidden email].
>> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en.
>>
>
> --
> Chris K Wensel
> [hidden email]
> http://www.concurrentinc.com
>
> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
>
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Alex Kozlov
In reply to this post by Chris K Wensel-2
Hi Bradford,

Sometimes the reducers do not handle merging large chunks of data too well:
How many reducers do you have?  Try to increase the # of reducers (you can
always merge the data later if you are worried about too many output files).

--
Alex Kozlov
Solutions Architect
Cloudera, Inc
twitter: alexvk2009

Hadoop World 2010, October 12, New York City - Register now:
http://www.cloudera.com/company/press-center/hadoop-world-nyc/


On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> wrote:

> Try using a lower threshold value (the num of values in the LRU to cache).
> this is the tradeoff of this approach.
>
> ckw
>
> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
>
> > Sadly, making Chris's changes didn't help.
> >
> > Here's the Cascading code, it's pretty simple but uses the new
> > "combiner"-like functionality:
> >
> > http://pastebin.com/ccvDmLSX
> >
> >
> >
> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]>
> wrote:
> >> My feeling is that you have some kind of leak going on in your mappers
> or
> >> reducers and that reducing the number of times the jvm is re-used would
> >> improve matters.
> >>
> >> GC overhead limit indicates that your (tiny) heap is full and collection
> is
> >> not reducing that.
> >>
> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
> >> [hidden email]> wrote:
> >>
> >>> mapred.job.reuse.jvm.num.tasks=50
> >>>
> >>
> >
> >
> >
> > --
> > Bradford Stephens,
> > Founder, Drawn to Scale
> > drawntoscalehq.com
> > 727.697.7528
> >
> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> > solution. Process, store, query, search, and serve all your data.
> >
> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
> > Media, and Computer Science
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> > To post to this group, send email to [hidden email].
> > To unsubscribe from this group, send email to
> [hidden email]<cascading-user%[hidden email]>
> .
> > For more options, visit this group at
> http://groups.google.com/group/cascading-user?hl=en.
> >
>
> --
> Chris K Wensel
> [hidden email]
> http://www.concurrentinc.com
>
> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
One of the problems with this data set is that I'm grouping by a
category that has only, say, 20 different values. Then I'm doing a
unique count of Facebook user IDs per group. I imagine that's not
pleasant for the reducers.

On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[hidden email]> wrote:

> Hi Bradford,
>
> Sometimes the reducers do not handle merging large chunks of data too well:
> How many reducers do you have?  Try to increase the # of reducers (you can
> always merge the data later if you are worried about too many output files).
>
> --
> Alex Kozlov
> Solutions Architect
> Cloudera, Inc
> twitter: alexvk2009
>
> Hadoop World 2010, October 12, New York City - Register now:
> http://www.cloudera.com/company/press-center/hadoop-world-nyc/
>
>
> On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> wrote:
>
>> Try using a lower threshold value (the num of values in the LRU to cache).
>> this is the tradeoff of this approach.
>>
>> ckw
>>
>> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
>>
>> > Sadly, making Chris's changes didn't help.
>> >
>> > Here's the Cascading code, it's pretty simple but uses the new
>> > "combiner"-like functionality:
>> >
>> > http://pastebin.com/ccvDmLSX
>> >
>> >
>> >
>> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]>
>> wrote:
>> >> My feeling is that you have some kind of leak going on in your mappers
>> or
>> >> reducers and that reducing the number of times the jvm is re-used would
>> >> improve matters.
>> >>
>> >> GC overhead limit indicates that your (tiny) heap is full and collection
>> is
>> >> not reducing that.
>> >>
>> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
>> >> [hidden email]> wrote:
>> >>
>> >>> mapred.job.reuse.jvm.num.tasks=50
>> >>>
>> >>
>> >
>> >
>> >
>> > --
>> > Bradford Stephens,
>> > Founder, Drawn to Scale
>> > drawntoscalehq.com
>> > 727.697.7528
>> >
>> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> > solution. Process, store, query, search, and serve all your data.
>> >
>> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> > Media, and Computer Science
>> >
>> > --
>> > You received this message because you are subscribed to the Google Groups
>> "cascading-user" group.
>> > To post to this group, send email to [hidden email].
>> > To unsubscribe from this group, send email to
>> [hidden email]<cascading-user%[hidden email]>
>> .
>> > For more options, visit this group at
>> http://groups.google.com/group/cascading-user?hl=en.
>> >
>>
>> --
>> Chris K Wensel
>> [hidden email]
>> http://www.concurrentinc.com
>>
>> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading
>>
>>
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Ted Dunning
If there are combiners, the reducers shouldn't get any lists longer than a
small multiple of the number of maps.

On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens <
[hidden email]> wrote:

> One of the problems with this data set is that I'm grouping by a
> category that has only, say, 20 different values. Then I'm doing a
> unique count of Facebook user IDs per group. I imagine that's not
> pleasant for the reducers.
>
> On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[hidden email]> wrote:
> > Hi Bradford,
> >
> > Sometimes the reducers do not handle merging large chunks of data too
> well:
> > How many reducers do you have?  Try to increase the # of reducers (you
> can
> > always merge the data later if you are worried about too many output
> files).
> >
> > --
> > Alex Kozlov
> > Solutions Architect
> > Cloudera, Inc
> > twitter: alexvk2009
> >
> > Hadoop World 2010, October 12, New York City - Register now:
> > http://www.cloudera.com/company/press-center/hadoop-world-nyc/
> >
> >
> > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]>
> wrote:
> >
> >> Try using a lower threshold value (the num of values in the LRU to
> cache).
> >> this is the tradeoff of this approach.
> >>
> >> ckw
> >>
> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
> >>
> >> > Sadly, making Chris's changes didn't help.
> >> >
> >> > Here's the Cascading code, it's pretty simple but uses the new
> >> > "combiner"-like functionality:
> >> >
> >> > http://pastebin.com/ccvDmLSX
> >> >
> >> >
> >> >
> >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]>
> >> wrote:
> >> >> My feeling is that you have some kind of leak going on in your
> mappers
> >> or
> >> >> reducers and that reducing the number of times the jvm is re-used
> would
> >> >> improve matters.
> >> >>
> >> >> GC overhead limit indicates that your (tiny) heap is full and
> collection
> >> is
> >> >> not reducing that.
> >> >>
> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
> >> >> [hidden email]> wrote:
> >> >>
> >> >>> mapred.job.reuse.jvm.num.tasks=50
> >> >>>
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Bradford Stephens,
> >> > Founder, Drawn to Scale
> >> > drawntoscalehq.com
> >> > 727.697.7528
> >> >
> >> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> >> > solution. Process, store, query, search, and serve all your data.
> >> >
> >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
> >> > Media, and Computer Science
> >> >
> >> > --
> >> > You received this message because you are subscribed to the Google
> Groups
> >> "cascading-user" group.
> >> > To post to this group, send email to [hidden email].
> >> > To unsubscribe from this group, send email to
> >> [hidden email]<cascading-user%[hidden email]>
> <cascading-user%[hidden email]<cascading-user%[hidden email]>
> >
> >> .
> >> > For more options, visit this group at
> >> http://groups.google.com/group/cascading-user?hl=en.
> >> >
> >>
> >> --
> >> Chris K Wensel
> >> [hidden email]
> >> http://www.concurrentinc.com
> >>
> >> -- Concurrent, Inc. offers mentoring, support, and licensing for
> Cascading
> >>
> >>
> >
>
>
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
> --
> You received this message because you are subscribed to the Google Groups
> "cascading-user" group.
> To post to this group, send email to [hidden email].
> To unsubscribe from this group, send email to
> [hidden email]<cascading-user%[hidden email]>
> .
> For more options, visit this group at
> http://groups.google.com/group/cascading-user?hl=en.
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Vitaliy Semochkin
In reply to this post by Bradford Stephens
Hi,

"[..]if more than 98% of the total time is spent in garbage collection
and less than 2% of the heap is recovered, an OutOfMemoryError will be
thrown. This feature is designed to prevent applications from running
for an extended period of time while making little or no progress
because the heap is too small. If necessary, this feature can be
disabled by adding the option -XX:-UseGCOverheadLimit to the command
line."

This is what often happens in MapReduce operations when u process a lot of data.
I recommend to try
<property>
  <name>mapred.child.java.opts</name>
   <value>-Xmx1024m -XX:-UseGCOverheadLimit</value>
</property>


also from my personal experience when process a lot of data often it
is much cheaper to kill JVM rather than wait for GC.
For that reason if you have a lot of BIG tasks rather than tons of
small tasks do not reuse JVM, killing JVM and starting it again often
much cheaper than trying to GC 1GB of ram(don't know why, it just
tuned out in my tests).
<property>
  <name>mapred.job.reuse.jvm.num.tasks</name>
  <value>1</value>
</description>

Regards,
Vitaliy S

On Sun, Sep 26, 2010 at 11:55 AM, Bradford Stephens
<[hidden email]> wrote:

> Greetings,
>
> I'm running into a brain-numbing problem on Elastic MapReduce. I'm
> running a decent-size task (22,000 mappers, a ton of GZipped input
> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").
>
> I get failures randomly --- sometimes at the end of my 6-step process,
> sometimes at the first reducer phase, sometimes in the mapper. It
> seems to fail in multiple areas. Mostly in the reducers. Any ideas?
>
> Here's the settings I've changed:
> -Xmx400m
> 6 max mappers
> 1 max reducer
> 1GB swap partition
> mapred.job.reuse.jvm.num.tasks=50
> mapred.reduce.parallel.copies=3
>
>
> java.lang.OutOfMemoryError: GC overhead limit exceeded
>        at java.nio.CharBuffer.wrap(CharBuffer.java:350)
>        at java.nio.CharBuffer.wrap(CharBuffer.java:373)
>        at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
>        at java.lang.StringCoding.decode(StringCoding.java:173)
>        at java.lang.String.(String.java:443)
>        at java.lang.String.(String.java:515)
>        at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
>        at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
>        at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
>        at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
>        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
>        at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
>        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
>        at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
>        at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
>        at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
>        at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
>        at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
>        at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
>        at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
>        at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
>        at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
>        at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)
>
> --
> Bradford Stephens,
> Founder, Drawn to Scale
> drawntoscalehq.com
> 727.697.7528
>
> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
> solution. Process, store, query, search, and serve all your data.
>
> http://www.roadtofailure.com -- The Fringes of Scalability, Social
> Media, and Computer Science
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bradford Stephens
In reply to this post by Ted Dunning
It turned out to be a deployment issue of an old version. Ted and
Chris's suggestions were spot-on.

I can't believe how BRILLIANT these combiners from Cascading are. It's
cut my processing time down from 20 hours to 50 minutes. AND I cut out
about 80% of my hand-crafted code.

Bravo. I look smart now. (Almost).

-B

On Sun, Sep 26, 2010 at 7:00 PM, Ted Dunning <[hidden email]> wrote:

> If there are combiners, the reducers shouldn't get any lists longer than a
> small multiple of the number of maps.
>
> On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens <
> [hidden email]> wrote:
>
>> One of the problems with this data set is that I'm grouping by a
>> category that has only, say, 20 different values. Then I'm doing a
>> unique count of Facebook user IDs per group. I imagine that's not
>> pleasant for the reducers.
>>
>> On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[hidden email]> wrote:
>> > Hi Bradford,
>> >
>> > Sometimes the reducers do not handle merging large chunks of data too
>> well:
>> > How many reducers do you have?  Try to increase the # of reducers (you
>> can
>> > always merge the data later if you are worried about too many output
>> files).
>> >
>> > --
>> > Alex Kozlov
>> > Solutions Architect
>> > Cloudera, Inc
>> > twitter: alexvk2009
>> >
>> > Hadoop World 2010, October 12, New York City - Register now:
>> > http://www.cloudera.com/company/press-center/hadoop-world-nyc/
>> >
>> >
>> > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]>
>> wrote:
>> >
>> >> Try using a lower threshold value (the num of values in the LRU to
>> cache).
>> >> this is the tradeoff of this approach.
>> >>
>> >> ckw
>> >>
>> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote:
>> >>
>> >> > Sadly, making Chris's changes didn't help.
>> >> >
>> >> > Here's the Cascading code, it's pretty simple but uses the new
>> >> > "combiner"-like functionality:
>> >> >
>> >> > http://pastebin.com/ccvDmLSX
>> >> >
>> >> >
>> >> >
>> >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]>
>> >> wrote:
>> >> >> My feeling is that you have some kind of leak going on in your
>> mappers
>> >> or
>> >> >> reducers and that reducing the number of times the jvm is re-used
>> would
>> >> >> improve matters.
>> >> >>
>> >> >> GC overhead limit indicates that your (tiny) heap is full and
>> collection
>> >> is
>> >> >> not reducing that.
>> >> >>
>> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <
>> >> >> [hidden email]> wrote:
>> >> >>
>> >> >>> mapred.job.reuse.jvm.num.tasks=50
>> >> >>>
>> >> >>
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > Bradford Stephens,
>> >> > Founder, Drawn to Scale
>> >> > drawntoscalehq.com
>> >> > 727.697.7528
>> >> >
>> >> > http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> >> > solution. Process, store, query, search, and serve all your data.
>> >> >
>> >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> >> > Media, and Computer Science
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> Groups
>> >> "cascading-user" group.
>> >> > To post to this group, send email to [hidden email].
>> >> > To unsubscribe from this group, send email to
>> >> [hidden email]<cascading-user%[hidden email]>
>> <cascading-user%[hidden email]<cascading-user%[hidden email]>
>> >
>> >> .
>> >> > For more options, visit this group at
>> >> http://groups.google.com/group/cascading-user?hl=en.
>> >> >
>> >>
>> >> --
>> >> Chris K Wensel
>> >> [hidden email]
>> >> http://www.concurrentinc.com
>> >>
>> >> -- Concurrent, Inc. offers mentoring, support, and licensing for
>> Cascading
>> >>
>> >>
>> >
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "cascading-user" group.
>> To post to this group, send email to [hidden email].
>> To unsubscribe from this group, send email to
>> [hidden email]<cascading-user%[hidden email]>
>> .
>> For more options, visit this group at
>> http://groups.google.com/group/cascading-user?hl=en.
>>
>>
>



--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: java.lang.OutOfMemoryError: GC overhead limit exceeded

Bharath Mundlapudi
In reply to this post by Bradford Stephens
Couple of things you can try.
1. Increase the Heap Size for the tasks.

2. Since, your OOM happening randomly, try setting -XX:+HeapDumpOnOutOfMemoryError for your child JVM parameters. Atleast you can detect, why your heap growing -is it due to a leak ? or if you need to increase the heap size for your mappers or reduces from this heap dump analysis.  

3. Other reason is due to poor JVM GC tuning. Sometimes, default can't catchup with the garbage created. This needs some GC tuning.  

-Bharath




From: [hidden email]
To: [hidden email]; [hidden email]
Cc:
Sent: Sunday, September 26, 2010 12:55:15 AM
Subject: java.lang.OutOfMemoryError: GC overhead limit exceeded

Greetings,

I'm running into a brain-numbing problem on Elastic MapReduce. I'm
running a decent-size task (22,000 mappers, a ton of GZipped input
blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores").

I get failures randomly --- sometimes at the end of my 6-step process,
sometimes at the first reducer phase, sometimes in the mapper. It
seems to fail in multiple areas. Mostly in the reducers. Any ideas?

Here's the settings I've changed:
-Xmx400m
6 max mappers
1 max reducer
1GB swap partition
mapred.job.reuse.jvm.num.tasks=50
mapred.reduce.parallel.copies=3


java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.nio.CharBuffer.wrap(CharBuffer.java:350)
    at java.nio.CharBuffer.wrap(CharBuffer.java:373)
    at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138)
    at java.lang.StringCoding.decode(StringCoding.java:173)
    at java.lang.String.(String.java:443)
    at java.lang.String.(String.java:515)
    at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116)
    at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144)
    at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154)
    at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75)
    at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74)
    at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34)
    at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142)
    at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55)
    at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373)
    at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136)
    at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103)
    at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335)
    at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350)
    at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645)
    at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586)

--
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science


     
Loading...