Greetings,
I'm running into a brain-numbing problem on Elastic MapReduce. I'm running a decent-size task (22,000 mappers, a ton of GZipped input blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). I get failures randomly --- sometimes at the end of my 6-step process, sometimes at the first reducer phase, sometimes in the mapper. It seems to fail in multiple areas. Mostly in the reducers. Any ideas? Here's the settings I've changed: -Xmx400m 6 max mappers 1 max reducer 1GB swap partition mapred.job.reuse.jvm.num.tasks=50 mapred.reduce.parallel.copies=3 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.CharBuffer.wrap(CharBuffer.java:350) at java.nio.CharBuffer.wrap(CharBuffer.java:373) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.(String.java:443) at java.lang.String.(String.java:515) at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
I'm going to try running it on high-RAM boxes with -Xmx4096m or so,
see if that helps. On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens <[hidden email]> wrote: > Greetings, > > I'm running into a brain-numbing problem on Elastic MapReduce. I'm > running a decent-size task (22,000 mappers, a ton of GZipped input > blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). > > I get failures randomly --- sometimes at the end of my 6-step process, > sometimes at the first reducer phase, sometimes in the mapper. It > seems to fail in multiple areas. Mostly in the reducers. Any ideas? > > Here's the settings I've changed: > -Xmx400m > 6 max mappers > 1 max reducer > 1GB swap partition > mapred.job.reuse.jvm.num.tasks=50 > mapred.reduce.parallel.copies=3 > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.CharBuffer.wrap(CharBuffer.java:350) > at java.nio.CharBuffer.wrap(CharBuffer.java:373) > at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) > at java.lang.StringCoding.decode(StringCoding.java:173) > at java.lang.String.(String.java:443) > at java.lang.String.(String.java:515) > at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) > at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) > at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) > at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) > at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) > at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) > at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) > at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) > at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) > at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) > at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) > at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) > at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) > at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
Nope, that didn't seem to help.
On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens <[hidden email]> wrote: > I'm going to try running it on high-RAM boxes with -Xmx4096m or so, > see if that helps. > > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens > <[hidden email]> wrote: >> Greetings, >> >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm >> running a decent-size task (22,000 mappers, a ton of GZipped input >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). >> >> I get failures randomly --- sometimes at the end of my 6-step process, >> sometimes at the first reducer phase, sometimes in the mapper. It >> seems to fail in multiple areas. Mostly in the reducers. Any ideas? >> >> Here's the settings I've changed: >> -Xmx400m >> 6 max mappers >> 1 max reducer >> 1GB swap partition >> mapred.job.reuse.jvm.num.tasks=50 >> mapred.reduce.parallel.copies=3 >> >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> at java.nio.CharBuffer.wrap(CharBuffer.java:350) >> at java.nio.CharBuffer.wrap(CharBuffer.java:373) >> at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) >> at java.lang.StringCoding.decode(StringCoding.java:173) >> at java.lang.String.(String.java:443) >> at java.lang.String.(String.java:515) >> at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) >> at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) >> at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) >> at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) >> at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) >> at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) >> at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) >> at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) >> at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) >> at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) >> at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) >> at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) >> at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) >> at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) >> at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) >> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) >> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) >> at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
Have you tried lowering mapred.job.reuse.jvm.num.tasks ?
On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens < [hidden email]> wrote: > Nope, that didn't seem to help. > > On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens > <[hidden email]> wrote: > > I'm going to try running it on high-RAM boxes with -Xmx4096m or so, > > see if that helps. > > > > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens > > <[hidden email]> wrote: > >> Greetings, > >> > >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm > >> running a decent-size task (22,000 mappers, a ton of GZipped input > >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). > >> > >> I get failures randomly --- sometimes at the end of my 6-step process, > >> sometimes at the first reducer phase, sometimes in the mapper. It > >> seems to fail in multiple areas. Mostly in the reducers. Any ideas? > >> > >> Here's the settings I've changed: > >> -Xmx400m > >> 6 max mappers > >> 1 max reducer > >> 1GB swap partition > >> mapred.job.reuse.jvm.num.tasks=50 > >> mapred.reduce.parallel.copies=3 > >> > >> > >> java.lang.OutOfMemoryError: GC overhead limit exceeded > >> at java.nio.CharBuffer.wrap(CharBuffer.java:350) > >> at java.nio.CharBuffer.wrap(CharBuffer.java:373) > >> at > java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) > >> at java.lang.StringCoding.decode(StringCoding.java:173) > >> at java.lang.String.(String.java:443) > >> at java.lang.String.(String.java:515) > >> at > org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) > >> at > cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) > >> at > cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) > >> at > cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) > >> at > cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) > >> at > cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) > >> at > cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) > >> at > cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) > >> at > cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) > >> at > cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) > >> at > org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) > >> at > org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) > >> at > org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > >> at > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) > >> at > org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > >> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > >> at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) > >> at > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) > >> > >> -- > >> Bradford Stephens, > >> Founder, Drawn to Scale > >> drawntoscalehq.com > >> 727.697.7528 > >> > >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > >> solution. Process, store, query, search, and serve all your data. > >> > >> http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> Media, and Computer Science > >> > > > > > > > > -- > > Bradford Stephens, > > Founder, Drawn to Scale > > drawntoscalehq.com > > 727.697.7528 > > > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > > solution. Process, store, query, search, and serve all your data. > > > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > > Media, and Computer Science > > > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > |
In reply to this post by Bradford Stephens
fwiw
I run m2.xlarge slaves, using the default mappers/reducers (4/2 i think). with swap --bootstrap-action s3://elasticmapreduce/bootstrap-actions/create-swap-file.rb --args "-E,/mnt/swap,1000" historically i'v run this property with no issues, but should probably re-research the gc setting (comments please) "mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC" i haven't co-installed ganglia to look at utilization lately, but any more mappers than 4 or more than 2 reducers have always given me headaches. ckw On Sep 26, 2010, at 12:55 AM, Bradford Stephens wrote: > Greetings, > > I'm running into a brain-numbing problem on Elastic MapReduce. I'm > running a decent-size task (22,000 mappers, a ton of GZipped input > blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). > > I get failures randomly --- sometimes at the end of my 6-step process, > sometimes at the first reducer phase, sometimes in the mapper. It > seems to fail in multiple areas. Mostly in the reducers. Any ideas? > > Here's the settings I've changed: > -Xmx400m > 6 max mappers > 1 max reducer > 1GB swap partition > mapred.job.reuse.jvm.num.tasks=50 > mapred.reduce.parallel.copies=3 > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.CharBuffer.wrap(CharBuffer.java:350) > at java.nio.CharBuffer.wrap(CharBuffer.java:373) > at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) > at java.lang.StringCoding.decode(StringCoding.java:173) > at java.lang.String.(String.java:443) > at java.lang.String.(String.java:515) > at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) > at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) > at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) > at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) > at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) > at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) > at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) > at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) > at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) > at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) > at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) > at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) > at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) > at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > > -- > You received this message because you are subscribed to the Google Groups "cascading-user" group. > To post to this group, send email to [hidden email]. > To unsubscribe from this group, send email to [hidden email]. > For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en. > -- Chris K Wensel [hidden email] http://www.concurrentinc.com -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading |
The old GC routinely gives me lower performance than modern GC. The default
is now quite good for batch programs. On Sun, Sep 26, 2010 at 8:10 AM, Chris K Wensel <[hidden email]> wrote: > historically i'v run this property with no issues, but should probably > re-research the gc setting (comments please) > "mapred.child.java.opts", "-server -Xmx2000m -XX:+UseParallelOldGC" > |
In reply to this post by Bradford Stephens
My feeling is that you have some kind of leak going on in your mappers or
reducers and that reducing the number of times the jvm is re-used would improve matters. GC overhead limit indicates that your (tiny) heap is full and collection is not reducing that. On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < [hidden email]> wrote: > mapred.job.reuse.jvm.num.tasks=50 > |
In reply to this post by Ted Yu-3
Hrm.... no. I've lowered it to -1, but I can try 1 again.
On Sun, Sep 26, 2010 at 6:47 AM, Ted Yu <[hidden email]> wrote: > Have you tried lowering mapred.job.reuse.jvm.num.tasks ? > > On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens < > [hidden email]> wrote: > >> Nope, that didn't seem to help. >> >> On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens >> <[hidden email]> wrote: >> > I'm going to try running it on high-RAM boxes with -Xmx4096m or so, >> > see if that helps. >> > >> > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens >> > <[hidden email]> wrote: >> >> Greetings, >> >> >> >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm >> >> running a decent-size task (22,000 mappers, a ton of GZipped input >> >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). >> >> >> >> I get failures randomly --- sometimes at the end of my 6-step process, >> >> sometimes at the first reducer phase, sometimes in the mapper. It >> >> seems to fail in multiple areas. Mostly in the reducers. Any ideas? >> >> >> >> Here's the settings I've changed: >> >> -Xmx400m >> >> 6 max mappers >> >> 1 max reducer >> >> 1GB swap partition >> >> mapred.job.reuse.jvm.num.tasks=50 >> >> mapred.reduce.parallel.copies=3 >> >> >> >> >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded >> >> at java.nio.CharBuffer.wrap(CharBuffer.java:350) >> >> at java.nio.CharBuffer.wrap(CharBuffer.java:373) >> >> at >> java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) >> >> at java.lang.StringCoding.decode(StringCoding.java:173) >> >> at java.lang.String.(String.java:443) >> >> at java.lang.String.(String.java:515) >> >> at >> org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) >> >> at >> cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) >> >> at >> cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) >> >> at >> cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) >> >> at >> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) >> >> at >> cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) >> >> at >> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) >> >> at >> cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) >> >> at >> cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) >> >> at >> cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) >> >> at >> org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) >> >> at >> org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) >> >> at >> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) >> >> at >> org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) >> >> at >> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) >> >> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) >> >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) >> >> at >> org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) >> >> >> >> -- >> >> Bradford Stephens, >> >> Founder, Drawn to Scale >> >> drawntoscalehq.com >> >> 727.697.7528 >> >> >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> >> solution. Process, store, query, search, and serve all your data. >> >> >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> >> Media, and Computer Science >> >> >> > >> > >> > >> > -- >> > Bradford Stephens, >> > Founder, Drawn to Scale >> > drawntoscalehq.com >> > 727.697.7528 >> > >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> > solution. Process, store, query, search, and serve all your data. >> > >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social >> > Media, and Computer Science >> > >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
-1 means there is no limit to reusing.
At the same time, you can generate heap dump from OOME and analyze with YourKit, etc. Cheers On Sun, Sep 26, 2010 at 1:19 PM, Bradford Stephens < [hidden email]> wrote: > Hrm.... no. I've lowered it to -1, but I can try 1 again. > > On Sun, Sep 26, 2010 at 6:47 AM, Ted Yu <[hidden email]> wrote: > > Have you tried lowering mapred.job.reuse.jvm.num.tasks ? > > > > On Sun, Sep 26, 2010 at 3:30 AM, Bradford Stephens < > > [hidden email]> wrote: > > > >> Nope, that didn't seem to help. > >> > >> On Sun, Sep 26, 2010 at 1:00 AM, Bradford Stephens > >> <[hidden email]> wrote: > >> > I'm going to try running it on high-RAM boxes with -Xmx4096m or so, > >> > see if that helps. > >> > > >> > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens > >> > <[hidden email]> wrote: > >> >> Greetings, > >> >> > >> >> I'm running into a brain-numbing problem on Elastic MapReduce. I'm > >> >> running a decent-size task (22,000 mappers, a ton of GZipped input > >> >> blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). > >> >> > >> >> I get failures randomly --- sometimes at the end of my 6-step > process, > >> >> sometimes at the first reducer phase, sometimes in the mapper. It > >> >> seems to fail in multiple areas. Mostly in the reducers. Any ideas? > >> >> > >> >> Here's the settings I've changed: > >> >> -Xmx400m > >> >> 6 max mappers > >> >> 1 max reducer > >> >> 1GB swap partition > >> >> mapred.job.reuse.jvm.num.tasks=50 > >> >> mapred.reduce.parallel.copies=3 > >> >> > >> >> > >> >> java.lang.OutOfMemoryError: GC overhead limit exceeded > >> >> at java.nio.CharBuffer.wrap(CharBuffer.java:350) > >> >> at java.nio.CharBuffer.wrap(CharBuffer.java:373) > >> >> at > >> java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) > >> >> at java.lang.StringCoding.decode(StringCoding.java:173) > >> >> at java.lang.String.(String.java:443) > >> >> at java.lang.String.(String.java:515) > >> >> at > >> org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) > >> >> at > >> cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) > >> >> at > >> cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) > >> >> at > >> > cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) > >> >> at > >> > cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) > >> >> at > >> > cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) > >> >> at > >> > cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) > >> >> at > >> > cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) > >> >> at > >> > cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) > >> >> at > >> > cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) > >> >> at > >> org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) > >> >> at > >> org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) > >> >> at > >> org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > >> >> at > >> > org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) > >> >> at > >> org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > >> >> at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > >> >> at > >> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) > >> >> at > >> > org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) > >> >> > >> >> -- > >> >> Bradford Stephens, > >> >> Founder, Drawn to Scale > >> >> drawntoscalehq.com > >> >> 727.697.7528 > >> >> > >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > >> >> solution. Process, store, query, search, and serve all your data. > >> >> > >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> >> Media, and Computer Science > >> >> > >> > > >> > > >> > > >> > -- > >> > Bradford Stephens, > >> > Founder, Drawn to Scale > >> > drawntoscalehq.com > >> > 727.697.7528 > >> > > >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > >> > solution. Process, store, query, search, and serve all your data. > >> > > >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> > Media, and Computer Science > >> > > >> > >> > >> > >> -- > >> Bradford Stephens, > >> Founder, Drawn to Scale > >> drawntoscalehq.com > >> 727.697.7528 > >> > >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > >> solution. Process, store, query, search, and serve all your data. > >> > >> http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> Media, and Computer Science > >> > > > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > |
In reply to this post by Ted Dunning
Sadly, making Chris's changes didn't help.
Here's the Cascading code, it's pretty simple but uses the new "combiner"-like functionality: http://pastebin.com/ccvDmLSX On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> wrote: > My feeling is that you have some kind of leak going on in your mappers or > reducers and that reducing the number of times the jvm is re-used would > improve matters. > > GC overhead limit indicates that your (tiny) heap is full and collection is > not reducing that. > > On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < > [hidden email]> wrote: > >> mapred.job.reuse.jvm.num.tasks=50 >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach.
ckw On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: > Sadly, making Chris's changes didn't help. > > Here's the Cascading code, it's pretty simple but uses the new > "combiner"-like functionality: > > http://pastebin.com/ccvDmLSX > > > > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> wrote: >> My feeling is that you have some kind of leak going on in your mappers or >> reducers and that reducing the number of times the jvm is re-used would >> improve matters. >> >> GC overhead limit indicates that your (tiny) heap is full and collection is >> not reducing that. >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < >> [hidden email]> wrote: >> >>> mapred.job.reuse.jvm.num.tasks=50 >>> >> > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > > -- > You received this message because you are subscribed to the Google Groups "cascading-user" group. > To post to this group, send email to [hidden email]. > To unsubscribe from this group, send email to [hidden email]. > For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en. > -- Chris K Wensel [hidden email] http://www.concurrentinc.com -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading |
Yup, I've turned it down to 1,000. Let's see if that helps!
On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> wrote: > Try using a lower threshold value (the num of values in the LRU to cache). this is the tradeoff of this approach. > > ckw > > On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: > >> Sadly, making Chris's changes didn't help. >> >> Here's the Cascading code, it's pretty simple but uses the new >> "combiner"-like functionality: >> >> http://pastebin.com/ccvDmLSX >> >> >> >> On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> wrote: >>> My feeling is that you have some kind of leak going on in your mappers or >>> reducers and that reducing the number of times the jvm is re-used would >>> improve matters. >>> >>> GC overhead limit indicates that your (tiny) heap is full and collection is >>> not reducing that. >>> >>> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < >>> [hidden email]> wrote: >>> >>>> mapred.job.reuse.jvm.num.tasks=50 >>>> >>> >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> >> -- >> You received this message because you are subscribed to the Google Groups "cascading-user" group. >> To post to this group, send email to [hidden email]. >> To unsubscribe from this group, send email to [hidden email]. >> For more options, visit this group at http://groups.google.com/group/cascading-user?hl=en. >> > > -- > Chris K Wensel > [hidden email] > http://www.concurrentinc.com > > -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading > > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
In reply to this post by Chris K Wensel-2
Hi Bradford,
Sometimes the reducers do not handle merging large chunks of data too well: How many reducers do you have? Try to increase the # of reducers (you can always merge the data later if you are worried about too many output files). -- Alex Kozlov Solutions Architect Cloudera, Inc twitter: alexvk2009 Hadoop World 2010, October 12, New York City - Register now: http://www.cloudera.com/company/press-center/hadoop-world-nyc/ On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> wrote: > Try using a lower threshold value (the num of values in the LRU to cache). > this is the tradeoff of this approach. > > ckw > > On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: > > > Sadly, making Chris's changes didn't help. > > > > Here's the Cascading code, it's pretty simple but uses the new > > "combiner"-like functionality: > > > > http://pastebin.com/ccvDmLSX > > > > > > > > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> > wrote: > >> My feeling is that you have some kind of leak going on in your mappers > or > >> reducers and that reducing the number of times the jvm is re-used would > >> improve matters. > >> > >> GC overhead limit indicates that your (tiny) heap is full and collection > is > >> not reducing that. > >> > >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < > >> [hidden email]> wrote: > >> > >>> mapred.job.reuse.jvm.num.tasks=50 > >>> > >> > > > > > > > > -- > > Bradford Stephens, > > Founder, Drawn to Scale > > drawntoscalehq.com > > 727.697.7528 > > > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > > solution. Process, store, query, search, and serve all your data. > > > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > > Media, and Computer Science > > > > -- > > You received this message because you are subscribed to the Google Groups > "cascading-user" group. > > To post to this group, send email to [hidden email]. > > To unsubscribe from this group, send email to > [hidden email]<cascading-user%[hidden email]> > . > > For more options, visit this group at > http://groups.google.com/group/cascading-user?hl=en. > > > > -- > Chris K Wensel > [hidden email] > http://www.concurrentinc.com > > -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading > > |
One of the problems with this data set is that I'm grouping by a
category that has only, say, 20 different values. Then I'm doing a unique count of Facebook user IDs per group. I imagine that's not pleasant for the reducers. On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[hidden email]> wrote: > Hi Bradford, > > Sometimes the reducers do not handle merging large chunks of data too well: > How many reducers do you have? Try to increase the # of reducers (you can > always merge the data later if you are worried about too many output files). > > -- > Alex Kozlov > Solutions Architect > Cloudera, Inc > twitter: alexvk2009 > > Hadoop World 2010, October 12, New York City - Register now: > http://www.cloudera.com/company/press-center/hadoop-world-nyc/ > > > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> wrote: > >> Try using a lower threshold value (the num of values in the LRU to cache). >> this is the tradeoff of this approach. >> >> ckw >> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: >> >> > Sadly, making Chris's changes didn't help. >> > >> > Here's the Cascading code, it's pretty simple but uses the new >> > "combiner"-like functionality: >> > >> > http://pastebin.com/ccvDmLSX >> > >> > >> > >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> >> wrote: >> >> My feeling is that you have some kind of leak going on in your mappers >> or >> >> reducers and that reducing the number of times the jvm is re-used would >> >> improve matters. >> >> >> >> GC overhead limit indicates that your (tiny) heap is full and collection >> is >> >> not reducing that. >> >> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < >> >> [hidden email]> wrote: >> >> >> >>> mapred.job.reuse.jvm.num.tasks=50 >> >>> >> >> >> > >> > >> > >> > -- >> > Bradford Stephens, >> > Founder, Drawn to Scale >> > drawntoscalehq.com >> > 727.697.7528 >> > >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> > solution. Process, store, query, search, and serve all your data. >> > >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social >> > Media, and Computer Science >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> "cascading-user" group. >> > To post to this group, send email to [hidden email]. >> > To unsubscribe from this group, send email to >> [hidden email]<cascading-user%[hidden email]> >> . >> > For more options, visit this group at >> http://groups.google.com/group/cascading-user?hl=en. >> > >> >> -- >> Chris K Wensel >> [hidden email] >> http://www.concurrentinc.com >> >> -- Concurrent, Inc. offers mentoring, support, and licensing for Cascading >> >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
If there are combiners, the reducers shouldn't get any lists longer than a
small multiple of the number of maps. On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens < [hidden email]> wrote: > One of the problems with this data set is that I'm grouping by a > category that has only, say, 20 different values. Then I'm doing a > unique count of Facebook user IDs per group. I imagine that's not > pleasant for the reducers. > > On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[hidden email]> wrote: > > Hi Bradford, > > > > Sometimes the reducers do not handle merging large chunks of data too > well: > > How many reducers do you have? Try to increase the # of reducers (you > can > > always merge the data later if you are worried about too many output > files). > > > > -- > > Alex Kozlov > > Solutions Architect > > Cloudera, Inc > > twitter: alexvk2009 > > > > Hadoop World 2010, October 12, New York City - Register now: > > http://www.cloudera.com/company/press-center/hadoop-world-nyc/ > > > > > > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> > wrote: > > > >> Try using a lower threshold value (the num of values in the LRU to > cache). > >> this is the tradeoff of this approach. > >> > >> ckw > >> > >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: > >> > >> > Sadly, making Chris's changes didn't help. > >> > > >> > Here's the Cascading code, it's pretty simple but uses the new > >> > "combiner"-like functionality: > >> > > >> > http://pastebin.com/ccvDmLSX > >> > > >> > > >> > > >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> > >> wrote: > >> >> My feeling is that you have some kind of leak going on in your > mappers > >> or > >> >> reducers and that reducing the number of times the jvm is re-used > would > >> >> improve matters. > >> >> > >> >> GC overhead limit indicates that your (tiny) heap is full and > collection > >> is > >> >> not reducing that. > >> >> > >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < > >> >> [hidden email]> wrote: > >> >> > >> >>> mapred.job.reuse.jvm.num.tasks=50 > >> >>> > >> >> > >> > > >> > > >> > > >> > -- > >> > Bradford Stephens, > >> > Founder, Drawn to Scale > >> > drawntoscalehq.com > >> > 727.697.7528 > >> > > >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > >> > solution. Process, store, query, search, and serve all your data. > >> > > >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social > >> > Media, and Computer Science > >> > > >> > -- > >> > You received this message because you are subscribed to the Google > Groups > >> "cascading-user" group. > >> > To post to this group, send email to [hidden email]. > >> > To unsubscribe from this group, send email to > >> [hidden email]<cascading-user%[hidden email]> > <cascading-user%[hidden email]<cascading-user%[hidden email]> > > > >> . > >> > For more options, visit this group at > >> http://groups.google.com/group/cascading-user?hl=en. > >> > > >> > >> -- > >> Chris K Wensel > >> [hidden email] > >> http://www.concurrentinc.com > >> > >> -- Concurrent, Inc. offers mentoring, support, and licensing for > Cascading > >> > >> > > > > > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > > -- > You received this message because you are subscribed to the Google Groups > "cascading-user" group. > To post to this group, send email to [hidden email]. > To unsubscribe from this group, send email to > [hidden email]<cascading-user%[hidden email]> > . > For more options, visit this group at > http://groups.google.com/group/cascading-user?hl=en. > > |
In reply to this post by Bradford Stephens
Hi,
"[..]if more than 98% of the total time is spent in garbage collection and less than 2% of the heap is recovered, an OutOfMemoryError will be thrown. This feature is designed to prevent applications from running for an extended period of time while making little or no progress because the heap is too small. If necessary, this feature can be disabled by adding the option -XX:-UseGCOverheadLimit to the command line." This is what often happens in MapReduce operations when u process a lot of data. I recommend to try <property> <name>mapred.child.java.opts</name> <value>-Xmx1024m -XX:-UseGCOverheadLimit</value> </property> also from my personal experience when process a lot of data often it is much cheaper to kill JVM rather than wait for GC. For that reason if you have a lot of BIG tasks rather than tons of small tasks do not reuse JVM, killing JVM and starting it again often much cheaper than trying to GC 1GB of ram(don't know why, it just tuned out in my tests). <property> <name>mapred.job.reuse.jvm.num.tasks</name> <value>1</value> </description> Regards, Vitaliy S On Sun, Sep 26, 2010 at 11:55 AM, Bradford Stephens <[hidden email]> wrote: > Greetings, > > I'm running into a brain-numbing problem on Elastic MapReduce. I'm > running a decent-size task (22,000 mappers, a ton of GZipped input > blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). > > I get failures randomly --- sometimes at the end of my 6-step process, > sometimes at the first reducer phase, sometimes in the mapper. It > seems to fail in multiple areas. Mostly in the reducers. Any ideas? > > Here's the settings I've changed: > -Xmx400m > 6 max mappers > 1 max reducer > 1GB swap partition > mapred.job.reuse.jvm.num.tasks=50 > mapred.reduce.parallel.copies=3 > > > java.lang.OutOfMemoryError: GC overhead limit exceeded > at java.nio.CharBuffer.wrap(CharBuffer.java:350) > at java.nio.CharBuffer.wrap(CharBuffer.java:373) > at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) > at java.lang.StringCoding.decode(StringCoding.java:173) > at java.lang.String.(String.java:443) > at java.lang.String.(String.java:515) > at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) > at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) > at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) > at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) > at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) > at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) > at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) > at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) > at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) > at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) > at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) > at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) > at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) > at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) > at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) > at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) > at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) > > -- > Bradford Stephens, > Founder, Drawn to Scale > drawntoscalehq.com > 727.697.7528 > > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data > solution. Process, store, query, search, and serve all your data. > > http://www.roadtofailure.com -- The Fringes of Scalability, Social > Media, and Computer Science > |
In reply to this post by Ted Dunning
It turned out to be a deployment issue of an old version. Ted and
Chris's suggestions were spot-on. I can't believe how BRILLIANT these combiners from Cascading are. It's cut my processing time down from 20 hours to 50 minutes. AND I cut out about 80% of my hand-crafted code. Bravo. I look smart now. (Almost). -B On Sun, Sep 26, 2010 at 7:00 PM, Ted Dunning <[hidden email]> wrote: > If there are combiners, the reducers shouldn't get any lists longer than a > small multiple of the number of maps. > > On Sun, Sep 26, 2010 at 6:01 PM, Bradford Stephens < > [hidden email]> wrote: > >> One of the problems with this data set is that I'm grouping by a >> category that has only, say, 20 different values. Then I'm doing a >> unique count of Facebook user IDs per group. I imagine that's not >> pleasant for the reducers. >> >> On Sun, Sep 26, 2010 at 5:41 PM, Alex Kozlov <[hidden email]> wrote: >> > Hi Bradford, >> > >> > Sometimes the reducers do not handle merging large chunks of data too >> well: >> > How many reducers do you have? Try to increase the # of reducers (you >> can >> > always merge the data later if you are worried about too many output >> files). >> > >> > -- >> > Alex Kozlov >> > Solutions Architect >> > Cloudera, Inc >> > twitter: alexvk2009 >> > >> > Hadoop World 2010, October 12, New York City - Register now: >> > http://www.cloudera.com/company/press-center/hadoop-world-nyc/ >> > >> > >> > On Sun, Sep 26, 2010 at 5:09 PM, Chris K Wensel <[hidden email]> >> wrote: >> > >> >> Try using a lower threshold value (the num of values in the LRU to >> cache). >> >> this is the tradeoff of this approach. >> >> >> >> ckw >> >> >> >> On Sep 26, 2010, at 4:46 PM, Bradford Stephens wrote: >> >> >> >> > Sadly, making Chris's changes didn't help. >> >> > >> >> > Here's the Cascading code, it's pretty simple but uses the new >> >> > "combiner"-like functionality: >> >> > >> >> > http://pastebin.com/ccvDmLSX >> >> > >> >> > >> >> > >> >> > On Sun, Sep 26, 2010 at 9:37 AM, Ted Dunning <[hidden email]> >> >> wrote: >> >> >> My feeling is that you have some kind of leak going on in your >> mappers >> >> or >> >> >> reducers and that reducing the number of times the jvm is re-used >> would >> >> >> improve matters. >> >> >> >> >> >> GC overhead limit indicates that your (tiny) heap is full and >> collection >> >> is >> >> >> not reducing that. >> >> >> >> >> >> On Sun, Sep 26, 2010 at 12:55 AM, Bradford Stephens < >> >> >> [hidden email]> wrote: >> >> >> >> >> >>> mapred.job.reuse.jvm.num.tasks=50 >> >> >>> >> >> >> >> >> > >> >> > >> >> > >> >> > -- >> >> > Bradford Stephens, >> >> > Founder, Drawn to Scale >> >> > drawntoscalehq.com >> >> > 727.697.7528 >> >> > >> >> > http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> >> > solution. Process, store, query, search, and serve all your data. >> >> > >> >> > http://www.roadtofailure.com -- The Fringes of Scalability, Social >> >> > Media, and Computer Science >> >> > >> >> > -- >> >> > You received this message because you are subscribed to the Google >> Groups >> >> "cascading-user" group. >> >> > To post to this group, send email to [hidden email]. >> >> > To unsubscribe from this group, send email to >> >> [hidden email]<cascading-user%[hidden email]> >> <cascading-user%[hidden email]<cascading-user%[hidden email]> >> > >> >> . >> >> > For more options, visit this group at >> >> http://groups.google.com/group/cascading-user?hl=en. >> >> > >> >> >> >> -- >> >> Chris K Wensel >> >> [hidden email] >> >> http://www.concurrentinc.com >> >> >> >> -- Concurrent, Inc. offers mentoring, support, and licensing for >> Cascading >> >> >> >> >> > >> >> >> >> -- >> Bradford Stephens, >> Founder, Drawn to Scale >> drawntoscalehq.com >> 727.697.7528 >> >> http://www.drawntoscalehq.com -- The intuitive, cloud-scale data >> solution. Process, store, query, search, and serve all your data. >> >> http://www.roadtofailure.com -- The Fringes of Scalability, Social >> Media, and Computer Science >> >> -- >> You received this message because you are subscribed to the Google Groups >> "cascading-user" group. >> To post to this group, send email to [hidden email]. >> To unsubscribe from this group, send email to >> [hidden email]<cascading-user%[hidden email]> >> . >> For more options, visit this group at >> http://groups.google.com/group/cascading-user?hl=en. >> >> > -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
In reply to this post by Bradford Stephens
Couple of things you can try.
1. Increase the Heap Size for the tasks. 2. Since, your OOM happening randomly, try setting -XX:+HeapDumpOnOutOfMemoryError for your child JVM parameters. Atleast you can detect, why your heap growing -is it due to a leak ? or if you need to increase the heap size for your mappers or reduces from this heap dump analysis. 3. Other reason is due to poor JVM GC tuning. Sometimes, default can't catchup with the garbage created. This needs some GC tuning. -Bharath From: [hidden email] To: [hidden email]; [hidden email] Cc: Sent: Sunday, September 26, 2010 12:55:15 AM Subject: java.lang.OutOfMemoryError: GC overhead limit exceeded Greetings, I'm running into a brain-numbing problem on Elastic MapReduce. I'm running a decent-size task (22,000 mappers, a ton of GZipped input blocks, ~1TB of data) on 40 c1.xlarge nodes (7 gb RAM, ~8 "cores"). I get failures randomly --- sometimes at the end of my 6-step process, sometimes at the first reducer phase, sometimes in the mapper. It seems to fail in multiple areas. Mostly in the reducers. Any ideas? Here's the settings I've changed: -Xmx400m 6 max mappers 1 max reducer 1GB swap partition mapred.job.reuse.jvm.num.tasks=50 mapred.reduce.parallel.copies=3 java.lang.OutOfMemoryError: GC overhead limit exceeded at java.nio.CharBuffer.wrap(CharBuffer.java:350) at java.nio.CharBuffer.wrap(CharBuffer.java:373) at java.lang.StringCoding$StringDecoder.decode(StringCoding.java:138) at java.lang.StringCoding.decode(StringCoding.java:173) at java.lang.String.(String.java:443) at java.lang.String.(String.java:515) at org.apache.hadoop.io.WritableUtils.readString(WritableUtils.java:116) at cascading.tuple.TupleInputStream.readString(TupleInputStream.java:144) at cascading.tuple.TupleInputStream.readType(TupleInputStream.java:154) at cascading.tuple.TupleInputStream.getNextElement(TupleInputStream.java:101) at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:75) at cascading.tuple.hadoop.TupleElementComparator.compare(TupleElementComparator.java:33) at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:74) at cascading.tuple.hadoop.DelegatingTupleElementComparator.compare(DelegatingTupleElementComparator.java:34) at cascading.tuple.hadoop.DeserializerComparator.compareTuples(DeserializerComparator.java:142) at cascading.tuple.hadoop.GroupingSortingComparator.compare(GroupingSortingComparator.java:55) at org.apache.hadoop.mapred.Merger$MergeQueue.lessThan(Merger.java:373) at org.apache.hadoop.util.PriorityQueue.downHeap(PriorityQueue.java:136) at org.apache.hadoop.util.PriorityQueue.adjustTop(PriorityQueue.java:103) at org.apache.hadoop.mapred.Merger$MergeQueue.adjustPriorityQueue(Merger.java:335) at org.apache.hadoop.mapred.Merger$MergeQueue.next(Merger.java:350) at org.apache.hadoop.mapred.Merger.writeFile(Merger.java:156) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.doInMemMerge(ReduceTask.java:2645) at org.apache.hadoop.mapred.ReduceTask$ReduceCopier$InMemFSMergeThread.run(ReduceTask.java:2586) -- Bradford Stephens, Founder, Drawn to Scale drawntoscalehq.com 727.697.7528 http://www.drawntoscalehq.com -- The intuitive, cloud-scale data solution. Process, store, query, search, and serve all your data. http://www.roadtofailure.com -- The Fringes of Scalability, Social Media, and Computer Science |
Free forum by Nabble | Edit this page |