Dump not copied to HDFS | Taking memory dumps of Hadoop tasks

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Dump not copied to HDFS | Taking memory dumps of Hadoop tasks

Akshay Aggarwal
Hey,

I was following the following blog post to copy the dump in case the container goes OOM -

But for some reason the dump is not getting pushed to hdfs, I get the following logs -

--

Log Type: stderr

Log Upload Time: Wed Aug 17 12:01:55 +0530 2016

Log Length: 833

SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/grid/6/yarn/local/usercache/fk-fdp-cdm/filecache/2682/slf4j-log4j12-1.7.7.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/grid/7/yarn/local/filecache/10/mapreduce.tar.gz/hadoop/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
./copy_dump.sh: 2: ./copy_dump.sh: Bad substitution


Log Type: stdout

Log Upload Time: Wed Aug 17 12:01:55 +0530 2016

Log Length: 272

java.lang.OutOfMemoryError: Java heap space
Dumping heap to ./heapdump.hprof ...
Heap dump file created <a href="tel:%5B1906572521" value="+911906572521" target="_blank">[1906572521 bytes in 7.933 secs]
#
# java.lang.OutOfMemoryError: Java heap space
# -XX:OnOutOfMemoryError="./copy_dump.sh"
#   Executing /bin/sh -c "./copy_dump.sh"...
--

copy_dump.sh looks like this -
#!/bin/sh
hadoop fs -copyFromLocal heapdump.hprof /tmp/heapdump_akshay_aggarwal/${PWD//\//_}.hprof

And I've added the following params to my job -
        -files hdfs:///user/fk-fdp-cdm/scripts/copy_dump.sh#copy_dump.sh \
        -archives ${metadata_archive} \
        -D mapred.create.symlink=yes \
        -D mapreduce.reduce.java.opts='-Xmx2048m -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=./heapdump.hprof -XX:OnOutOfMemoryError=./copy_dump.sh' \

Any pointers to what might be wrong here?

Thanks,
Akshay Aggarwal