Random nodemanager crashes SIGSEGV

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Random nodemanager crashes SIGSEGV

Daniel Haviv

Hi,

In the last 24 hours our node managers keep crashing due to SIGSEGV.

The only info I could find was in the hs_err_XXXX.pid files which includes the following java stack:

 

Stack: [0x00007f756a30f000,0x00007f756a410000],  sp=0x00007f756a40dea0,  free space=1019k
Native frames: (J=compiled Java code, j=interpreted, Vv=VM code, C=native code)
C  [libleveldbjni-64-1-5625225739273738004.8+0x2aaac]  leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, unsigned long)+0x7c

Java frames: (J=compiled Java code, j=interpreted, Vv=VM code)
j  org.fusesource.leveldbjni.internal.NativeDB$DBJNI.Put(JLorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fu
sesource/leveldbjni/internal/NativeSlice;)J+0
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeSlice;Lorg/fusesourc
e/leveldbjni/internal/NativeSlice;)V+11
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;Lorg/fusesource/leveldbjni/internal/NativeBuffer;Lorg/fusesour
ce/leveldbjni/internal/NativeBuffer;)V+18
j  org.fusesource.leveldbjni.internal.NativeDB.put(Lorg/fusesource/leveldbjni/internal/NativeWriteOptions;[B[B)V+36
j  org.fusesource.leveldbjni.internal.JniDB.put([B[BLorg/iq80/leveldb/WriteOptions;)Lorg/iq80/leveldb/Snapshot;+28
j  org.fusesource.leveldbjni.internal.JniDB.put([B[B)V+10
j  org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.storeDeletionTask(ILorg/apache/hadoop/yarn/proto/YarnServerNodemanagerRecoveryProtos$De
letionServiceDeleteTaskProto;)V+32
j  org.apache.hadoop.yarn.server.nodemanager.DeletionService.recordDeletionTaskInStateStore(Lorg/apache/hadoop/yarn/server/nodemanager/DeletionService$FileDeletionTask;
)V+245
j  org.apache.hadoop.yarn.server.nodemanager.DeletionService.delete(Ljava/lang/String;Lorg/apache/hadoop/fs/Path;[Lorg/apache/hadoop/fs/Path;)V+44
j  org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService$LocalizerRunner.run()V+271
v  ~StubRoutines::call_stub

The culprit seems to be  [libleveldbjni-64-1-5625225739273738004.8+0x2aaac] leveldb::log::Writer::EmitPhysicalRecord(leveldb::log::RecordType, char const*, unsigned long)+0x7c

 

Any ideas on what that is and how to solve it ? 

 

Thank you.

Daniel