Running a HA HDFS cluster on alpine linux

classic Classic list List threaded Threaded
2 messages Options
F21
Reply | Threaded
Open this post in threaded view
|

Running a HA HDFS cluster on alpine linux

F21
Hi all,

I am currently experimenting with running a HA HDFS cluster in docker
containers. I have successfully created an HA cluster using Ubuntu as my
base image for running the namenode, datanode and journalnodes. The
zookeeper instance runs on an image built using Alpine linux as the base
and works pretty well.

I attempted to get the namenode, datanode and journalnodes running using
Alpine linux as the base image. The datanode and journalnodes seem to
work fine. However, while the namenodes start correctly, they seem to
disconnect from Zookeeper quite often and will transition into neutral
mode. This results in the namenodes being in "startup mode" forever.

These are the logs from the active namenode:
16/08/25 22:40:28 INFO blockmanagement.CacheReplicationMonitor: Starting
CacheReplicationMonitor with interval 30000 milliseconds
16/08/25 22:40:28 INFO fs.TrashPolicyDefault: Namenode trash
configuration: Deletion interval = 1440 minutes, Emptier interval = 0
minutes.
16/08/25 22:40:28 INFO fs.TrashPolicyDefault: The configured checkpoint
interval is 0 minutes. Using an interval of 1440 minutes that is used
for deletion instead
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Total number of
blocks            = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of invalid
blocks          = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of
under-replicated blocks = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of
over-replicated blocks = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of blocks
being written    = 0
16/08/25 22:40:28 INFO hdfs.StateChange: STATE* Replication Queue
initialization scan for invalid, over- and under-replicated blocks
completed in 687 msec
16/08/25 22:40:28 INFO ha.ZKFailoverController: Successfully
transitioned NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state
16/08/25 22:40:30 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
mkdir: `/tmp': File exists
16/08/25 22:40:32 WARN util.NativeCodeLoader: Unable to load
native-hadoop library for your platform... using builtin-java classes
where applicable
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command getfileinfo is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command mkdirs is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command listStatus is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command * is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command setPermission is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command getfileinfo is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command mkdirs is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command listStatus is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command * is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command setPermission is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command getfileinfo is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command mkdirs is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command listStatus is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command * is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for
command setPermission is: 1
16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional
data from server sessionid 0x156c3dc679a0003, likely server has closed
socket, closing socket connection and attempting reconnect
16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected.
Entering neutral mode...
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection
to server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to
authenticate using SASL (unknown error)
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection
established to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment
complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid
= 0x156c3dc679a0003, negotiated timeout = 5000
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Checking for any old
active which needs to be fenced...
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Old node exists:
0a036e733112036e6e311a136d3965646435312d6e6e312e6d39656464353120d43e28d33e
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: But old node has our own
data, so don't need to fence it.
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Writing znode
/mycluster/ns1/ActiveBreadCrumb to indicate that the local node is the
most recent active...
16/08/25 22:41:41 INFO ha.ZKFailoverController: Trying to make NameNode
at m9edd51-nn1.m9edd51/172.18.0.7:8020 active...
16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully
transitioned NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command getfileinfo is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command mkdirs is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command listStatus is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command * is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command setPermission is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command getfileinfo is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command mkdirs is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command listStatus is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command * is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command setPermission is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command getfileinfo is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command mkdirs is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command listStatus is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command * is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for
command setPermission is: 1
16/08/25 22:42:28 INFO namenode.FSNamesystem: Roll Edit Log from 172.18.0.8
16/08/25 22:42:28 INFO namenode.FSEditLog: Rolling edit logs
16/08/25 22:42:28 INFO namenode.FSEditLog: Ending log segment 1
16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4
Total time for transactions(ms): 16 Number of transactions batched in
Syncs: 0 Number of syncs: 3 SyncTimes(ms): 40 17
16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4
Total time for transactions(ms): 16 Number of transactions batched in
Syncs: 0 Number of syncs: 4 SyncTimes(ms): 50 18
16/08/25 22:42:28 INFO namenode.FileJournalManager: Finalizing edits
file /var/lib/hadoop/name/current/edits_inprogress_0000000000000000001
->
/var/lib/hadoop/name/current/edits_0000000000000000001-0000000000000000004
16/08/25 22:42:28 INFO namenode.FSEditLog: Starting log segment at 5

 From the standby namenode:
16/08/25 22:40:33 INFO ha.HealthMonitor: Entering state SERVICE_HEALTHY
16/08/25 22:40:33 INFO ha.ZKFailoverController: Local service NameNode
at m9edd51-nn2.m9edd51/172.18.0.8:8020 entered state: SERVICE_HEALTHY
16/08/25 22:40:33 INFO ha.ZKFailoverController: ZK Election indicated
that NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby
16/08/25 22:40:33 INFO ha.ZKFailoverController: Successfully
transitioned NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby
state
16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional
data from server sessionid 0x156c3dc679a0004, likely server has closed
socket, closing socket connection and attempting reconnect
16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected.
Entering neutral mode...
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection
to server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to
authenticate using SASL (unknown error)
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection
established to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment
complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid
= 0x156c3dc679a0004, negotiated timeout = 5000
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
16/08/25 22:41:41 INFO ha.ZKFailoverController: ZK Election indicated
that NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby
16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully
transitioned NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby
state
16/08/25 22:42:28 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:42:28 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3b1f16f0
expecting start txid #1
16/08/25 22:42:28 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 1
16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 1
16/08/25 22:42:29 INFO namenode.FSImage: Edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 133 edits # 4 loaded in 0 seconds
16/08/25 22:42:29 INFO ha.EditLogTailer: Loaded 4 edits starting from txid 0
16/08/25 22:44:29 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:44:29 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c92162
expecting start txid #5
16/08/25 22:44:29 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 5
16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 5
16/08/25 22:44:29 INFO namenode.FSImage: Edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:44:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 4
16/08/25 22:46:29 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:46:29 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1f890896
expecting start txid #7
16/08/25 22:46:29 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 7
16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 7
16/08/25 22:46:29 INFO namenode.FSImage: Edits file
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:46:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 6
16/08/25 22:48:29 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:48:29 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4da07bde
expecting start txid #9
16/08/25 22:48:29 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 9
16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 9
16/08/25 22:48:30 INFO namenode.FSImage: Edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:48:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 8
16/08/25 22:50:30 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:50:30 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@350cea58
expecting start txid #11
16/08/25 22:50:30 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 11
16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 11
16/08/25 22:50:30 INFO namenode.FSImage: Edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:50:30 INFO ha.EditLogTailer: Loaded 2 edits starting from
txid 10
16/08/25 22:52:30 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:52:30 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a6c8f51
expecting start txid #13
16/08/25 22:52:30 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 13
16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 13
16/08/25 22:52:30 INFO namenode.FSImage: Edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:52:30 INFO
ha.EditLogTailer: Loaded 2 edits starting from txid 12
16/08/25 22:54:30 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:54:30 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@8212238
expecting start txid #15
16/08/25 22:54:30 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 15
16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 15
16/08/25 22:54:30 INFO namenode.FSImage: Edits file
http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:54:30 INFO ha.EditLogTailer: Loaded 2 edits starting from
txid 14
16/08/25 22:56:30 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:56:30 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@58d1e7d8
expecting start txid #17
16/08/25 22:56:30 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 17
16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 17
16/08/25 22:56:30 INFO namenode.FSImage: Edits file
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:56:30 INFO
ha.EditLogTailer: Loaded 2 edits starting from txid 16
16/08/25 22:58:30 INFO ha.EditLogTailer: Triggering log roll on remote
NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:58:30 INFO namenode.FSImage: Reading
org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@791c80ff
expecting start txid #19
16/08/25 22:58:30 INFO namenode.FSImage: Start loading edits file
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 19
16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding
stream
'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' 
to transaction ID 19
16/08/25 22:58:30 INFO namenode.FSImage: Edits file
http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster,
http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster 
of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:58:30 INFO
ha.EditLogTailer: Loaded 2 edits starting from txid 18

Any ideas where I can start to debug this? This seems to be a problem
specific to the zookeeper failover controller. I tried this docker image
that starts a standalone cluster with 2 namenodes, datanodes and journal
nodes, but no zookeeper failover controller and it works correctly on
Alpline linux: https://github.com/gustavonalle/yarn-docker

Cheers,
Francis

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Running a HA HDFS cluster on alpine linux

Rakesh Radhakrishnan-2
Hi Francis,

There could be cases of connection fluctuations between ZKFC and ZK server, I've observed the following message from your logs. I'd suggest you to start analyzing all your ZooKeeper servers log messages and see ZooKeeper cluster status during this period. BTW, could you tell me the ZK cluster size.

16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x156c3dc679a0003, likely server has closed socket, closing socket connection and attempting reconnect
16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected. 
Entering neutral mode...

Regards,
Rakesh

On Sat, Aug 27, 2016 at 5:16 PM, F21 <[hidden email]> wrote:
Hi all,

I am currently experimenting with running a HA HDFS cluster in docker containers. I have successfully created an HA cluster using Ubuntu as my base image for running the namenode, datanode and journalnodes. The zookeeper instance runs on an image built using Alpine linux as the base and works pretty well.

I attempted to get the namenode, datanode and journalnodes running using Alpine linux as the base image. The datanode and journalnodes seem to work fine. However, while the namenodes start correctly, they seem to disconnect from Zookeeper quite often and will transition into neutral mode. This results in the namenodes being in "startup mode" forever.

These are the logs from the active namenode:
16/08/25 22:40:28 INFO blockmanagement.CacheReplicationMonitor: Starting CacheReplicationMonitor with interval 30000 milliseconds
16/08/25 22:40:28 INFO fs.TrashPolicyDefault: Namenode trash configuration: Deletion interval = 1440 minutes, Emptier interval = 0 minutes.
16/08/25 22:40:28 INFO fs.TrashPolicyDefault: The configured checkpoint interval is 0 minutes. Using an interval of 1440 minutes that is used for deletion instead
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Total number of blocks            = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of invalid blocks          = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of under-replicated blocks = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of over-replicated blocks = 0
16/08/25 22:40:28 INFO blockmanagement.BlockManager: Number of blocks being written    = 0
16/08/25 22:40:28 INFO hdfs.StateChange: STATE* Replication Queue initialization scan for invalid, over- and under-replicated blocks completed in 687 msec
16/08/25 22:40:28 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state
16/08/25 22:40:30 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
mkdir: `/tmp': File exists
16/08/25 22:40:32 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command getfileinfo is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command mkdirs is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command listStatus is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command * is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command setPermission is: 0
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command mkdirs is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command listStatus is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command * is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command setPermission is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command mkdirs is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command listStatus is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command * is: 1
16/08/25 22:41:36 INFO window.RollingWindowManager: topN size for command setPermission is: 1
16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x156c3dc679a0003, likely server has closed socket, closing socket connection and attempting reconnect
16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection to server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection established to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid = 0x156c3dc679a0003, negotiated timeout = 5000
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Checking for any old active which needs to be fenced...
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Old node exists: 0a036e733112036e6e311a136d3965646435312d6e6e312e6d39656464353120d43e28d33e
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: But old node has our own data, so don't need to fence it.
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Writing znode /mycluster/ns1/ActiveBreadCrumb to indicate that the local node is the most recent active...
16/08/25 22:41:41 INFO ha.ZKFailoverController: Trying to make NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 active...
16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn1.m9edd51/172.18.0.7:8020 to active state
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command getfileinfo is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command mkdirs is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command listStatus is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command * is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command setPermission is: 0
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command mkdirs is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command listStatus is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command * is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command setPermission is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command getfileinfo is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command mkdirs is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command listStatus is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command * is: 1
16/08/25 22:41:57 INFO window.RollingWindowManager: topN size for command setPermission is: 1
16/08/25 22:42:28 INFO namenode.FSNamesystem: Roll Edit Log from 172.18.0.8
16/08/25 22:42:28 INFO namenode.FSEditLog: Rolling edit logs
16/08/25 22:42:28 INFO namenode.FSEditLog: Ending log segment 1
16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4 Total time for transactions(ms): 16 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 40 17
16/08/25 22:42:28 INFO namenode.FSEditLog: Number of transactions: 4 Total time for transactions(ms): 16 Number of transactions batched in Syncs: 0 Number of syncs: 4 SyncTimes(ms): 50 18
16/08/25 22:42:28 INFO namenode.FileJournalManager: Finalizing edits file /var/lib/hadoop/name/current/edits_inprogress_0000000000000000001 -> /var/lib/hadoop/name/current/edits_0000000000000000001-0000000000000000004
16/08/25 22:42:28 INFO namenode.FSEditLog: Starting log segment at 5

From the standby namenode:
16/08/25 22:40:33 INFO ha.HealthMonitor: Entering state SERVICE_HEALTHY
16/08/25 22:40:33 INFO ha.ZKFailoverController: Local service NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 entered state: SERVICE_HEALTHY
16/08/25 22:40:33 INFO ha.ZKFailoverController: ZK Election indicated that NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby
16/08/25 22:40:33 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby state
16/08/25 22:41:40 INFO zookeeper.ClientCnxn: Unable to read additional data from server sessionid 0x156c3dc679a0004, likely server has closed socket, closing socket connection and attempting reconnect
16/08/25 22:41:40 INFO ha.ActiveStandbyElector: Session disconnected. Entering neutral mode...
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Opening socket connection to server m9edd51-zookeeper.m9edd51/172.18.0.2:2181. Will not attempt to authenticate using SASL (unknown error)
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Socket connection established to m9edd51-zookeeper.m9edd51/172.18.0.2:2181, initiating session
16/08/25 22:41:41 INFO zookeeper.ClientCnxn: Session establishment complete on server m9edd51-zookeeper.m9edd51/172.18.0.2:2181, sessionid = 0x156c3dc679a0004, negotiated timeout = 5000
16/08/25 22:41:41 INFO ha.ActiveStandbyElector: Session connected.
16/08/25 22:41:41 INFO ha.ZKFailoverController: ZK Election indicated that NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 should become standby
16/08/25 22:41:41 INFO ha.ZKFailoverController: Successfully transitioned NameNode at m9edd51-nn2.m9edd51/172.18.0.8:8020 to standby state
16/08/25 22:42:28 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:42:28 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@3b1f16f0 expecting start txid #1
16/08/25 22:42:28 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 1
16/08/25 22:42:28 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 1
16/08/25 22:42:29 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=1&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 133 edits # 4 loaded in 0 seconds
16/08/25 22:42:29 INFO ha.EditLogTailer: Loaded 4 edits starting from txid 0
16/08/25 22:44:29 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:44:29 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@7c92162 expecting start txid #5
16/08/25 22:44:29 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 5
16/08/25 22:44:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 5
16/08/25 22:44:29 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=5&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:44:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 4
16/08/25 22:46:29 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:46:29 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@1f890896 expecting start txid #7
16/08/25 22:46:29 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 7
16/08/25 22:46:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 7
16/08/25 22:46:29 INFO namenode.FSImage: Edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=7&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:46:29 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 6
16/08/25 22:48:29 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:48:29 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@4da07bde expecting start txid #9
16/08/25 22:48:29 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 9
16/08/25 22:48:29 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 9
16/08/25 22:48:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=9&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:48:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 8
16/08/25 22:50:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:50:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@350cea58 expecting start txid #11
16/08/25 22:50:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 11
16/08/25 22:50:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 11
16/08/25 22:50:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=11&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:50:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 10
16/08/25 22:52:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:52:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@6a6c8f51 expecting start txid #13
16/08/25 22:52:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 13
16/08/25 22:52:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 13
16/08/25 22:52:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=13&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:52:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 12
16/08/25 22:54:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:54:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@8212238 expecting start txid #15
16/08/25 22:54:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 15
16/08/25 22:54:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 15
16/08/25 22:54:30 INFO namenode.FSImage: Edits file http://m9edd51-jn3.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=15&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds
16/08/25 22:54:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 14
16/08/25 22:56:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:56:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@58d1e7d8 expecting start txid #17
16/08/25 22:56:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 17
16/08/25 22:56:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 17
16/08/25 22:56:30 INFO namenode.FSImage: Edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=17&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:56:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 16
16/08/25 22:58:30 INFO ha.EditLogTailer: Triggering log roll on remote NameNode m9edd51-nn1.m9edd51/172.18.0.7:8020
16/08/25 22:58:30 INFO namenode.FSImage: Reading org.apache.hadoop.hdfs.server.namenode.RedundantEditLogInputStream@791c80ff expecting start txid #19
16/08/25 22:58:30 INFO namenode.FSImage: Start loading edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster
16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 19
16/08/25 22:58:30 INFO namenode.EditLogInputStream: Fast-forwarding stream 'http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster' to transaction ID 19
16/08/25 22:58:30 INFO namenode.FSImage: Edits file http://m9edd51-jn1.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster, http://m9edd51-jn2.m9edd51:8480/getJournal?jid=ns1&segmentTxId=19&storageInfo=-63%3A2072890455%3A0%3Amycluster of size 42 edits # 2 loaded in 0 seconds 16/08/25 22:58:30 INFO ha.EditLogTailer: Loaded 2 edits starting from txid 18

Any ideas where I can start to debug this? This seems to be a problem specific to the zookeeper failover controller. I tried this docker image that starts a standalone cluster with 2 namenodes, datanodes and journal nodes, but no zookeeper failover controller and it works correctly on Alpline linux: https://github.com/gustavonalle/yarn-docker

Cheers,
Francis

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]