Restarting a resource manager kills the other in HA

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Restarting a resource manager kills the other in HA

Nikhil-2
Hi,

In the YARN HA for Resource Manager, I noticed that the HA has been fine initially during the HA setup but however after sometime I notice that restarting one resource manager gets the other resource manager stopped/killed. Below is what I see the logs on the killed resource manager instance. I am using hadoop version 2.5.1, if that helps.

Has anyone seen this before? Any ideas on how do I go about this one?

thanks,
Nikhil

-----

2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector: Deleting bread-crumb of active node...
2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session: 0x14b997543fd001e closed
2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x14b997543fd001e
2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2015-02-24 16:47:37,580 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2015-02-24 16:47:37,581 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
2015-02-24 16:47:37,587 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.
2015-02-24 16:47:37,588 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
2015-02-24 16:47:37,588 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread thread interrupted! Exiting!
2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session: 0x24b13ab5b4c069a closed
2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.
2015-02-24 16:47:37,617 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030
2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030
2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031
2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031
2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,634 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor thread interrupted

-----
Reply | Threaded
Open this post in threaded view
|

Re: Restarting a resource manager kills the other in HA

daemeon reiydelle

Only one rm will be active at a time. The other is in standby. When you started the new rm, the configuration files direct the "new" rm to come up and take over, the old primary will go to stand by (or should!). Working as designed except you will see slowdown in scheduling. I suspect what you want is for the new rm to come up in standby, not take over, no?

So ... I see normal messages for a switch over. However you should still see the standby rm receiving status from the new active rm if ha is configured.

sent from my mobile
Daemeon C.M. Reiydelle
USA 415.501.0198
London +44.0.20.8144.9872

On Feb 24, 2015 1:56 PM, "Nikhil" <[hidden email]> wrote:
Hi,

In the YARN HA for Resource Manager, I noticed that the HA has been fine initially during the HA setup but however after sometime I notice that restarting one resource manager gets the other resource manager stopped/killed. Below is what I see the logs on the killed resource manager instance. I am using hadoop version 2.5.1, if that helps.

Has anyone seen this before? Any ideas on how do I go about this one?

thanks,
Nikhil

-----

2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector: Yielding from election
2015-02-24 16:47:37,555 INFO org.apache.hadoop.ha.ActiveStandbyElector: Deleting bread-crumb of active node...
2015-02-24 16:47:37,555 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ZooKeeper: Session: 0x14b997543fd001e closed
2015-02-24 16:47:37,580 WARN org.apache.hadoop.ha.ActiveStandbyElector: Ignoring stale result from old client with sessionId 0x14b997543fd001e
2015-02-24 16:47:37,580 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2015-02-24 16:47:37,580 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Transitioning to standby state
2015-02-24 16:47:37,581 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping ResourceManager metrics system...
2015-02-24 16:47:37,587 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system stopped.
2015-02-24 16:47:37,588 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: ResourceManager metrics system shutdown complete.
2015-02-24 16:47:37,588 INFO org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore: org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore$VerifyActiveStatusThread thread interrupted! Exiting!
2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ZooKeeper: Session: 0x24b13ab5b4c069a closed
2015-02-24 16:47:37,616 INFO org.apache.zookeeper.ClientCnxn: EventThread shut down
2015-02-24 16:47:37,616 INFO org.apache.hadoop.yarn.event.AsyncDispatcher: AsyncDispatcher is draining to stop, igonring any new events.
2015-02-24 16:47:37,617 WARN org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher: org.apache.hadoop.yarn.server.resourcemanager.amlauncher.ApplicationMasterLauncher$LauncherThread interrupted. Returning.
2015-02-24 16:47:37,618 INFO org.apache.hadoop.ipc.Server: Stopping server on 8032
2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8032
2015-02-24 16:47:37,622 INFO org.apache.hadoop.ipc.Server: Stopping server on 8030
2015-02-24 16:47:37,623 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8030
2015-02-24 16:47:37,627 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,629 INFO org.apache.hadoop.ipc.Server: Stopping server on 8031
2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8031
2015-02-24 16:47:37,633 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2015-02-24 16:47:37,634 INFO org.apache.hadoop.yarn.util.AbstractLivelinessMonitor: NMLivelinessMonitor thread interrupted

-----