Journal nodes in HA

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Journal nodes in HA

Konstantinos Tsakalozos
Hi everyone,

In an HA setup do you tend to co-host the journal service with other services instead of having them on separate dedicated machines? If so, what services do you pack together?

Thank you,
Konstantinos 
Reply | Threaded
Open this post in threaded view
|

Re: Journal nodes in HA

Rakesh Radhakrishnan-2
Hi Konstantinos,

The typical deployment is, three Journal Nodes(JNs) and can collocate two of the three JNs on the same machine where Namenodes(2 NNs) are running. The third one can be deployed to the machine where ZK server is running(assume ZK cluster has 3 nodes). I'd recommend to have a dedicated disk for each JN server to use for edit log path as edit logs will be writing continuously.

It would be helpful if you could give more details of your Hadoop cluster size and components including ZK service etc.

Thanks,
Rakesh

On Fri, Aug 12, 2016 at 3:12 PM, Konstantinos Tsakalozos <[hidden email]> wrote:
Hi everyone,

In an HA setup do you tend to co-host the journal service with other services instead of having them on separate dedicated machines? If so, what services do you pack together?

Thank you,
Konstantinos 

Reply | Threaded
Open this post in threaded view
|

Re: Journal nodes in HA

Konstantinos Tsakalozos
+ the hadoop list 

On Fri, Aug 12, 2016 at 3:25 PM, Konstantinos Tsakalozos <[hidden email]> wrote:
Hi Rakesh,

Thank you for your prompt reply.

In the Juju big data team we bundle Hadoop and a set of "peripheral" helper services so that any interested user can easily deploy the full environment in an automated way.
The deployment bundle looks like this: https://jujucharms.com/hadoop-processing/ . On the right side of the bundle you see a client service that can be replaced with any other service the user wishes (eg Hive, Pig etc). We also decided to go with ganglia and rsyslog for monitoring. Would you prefer to see anything more there? In the next release we will be adding Apache Zookeeper that will give us HA and this is why I am asking where would it be best to place the journal nodes. 

In our case it would be preferable to "waste" one more "namenode" machine (machine=unit in juju terminology) to place the third journal service by itself. The deployment would be cleaner and easier to reach. Also, appreciate very much your advice on dedicated storage. Are there any performance benchmarks showing what bandwidth we can sustain with shared vs dedicated storage for the journal nodes?

Thank you,
Konstantinos




On Fri, Aug 12, 2016 at 2:26 PM, Rakesh Radhakrishnan <[hidden email]> wrote:
Hi Konstantinos,

The typical deployment is, three Journal Nodes(JNs) and can collocate two of the three JNs on the same machine where Namenodes(2 NNs) are running. The third one can be deployed to the machine where ZK server is running(assume ZK cluster has 3 nodes). I'd recommend to have a dedicated disk for each JN server to use for edit log path as edit logs will be writing continuously.

It would be helpful if you could give more details of your Hadoop cluster size and components including ZK service etc.

Thanks,
Rakesh

On Fri, Aug 12, 2016 at 3:12 PM, Konstantinos Tsakalozos <[hidden email]> wrote:
Hi everyone,

In an HA setup do you tend to co-host the journal service with other services instead of having them on separate dedicated machines? If so, what services do you pack together?

Thank you,
Konstantinos 



Reply | Threaded
Open this post in threaded view
|

Re: Journal nodes in HA

Rakesh Radhakrishnan-2
Hi Konstantinos,

Nice documentation! Wish you all the success for expanding to Hadoop-HA mode.

I'd say, the JournalNode should be co-located on machines with other Hadoop master daemons; for example Namenodes, YARN ResourceManager etc. These daemons are attractive because they are already well-provisioned machines with little unpredictable user activity, and those daemons are generally light on disk usage, compares to worker nodes(Datanode, Nodemanager etc.). In general, dedicating a disk drive on each of the machines for use by the JournalNode helps avoid disk spindle competition between others. Sorry, I don't have any reports with me now. Perhaps other folks can pitch in and add more about any performance benchmarks results, if any. For ZooKeeper server, can refer http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html, https://wiki.apache.org/hadoop/ZooKeeper/ServiceLatencyOverview pages.

Thanks,
Rakesh

On Fri, Aug 12, 2016 at 5:56 PM, Konstantinos Tsakalozos <[hidden email]> wrote:
+ the hadoop list 

On Fri, Aug 12, 2016 at 3:25 PM, Konstantinos Tsakalozos <[hidden email]> wrote:
Hi Rakesh,

Thank you for your prompt reply.

In the Juju big data team we bundle Hadoop and a set of "peripheral" helper services so that any interested user can easily deploy the full environment in an automated way.
The deployment bundle looks like this: https://jujucharms.com/hadoop-processing/ . On the right side of the bundle you see a client service that can be replaced with any other service the user wishes (eg Hive, Pig etc). We also decided to go with ganglia and rsyslog for monitoring. Would you prefer to see anything more there? In the next release we will be adding Apache Zookeeper that will give us HA and this is why I am asking where would it be best to place the journal nodes. 

In our case it would be preferable to "waste" one more "namenode" machine (machine=unit in juju terminology) to place the third journal service by itself. The deployment would be cleaner and easier to reach. Also, appreciate very much your advice on dedicated storage. Are there any performance benchmarks showing what bandwidth we can sustain with shared vs dedicated storage for the journal nodes?

Thank you,
Konstantinos




On Fri, Aug 12, 2016 at 2:26 PM, Rakesh Radhakrishnan <[hidden email]> wrote:
Hi Konstantinos,

The typical deployment is, three Journal Nodes(JNs) and can collocate two of the three JNs on the same machine where Namenodes(2 NNs) are running. The third one can be deployed to the machine where ZK server is running(assume ZK cluster has 3 nodes). I'd recommend to have a dedicated disk for each JN server to use for edit log path as edit logs will be writing continuously.

It would be helpful if you could give more details of your Hadoop cluster size and components including ZK service etc.

Thanks,
Rakesh

On Fri, Aug 12, 2016 at 3:12 PM, Konstantinos Tsakalozos <[hidden email]> wrote:
Hi everyone,

In an HA setup do you tend to co-host the journal service with other services instead of having them on separate dedicated machines? If so, what services do you pack together?

Thank you,
Konstantinos