Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Zhe Zhang-2
+1 (binding)

Thanks Konstantin for leading the merge effort!

I worked very closely with Chen, Konstantin, and Erik in the testing stage
and I feel confident that the feature has now completed designed
functionalities and has proven to be stable.

Great team work with contributors from multiple companies!

On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <[hidden email]>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>
--
Zhe Zhang
Apache Hadoop Committer
http://zhe-thoughts.github.io/about/ | @oldcap
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Yongjun Zhang-3
Great work guys.

Wonder if we can elaborate what's impact of not having #2 fixed, and why #2
is not needed for the feature to complete?
2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
know about ObserverNodes trying to convert them to SBNs.

Thanks.
--Yongjun


On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <[hidden email]>
wrote:

> Hi Hadoop developers,
>
> I would like to propose to merge to trunk the feature branch HDFS-12943 for
> Consistent Reads from Standby Node. The feature is intended to scale read
> RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> NameNode. We should be able to accommodate higher overall RPC workloads (up
> to 4x by some estimates) by adding multiple ObserverNodes.
>
> The main functionality has been implemented see sub-tasks of HDFS-12943.
> We followed up with the test plan. Testing was done on two independent
> clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> We ran standard HDFS commands, MR jobs, admin commands including manual
> failover.
> We know of one cluster running this feature in production.
>
> There are a few outstanding issues:
> 1. Need to provide proper documentation - a user guide for the new feature
> 2. Need to fix automatic failover with ZKFC. Currently it does not doesn't
> know about ObserverNodes trying to convert them to SBNs.
> 3. Scale testing and performance fine-tuning
> 4. As testing progresses, we continue fixing non-critical bugs like
> HDFS-14116.
>
> I attached a unified patch to the umbrella jira for the review and Jenkins
> build.
> Please vote on this thread. The vote will run for 7 days until Wed Dec 12.
>
> Thanks,
> --Konstantin
>
Reply | Threaded
Open this post in threaded view
|

Re: [VOTE] Merge HDFS-12943 branch to trunk - Consistent Reads from Standby

Yongjun Zhang-3
Hi Konstantin,

Thanks for addressing my other question about failover.

Some thought to share about the suggestion Daryn made.  Seems we could try
this: let ObserverNode throws an RetriableException back to client saying
it has not reached the transaction ID to serve the client yet, maybe even
include the transaction ID gap information in the exception, then when the
client received the RetriableException, it can decide whether the continue
to send the request to the observer node again, or to the active NN when
the gap is too big.

Though saving another RPC would help the performance with the current
implementation, I expect the above mentioned exception only happens
infrequently, so the performance won't be too bad, plus the client has a
chance to try ANN when knowing that the observer is too behind at extreme
case.

I wonder how different the performance is between these two approaches in
cluster with real workload.

Comments?

--Yongjun

On Fri, Dec 7, 2018 at 4:10 PM Konstantin Shvachko <[hidden email]>
wrote:

> Hi Daryn,
>
> Wanted to backup Chen's earlier response to your concerns about rotating
> calls in the call queue.
> Our design
> 1. targets directly the livelock problem by rejecting calls on the Observer
> that are not likely to be responded in timely matter: HDFS-13873.
> 2. The call queue rotation is only done on Observers, and never on the
> active NN, so it stays free of attacks like you suggest.
>
> If this is a satisfactory mitigation for the problem could you please
> reconsider your -1, so that people could continue voting on this thread.
>
> Thanks,
> --Konst
>
> On Thu, Dec 6, 2018 at 10:38 AM Daryn Sharp <[hidden email]> wrote:
>
> > -1 pending additional info.  After a cursory scan, I have serious
> concerns
> > regarding the design.  This seems like a feature that should have been
> > purely implemented in hdfs w/o touching the common IPC layer.
> >
> > The biggest issue in the alignment context.  It's purpose appears to be
> > for allowing handlers to reinsert calls back into the call queue.  That's
> > completely unacceptable.  A buggy or malicious client can easily cause
> > livelock in the IPC layer with handlers only looping on calls that never
> > satisfy the condition.  Why is this not implemented via
> RetriableExceptions?
> >
> > On Thu, Dec 6, 2018 at 1:24 AM Yongjun Zhang <[hidden email]
> >
> > wrote:
> >
> >> Great work guys.
> >>
> >> Wonder if we can elaborate what's impact of not having #2 fixed, and why
> >> #2
> >> is not needed for the feature to complete?
> >> 2. Need to fix automatic failover with ZKFC. Currently it does not
> doesn't
> >> know about ObserverNodes trying to convert them to SBNs.
> >>
> >> Thanks.
> >> --Yongjun
> >>
> >>
> >> On Wed, Dec 5, 2018 at 5:27 PM Konstantin Shvachko <
> [hidden email]>
> >> wrote:
> >>
> >> > Hi Hadoop developers,
> >> >
> >> > I would like to propose to merge to trunk the feature branch
> HDFS-12943
> >> for
> >> > Consistent Reads from Standby Node. The feature is intended to scale
> >> read
> >> > RPC workloads. On large clusters reads comprise 95% of all RPCs to the
> >> > NameNode. We should be able to accommodate higher overall RPC
> workloads
> >> (up
> >> > to 4x by some estimates) by adding multiple ObserverNodes.
> >> >
> >> > The main functionality has been implemented see sub-tasks of
> HDFS-12943.
> >> > We followed up with the test plan. Testing was done on two independent
> >> > clusters (see HDFS-14058 and HDFS-14059) with security enabled.
> >> > We ran standard HDFS commands, MR jobs, admin commands including
> manual
> >> > failover.
> >> > We know of one cluster running this feature in production.
> >> >
> >> > There are a few outstanding issues:
> >> > 1. Need to provide proper documentation - a user guide for the new
> >> feature
> >> > 2. Need to fix automatic failover with ZKFC. Currently it does not
> >> doesn't
> >> > know about ObserverNodes trying to convert them to SBNs.
> >> > 3. Scale testing and performance fine-tuning
> >> > 4. As testing progresses, we continue fixing non-critical bugs like
> >> > HDFS-14116.
> >> >
> >> > I attached a unified patch to the umbrella jira for the review and
> >> Jenkins
> >> > build.
> >> > Please vote on this thread. The vote will run for 7 days until Wed Dec
> >> 12.
> >> >
> >> > Thanks,
> >> > --Konstantin
> >> >
> >>
> >
> >
> > --
> >
> > Daryn
> >
>