Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Getting close to a vote on a merge of S3Guard., HADOOP-13345

Andrew Wang
Thanks for the reply Steve, aligns what Aaron said above. Sooner the better
for this branch merge :)

On Thu, Aug 17, 2017 at 6:49 AM, Steve Loughran <[hidden email]>
wrote:

>
> On 16 Aug 2017, at 18:39, Andrew Wang <[hidden email]> wrote:
>
> Hi Steve,
>
> What's the target release vehicle, and the timeline for merging this? The
> target date for beta1 is mid-September, so any large code movements make me
> nervous.
>
>
> Code targets trunk, current state is ready to go in.
>
> I've also got it building & running against branch-2: all the code is
> Java-7 and the classpath problems were dealt with by Mingliang.
>
>
> Could you comment on testing and API stability of this branch? I'm
> trusting the judgement of the contributors involved, since there isn't much
> time to fix things before beta1.
>
>
>
> This is all working in the s3 code, and it's something you have to
> explicitly enable; I'm confident that when disabled it doesn't cause
> problems
>
> There's two modes of use in production (as well as a local dynamodb for
> testing)
>
> * dynamo DB as cache, "non authoritative"
> * dynamo DB as store of record, "authoritative"
>
> I'm fairly happy with non-auth; but as auth assumes that all clients are
> using s3guard, it's the one with the most risks. That one I'd be cautious
> over. But it does deliver the best speedup. And it lets you use the v1/v2
> algorithms to commit output, as now you get the consistent directory
> listings you need. There's still the O(data) COPY call, but at least the
> risk of incomplete listings -> incomplete copy operation is eliminated.
>
> We've had a preview version up for a while, running large hive/LLAP tests
> against it happily in particular, and my spark & cloud testing has shown
> all is well (indeed, I can show how all isn't well if you enable the
> inconsistent FS client and *dont* turn s3guard on).
>
> After the initial merge, there is more work to do, but mostly around:
> metrics, diagnostics, and the new committer work which depends on the
> consistent listings for one of the committers, but doesn't do *any* API
> calls into s3guard itself. All it needs is a consistent S3 endpoint, be it
> AWS S3 & S3Guard, or something else like the WDC cloud store. That's not
> going to be ready for Beta 1.
>
> -Steve
>
>
>
>
> Best,
> Andrew
>
> On Wed, Aug 16, 2017 at 5:25 AM, Steve Loughran <[hidden email]>
> wrote:
>
>>
>> FYI, We're getting ready for a patch to merge the current S3Guard branch,
>> HADOOP-13345, via a patch https://issues.apache.org/jira
>> /browse/HADOOP-13998
>>
>> After that's done, we do plan to have a second iteration, work on a
>> 0-rename committer (HADOOP-13786) with all the other tuning and
>> improvements; We'd add a new uber-JIRA & move stuff over, maybe branch,
>> and/or do things patch-by-patch .
>>
>> Anyway, now is a great time for people to download and play
>>
>> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-
>> tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/s3guard.md
>>
>> testing this
>>
>> https://github.com/apache/hadoop/blob/HADOOP-13345/hadoop-
>> tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/testing.md
>>
>> The Inconsistent AWS Client is also something everyone is free to use for
>> injecting inconsistencies (and soon faults) into their own apps by way of
>> 2-3 config options. Want to know how your code handles S3A being observably
>> inconsistent? We'll let you do that.
>>
>> -Steve
>>
>>
>>
>
>