Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

Sebastian Nagel
Hi,

recently, after upgrading to CDH 5.8.0, I've run into a performance
issue when reading data from AWS S3 (via s3a).

A job [1] reads 10,000s files ("objects") from S3 and writes extracted
data back to S3. Every file/object is about 1 GB in size, processing
is CPU-intensive and takes a couple of minutes per file/object. Each
file/object is processed by one task using FilenameInputFormat.

After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6
times slower in overall than in previous runs. A significant number
of tasks hung up without progress for up to one hour. These tasks were
dominating and most nodes in the cluster showed little or no CPU
utilization. Tasks are not killed/restarted because the task timeout
is set to a very large value (because S3 is known to be slow
sometimes). Attaching to a couple of the hung tasks with jstack
showed that these tasks hang when reading from S3 [3].

The problem was finally fixed by setting
  fs.s3a.connection.timeout = 30000  (default: 200000 ms)
  fs.s3a.attempts.maximum = 5        (default 20)
Tasks now take 20min. in the worst case, the majority finishes within minutes.

Is this the correct way to fix the problem?
These settings have been increased recently in HADOOP-12346 [2].
What could be the draw-backs with a lower timeout?

Thanks,
Sebastian

[1]
https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java

[2] https://issues.apache.org/jira/browse/HADOOP-12346

[3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable [0x00007fad6b274000]
   java.lang.Thread.State: RUNNABLE
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:152)
        at java.net.SocketInputStream.read(SocketInputStream.java:122)
        at
com.cloudera.org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
        at
com.cloudera.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
        at com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
        at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
        at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
        at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
        at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
        at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
        at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
        at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
        at com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
        at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
        at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
        - locked <0x00000007765604f8> (a org.apache.hadoop.fs.s3a.S3AInputStream)
        at java.io.DataInputStream.read(DataInputStream.java:149)
        ...

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

Chris Nauroth
Hello Sebastian,

This is an interesting finding.  Thank you for reporting it.

Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so, are they co-located in the same AWS region as the S3 bucket?  If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any notable differences on nodes that experienced this problem (e.g. smaller capacity on the outbound NIC)?

This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or throttled or somehow compromised, then I could see how longer timeouts and more retries might increase overall job time.  With the shorter settings, it might cause individual task attempts to fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to S3, it would start making progress faster in the second attempt.  Then, the effect on overall job execution might be faster.

--Chris Nauroth

On 8/7/16, 12:12 PM, "Sebastian Nagel" <[hidden email]> wrote:

    Hi,
   
    recently, after upgrading to CDH 5.8.0, I've run into a performance
    issue when reading data from AWS S3 (via s3a).
   
    A job [1] reads 10,000s files ("objects") from S3 and writes extracted
    data back to S3. Every file/object is about 1 GB in size, processing
    is CPU-intensive and takes a couple of minutes per file/object. Each
    file/object is processed by one task using FilenameInputFormat.
   
    After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6
    times slower in overall than in previous runs. A significant number
    of tasks hung up without progress for up to one hour. These tasks were
    dominating and most nodes in the cluster showed little or no CPU
    utilization. Tasks are not killed/restarted because the task timeout
    is set to a very large value (because S3 is known to be slow
    sometimes). Attaching to a couple of the hung tasks with jstack
    showed that these tasks hang when reading from S3 [3].
   
    The problem was finally fixed by setting
      fs.s3a.connection.timeout = 30000  (default: 200000 ms)
      fs.s3a.attempts.maximum = 5        (default 20)
    Tasks now take 20min. in the worst case, the majority finishes within minutes.
   
    Is this the correct way to fix the problem?
    These settings have been increased recently in HADOOP-12346 [2].
    What could be the draw-backs with a lower timeout?
   
    Thanks,
    Sebastian
   
    [1]
    https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java
   
    [2] https://issues.apache.org/jira/browse/HADOOP-12346
   
    [3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable [0x00007fad6b274000]
       java.lang.Thread.State: RUNNABLE
            at java.net.SocketInputStream.socketRead0(Native Method)
            at java.net.SocketInputStream.read(SocketInputStream.java:152)
            at java.net.SocketInputStream.read(SocketInputStream.java:122)
            at
    com.cloudera.org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
            at
    com.cloudera.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
            at com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
            at com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
            at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
            - locked <0x00000007765604f8> (a org.apache.hadoop.fs.s3a.S3AInputStream)
            at java.io.DataInputStream.read(DataInputStream.java:149)
            ...
   
    ---------------------------------------------------------------------
    To unsubscribe, e-mail: [hidden email]
    For additional commands, e-mail: [hidden email]
   
   
   


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

dheeren bebortha-2
Did you change java idk version as well,  as part of the upgrade?
Dheeren

> On Aug 16, 2016, at 11:59 AM, Chris Nauroth <[hidden email]> wrote:
>
> Hello Sebastian,
>
> This is an interesting finding.  Thank you for reporting it.
>
> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so, are they co-located in the same AWS region as the S3 bucket?  If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any notable differences on nodes that experienced this problem (e.g. smaller capacity on the outbound NIC)?
>
> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or throttled or somehow compromised, then I could see how longer timeouts and more retries might increase overall job time.  With the shorter settings, it might cause individual task attempts to fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to S3, it would start making progress faster in the second attempt.  Then, the effect on overall job execution might be faster.
>
> --Chris Nauroth
>
> On 8/7/16, 12:12 PM, "Sebastian Nagel" <[hidden email]> wrote:
>
>    Hi,
>
>    recently, after upgrading to CDH 5.8.0, I've run into a performance
>    issue when reading data from AWS S3 (via s3a).
>
>    A job [1] reads 10,000s files ("objects") from S3 and writes extracted
>    data back to S3. Every file/object is about 1 GB in size, processing
>    is CPU-intensive and takes a couple of minutes per file/object. Each
>    file/object is processed by one task using FilenameInputFormat.
>
>    After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6
>    times slower in overall than in previous runs. A significant number
>    of tasks hung up without progress for up to one hour. These tasks were
>    dominating and most nodes in the cluster showed little or no CPU
>    utilization. Tasks are not killed/restarted because the task timeout
>    is set to a very large value (because S3 is known to be slow
>    sometimes). Attaching to a couple of the hung tasks with jstack
>    showed that these tasks hang when reading from S3 [3].
>
>    The problem was finally fixed by setting
>      fs.s3a.connection.timeout = 30000  (default: 200000 ms)
>      fs.s3a.attempts.maximum = 5        (default 20)
>    Tasks now take 20min. in the worst case, the majority finishes within minutes.
>
>    Is this the correct way to fix the problem?
>    These settings have been increased recently in HADOOP-12346 [2].
>    What could be the draw-backs with a lower timeout?
>
>    Thanks,
>    Sebastian
>
>    [1]
>    https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java
>
>    [2] https://issues.apache.org/jira/browse/HADOOP-12346
>
>    [3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable [0x00007fad6b274000]
>       java.lang.Thread.State: RUNNABLE
>            at java.net.SocketInputStream.socketRead0(Native Method)
>            at java.net.SocketInputStream.read(SocketInputStream.java:152)
>            at java.net.SocketInputStream.read(SocketInputStream.java:122)
>            at
>    com.cloudera.org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
>            at
>    com.cloudera.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
>            at com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>            at com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>            at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
>            - locked <0x00000007765604f8> (a org.apache.hadoop.fs.s3a.S3AInputStream)
>            at java.io.DataInputStream.read(DataInputStream.java:149)
>            ...
>
>    ---------------------------------------------------------------------
>    To unsubscribe, e-mail: [hidden email]
>    For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

Sebastian Nagel
Hi Dheeren, hi Chris,


>> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so,
are they co-located in the same AWS region as the S3 bucket?

Running a cluster of 100 m1.xlarge EC2 instances with Ubuntu 14.04 (ami-41a20f2a).
The cluster is running in a single availability zone (us-east-1d), the S3 bucket
is in the same region (us-east-1).

% lsb_release -d
Description:    Ubuntu 14.04.3 LTS

% uname -a
Linux ip-10-91-235-121 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64
x86_64 GNU/Linux

> Did you change java idk version as well,  as part of the upgrade?

Java is taken as provided by Ubuntu:

% java -version
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)

Cloudera CDH is installed from
  http://archive.cloudera.com/cdh5/one-click-install/trusty/amd64/cdh5-repository_1.0_all.deb

After the jobs are done the cluster is shut down and bootstrapped (bash + cloudinit) anew on demand.
A new launch of the cluster may, of course, include updates of
 - the underlying Amazon machine image
 - Ubuntu packages
 - Cloudera packages

And the real reason for the problem may come from any of these changes.
The update to Cloudera CDH 5.8.0 was the most obvious since the problems appeared
(seen first 2016-08-01).

>> If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any
notable differences on nodes that experienced this problem (e.g. smaller capacity on the outbound NIC)?

Probably not, although I cannot exclude this. I've the last days run into problems which could be
related: few tasks are slow, even seem to hang, e.g., reducers during copy. But that's also looks
more like a Hadoop (configuration) problem. Network throughput between nodes measured with iperf is
not super-performant but generally ok (5-20 MBit/s).

 >> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or
throttled or somehow compromised, then I could see how longer timeouts and more retries might
increase overall job time.  With the shorter settings, it might cause individual task attempts to
fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to
S3, it would start making progress faster in the second attempt.  Then, the effect on overall job
execution might be faster.

That's also my assumption. While connecting to S3 a server is selected which is fast now.
While copying 1 GB which takes a couple of minutes just because of general network throughput,
the server may become more loaded. When reconnecting a better server is chosen.

Btw., tasks are not failing when choosing a moderate timeout - 30 sec. is ok, with lower
values (a few seconds) the file uploads frequently fail.

I've seen this behavior with a simple distcp from S3: with the default values, it took 1 day to copy
300 GB from S3 to HDFS. After choosing a shorter timeout the job finished within 5 hours.

Thanks,
Sebastian

On 08/16/2016 09:11 PM, Dheeren Bebortha wrote:

> Did you change java idk version as well,  as part of the upgrade?
> Dheeren
>
>> On Aug 16, 2016, at 11:59 AM, Chris Nauroth <[hidden email]> wrote:
>>
>> Hello Sebastian,
>>
>> This is an interesting finding.  Thank you for reporting it.
>>
>> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so, are they co-located in the same AWS region as the S3 bucket?  If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any notable differences on nodes that experienced this problem (e.g. smaller capacity on the outbound NIC)?
>>
>> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or throttled or somehow compromised, then I could see how longer timeouts and more retries might increase overall job time.  With the shorter settings, it might cause individual task attempts to fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to S3, it would start making progress faster in the second attempt.  Then, the effect on overall job execution might be faster.
>>
>> --Chris Nauroth
>>
>> On 8/7/16, 12:12 PM, "Sebastian Nagel" <[hidden email]> wrote:
>>
>>    Hi,
>>
>>    recently, after upgrading to CDH 5.8.0, I've run into a performance
>>    issue when reading data from AWS S3 (via s3a).
>>
>>    A job [1] reads 10,000s files ("objects") from S3 and writes extracted
>>    data back to S3. Every file/object is about 1 GB in size, processing
>>    is CPU-intensive and takes a couple of minutes per file/object. Each
>>    file/object is processed by one task using FilenameInputFormat.
>>
>>    After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6
>>    times slower in overall than in previous runs. A significant number
>>    of tasks hung up without progress for up to one hour. These tasks were
>>    dominating and most nodes in the cluster showed little or no CPU
>>    utilization. Tasks are not killed/restarted because the task timeout
>>    is set to a very large value (because S3 is known to be slow
>>    sometimes). Attaching to a couple of the hung tasks with jstack
>>    showed that these tasks hang when reading from S3 [3].
>>
>>    The problem was finally fixed by setting
>>      fs.s3a.connection.timeout = 30000  (default: 200000 ms)
>>      fs.s3a.attempts.maximum = 5        (default 20)
>>    Tasks now take 20min. in the worst case, the majority finishes within minutes.
>>
>>    Is this the correct way to fix the problem?
>>    These settings have been increased recently in HADOOP-12346 [2].
>>    What could be the draw-backs with a lower timeout?
>>
>>    Thanks,
>>    Sebastian
>>
>>    [1]
>>    https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java
>>
>>    [2] https://issues.apache.org/jira/browse/HADOOP-12346
>>
>>    [3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable [0x00007fad6b274000]
>>       java.lang.Thread.State: RUNNABLE
>>            at java.net.SocketInputStream.socketRead0(Native Method)
>>            at java.net.SocketInputStream.read(SocketInputStream.java:152)
>>            at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>            at
>>    com.cloudera.org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
>>            at
>>    com.cloudera.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
>>            at com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
>>            - locked <0x00000007765604f8> (a org.apache.hadoop.fs.s3a.S3AInputStream)
>>            at java.io.DataInputStream.read(DataInputStream.java:149)
>>            ...
>>
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: [hidden email]
>>    For additional commands, e-mail: [hidden email]
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

max scalf
Just out of curiosity, have you enabled S3 endpoint for this ?  Hopefully u are running this cluster inside a VPC, if so an endpoint would help as the S3 traffic will not go out to the Internet...

Any new policies put in place for your S3 bucket as others have mentioned something about throttling ?

On Wed, Aug 17, 2016, 3:22 PM Sebastian Nagel <[hidden email]> wrote:
Hi Dheeren, hi Chris,


>> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so,
are they co-located in the same AWS region as the S3 bucket?

Running a cluster of 100 m1.xlarge EC2 instances with Ubuntu 14.04 (ami-41a20f2a).
The cluster is running in a single availability zone (us-east-1d), the S3 bucket
is in the same region (us-east-1).

% lsb_release -d
Description:    Ubuntu 14.04.3 LTS

% uname -a
Linux ip-10-91-235-121 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64
x86_64 GNU/Linux

> Did you change java idk version as well,  as part of the upgrade?

Java is taken as provided by Ubuntu:

% java -version
java version "1.7.0_111"
OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)

Cloudera CDH is installed from
  http://archive.cloudera.com/cdh5/one-click-install/trusty/amd64/cdh5-repository_1.0_all.deb

After the jobs are done the cluster is shut down and bootstrapped (bash + cloudinit) anew on demand.
A new launch of the cluster may, of course, include updates of
 - the underlying Amazon machine image
 - Ubuntu packages
 - Cloudera packages

And the real reason for the problem may come from any of these changes.
The update to Cloudera CDH 5.8.0 was the most obvious since the problems appeared
(seen first 2016-08-01).

>> If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any
notable differences on nodes that experienced this problem (e.g. smaller capacity on the outbound NIC)?

Probably not, although I cannot exclude this. I've the last days run into problems which could be
related: few tasks are slow, even seem to hang, e.g., reducers during copy. But that's also looks
more like a Hadoop (configuration) problem. Network throughput between nodes measured with iperf is
not super-performant but generally ok (5-20 MBit/s).

 >> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or
throttled or somehow compromised, then I could see how longer timeouts and more retries might
increase overall job time.  With the shorter settings, it might cause individual task attempts to
fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to
S3, it would start making progress faster in the second attempt.  Then, the effect on overall job
execution might be faster.

That's also my assumption. While connecting to S3 a server is selected which is fast now.
While copying 1 GB which takes a couple of minutes just because of general network throughput,
the server may become more loaded. When reconnecting a better server is chosen.

Btw., tasks are not failing when choosing a moderate timeout - 30 sec. is ok, with lower
values (a few seconds) the file uploads frequently fail.

I've seen this behavior with a simple distcp from S3: with the default values, it took 1 day to copy
300 GB from S3 to HDFS. After choosing a shorter timeout the job finished within 5 hours.

Thanks,
Sebastian

On 08/16/2016 09:11 PM, Dheeren Bebortha wrote:
> Did you change java idk version as well,  as part of the upgrade?
> Dheeren
>
>> On Aug 16, 2016, at 11:59 AM, Chris Nauroth <[hidden email]> wrote:
>>
>> Hello Sebastian,
>>
>> This is an interesting finding.  Thank you for reporting it.
>>
>> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so, are they co-located in the same AWS region as the S3 bucket?  If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any notable differences on nodes that experienced this problem (e.g. smaller capacity on the outbound NIC)?
>>
>> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or throttled or somehow compromised, then I could see how longer timeouts and more retries might increase overall job time.  With the shorter settings, it might cause individual task attempts to fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to S3, it would start making progress faster in the second attempt.  Then, the effect on overall job execution might be faster.
>>
>> --Chris Nauroth
>>
>> On 8/7/16, 12:12 PM, "Sebastian Nagel" <[hidden email]> wrote:
>>
>>    Hi,
>>
>>    recently, after upgrading to CDH 5.8.0, I've run into a performance
>>    issue when reading data from AWS S3 (via s3a).
>>
>>    A job [1] reads 10,000s files ("objects") from S3 and writes extracted
>>    data back to S3. Every file/object is about 1 GB in size, processing
>>    is CPU-intensive and takes a couple of minutes per file/object. Each
>>    file/object is processed by one task using FilenameInputFormat.
>>
>>    After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6
>>    times slower in overall than in previous runs. A significant number
>>    of tasks hung up without progress for up to one hour. These tasks were
>>    dominating and most nodes in the cluster showed little or no CPU
>>    utilization. Tasks are not killed/restarted because the task timeout
>>    is set to a very large value (because S3 is known to be slow
>>    sometimes). Attaching to a couple of the hung tasks with jstack
>>    showed that these tasks hang when reading from S3 [3].
>>
>>    The problem was finally fixed by setting
>>      fs.s3a.connection.timeout = 30000  (default: 200000 ms)
>>      fs.s3a.attempts.maximum = 5        (default 20)
>>    Tasks now take 20min. in the worst case, the majority finishes within minutes.
>>
>>    Is this the correct way to fix the problem?
>>    These settings have been increased recently in HADOOP-12346 [2].
>>    What could be the draw-backs with a lower timeout?
>>
>>    Thanks,
>>    Sebastian
>>
>>    [1]
>>    https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java
>>
>>    [2] https://issues.apache.org/jira/browse/HADOOP-12346
>>
>>    [3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable [0x00007fad6b274000]
>>       java.lang.Thread.State: RUNNABLE
>>            at java.net.SocketInputStream.socketRead0(Native Method)
>>            at java.net.SocketInputStream.read(SocketInputStream.java:152)
>>            at java.net.SocketInputStream.read(SocketInputStream.java:122)
>>            at
>>    com.cloudera.org.apache.http.impl.io.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
>>            at
>>    com.cloudera.org.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
>>            at com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
>>            at com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>>            at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
>>            - locked <0x00000007765604f8> (a org.apache.hadoop.fs.s3a.S3AInputStream)
>>            at java.io.DataInputStream.read(DataInputStream.java:149)
>>            ...
>>
>>    ---------------------------------------------------------------------
>>    To unsubscribe, e-mail: [hidden email]
>>    For additional commands, e-mail: [hidden email]
>>
>>
>>
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Slow read from S3 on CDH 5.8.0 (includes HADOOP-12346)

Sebastian Nagel
Hi Max,

the cluster isn't running inside a VPC. It's a web crawl which is then
published as public data set at s3://commoncrawl/. No need for VPC, since
all the data is open.

But thanks for the pointer, a VPC and a S3 endpoint could be an option
for the future.

> Any new policies put in place for your S3 bucket
> as others have mentioned something about throttling ?

No, the policies are unchanged since several months,
long time before the problems appeared. And no throttling
is configured for that bucket.

The only "throttling" I've observed the last week, is
that there is a low bandwidth (120kbit/s) between nodes
for about 20 sec. Looks like that only on a higher demand the
bandwidth is increased:
  https://forums.aws.amazon.com/thread.jspa?threadID=237530
Hope to get an answer from AWS for this phenomenon.

Whether this does also apply to the connection between
cluster nodes and S3 front servers, I don't know.
Could be related, of course.

Thanks,
Sebastian


On 08/20/2016 04:35 PM, max scalf wrote:

> Just out of curiosity, have you enabled S3 endpoint for this ?  Hopefully u are running this cluster
> inside a VPC, if so an endpoint would help as the S3 traffic will not go out to the Internet...
>
> Any new policies put in place for your S3 bucket as others have mentioned something about throttling ?
>
> On Wed, Aug 17, 2016, 3:22 PM Sebastian Nagel <[hidden email]
> <mailto:[hidden email]>> wrote:
>
>     Hi Dheeren, hi Chris,
>
>
>     >> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If so,
>     are they co-located in the same AWS region as the S3 bucket?
>
>     Running a cluster of 100 m1.xlarge EC2 instances with Ubuntu 14.04 (ami-41a20f2a).
>     The cluster is running in a single availability zone (us-east-1d), the S3 bucket
>     is in the same region (us-east-1).
>
>     % lsb_release -d
>     Description:    Ubuntu 14.04.3 LTS
>
>     % uname -a
>     Linux ip-10-91-235-121 3.13.0-61-generic #100-Ubuntu SMP Wed Jul 29 11:21:34 UTC 2015 x86_64 x86_64
>     x86_64 GNU/Linux
>
>     > Did you change java idk version as well,  as part of the upgrade?
>
>     Java is taken as provided by Ubuntu:
>
>     % java -version
>     java version "1.7.0_111"
>     OpenJDK Runtime Environment (IcedTea 2.6.7) (7u111-2.6.7-0ubuntu0.14.04.3)
>     OpenJDK 64-Bit Server VM (build 24.111-b01, mixed mode)
>
>     Cloudera CDH is installed from
>       http://archive.cloudera.com/cdh5/one-click-install/trusty/amd64/cdh5-repository_1.0_all.deb
>
>     After the jobs are done the cluster is shut down and bootstrapped (bash + cloudinit) anew on demand.
>     A new launch of the cluster may, of course, include updates of
>      - the underlying Amazon machine image
>      - Ubuntu packages
>      - Cloudera packages
>
>     And the real reason for the problem may come from any of these changes.
>     The update to Cloudera CDH 5.8.0 was the most obvious since the problems appeared
>     (seen first 2016-08-01).
>
>     >> If the cluster is not running in EC2 (e.g. on-premises physical hardware), then are there any
>     notable differences on nodes that experienced this problem (e.g. smaller capacity on the
>     outbound NIC)?
>
>     Probably not, although I cannot exclude this. I've the last days run into problems which could be
>     related: few tasks are slow, even seem to hang, e.g., reducers during copy. But that's also looks
>     more like a Hadoop (configuration) problem. Network throughput between nodes measured with iperf is
>     not super-performant but generally ok (5-20 MBit/s).
>
>      >> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or
>     throttled or somehow compromised, then I could see how longer timeouts and more retries might
>     increase overall job time.  With the shorter settings, it might cause individual task attempts to
>     fail sooner.  Then, if the next attempt gets scheduled to a different node with better bandwidth to
>     S3, it would start making progress faster in the second attempt.  Then, the effect on overall job
>     execution might be faster.
>
>     That's also my assumption. While connecting to S3 a server is selected which is fast now.
>     While copying 1 GB which takes a couple of minutes just because of general network throughput,
>     the server may become more loaded. When reconnecting a better server is chosen.
>
>     Btw., tasks are not failing when choosing a moderate timeout - 30 sec. is ok, with lower
>     values (a few seconds) the file uploads frequently fail.
>
>     I've seen this behavior with a simple distcp from S3: with the default values, it took 1 day to copy
>     300 GB from S3 to HDFS. After choosing a shorter timeout the job finished within 5 hours.
>
>     Thanks,
>     Sebastian
>
>     On 08/16/2016 09:11 PM, Dheeren Bebortha wrote:
>     > Did you change java idk version as well,  as part of the upgrade?
>     > Dheeren
>     >
>     >> On Aug 16, 2016, at 11:59 AM, Chris Nauroth <[hidden email]
>     <mailto:[hidden email]>> wrote:
>     >>
>     >> Hello Sebastian,
>     >>
>     >> This is an interesting finding.  Thank you for reporting it.
>     >>
>     >> Are you able to share a bit more about your deployment architecture?  Are these EC2 VMs?  If
>     so, are they co-located in the same AWS region as the S3 bucket?  If the cluster is not running
>     in EC2 (e.g. on-premises physical hardware), then are there any notable differences on nodes
>     that experienced this problem (e.g. smaller capacity on the outbound NIC)?
>     >>
>     >> This is just a theory, but If your bandwidth to the S3 service is intermittently saturated or
>     throttled or somehow compromised, then I could see how longer timeouts and more retries might
>     increase overall job time.  With the shorter settings, it might cause individual task attempts
>     to fail sooner.  Then, if the next attempt gets scheduled to a different node with better
>     bandwidth to S3, it would start making progress faster in the second attempt.  Then, the effect
>     on overall job execution might be faster.
>     >>
>     >> --Chris Nauroth
>     >>
>     >> On 8/7/16, 12:12 PM, "Sebastian Nagel" <[hidden email]
>     <mailto:[hidden email]>> wrote:
>     >>
>     >>    Hi,
>     >>
>     >>    recently, after upgrading to CDH 5.8.0, I've run into a performance
>     >>    issue when reading data from AWS S3 (via s3a).
>     >>
>     >>    A job [1] reads 10,000s files ("objects") from S3 and writes extracted
>     >>    data back to S3. Every file/object is about 1 GB in size, processing
>     >>    is CPU-intensive and takes a couple of minutes per file/object. Each
>     >>    file/object is processed by one task using FilenameInputFormat.
>     >>
>     >>    After the upgrade to CDH 5.8.0, the job showed slow progress, 5-6
>     >>    times slower in overall than in previous runs. A significant number
>     >>    of tasks hung up without progress for up to one hour. These tasks were
>     >>    dominating and most nodes in the cluster showed little or no CPU
>     >>    utilization. Tasks are not killed/restarted because the task timeout
>     >>    is set to a very large value (because S3 is known to be slow
>     >>    sometimes). Attaching to a couple of the hung tasks with jstack
>     >>    showed that these tasks hang when reading from S3 [3].
>     >>
>     >>    The problem was finally fixed by setting
>     >>      fs.s3a.connection.timeout = 30000  (default: 200000 ms)
>     >>      fs.s3a.attempts.maximum = 5        (default 20)
>     >>    Tasks now take 20min. in the worst case, the majority finishes within minutes.
>     >>
>     >>    Is this the correct way to fix the problem?
>     >>    These settings have been increased recently in HADOOP-12346 [2].
>     >>    What could be the draw-backs with a lower timeout?
>     >>
>     >>    Thanks,
>     >>    Sebastian
>     >>
>     >>    [1]
>     >>  
>     https://github.com/commoncrawl/ia-hadoop-tools/blob/master/src/main/java/org/archive/hadoop/jobs/WEATGenerator.java
>     >>
>     >>    [2] https://issues.apache.org/jira/browse/HADOOP-12346
>     >>
>     >>    [3] "main" prio=10 tid=0x00007fad64013000 nid=0x4ab5 runnable [0x00007fad6b274000]
>     >>       java.lang.Thread.State: RUNNABLE
>     >>            at java.net.SocketInputStream.socketRead0(Native Method)
>     >>            at java.net.SocketInputStream.read(SocketInputStream.java:152)
>     >>            at java.net.SocketInputStream.read(SocketInputStream.java:122)
>     >>            at
>     >>    com.cloudera.org.apache.http.impl.io
>     <http://impl.io>.AbstractSessionInputBuffer.read(AbstractSessionInputBuffer.java:204)
>     >>            at
>     >>    com.cloudera.org.apache.http.impl.io
>     <http://impl.io>.ContentLengthInputStream.read(ContentLengthInputStream.java:182)
>     >>            at
>     com.cloudera.org.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:138)
>     >>            at
>     com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>     >>            at
>     com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>     >>            at
>     com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>     >>            at
>     com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>     >>            at
>     com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>     >>            at
>     com.cloudera.com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:151)
>     >>            at
>     com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>     >>            at
>     com.cloudera.com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:108)
>     >>            at
>     com.cloudera.com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:72)
>     >>            at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:160)
>     >>            - locked <0x00000007765604f8> (a org.apache.hadoop.fs.s3a.S3AInputStream)
>     >>            at java.io.DataInputStream.read(DataInputStream.java:149)
>     >>            ...
>     >>
>     >>    ---------------------------------------------------------------------
>     >>    To unsubscribe, e-mail: [hidden email]
>     <mailto:[hidden email]>
>     >>    For additional commands, e-mail: [hidden email]
>     <mailto:[hidden email]>
>     >>
>     >>
>     >>
>     >>
>     >>
>     >> ---------------------------------------------------------------------
>     >> To unsubscribe, e-mail: [hidden email]
>     <mailto:[hidden email]>
>     >> For additional commands, e-mail: [hidden email] <mailto:[hidden email]>
>     >
>
>
>     ---------------------------------------------------------------------
>     To unsubscribe, e-mail: [hidden email]
>     <mailto:[hidden email]>
>     For additional commands, e-mail: [hidden email] <mailto:[hidden email]>
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]