[jira] Created: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)
Block reports from all datanodes arrive at the namenode within a small band of time
-----------------------------------------------------------------------------------

                 Key: HADOOP-1501
                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur


I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.

My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1501:
-------------------------------------

    Attachment: randomBlockReportInterval.patch

1. The first block report goes out as soon as the datanode starts.
2. The second block report goes out within a random time [ 0 .. dfs.blockReportInterval]
3. The succeeding block reports are generated once every dfs.blockreport.Interval time period.

> Block reports from all datanodes arrive at the namenode within a small band of time
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: randomBlockReportInterval.patch
>
>
> I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.
> My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)
In reply to this post by Markus Jelsma (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506607 ]

Raghu Angadi commented on HADOOP-1501:
--------------------------------------

Does this mean we could get rid of {{blockReportIntervalBasis}} calculation? Currently it sets the interval to random value between 90-100% of configured interval. It is not necessary anymore.

+1 for current patch.

> Block reports from all datanodes arrive at the namenode within a small band of time
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>         Attachments: randomBlockReportInterval.patch
>
>
> I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.
> My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)
In reply to this post by Markus Jelsma (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1501:
-------------------------------------

    Assignee: dhruba borthakur
      Status: Patch Available  (was: Open)

Thanks for the review. I have not yet changed the 90-100% variablity of the periodicity of the block reports. I would like to keep that variability because it provides us with another randomization factor so that all block reports do not arrive at the namenode at around the same time.

> Block reports from all datanodes arrive at the namenode within a small band of time
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: randomBlockReportInterval.patch
>
>
> I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.
> My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)
In reply to this post by Markus Jelsma (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506628 ]

Hadoop QA commented on HADOOP-1501:
-----------------------------------

+1

http://issues.apache.org/jira/secure/attachment/12360084/randomBlockReportInterval.patch applied and successfully tested against trunk revision r548794.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/312/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/312/console

> Block reports from all datanodes arrive at the namenode within a small band of time
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: randomBlockReportInterval.patch
>
>
> I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.
> My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)
In reply to this post by Markus Jelsma (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1501:
---------------------------------

       Resolution: Fixed
    Fix Version/s: 0.14.0
           Status: Resolved  (was: Patch Available)

I just committed this.  Thanks, Dhruba!

> Block reports from all datanodes arrive at the namenode within a small band of time
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: randomBlockReportInterval.patch
>
>
> I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.
> My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1501) Block reports from all datanodes arrive at the namenode within a small band of time

Markus Jelsma (Jira)
In reply to this post by Markus Jelsma (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12506871 ]

Hudson commented on HADOOP-1501:
--------------------------------

Integrated in Hadoop-Nightly #131 (See [http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Nightly/131/])

> Block reports from all datanodes arrive at the namenode within a small band of time
> -----------------------------------------------------------------------------------
>
>                 Key: HADOOP-1501
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1501
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.14.0
>
>         Attachments: randomBlockReportInterval.patch
>
>
> I have a 2000 node cluster and the block report interval is set to 1 hour. Most block report arrive within a few minutes of one another. For example, I have seen block reports from all 2000 nodes arrive within 5 minutes of one another. This causes CPU overload on the namenode, causing dropped calls in Call queue.
> My proposal is to make the datanode send a block report as soon as the datanode starts. Then, it waits for a random time between 0 to 1 hour (the configured value) before sending the nect block report. From then on, block reports from that datanode are sent once every 1 hour (the configured value).

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.