Created: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Created: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
DFS Scalability: When the namenode is restarted it consumes 80% CPU
-------------------------------------------------------------------

                 Key: HADOOP-1117
                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.12.0
            Reporter: dhruba borthakur
         Assigned To: dhruba borthakur


When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Attachment: CpuPendingTransfer.patch

pendingTransfer removes a block from neededReplications if that block already has the required number of replicas.

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>         Attachments: CpuPendingTransfer.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Priority: Blocker  (was: Major)

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>         Attachments: CpuPendingTransfer.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-1117:
--------------------------------

    Fix Version/s: 0.12.1

Dhruba meant to assign this to 0.12.1.

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Attachment:     (was: CpuPendingTransfer.patch)

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Attachment: CpuPendingTransfer2.patch

pendingTranfer does not cause any replication if the replication factor has already been achieved. Also, addStoredBlock() removes blocks from neededReplication if the replicationfactor has already been achieved.

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer2.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Commented: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12480884 ]

Hairong Kuang commented on HADOOP-1117:
---------------------------------------

The patch looks good. It is better to remove the logging because neededReplication has already done it.

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer2.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Attachment:     (was: CpuPendingTransfer2.patch)

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer3.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Attachment: CpuPendingTransfer3.patch

Removed some logging messages from previous patch.

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer3.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

dhruba borthakur updated HADOOP-1117:
-------------------------------------

    Status: Patch Available  (was: Open)

Code reviewed by Hairong.

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer3.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

Updated: (HADOOP-1117) DFS Scalability: When the namenode is restarted it consumes 80% CPU

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/HADOOP-1117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tom White updated HADOOP-1117:
------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I've just committed this. Thanks Dhruba!

> DFS Scalability: When the namenode is restarted it consumes 80% CPU
> -------------------------------------------------------------------
>
>                 Key: HADOOP-1117
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1117
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.12.0
>            Reporter: dhruba borthakur
>         Assigned To: dhruba borthakur
>            Priority: Blocker
>             Fix For: 0.12.1
>
>         Attachments: CpuPendingTransfer3.patch
>
>
> When the namenode is restarted, the datanodes register and each block is inserted into neededReplication. When the namenode exists, safemode it sees starts processing neededReplication. It picks up a block from neededReplication, sees that it has already has the required number of replicas, and continues to the next block in neededReplication. The blocks remain in neededReplication permanentlyhe namenode worker thread to scans this huge list of blocks once every 3 seconds. This consumes plenty of CPU on the namenode.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.