[jira] Created: (HADOOP-1463) dfs should report total size of all the space that dfs is using

classic Classic list List threaded Threaded
38 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
dfs should report total size of all the space that dfs is using
---------------------------------------------------------------

                 Key: HADOOP-1463
                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
             Project: Hadoop
          Issue Type: Improvement
          Components: dfs
    Affects Versions: 0.12.3
            Reporter: Hairong Kuang
             Fix For: 0.14.0


Currently namenode reports two statistics back to the client:
1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.

Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502000 ]

Raghu Angadi commented on HADOOP-1463:
--------------------------------------


The meaning of "Reserved" that current implements is : "Space that Datanode should try to keep *free* for immediate or future use by either Datanode or some other application". In that sense I think the calculates well. Note that this calculation does not require costly du.

After chatting with Hairong, I think Koji's impression of "Reserved" is : "Disk space that Datanode avoids in its calculations whether it is used or free".  Eg. if there is a partition of 100 GB, 50% is reserved, and DFS occupies 40 GB,  25GB is "available", then "remaining" should be 10GB. This requires equivalent of 'du' for Datanode's data. With the previous interpretation remaining will be zero.

If we want the latter, then we need either do 'du' or maintain disk used, to be sent with every heartbeat.




> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502033 ]

Hairong Kuang commented on HADOOP-1463:
---------------------------------------

I feel that doing "du" to get the size of data directories is too costly. Current code does this every 3 seconds. What we can do is to have a counter keeping track of the size of all blocks at each datanode. It gets update whenever a block is written or deleted. It gets reset by summing up all block size when a block report is sent.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502045 ]

Raghu Angadi commented on HADOOP-1463:
--------------------------------------

+1. Except that during block report we should probably reset with 'du' instead of summing over block sizes, so that it takes all the other overhead of Datanode directory into account ( native filesystem, directories, 'previous' directories, metadata files, tmp directory etc). But it could be updated with block sizes as you described. Accrued error till next block report would be small.


> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12502138 ]

Koji Noguchi commented on HADOOP-1463:
--------------------------------------

> if we want the latter,

Yes I prefer the latter.  (Datanode utilizing another 10G of space)

As a side note, we might want to update the config description for "dfs.datanode.du.pct".
It seems to be different from what it actually does.

<property>
  <name>dfs.datanode.du.pct</name>
  <value>0.98f</value>
  <description>When calculating remaining space, only use this percentage of the real available space
  </description>
</property>



> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang reassigned HADOOP-1463:
-------------------------------------

    Assignee: Hairong Kuang

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508291 ]

Hairong Kuang commented on HADOOP-1463:
---------------------------------------

To summarize what we have discussed:

each data node's disk space = dfs used space + reserved space + remaining space

where dfs used space is a summation of all data dir sizes, reserved space is reserved for non-dfs usage whether it is used or unused, and remaining space is for future dfs usage.

dfs capacity = dfs used space + remaining space

data node sends dfs capacity and remaining space to namenode at each heartbeat.

I plan to run "df" when datanode gets started to get the data node's disk space and  the reserved space. I plan to keep track of dfs used space by running a "du" when a blockreport is sent and gets adjusted when a block is written or is deleted.

Please comment if you have any other opinion.

Regarding the reserved space, currently hadoop-default.xml supports the following two properties. Shall we enforce that only one of them is non-zero?
<code>
<property>
  <name>dfs.datanode.du.reserved</name>
  <value>0</value>
  <description>Reserved space in bytes. Always leave this much space free fornon dfs use
  </description>
</property>

<property>
  <name>dfs.datanode.du.pct</name>
  <value>0.98f</value>
  <description>When calculating remaining space, only use this percentage of the real available space
  </description>
</property>
<code>




> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508301 ]

Doug Cutting commented on HADOOP-1463:
--------------------------------------

> run "df" when datanode gets started to get the data node's disk space and the reserved space

We should also run "df" periodically, since other applications may be using disk space too (like MapReduce).  Probably once-per-heartbeat is excessive, but not less than once every few minutes.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508333 ]

Raghu Angadi commented on HADOOP-1463:
--------------------------------------

'df' is very cheap so calculating at every heart beat not bad either.

> Shall we enforce that only one of them is non-zero?
Not necessary depending on calculation of 'remaining space' 'reserved space' in your equations above. How are those calculated?

According requirements of this jira, this is what I understand the calculation of available space for datanode is:
{noformat}
remainig space for Datanode =
      Min( (dfs.datanode.du.pct * Total_Capacity -  cur_space_used_by_Datanode),
               cur_disk_available) - dfs.datanode.du.reserved
{noformat}
cur_disk_available (and total capacity) comes from 'df' and cur_space_used_by_Datanode is based on 'du'.

The descriptions of these config variables should probably change.


> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511860 ]

Hairong Kuang commented on HADOOP-1463:
---------------------------------------

Not sure if Raghu's formula is correct. The key is what's the definition of reserved space. From my understanding dfs.datanode.du.pct and dfs.datanode.du.reserved are two different ways of specifying reserved space. dfs.datanode.du.reserved gives an absolute value while dfs.datanode.du.pct gives a percentage. That's why I said that we need only one of them. Reserved space is for non-dfs usage including used space for map/reduce.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511864 ]

Hairong Kuang commented on HADOOP-1463:
---------------------------------------

The current implementation seems to define reserved space for future non-dfs usage.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511871 ]

Koji Noguchi commented on HADOOP-1463:
--------------------------------------

> From my understanding dfs.datanode.du.pct and dfs.datanode.du.reserved are two different ways of specifying reserved space.
>
I used to think the same. But there shouldn't be two config variables that serve the same purpose.
So, assuming "dfs.datanode.du.reserved" is the one for "space reserved for non-dfs usage *whether it is used or unused*"
I'd want

{noformat}
MIN( Total_Capacity -  cur_space_used_by_Datanode - dfs.datanode.du.reserved, cur_disk_available)
{noformat}

I'm not sure where "dfs.datanode.du.pct" should fit.  Maybe

{noformat}
MIN( Total_Capacity -  cur_space_used_by_Datanode - dfs.datanode.du.reserved, cur_disk_available) * dfs.datanode.du.pct
{noformat}


> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12511887 ]

Hairong Kuang commented on HADOOP-1463:
---------------------------------------

+1 on Koji's proposal 2.

I am reading more of the code.  The current implementation is interpreted the reserved space as the space reserved per volumn.  We want it to be the space reserved per datanode, right?
I also found out that the period for running "df" is configurable in dfs by setting the value of "dfs.df.interval". The default value is 3000msec. should we change the default value to be 1 min?

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1463:
----------------------------------

    Attachment: usedSpace.patch

A patch for review:
1. datanode sends namenode (dfs used space + remaining space, remaining space) per heartbeat. dfs remaining space & used space are cacluated as we dicussed.
2. fix a bug in printing a double in FSShell.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-1463:
----------------------------------

    Status: Patch Available  (was: Open)

Although this issue is still expecting review comments, I am marking it as patch available so it could be committed to release 0.14.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12512685 ]

Hadoop QA commented on HADOOP-1463:
-----------------------------------

-1, build or testing failed

2 attempts failed to build and test the latest attachment http://issues.apache.org/jira/secure/attachment/12361823/usedSpace.patch against trunk revision r555813.

Test results:   http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/413/testReport/
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/413/console

Please note that this message is automatically generated and may represent a problem with the automation system and not the patch.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513079 ]

dhruba borthakur commented on HADOOP-1463:
------------------------------------------

+1. Code looks good. is it possible to write a unit test (maybe later) for this one?

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Doug Cutting updated HADOOP-1463:
---------------------------------

    Status: Open  (was: Patch Available)

Unfortunately this patch no longer applies cleanly to trunk.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513085 ]

dhruba borthakur commented on HADOOP-1463:
------------------------------------------

Hairong is uploading a new patch merged with the latest trunk and a small bug fix.

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1463) dfs should report total size of all the space that dfs is using

kuladeep (Jira)
In reply to this post by kuladeep (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1463?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12513108 ]

Hairong Kuang commented on HADOOP-1463:
---------------------------------------

Koji is examining the interface and I am waiting for his comment..

> dfs should report total size of all the space that dfs is using
> ---------------------------------------------------------------
>
>                 Key: HADOOP-1463
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1463
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>    Affects Versions: 0.12.3
>            Reporter: Hairong Kuang
>            Assignee: Hairong Kuang
>             Fix For: 0.14.0
>
>         Attachments: usedSpace.patch
>
>
> Currently namenode reports two statistics back to the client:
> 1. The total capacity of dfs. This is a sum of all datanode's capacities, each of which is calculated by datanode summing all data directories disk space.
> 2. The total remaining space of dfs. This is a sum of all datanodes's remaining space. Each datanode's remaining space is calculated by using the following formula: remaining space = unused space - capacity*unusableDiskPercentage - reserved space. So the remaining space shows how much space that the dfs can still use, but it does not show the size of unused space.
> Each dfs client caculates the total dfs used space by substracting remaining space from the total capacity. So the used space does not accurately shows the space that dfs is using. However it is a very important number that dfs should provide.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

12