[jira] Created: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

classic Classic list List threaded Threaded
21 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
hdfs does not honor dfs.du.reserved setting
-------------------------------------------

                 Key: HADOOP-2549
                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
             Project: Hadoop
          Issue Type: Bug
          Components: dfs
    Affects Versions: 0.14.4
         Environment: FC Linux.
            Reporter: Joydeep Sen Sarma
            Priority: Critical


running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.

i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:

/* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0

/* some other disk chosen with 300G space. */
2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0

i am going to default blocksize to something reasonable when it's zero for now.

this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12556959#action_12556959 ]

Ted Dunning commented on HADOOP-2549:
-------------------------------------


This happens in 15.1 as well.  It is particularly problematic when there is one small and one large partition available for storage.  If the smaller partition is listed first, then it will be filled without any reference to available space and the status display will show available space because the larger partition is still free.

Aggressive rebalancing can stave off the problem, but that is more of a band-aid than a solution.

It is also a real problem that when the disk fills up, the file system is corrupted in a way that is very difficult to recover from.



> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557063#action_12557063 ]

Hairong Kuang commented on HADOOP-2549:
---------------------------------------

The cause of block size being 0 is that block size is not past as a parameter in block transfer protocol. So a Block object is initialized, we set its block size to be zero that leads to a parameter of zero when getNextVolume is called. There are three options:
1. change the DatanodeProtocol to pass the expected block size as well.
2. not to pass the block size in protocol, but use the default block size. The problem with this approach is to block size is a client size configuration.
3. use a big number like 128m as the block size. This may not work for bigger block size but should work most of the time.

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557068#action_12557068 ]

Raghu Angadi commented on HADOOP-2549:
--------------------------------------

"reserved" space is incremented by default block size to compensate for the fact that block size is not transfered in protocol.
Should we add a different version volume.getAvailable() that returns less negative number if left over space is less than the reserved? Currently it returns 0.


> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557072#action_12557072 ]

Raghu Angadi commented on HADOOP-2549:
--------------------------------------

or change '>=' to '>'  in {code}
if (volume.getAvailable() >= blockSize) { return volume; }{code}

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557068#action_12557068 ]

rangadi edited comment on HADOOP-2549 at 1/8/08 3:28 PM:
--------------------------------------------------------------

"reserved" space is incremented by default block size to compensate for the fact that block size is not transfered in protocol.
Should we add a different version of volume.getAvailable() that returns negative number if left over space is less than the reserved? Currently it returns 0.


      was (Author: rangadi):
    "reserved" space is incremented by default block size to compensate for the fact that block size is not transfered in protocol.
Should we add a different version volume.getAvailable() that returns less negative number if left over space is less than the reserved? Currently it returns 0.

 

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12557075#action_12557075 ]

Raghu Angadi commented on HADOOP-2549:
--------------------------------------

'>' does make sense even if we implement other options.. we do in fact need more space the block size (checksum, native filesystem overhead etc).


> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment: diskfull.patch

The patches takes the option 3.

Regarding Raghu's comments. Alghough the reservered space is incremented by default block size,  the default block size is not reserved when non-dfs usage takes more space than the reserved space (i.e. when available is less than remaining). Yes, I agree that it makes sense to change >= to >.


> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Priority: Critical
>         Attachments: diskfull.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Assigned: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang reassigned HADOOP-2549:
-------------------------------------

    Assignee: Hairong Kuang

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: diskfull.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment: diskfull1.patch

Approach 3 breaks some junit tests that use SimulatedDataSet.This new patch takes the approach 2 instead. It also fixed a minor error in SimulatedDataSet.

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: diskfull.patch, diskfull1.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment:     (was: diskfull1.patch)

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: diskfull.patch, diskfull1.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment: diskfull1.patch

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: diskfull.patch, diskfull1.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560189#action_12560189 ]

Konstantin Shvachko commented on HADOOP-2549:
---------------------------------------------

Yes, that should turn on volume switching if one of them is full.
Some comments.
- It is better to move declaration of estimateBlockSize up together with all other member declarations
- Use JavaDoc style comment for estimateBlockSize instead of the regular ones. That way I can see the description whenever I move the cursor over the variable in Eclipse.
- Do we plan to apply it to previous releases 0.14 or 0.15? If not then could you please also remove unused pieces of code
-# import org.apache.hadoop.io.Text;
-# private void enumerateThreadGroup()
-# short opStatus



> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: diskfull.patch, diskfull1.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment: diskfull2.patch

The patch incorporated most of Konstantin's comments except that I did not remove the unused method and variable. I need some time to investigate why they are there without being used.

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>         Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Fix Version/s: 0.16.0
           Status: Patch Available  (was: Open)

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment:     (was: diskfull2.patch)

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: diskfull.patch, diskfull1.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Hairong Kuang updated HADOOP-2549:
----------------------------------

    Attachment: diskfull2.patch

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560590#action_12560590 ]

Konstantin Shvachko commented on HADOOP-2549:
---------------------------------------------

Yes, if there is a doubt whether we should remove these two warnings lets not do it as a part of this patch.
But lets not forget to investigate later on.
+1

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12560698#action_12560698 ]

Hadoop QA commented on HADOOP-2549:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12373542/diskfull2.patch
against trunk revision r613359.

    @author +1.  The patch does not contain any @author tags.

    javadoc +1.  The javadoc tool did not generate any warning messages.

    javac +1.  The applied patch does not generate any new compiler warnings.

    findbugs +1.  The patch does not introduce any new Findbugs warnings.

    core tests +1.  The patch passed core unit tests.

    contrib tests +1.  The patch passed contrib unit tests.

Test results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1650/testReport/
Findbugs warnings: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1650/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1650/artifact/trunk/build/test/checkstyle-errors.html
Console output: http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch/1650/console

This message is automatically generated.

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-2549) hdfs does not honor dfs.du.reserved setting

Jorge Spinsanti (Jira)
In reply to this post by Jorge Spinsanti (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-2549?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Konstantin Shvachko updated HADOOP-2549:
----------------------------------------

    Resolution: Fixed
        Status: Resolved  (was: Patch Available)

I just committed this. Thank you Hairong.

> hdfs does not honor dfs.du.reserved setting
> -------------------------------------------
>
>                 Key: HADOOP-2549
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2549
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.14.4
>         Environment: FC Linux.
>            Reporter: Joydeep Sen Sarma
>            Assignee: Hairong Kuang
>            Priority: Critical
>             Fix For: 0.16.0
>
>         Attachments: diskfull.patch, diskfull1.patch, diskfull2.patch
>
>
> running 0.14.4. one of our drives is smaller and is always getting disk full. i reset the disk reservation to 1Gig - but it was filled quickly again.
> i put in some tracing in getnextvolume. the blocksize argument is 0. so every volume (regardless of available space) qualifies. here's the trace:
> /* root disk chosen with 0 available bytes. format is <available>:<blocksize>*/
> 2008-01-08 08:08:51,918 WARN org.apache.hadoop.dfs.DataNode: Volume /var/hadoop/tmp/dfs/data/current:0:0
> /* some other disk chosen with 300G space. */
> 2008-01-08 08:09:21,974 WARN org.apache.hadoop.dfs.DataNode: Volume /mnt/d1/hdfs/current:304725631026:0
> i am going to default blocksize to something reasonable when it's zero for now.
> this is driving us nuts since our automounter starts failing when we run out of space. so everything's broke.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

12