Mounting HDFS as local file system

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

Mounting HDFS as local file system

Mark Kerzner
Hi, guys,

I see that there is MountableHDFS<http://wiki.apache.org/hadoop/MountableHDFS>,
and I know that it works, but my questions are as follows:

   - How reliable is it for large storage?;
   - Is it not hiding the regular design questions - we are dealing with
   NameServers after all, but are trying to use it as a regular file system?
   - For example, HDFS is not optimized for many small files that get
   written and deleted, but a mounted system will lure one in this direction.


Thanks a bunch for your opinions.

Sincerely,
Mark
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Steve Loughran
On 02/12/10 03:01, Mark Kerzner wrote:
> Hi, guys,
>
> I see that there is MountableHDFS<http://wiki.apache.org/hadoop/MountableHDFS>,
> and I know that it works, but my questions are as follows:
>
>     - How reliable is it for large storage?;

Shouldn't be any worse than normal HDFS operations.

>     - Is it not hiding the regular design questions - we are dealing with
>     NameServers after all, but are trying to use it as a regular file system?
>     - For example, HDFS is not optimized for many small files that get
>     written and deleted, but a mounted system will lure one in this direction.

Like you say, it's not a conventional posix fs, it hates small files,
where other things may be better.
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Brian Bockelman

On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote:

> On 02/12/10 03:01, Mark Kerzner wrote:
>> Hi, guys,
>>
>> I see that there is MountableHDFS<http://wiki.apache.org/hadoop/MountableHDFS>,
>> and I know that it works, but my questions are as follows:
>>
>>    - How reliable is it for large storage?;
>
> Shouldn't be any worse than normal HDFS operations.
>
>>    - Is it not hiding the regular design questions - we are dealing with
>>    NameServers after all, but are trying to use it as a regular file system?
>>    - For example, HDFS is not optimized for many small files that get
>>    written and deleted, but a mounted system will lure one in this direction.
>
> Like you say, it's not a conventional posix fs, it hates small files, where other things may be better.
I would comment that it's extremely reliable.  There's at least one slow memory leak in fuse-dfs that I haven't been able to squash, and I typically remount things after a month or two of *heavy* usage.

Across all the nodes in our cluster, we probably do a few billion HDFS operations per day over FUSE.

Brian

smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Mark Kerzner
Thank you, Brian.

I found your paper "Using Hadoop as grid storage," and it was very useful.

One thing I did not understand in it is your file usage pattern - do you
deal with small or large files, and do you delete them often enough? My
question was, in part, can you use HDFS as a regular file system with
frequent file deletes? Does it not become fragmented and unreliable?

Thank you,
Mark

On Thu, Dec 2, 2010 at 7:10 AM, Brian Bockelman <[hidden email]>wrote:

>
> On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote:
>
> > On 02/12/10 03:01, Mark Kerzner wrote:
> >> Hi, guys,
> >>
> >> I see that there is MountableHDFS<
> http://wiki.apache.org/hadoop/MountableHDFS>,
> >> and I know that it works, but my questions are as follows:
> >>
> >>    - How reliable is it for large storage?;
> >
> > Shouldn't be any worse than normal HDFS operations.
> >
> >>    - Is it not hiding the regular design questions - we are dealing with
> >>    NameServers after all, but are trying to use it as a regular file
> system?
> >>    - For example, HDFS is not optimized for many small files that get
> >>    written and deleted, but a mounted system will lure one in this
> direction.
> >
> > Like you say, it's not a conventional posix fs, it hates small files,
> where other things may be better.
>
> I would comment that it's extremely reliable.  There's at least one slow
> memory leak in fuse-dfs that I haven't been able to squash, and I typically
> remount things after a month or two of *heavy* usage.
>
> Across all the nodes in our cluster, we probably do a few billion HDFS
> operations per day over FUSE.
>
> Brian
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Brian Bockelman

On Dec 2, 2010, at 8:52 AM, Mark Kerzner wrote:

> Thank you, Brian.
>
> I found your paper "Using Hadoop as grid storage," and it was very useful.
>
> One thing I did not understand in it is your file usage pattern - do you
> deal with small or large files, and do you delete them often enough? My
> question was, in part, can you use HDFS as a regular file system with
> frequent file deletes? Does it not become fragmented and unreliable?
>

We don't have any fragmentation issues.  We frequently delete files (we're supposed to be able to turn over 500TB in 2 weeks).  We use quotas and have daily monitoring to watch for users who abuse the system.  The only directories without quotas are the ones we populate centrally; user directories (who we don't control) can quite easily get 1-20TB, but have to provide a strong justification to get more than 10k files.

Because HDFS has limited write semantics (but close enough to POSIX read semantics) our users love it, but understand it's "special".

It's been a matter of user training:
- Do you want high performance storage that can do lots of small files?  If so, the cost is $X / TB.
- Do you want high throughput storage where you have limited write semantics and need to use large files?  If so, the cost is $Y / TB.
X is roughly 5-10x Y, so the group leaders can budget appropriately.  We then purchase Hadoop and our Other Storage System in appropriate amounts.

User education goes a long way.  However, if they don't want to be bothered to be educated, they can always pay more money.

Brian

> Thank you,
> Mark
>
> On Thu, Dec 2, 2010 at 7:10 AM, Brian Bockelman <[hidden email]>wrote:
>
>>
>> On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote:
>>
>>> On 02/12/10 03:01, Mark Kerzner wrote:
>>>> Hi, guys,
>>>>
>>>> I see that there is MountableHDFS<
>> http://wiki.apache.org/hadoop/MountableHDFS>,
>>>> and I know that it works, but my questions are as follows:
>>>>
>>>>   - How reliable is it for large storage?;
>>>
>>> Shouldn't be any worse than normal HDFS operations.
>>>
>>>>   - Is it not hiding the regular design questions - we are dealing with
>>>>   NameServers after all, but are trying to use it as a regular file
>> system?
>>>>   - For example, HDFS is not optimized for many small files that get
>>>>   written and deleted, but a mounted system will lure one in this
>> direction.
>>>
>>> Like you say, it's not a conventional posix fs, it hates small files,
>> where other things may be better.
>>
>> I would comment that it's extremely reliable.  There's at least one slow
>> memory leak in fuse-dfs that I haven't been able to squash, and I typically
>> remount things after a month or two of *heavy* usage.
>>
>> Across all the nodes in our cluster, we probably do a few billion HDFS
>> operations per day over FUSE.
>>
>> Brian


smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Mark Kerzner
Brian,

that almost answers my question. Still, are you saying that the problem of
"Hadoop hates small files" does not exist?

Mark

On Thu, Dec 2, 2010 at 9:02 AM, Brian Bockelman <[hidden email]>wrote:

>
> On Dec 2, 2010, at 8:52 AM, Mark Kerzner wrote:
>
> > Thank you, Brian.
> >
> > I found your paper "Using Hadoop as grid storage," and it was very
> useful.
> >
> > One thing I did not understand in it is your file usage pattern - do you
> > deal with small or large files, and do you delete them often enough? My
> > question was, in part, can you use HDFS as a regular file system with
> > frequent file deletes? Does it not become fragmented and unreliable?
> >
>
> We don't have any fragmentation issues.  We frequently delete files (we're
> supposed to be able to turn over 500TB in 2 weeks).  We use quotas and have
> daily monitoring to watch for users who abuse the system.  The only
> directories without quotas are the ones we populate centrally; user
> directories (who we don't control) can quite easily get 1-20TB, but have to
> provide a strong justification to get more than 10k files.
>
> Because HDFS has limited write semantics (but close enough to POSIX read
> semantics) our users love it, but understand it's "special".
>
> It's been a matter of user training:
> - Do you want high performance storage that can do lots of small files?  If
> so, the cost is $X / TB.
> - Do you want high throughput storage where you have limited write
> semantics and need to use large files?  If so, the cost is $Y / TB.
> X is roughly 5-10x Y, so the group leaders can budget appropriately.  We
> then purchase Hadoop and our Other Storage System in appropriate amounts.
>
> User education goes a long way.  However, if they don't want to be bothered
> to be educated, they can always pay more money.
>
> Brian
>
> > Thank you,
> > Mark
> >
> > On Thu, Dec 2, 2010 at 7:10 AM, Brian Bockelman <[hidden email]
> >wrote:
> >
> >>
> >> On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote:
> >>
> >>> On 02/12/10 03:01, Mark Kerzner wrote:
> >>>> Hi, guys,
> >>>>
> >>>> I see that there is MountableHDFS<
> >> http://wiki.apache.org/hadoop/MountableHDFS>,
> >>>> and I know that it works, but my questions are as follows:
> >>>>
> >>>>   - How reliable is it for large storage?;
> >>>
> >>> Shouldn't be any worse than normal HDFS operations.
> >>>
> >>>>   - Is it not hiding the regular design questions - we are dealing
> with
> >>>>   NameServers after all, but are trying to use it as a regular file
> >> system?
> >>>>   - For example, HDFS is not optimized for many small files that get
> >>>>   written and deleted, but a mounted system will lure one in this
> >> direction.
> >>>
> >>> Like you say, it's not a conventional posix fs, it hates small files,
> >> where other things may be better.
> >>
> >> I would comment that it's extremely reliable.  There's at least one slow
> >> memory leak in fuse-dfs that I haven't been able to squash, and I
> typically
> >> remount things after a month or two of *heavy* usage.
> >>
> >> Across all the nodes in our cluster, we probably do a few billion HDFS
> >> operations per day over FUSE.
> >>
> >> Brian
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Michael Thomas
In reply to this post by Brian Bockelman
On 12/02/2010 05:10 AM, Brian Bockelman wrote:

>
> On Dec 2, 2010, at 5:16 AM, Steve Loughran wrote:
>
>> On 02/12/10 03:01, Mark Kerzner wrote:
>>> Hi, guys,
>>>
>>> I see that there is MountableHDFS<http://wiki.apache.org/hadoop/MountableHDFS>,
>>> and I know that it works, but my questions are as follows:
>>>
>>>    - How reliable is it for large storage?;
>>
>> Shouldn't be any worse than normal HDFS operations.
>>
>>>    - Is it not hiding the regular design questions - we are dealing with
>>>    NameServers after all, but are trying to use it as a regular file system?
>>>    - For example, HDFS is not optimized for many small files that get
>>>    written and deleted, but a mounted system will lure one in this direction.
>>
>> Like you say, it's not a conventional posix fs, it hates small files, where other things may be better.
>
> I would comment that it's extremely reliable.  There's at least one slow memory leak in fuse-dfs that I haven't been able to squash, and I typically remount things after a month or two of *heavy* usage.
Using the automounter with fuse-dfs has helped us a lot.  After 5
minutes of no activity, the fuse-dfs process goes away and the memory
leak is cleaned up automatically.  We only see the problem when there is
constant HDFS usage for days at a time, which unfortunately, has been
the rule rather than the exception lately.

--Mike

> Across all the nodes in our cluster, we probably do a few billion HDFS operations per day over FUSE.


smime.p7s (5K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Mounting HDFS as local file system

Brian Bockelman
In reply to this post by Mark Kerzner

On Dec 2, 2010, at 9:22 AM, Mark Kerzner wrote:

> Brian,
>
> that almost answers my question. Still, are you saying that the problem of
> "Hadoop hates small files" does not exist?
>

Well, I'd say "hates" is too strong of a word.  Several of the "costs" (NN memory, latency, efficiency) in HDFS are a function of the number of files, and one needs to plan appropriately.  Some  users can accept this fact and work with it; other, less sophisticated, users simply need to be told "don't save anything less than 10MB".

Over Thanksgiving holidays, we had a process go awry and write 900,000 files smaller than a kilobyte into one XFS directory.  This was definitely costly in system resources and inefficient, but I wouldn't say XFS hates small files.

So you need to keep file size costs in your planning.  If you ignore this variable, you will likely be bitten by these issues.

Brian


smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

HDFS and distcp issue??

hadoopman
In reply to this post by Brian Bockelman
I've run into an interesting problem with syncing a couple of clusters
using distcp.  We've validated that it works to a local installation
from our remote cluster.  I suspect our firewalls 'may' be responsible
for the problem we're experiencing.  We're using ports 9000, 9001 and
50010.I've verified all three ports are available to the namenodes and
datanodes in both directions.  Is there something else we're missing?

Looks like it get's to 80% before it fails.  Here's what we're seeing.

# user@hnn1:~$ hadoop distcp hdfs://hnn1:9000/user/testing
hdfs://hnn2:9000/user

10/12/03 15:58:10 INFO tools.DistCp:
srcPaths=[hdfs://hnn1:9000/user/testing]

10/12/03 15:58:10 INFO tools.DistCp: destPath=hdfs://hnn2:9000/user

10/12/03 15:58:11 INFO tools.DistCp: srcCount=6

10/12/03 15:58:11 INFO mapred.JobClient: Running job: job_201011221457_0019

10/12/03 15:58:12 INFO mapred.JobClient:  map 0% reduce 0%

  10/12/03 15:58:36 INFO mapred.JobClient:  map 19% reduce 0%

10/12/03 15:58:45 INFO mapred.JobClient:  map 39% reduce 0%

10/12/03 15:59:03 INFO mapred.JobClient:  map 60% reduce 0%

10/12/03 15:59:12 INFO mapred.JobClient:  map 80% reduce 0%

10/12/03 15:59:32 INFO mapred.JobClient: Task Id :
attempt_201011221457_0019_m_000000_0, Status : FAILED

java.io.IOException: Copied: 0 Skipped: 0 Failed: 5

         at
org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:572)

         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)

         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

         at org.apache.hadoop.mapred.Child.main(Child.java:170)

10/12/03 15:59:33 INFO mapred.JobClient:  map 0% reduce 0%

10/12/03 15:59:55 INFO mapred.JobClient:  map 19% reduce 0%

10/12/03 16:00:04 INFO mapred.JobClient:  map 39% reduce 0%

10/12/03 16:00:22 INFO mapred.JobClient:  map 60% reduce 0%

10/12/03 16:00:31 INFO mapred.JobClient:  map 80% reduce 0%

10/12/03 16:00:51 INFO mapred.JobClient: Task Id :
attempt_201011221457_0019_m_000000_1, Status : FAILED

java.io.IOException: Copied: 0 Skipped: 0 Failed: 5

         at
org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:572)

         at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)

         at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)

         at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)

         at org.apache.hadoop.mapred.Child.main(Child.java:170)

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: HDFS and distcp issue??

Dmitriy Ryaboy-2
Do you have the failing task's log?

-Dmitriy

On Sat, Dec 4, 2010 at 12:47 PM, hadoopman <[hidden email]> wrote:

> I've run into an interesting problem with syncing a couple of clusters
> using distcp.  We've validated that it works to a local installation from
> our remote cluster.  I suspect our firewalls 'may' be responsible for the
> problem we're experiencing.  We're using ports 9000, 9001 and 50010.I've
> verified all three ports are available to the namenodes and datanodes in
> both directions.  Is there something else we're missing?
>
> Looks like it get's to 80% before it fails.  Here's what we're seeing.
>
> # user@hnn1:~$ hadoop distcp hdfs://hnn1:9000/user/testing
> hdfs://hnn2:9000/user
>
> 10/12/03 15:58:10 INFO tools.DistCp:
> srcPaths=[hdfs://hnn1:9000/user/testing]
>
> 10/12/03 15:58:10 INFO tools.DistCp: destPath=hdfs://hnn2:9000/user
>
> 10/12/03 15:58:11 INFO tools.DistCp: srcCount=6
>
> 10/12/03 15:58:11 INFO mapred.JobClient: Running job: job_201011221457_0019
>
> 10/12/03 15:58:12 INFO mapred.JobClient:  map 0% reduce 0%
>
>  10/12/03 15:58:36 INFO mapred.JobClient:  map 19% reduce 0%
>
> 10/12/03 15:58:45 INFO mapred.JobClient:  map 39% reduce 0%
>
> 10/12/03 15:59:03 INFO mapred.JobClient:  map 60% reduce 0%
>
> 10/12/03 15:59:12 INFO mapred.JobClient:  map 80% reduce 0%
>
> 10/12/03 15:59:32 INFO mapred.JobClient: Task Id :
> attempt_201011221457_0019_m_000000_0, Status : FAILED
>
> java.io.IOException: Copied: 0 Skipped: 0 Failed: 5
>
>        at
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:572)
>
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> 10/12/03 15:59:33 INFO mapred.JobClient:  map 0% reduce 0%
>
> 10/12/03 15:59:55 INFO mapred.JobClient:  map 19% reduce 0%
>
> 10/12/03 16:00:04 INFO mapred.JobClient:  map 39% reduce 0%
>
> 10/12/03 16:00:22 INFO mapred.JobClient:  map 60% reduce 0%
>
> 10/12/03 16:00:31 INFO mapred.JobClient:  map 80% reduce 0%
>
> 10/12/03 16:00:51 INFO mapred.JobClient: Task Id :
> attempt_201011221457_0019_m_000000_1, Status : FAILED
>
> java.io.IOException: Copied: 0 Skipped: 0 Failed: 5
>
>        at
> org.apache.hadoop.tools.DistCp$CopyFilesMapper.close(DistCp.java:572)
>
>        at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:57)
>
>        at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
>
>        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
>
>        at org.apache.hadoop.mapred.Child.main(Child.java:170)
>
> Thanks!
>



--
Dmitriy V Ryaboy
Twitter Analytics
http://twitter.com/squarecog
Reply | Threaded
Open this post in threaded view
|

Re: HDFS and distcp issue??

hadoopman
On 12/06/2010 07:48 PM, Dmitriy Ryaboy wrote:
> Do you have the failing task's log?
>
> -Dmitriy
>
> On Sat, Dec 4, 2010 at 12:47 PM, hadoopman<[hidden email]>  wrote:
>
>    

I'll have to look for it.  This is my first full blown installation of
Hadoop. Still a LOT to learn

Is that the name it's typically called?  Yeah I know.  It's a beginners
question :-)

Thanks!