hdfsOpenFile() API

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

hdfsOpenFile() API

alakshman
Hi

Can this only be done for read only and write only mode ? How do I do
appends ? Because if I am using this for writing logs then I would want to
append to the file rather overwrite which is what the write only mode is
doing.

Thanks
A
Reply | Threaded
Open this post in threaded view
|

Re: hdfsOpenFile() API

Briggs
No appending, AFAIK.  Hadoop is not intended for writing in this way.
It's more of a write few read many system. Such granular writes would
be inefficient.

On 6/13/07, Phantom <[hidden email]> wrote:

> Hi
>
> Can this only be done for read only and write only mode ? How do I do
> appends ? Because if I am using this for writing logs then I would want to
> append to the file rather overwrite which is what the write only mode is
> doing.
>
> Thanks
> A
>


--
"Conscious decisions by conscious minds are what make reality real"
Reply | Threaded
Open this post in threaded view
|

Re: hdfsOpenFile() API

alakshman
Hmm I was under the impression that HDFS is like GFS optimized for appends
although GFS supports random writes. So let's say I want to process logs
using Hadoop. The only way I can do it is to move the entire log into Hadoop
from some place else and then perhaps run Map/Reduce jobs against it. It
seems to kind defeat the purpose. Am I missing something ?

Thanks
A

On 6/13/07, Briggs <[hidden email]> wrote:

>
> No appending, AFAIK.  Hadoop is not intended for writing in this way.
> It's more of a write few read many system. Such granular writes would
> be inefficient.
>
> On 6/13/07, Phantom <[hidden email]> wrote:
> > Hi
> >
> > Can this only be done for read only and write only mode ? How do I do
> > appends ? Because if I am using this for writing logs then I would want
> to
> > append to the file rather overwrite which is what the write only mode is
> > doing.
> >
> > Thanks
> > A
> >
>
>
> --
> "Conscious decisions by conscious minds are what make reality real"
>
Reply | Threaded
Open this post in threaded view
|

Re: hdfsOpenFile() API

Briggs
Yeah, you are right about the google fs.

I have also heard from this list that some people are planning on
adding the append functionality to Hadoop, but it's just not there
yet.  I am not sure why.

Perhaps my "inefficient" comment was premature.  The term logging
stuck in my head and I have preconceived ideas of what you are doing.
I am thinking that continuously writing extremely small chucks to a
distributed file system would cause a lot of latency that would
probably slow your system down considerably. But again, I am not sure
of your situation.

As for the way hadoop is now, you would have to "copyFromLocal", which
probably sucks in your situation.  I can understand your pain in this
area.

Anyone else have any ideas?


On 6/13/07, Phantom <[hidden email]> wrote:

> Hmm I was under the impression that HDFS is like GFS optimized for appends
> although GFS supports random writes. So let's say I want to process logs
> using Hadoop. The only way I can do it is to move the entire log into Hadoop
> from some place else and then perhaps run Map/Reduce jobs against it. It
> seems to kind defeat the purpose. Am I missing something ?
>
> Thanks
> A
>
> On 6/13/07, Briggs <[hidden email]> wrote:
> >
> > No appending, AFAIK.  Hadoop is not intended for writing in this way.
> > It's more of a write few read many system. Such granular writes would
> > be inefficient.
> >
> > On 6/13/07, Phantom <[hidden email]> wrote:
> > > Hi
> > >
> > > Can this only be done for read only and write only mode ? How do I do
> > > appends ? Because if I am using this for writing logs then I would want
> > to
> > > append to the file rather overwrite which is what the write only mode is
> > > doing.
> > >
> > > Thanks
> > > A
> > >
> >
> >
> > --
> > "Conscious decisions by conscious minds are what make reality real"
> >
>


--
"Conscious decisions by conscious minds are what make reality real"
Reply | Threaded
Open this post in threaded view
|

Re: hdfsOpenFile() API

Owen O'Malley-4
In reply to this post by alakshman

On Jun 13, 2007, at 3:29 PM, Phantom wrote:

> Hmm I was under the impression that HDFS is like GFS optimized for  
> appends
> although GFS supports random writes.

HDFS doesn't support appends. There has been discussion of  
implementing single-writer appends, but it hasn't reached the top of  
anyone's priority list. Some people (me included) aren't thrilled by  
the semantics of atomic append in GFS. To me, it seems like atomic  
append is basically a poor-man's map/reduce. *smile*

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: hdfsOpenFile() API

alakshman
Which would mean that if I want to have my logs to reside in HDFS I will
have to move them using copyFromLocal or some version thereof and then run
Map/Reduce process against them ? Am I right ?

Thanks
Avinash

On 6/13/07, Owen O'Malley <[hidden email]> wrote:

>
>
> On Jun 13, 2007, at 3:29 PM, Phantom wrote:
>
> > Hmm I was under the impression that HDFS is like GFS optimized for
> > appends
> > although GFS supports random writes.
>
> HDFS doesn't support appends. There has been discussion of
> implementing single-writer appends, but it hasn't reached the top of
> anyone's priority list. Some people (me included) aren't thrilled by
> the semantics of atomic append in GFS. To me, it seems like atomic
> append is basically a poor-man's map/reduce. *smile*
>
> -- Owen
>
Reply | Threaded
Open this post in threaded view
|

Re: hdfsOpenFile() API

Doug Cutting
Phantom wrote:
> Which would mean that if I want to have my logs to reside in HDFS I will
> have to move them using copyFromLocal or some version thereof and then run
> Map/Reduce process against them ? Am I right ?

Yes.  HDFS is probably not currently suitable for directly storing log
output as it is generated.  But I don't think append is actually the
missing feature you need.  Rather, the problem is that, currently in
HDFS, until a file is closed, it does not exist.  So if your server
crashes and does not close its log, the log would disappear, which is
probably not what you'd want.

If copying log files to HDFS is prohibitive, an alternative might be to
make them available via HTTP and to write an HttpFileSystem where they
could be accessed directly as MapReduce inputs (assuming that's what).
An HttpFileSystem should be easy to implement and would be useful for
lots of things.  It need not implement things like 'delete' and 'rename'
or even 'create', but rather just 'open' and 'list', so it could only be
used for inputs.

Doug