NDFS/Mapreduce questions

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

NDFS/Mapreduce questions

Joanna Harpell
 A few questions about NDFS:

 - Can a single NDFS file have multiple concurrent writers?

- Can the block size be changed in a running NDFS file system?
- Are there any plans to localize Map tasks to block-resident nodes?
- Is there any reason that big files (>>10TB?) wouldnt work?

- Is there any reason that big blocks (1GB?) wouldn't work?

 Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: NDFS/Mapreduce questions

Doug Cutting-2
Joanna Harpell wrote:
>  A few questions about NDFS:
>
>  - Can a single NDFS file have multiple concurrent writers?

No.  This is not supported.

> - Can the block size be changed in a running NDFS file system?

Yes, that is the intent.

> - Are there any plans to localize Map tasks to block-resident nodes?

Yes.  I think MapReduce & NDFS are now getting stable enough that we can
begin work on performance enhancements like this.  Other optimizations
I'd like to get to soon: one copy of reduce output should be written to
the local node if possible; reduce tasks should start as soon as the
first map task is complete, copying and sorting map output in parallel
with the remainder of the map tasks; and the job tracker should assign
new tasks to nodes that are working on the fewest tasks.  Each of these
should make a significant performance improvement.

> - Is there any reason that big files (>>10TB?) wouldnt work?

Not that I can think of.

> - Is there any reason that big blocks (1GB?) wouldn't work?

Not that I can think of.

The total number of blocks must not get too great, since the name ->
blockId* mapping is kept in RAM on the namenode.

Doug