match multiple dfs output files in reduce

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

match multiple dfs output files in reduce

Hao Zheng
I want to partition my data by using R reducer tasks to produce R reduce
output files, and each reduce task also writes a binary file for the
corresponding partition on DFS.  Is there an easy way to generate
matching file names for the reduce output and the extra file?

For example:

reduce output: part-0 part-1 ... part-<R-1>
extra file:    file-0 file-1 ... file-<R-1>

I can't find how to access task id in reduce.

Thanks.

--hao
Reply | Threaded
Open this post in threaded view
|

Re: match multiple dfs output files in reduce

Owen O'Malley-4

On Aug 5, 2007, at 8:28 PM, Hao Zheng wrote:

> I can't find how to access task id in reduce.

For now, the best way is to look in the config via conf.get
("mapred.task.id").

It is documented here:
http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment

-- Owen