How can I get the current file name in the Map function of WC example?

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

How can I get the current file name in the Map function of WC example?

howard chen
for example, in the Word Count example....

 public void map(WritableComparable key, Writable value,
        OutputCollector output,
        Reporter reporter) throws IOException {
      String line = ((Text)value).toString();
      StringTokenizer itr = new StringTokenizer(line);
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        output.collect(word, one);
      }
    }


How can I get the file name of value belong to?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: How can I get the current file name in the Map function of WC example?

Owen O'Malley-5

On Dec 10, 2006, at 6:19 AM, howard chen wrote:

> How can I get the file name of value belong to?

Yes, it is set as a property in the JobConf. Look at the wiki page:

http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment

under "localized properties".

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: How can I get the current file name in the Map function of WC example?

howard chen
On 12/11/06, Owen O'Malley <[hidden email]> wrote:

>
> On Dec 10, 2006, at 6:19 AM, howard chen wrote:
>
> > How can I get the file name of value belong to?
>
> Yes, it is set as a property in the JobConf. Look at the wiki page:
>
> http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment
>
> under "localized properties".
>
> -- Owen
>

Thanks!

I have another problem...

in the reduce function of WC example

 public void reduce(WritableComparable key, Iterator values,
        OutputCollector output,
        Reporter reporter) throws IOException {
      int sum = 0;
      while (values.hasNext()) {
        sum += ((IntWritable) values.next()).get();
      }
      output.collect(key, new IntWritable(sum));
    }


rather than output to part-00000..., is it possible to output to a
separate file (filename = key), with the content is the count?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: How can I get the current file name in the Map function of WC example?

Arif Iqbal
I also want the similar functionality but was wondering if its possible.

On 12/10/06, howard chen <[hidden email]> wrote:

>
> On 12/11/06, Owen O'Malley <[hidden email]> wrote:
> >
> > On Dec 10, 2006, at 6:19 AM, howard chen wrote:
> >
> > > How can I get the file name of value belong to?
> >
> > Yes, it is set as a property in the JobConf. Look at the wiki page:
> >
> > http://wiki.apache.org/lucene-hadoop/TaskExecutionEnvironment
> >
> > under "localized properties".
> >
> > -- Owen
> >
>
> Thanks!
>
> I have another problem...
>
> in the reduce function of WC example
>
> public void reduce(WritableComparable key, Iterator values,
>         OutputCollector output,
>         Reporter reporter) throws IOException {
>       int sum = 0;
>       while (values.hasNext()) {
>         sum += ((IntWritable) values.next()).get();
>       }
>       output.collect(key, new IntWritable(sum));
>     }
>
>
> rather than output to part-00000..., is it possible to output to a
> separate file (filename = key), with the content is the count?
>
> Thanks.
>
Reply | Threaded
Open this post in threaded view
|

Re: How can I get the current file name in the Map function of WC example?

Owen O'Malley-5
In reply to this post by howard chen
> in the reduce function of WC example
>
> public void reduce(WritableComparable key, Iterator values,
>        OutputCollector output,
>        Reporter reporter) throws IOException {
>      int sum = 0;
>      while (values.hasNext()) {
>        sum += ((IntWritable) values.next()).get();
>      }
>      output.collect(key, new IntWritable(sum));
>    }
>
>
> rather than output to part-00000..., is it possible to output to a
> separate file (filename = key), with the content is the count?

That is possible by just creating a new file in each invocation of  
reduce.

void reduce(WritableComparable key, Iterator values, OutputCollector  
output, Reporter reporter
             ) throws IOException {
   ... compute sum of values ...
   Path outFile = new Path(conf.getOutputDirectory(), key.toString());
   OutputStream out = outFile.getFileSystem(conf).create(outFile);
   ... write sum to out ...
   out.close();
}

You should also turn off speculative execution or use the phased file  
system.

-- Owen