Quantcast

conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Yuanyuan Tian


I have a problem in getting the input file name in the mapper  when uisng
MultipleInputs. I need to use MultipleInputs to support different formats
for my inputs to the my MapReduce job. And inside each mapper, I also need
to know the exact input file that the mapper is processing. However,
conf.get("map.input.file") returns null. Can anybody help me solve this
problem? Thanks in advance.

public class Test extends Configured implements Tool{

        static class InnerMapper extends MapReduceBase implements
Mapper<Writable, Writable, NullWritable, Text>
        {
                ................
                ................

                public void configure(JobConf conf)
                {
                        String inputName=conf.get("map.input.file"));
                        .......................................
                }

        }

        public int run(String[] arg0) throws Exception {
                JonConf job;
                job = new JobConf(Test.class);
                ...........................................

                MultipleInputs.addInputPath(conf, new Path("A"),
TextInputFormat.class);
                MultipleInputs.addInputPath(conf, new Path("B"),
SequenceFileFormat.class);
                ...........................................
        }
}

Yuanyuan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Tom White-2
Hi Yuanyuan,

I think you've found a bug - could you file a JIRA issue for this please?

Thanks,
Tom

On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <[hidden email]> wrote:

>
>
> I have a problem in getting the input file name in the mapper  when uisng
> MultipleInputs. I need to use MultipleInputs to support different formats
> for my inputs to the my MapReduce job. And inside each mapper, I also need
> to know the exact input file that the mapper is processing. However,
> conf.get("map.input.file") returns null. Can anybody help me solve this
> problem? Thanks in advance.
>
> public class Test extends Configured implements Tool{
>
>        static class InnerMapper extends MapReduceBase implements
> Mapper<Writable, Writable, NullWritable, Text>
>        {
>                ................
>                ................
>
>                public void configure(JobConf conf)
>                {
>                        String inputName=conf.get("map.input.file"));
>                        .......................................
>                }
>
>        }
>
>        public int run(String[] arg0) throws Exception {
>                JonConf job;
>                job = new JobConf(Test.class);
>                ...........................................
>
>                MultipleInputs.addInputPath(conf, new Path("A"),
> TextInputFormat.class);
>                MultipleInputs.addInputPath(conf, new Path("B"),
> SequenceFileFormat.class);
>                ...........................................
>        }
> }
>
> Yuanyuan
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Yuanyuan Tian

Hi Tom,

I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean time, can you suggest an alternative approach to achieve what I want (supporting different input formats and get the input file name in each mapper)?

Yuanyuan

Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a bug - could you file a JIRA issue for this please?


From:

Tom White <[hidden email]>

To:

[hidden email]

Date:

04/29/2010 09:42 AM

Subject:

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20




Hi Yuanyuan,

I think you've found a bug - could you file a JIRA issue for this please?

Thanks,
Tom

On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <[hidden email]> wrote:
>
>
> I have a problem in getting the input file name in the mapper  when uisng
> MultipleInputs. I need to use MultipleInputs to support different formats
> for my inputs to the my MapReduce job. And inside each mapper, I also need
> to know the exact input file that the mapper is processing. However,
> conf.get("map.input.file") returns null. Can anybody help me solve this
> problem? Thanks in advance.
>
> public class Test extends Configured implements Tool{
>
>        static class InnerMapper extends MapReduceBase implements
> Mapper<Writable, Writable, NullWritable, Text>
>        {
>                ................
>                ................
>
>                public void configure(JobConf conf)
>                {
>                        String inputName=conf.get("map.input.file"));
>                        .......................................
>                }
>
>        }
>
>        public int run(String[] arg0) throws Exception {
>                JonConf job;
>                job = new JobConf(Test.class);
>                ...........................................
>
>                MultipleInputs.addInputPath(conf, new Path("A"),
> TextInputFormat.class);
>                MultipleInputs.addInputPath(conf, new Path("B"),
> SequenceFileFormat.class);
>                ...........................................
>        }
> }
>
> Yuanyuan


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Tom White-2
Hi Yuanyuan,

Thanks for filing an issue. To work around the issue could you use a
regular FileInputFormat in a set of map-only jobs (which can read the
input file names) so you can create a common input for a final MR job?
This is admittedly less efficient since it needs more jobs.

Cheers,
Tom

On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <[hidden email]> wrote:

>
> Hi Tom,
>
> I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean time, can you suggest an alternative approach to achieve what I want (supporting different input formats and get the input file name in each mapper)?
>
> Yuanyuan
>
> Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a bug - could you file a JIRA issue for this please?
>
>
> From:
> Tom White <[hidden email]>
> To:
> [hidden email]
> Date:
> 04/29/2010 09:42 AM
> Subject:
> Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20
> ________________________________
>
>
> Hi Yuanyuan,
>
> I think you've found a bug - could you file a JIRA issue for this please?
>
> Thanks,
> Tom
>
> On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <[hidden email]> wrote:
> >
> >
> > I have a problem in getting the input file name in the mapper  when uisng
> > MultipleInputs. I need to use MultipleInputs to support different formats
> > for my inputs to the my MapReduce job. And inside each mapper, I also need
> > to know the exact input file that the mapper is processing. However,
> > conf.get("map.input.file") returns null. Can anybody help me solve this
> > problem? Thanks in advance.
> >
> > public class Test extends Configured implements Tool{
> >
> >        static class InnerMapper extends MapReduceBase implements
> > Mapper<Writable, Writable, NullWritable, Text>
> >        {
> >                ................
> >                ................
> >
> >                public void configure(JobConf conf)
> >                {
> >                        String inputName=conf.get("map.input.file"));
> >                        .......................................
> >                }
> >
> >        }
> >
> >        public int run(String[] arg0) throws Exception {
> >                JonConf job;
> >                job = new JobConf(Test.class);
> >                ...........................................
> >
> >                MultipleInputs.addInputPath(conf, new Path("A"),
> > TextInputFormat.class);
> >                MultipleInputs.addInputPath(conf, new Path("B"),
> > SequenceFileFormat.class);
> >                ...........................................
> >        }
> > }
> >
> > Yuanyuan
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Farhan Husain-3
Can you try the following code?

((FileSplit) context.getInputSplit()).getPath().getName()

Thanks,
Farhan

On Thu, Apr 29, 2010 at 12:46 PM, Tom White <[hidden email]> wrote:

> Hi Yuanyuan,
>
> Thanks for filing an issue. To work around the issue could you use a
> regular FileInputFormat in a set of map-only jobs (which can read the
> input file names) so you can create a common input for a final MR job?
> This is admittedly less efficient since it needs more jobs.
>
> Cheers,
> Tom
>
> On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <[hidden email]> wrote:
> >
> > Hi Tom,
> >
> > I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
> time, can you suggest an alternative approach to achieve what I want
> (supporting different input formats and get the input file name in each
> mapper)?
> >
> > Yuanyuan
> >
> > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a
> bug - could you file a JIRA issue for this please?
> >
> >
> > From:
> > Tom White <[hidden email]>
> > To:
> > [hidden email]
> > Date:
> > 04/29/2010 09:42 AM
> > Subject:
> > Re: conf.get("map.input.file") returns null when using MultipleInputs in
> Hadoop 0.20
> > ________________________________
> >
> >
> > Hi Yuanyuan,
> >
> > I think you've found a bug - could you file a JIRA issue for this please?
> >
> > Thanks,
> > Tom
> >
> > On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <[hidden email]>
> wrote:
> > >
> > >
> > > I have a problem in getting the input file name in the mapper  when
> uisng
> > > MultipleInputs. I need to use MultipleInputs to support different
> formats
> > > for my inputs to the my MapReduce job. And inside each mapper, I also
> need
> > > to know the exact input file that the mapper is processing. However,
> > > conf.get("map.input.file") returns null. Can anybody help me solve this
> > > problem? Thanks in advance.
> > >
> > > public class Test extends Configured implements Tool{
> > >
> > >        static class InnerMapper extends MapReduceBase implements
> > > Mapper<Writable, Writable, NullWritable, Text>
> > >        {
> > >                ................
> > >                ................
> > >
> > >                public void configure(JobConf conf)
> > >                {
> > >                        String inputName=conf.get("map.input.file"));
> > >                        .......................................
> > >                }
> > >
> > >        }
> > >
> > >        public int run(String[] arg0) throws Exception {
> > >                JonConf job;
> > >                job = new JobConf(Test.class);
> > >                ...........................................
> > >
> > >                MultipleInputs.addInputPath(conf, new Path("A"),
> > > TextInputFormat.class);
> > >                MultipleInputs.addInputPath(conf, new Path("B"),
> > > SequenceFileFormat.class);
> > >                ...........................................
> > >        }
> > > }
> > >
> > > Yuanyuan
> >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20

Yuanyuan Tian

Hi Farhan,

I believe I have to use the old JobConf MapReduce interface in order to user MultipleInputs. As a result, I cannot do as you suggested.

Yuanyuan

Farhan Husain ---04/30/2010 11:46:17 PM---Can you try the following code? ((FileSplit) context.getInputSplit()).getPath().getName()


From:

Farhan Husain <[hidden email]>

To:

[hidden email]

Date:

04/30/2010 11:46 PM

Subject:

Re: conf.get("map.input.file") returns null when using MultipleInputs in Hadoop 0.20




Can you try the following code?

((FileSplit) context.getInputSplit()).getPath().getName()

Thanks,
Farhan

On Thu, Apr 29, 2010 at 12:46 PM, Tom White <[hidden email]> wrote:

> Hi Yuanyuan,
>
> Thanks for filing an issue. To work around the issue could you use a
> regular FileInputFormat in a set of map-only jobs (which can read the
> input file names) so you can create a common input for a final MR job?
> This is admittedly less efficient since it needs more jobs.
>
> Cheers,
> Tom
>
> On Thu, Apr 29, 2010 at 10:37 AM, Yuanyuan Tian <[hidden email]> wrote:
> >
> > Hi Tom,
> >
> > I have file a JIRA ticket (MAPREDUCE-1743) for this issue. At the mean
> time, can you suggest an alternative approach to achieve what I want
> (supporting different input formats and get the input file name in each
> mapper)?
> >
> > Yuanyuan
> >
> > Tom White ---04/29/2010 09:42:44 AM---Hi Yuanyuan, I think you've found a
> bug - could you file a JIRA issue for this please?
> >
> >
> > From:
> > Tom White <[hidden email]>
> > To:
> > [hidden email]
> > Date:
> > 04/29/2010 09:42 AM
> > Subject:
> > Re: conf.get("map.input.file") returns null when using MultipleInputs in
> Hadoop 0.20
> > ________________________________
> >
> >
> > Hi Yuanyuan,
> >
> > I think you've found a bug - could you file a JIRA issue for this please?
> >
> > Thanks,
> > Tom
> >
> > On Wed, Apr 28, 2010 at 11:04 PM, Yuanyuan Tian <[hidden email]>
> wrote:
> > >
> > >
> > > I have a problem in getting the input file name in the mapper  when
> uisng
> > > MultipleInputs. I need to use MultipleInputs to support different
> formats
> > > for my inputs to the my MapReduce job. And inside each mapper, I also
> need
> > > to know the exact input file that the mapper is processing. However,
> > > conf.get("map.input.file") returns null. Can anybody help me solve this
> > > problem? Thanks in advance.
> > >
> > > public class Test extends Configured implements Tool{
> > >
> > >        static class InnerMapper extends MapReduceBase implements
> > > Mapper<Writable, Writable, NullWritable, Text>
> > >        {
> > >                ................
> > >                ................
> > >
> > >                public void configure(JobConf conf)
> > >                {
> > >                        String inputName=conf.get("map.input.file"));
> > >                        .......................................
> > >                }
> > >
> > >        }
> > >
> > >        public int run(String[] arg0) throws Exception {
> > >                JonConf job;
> > >                job = new JobConf(Test.class);
> > >                ...........................................
> > >
> > >                MultipleInputs.addInputPath(conf, new Path("A"),
> > > TextInputFormat.class);
> > >                MultipleInputs.addInputPath(conf, new Path("B"),
> > > SequenceFileFormat.class);
> > >                ...........................................
> > >        }
> > > }
> > >
> > > Yuanyuan
> >
> >
>


Loading...