How to ask hadoop not to split the input

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

How to ask hadoop not to split the input

Rui Shi
Hi,

My input is a bunch of gz files on local file system. I don't want hadoop to split them for mappers. How should I specify that?

Thanks,

Rui



      ____________________________________________________________________________________
Be a better friend, newshound, and
know-it-all with Yahoo! Mobile.  Try it now.  http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ 
Reply | Threaded
Open this post in threaded view
|

RE: How to ask hadoop not to split the input

Runping Qi-2

If your files have .gz as extension, they will split.

Runping


> -----Original Message-----
> From: Rui Shi [mailto:[hidden email]]
> Sent: Thursday, December 13, 2007 2:53 PM
> To: [hidden email]
> Subject: How to ask hadoop not to split the input
>
> Hi,
>
> My input is a bunch of gz files on local file system. I don't want
hadoop

> to split them for mappers. How should I specify that?
>
> Thanks,
>
> Rui
>
>
>
>
>
________________________________________________________________________
__
> __________
> Be a better friend, newshound, and
> know-it-all with Yahoo! Mobile.  Try it now.
> http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ
Reply | Threaded
Open this post in threaded view
|

Re: How to ask hadoop not to split the input

Owen O'Malley-4

On Dec 13, 2007, at 3:03 PM, Runping Qi wrote:

>
> If your files have .gz as extension, they will split.


They will _not_ split.
Reply | Threaded
Open this post in threaded view
|

Re: How to ask hadoop not to split the input

Rui Shi
In reply to this post by Rui Shi

Hi,

I guess that the problem is that I wrote my own LineReader. In this case, the corresponding InputFormat has to specify that the input is not splitable by overriding the isSplitable() method. I have got that fixed.

Thanks,

Rui

----- Original Message ----
From: Owen O'Malley <[hidden email]>
To: [hidden email]
Sent: Thursday, December 13, 2007 3:19:58 PM
Subject: Re: How to ask hadoop not to split the input



On Dec 13, 2007, at 3:03 PM, Runping Qi wrote:

>
> If your files have .gz as extension, they will split.


They will _not_ split.






      ____________________________________________________________________________________
Looking for last minute shopping deals?  
Find them fast with Yahoo! Search.  http://tools.search.yahoo.com/newsearch/category.php?category=shopping