Non local mapper .. Is it worth it?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Non local mapper .. Is it worth it?

jay vyas
If there is a job with files f1 and f2, and a Mapper (m1) is running against a file (f2) which is far from the local machine(m1), will the overhead of copying f2 over to m1 be worth it?.

That is .... - is the amount of resources required to read data off a remote machine (m2)  worth it? Or would it be better if that remote (m2) now simply processed both files (f1, f2) in turn?

Jay Vyas
http://jayunit100.blogspot.com
jay vyas
Reply | Threaded
Open this post in threaded view
|

Re: Non local mapper .. Is it worth it?

Bertrand Dechoux
The short answer is yes it can be worth it because your job can finish
faster if you are not only allowing local mappers. But this is of course a
trade off. The best performance (but not latency) can be obtained when
using only local mappers. You should read about delay scheduling which
allows the user to pick what is the 'best'. Fair scheduler has it for
hadoop 1 and capacity scheduler has it but for hadoop 2.

Regards

Bertrand

On Thu, Dec 6, 2012 at 6:14 AM, <[hidden email]> wrote:

> If there is a job with files f1 and f2, and a Mapper (m1) is running
> against a file (f2) which is far from the local machine(m1), will the
> overhead of copying f2 over to m1 be worth it?.
>
> That is .... - is the amount of resources required to read data off a
> remote machine (m2)  worth it? Or would it be better if that remote (m2)
> now simply processed both files (f1, f2) in turn?
>
> Jay Vyas
> http://jayunit100.blogspot.com




--
Bertrand Dechoux
Reply | Threaded
Open this post in threaded view
|

Re: Non local mapper .. Is it worth it?

jay vyas
Hmmmm.... but How can the scheduler effect the performance of a Mapper if there are no competing jobs?

I thought the scheduler only impacted the way separate jobs got resources for different jobs. In my example, there are 2 mappers, 2+n files, and 1 job.

Jay Vyas
http://jayunit100.blogspot.com

On Dec 6, 2012, at 4:39 AM, Bertrand Dechoux <[hidden email]> wrote:

> The short answer is yes it can be worth it because your job can finish
> faster if you are not only allowing local mappers. But this is of course a
> trade off. The best performance (but not latency) can be obtained when
> using only local mappers. You should read about delay scheduling which
> allows the user to pick what is the 'best'. Fair scheduler has it for
> hadoop 1 and capacity scheduler has it but for hadoop 2.
>
> Regards
>
> Bertrand
>
> On Thu, Dec 6, 2012 at 6:14 AM, <[hidden email]> wrote:
>
>> If there is a job with files f1 and f2, and a Mapper (m1) is running
>> against a file (f2) which is far from the local machine(m1), will the
>> overhead of copying f2 over to m1 be worth it?.
>>
>> That is .... - is the amount of resources required to read data off a
>> remote machine (m2)  worth it? Or would it be better if that remote (m2)
>> now simply processed both files (f1, f2) in turn?
>>
>> Jay Vyas
>> http://jayunit100.blogspot.com
>
>
>
>
> --
> Bertrand Dechoux
jay vyas