Commented: (HADOOP-248) locating map outputs via random probing is inefficient

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Commented: (HADOOP-248) locating map outputs via random probing is inefficient

Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12475160 ]

Hadoop QA commented on HADOOP-248:
----------------------------------

-1, because the patch command could not apply the latest attachment (http://issues.apache.org/jira/secure/attachment/12351812/248-fixed1.patch) as a patch to trunk revision r510644. Please note that this message is automatically generated and may represent a problem with the automation system and not the patch. Results are at http://lucene.zones.apache.org:8080/hudson/job/Hadoop-Patch

> locating map outputs via random probing is inefficient
> ------------------------------------------------------
>
>                 Key: HADOOP-248
>                 URL: https://issues.apache.org/jira/browse/HADOOP-248
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: mapred
>    Affects Versions: 0.2.1
>            Reporter: Owen O'Malley
>         Assigned To: Devaraj Das
>             Fix For: 0.12.0
>
>         Attachments: 248-9.patch, 248-fixed1.patch, 248-initial7.patch, 248-initial8.patch
>
>
> Currently the ReduceTaskRunner polls the JobTracker for a random list of map tasks asking for their output locations. It would be better if the JobTracker kept an ordered log and the interface was changed to:
> class MapLocationResults {
>    public int getTimestamp();
>    public MapOutputLocation[] getLocations();
> }
> interface InterTrackerProtocol {
>   ...
>   MapLocationResults locateMapOutputs(int prevTimestamp);
> }
> with the intention that each time a ReduceTaskRunner calls locateMapOutputs, it passes back the "timestamp" that it got from the previous result. That way, reduces can easily find the new MapOutputs. This should help the "ramp up" when the maps first start finishing.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.