[jira] Created: (NUTCH-181) mapred.local.dir temp dir. space allocation limited by smallest area

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-181) mapred.local.dir temp dir. space allocation limited by smallest area

JIRA jira@apache.org
mapred.local.dir  temp dir. space allocation limited by smallest area
---------------------------------------------------------------------

         Key: NUTCH-181
         URL: http://issues.apache.org/jira/browse/NUTCH-181
     Project: Nutch
        Type: Bug
  Components: indexer  
    Versions: 0.8-dev    
 Environment: all
    Reporter: Paul Baclace


When mapred.local.dir is used to specify multiple  temp dir. areas, space allocation limited by smallest area because the temp dir. selection algorithm is "round robin starting from a randomish point".   When round robin is used with approximately constant sized chunks, the smallest area runs out of space first, and this is a fatal error.

Workaround: only list local fs dirs in mapred.local.dir with similarly-sized available areas.

I wrote a patch to JobConf (currenly being tested) which uses df to check available space (once a minute or less often) and then uses an efficient roulette selection to do allocation weighted by magnitude of available space.



--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira