Should nutch try to reduce first?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Should nutch try to reduce first?

Rod Taylor-2
When you run multiple commands within nutch it seems to process the
pending tasks in the order that they were added to the queue.  In some
cases this means you may be 50% through many jobs (complete map but not
reduce) while processes maps for yet more jobs.

I think Nutch should prioritize a pending reduce before a pending map as
it keeps things going through (other processes may depend on the
results) and allows temporary diskspace to be freed.
--
Rod Taylor <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Should nutch try to reduce first?

Doug Cutting-2
Rod,

Mike is in the middle of a major revision of the job-tracker that I
believe addresses this issue, as well as lot of others.  Tasks will be
prioritized by job.

Thanks,

Doug

Rod Taylor wrote:
> When you run multiple commands within nutch it seems to process the
> pending tasks in the order that they were added to the queue.  In some
> cases this means you may be 50% through many jobs (complete map but not
> reduce) while processes maps for yet more jobs.
>
> I think Nutch should prioritize a pending reduce before a pending map as
> it keeps things going through (other processes may depend on the
> results) and allows temporary diskspace to be freed.