generate.max.per.host is per reduce task

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

generate.max.per.host is per reduce task

Chris Schneider-2
Gang,

I just noticed that the generate.max.per.host property is only
enforced on a "per reduce task" basis during the first generate job
(see Generator.Selector.reduce for details). At a minimum, it should
probably be documented this way in nutch-default.xml.template.

Thoughts?

- Chris
--
------------------------
Chris Schneider
TransPac Software, Inc.
[hidden email]
------------------------
Reply | Threaded
Open this post in threaded view
|

Re: generate.max.per.host is per reduce task

Doug Cutting
Chris Schneider wrote:
> I just noticed that the generate.max.per.host property is only enforced
> on a "per reduce task" basis during the first generate job (see
> Generator.Selector.reduce for details). At a minimum, it should probably
> be documented this way in nutch-default.xml.template.

Yes, but all URLs with the same host are a single reduce task, since it
is generating host-disjoint fetch lists.

Doug