Creating a throttle

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Creating a throttle

Fankhauser, Alain
        Hello

        I'm thinking about to create a throttle, who let us decide at
witch day with wich speed (MB/S) and with wich number of connections
(one thread = one connection) the fetcher fetches. That means, we have
settings for every day and if there are no settings for a time at a day,
then the fetcher would will make a break till we get settings.
        The target of this throttle is to controlle the fetcherspeed.
        If we fetch too fast, we just put to sleep a few (percent
calculation) threads. If we are too slow, we just wake a few threads up.


        About using the throttle, i have also a few ideas.
* The first idea is to set the throttle with -throttleDescription
[path of throttleDescription] so the throttle reads in the description.
* -throttleDescription without anything, so the throttle takes the
path of the conf file nutch-site.xml
* the user doesn't like to use my throttle, so he doesn't add any
parameter.

        maybe you think that this is a good idea.
        please give me your feedback

        thanks and greetings
      Alain
Reply | Threaded
Open this post in threaded view
|

JobTrackerInfoServer and nutch*.jar

Anton Potekhin
Why jsp scripts launched under JobTrackerInfoServer do not see classes from
из nutch*.jar? How to point the JobTrackerInfoServer to use nutch*.jar?


Reply | Threaded
Open this post in threaded view
|

Re: Creating a throttle

Thomas Delnoij-3
In reply to this post by Fankhauser, Alain
I think someting like this has already been done (apart from the daily
changes you suggest) http://issues.apache.org/jira/browse/NUTCH-207

Rgrds, Thomas


On 5/1/06, Fankhauser, Alain <[hidden email]> wrote:

>         Hello
>
>         I'm thinking about to create a throttle, who let us decide at
> witch day with wich speed (MB/S) and with wich number of connections
> (one thread = one connection) the fetcher fetches. That means, we have
> settings for every day and if there are no settings for a time at a day,
> then the fetcher would will make a break till we get settings.
>         The target of this throttle is to controlle the fetcherspeed.
>         If we fetch too fast, we just put to sleep a few (percent
> calculation) threads. If we are too slow, we just wake a few threads up.
>
>
>         About using the throttle, i have also a few ideas.
> *       The first idea is to set the throttle with -throttleDescription
> [path of throttleDescription] so the throttle reads in the description.
> *       -throttleDescription without anything, so the throttle takes the
> path of the conf file nutch-site.xml
> *       the user doesn't like to use my throttle, so he doesn't add any
> parameter.
>
>         maybe you think that this is a good idea.
>         please give me your feedback
>
>         thanks and greetings
>       Alain
>
Reply | Threaded
Open this post in threaded view
|

mapred question

Anton Potekhin
As far as we understood from MapRed documentation all reduce tasks must be
launched after last map task is finished e.g map and reduce must not work
simultaneously. But often in logs we see such records: "map 80%, reduce 10%"
and many more records where map is less then 100% but reduce more than 0%.
How should we interpret this?    


Reply | Threaded
Open this post in threaded view
|

Re: mapred question

Doug Cutting
[hidden email] wrote:
> As far as we understood from MapRed documentation all reduce tasks must be
> launched after last map task is finished e.g map and reduce must not work
> simultaneously. But often in logs we see such records: "map 80%, reduce 10%"
> and many more records where map is less then 100% but reduce more than 0%.
> How should we interpret this?    

Hadoop includes the "shuffle" stage in reduce.  Currently, first 25% of
a reduce task's progress is copying map outputs to the reduce node.
These copies can start as soon as any map tasks completes, so that, when
the last map task completes there is very little data remaining to be
copied, and the rest of the reduce work can quickly start.

Doug