how to make fetcher to use the full bandwidth

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

how to make fetcher to use the full bandwidth

AJ Chen-2
I try to fetch as fast as it can by using more threads on a large fetch
list. But, the fetcher starts download at speed much lower than the full
bandwidth allows. And the start download speed varies a lot from run to run,
200kb/s to 1200kb/s on my DSL line. This variation also happens on T1 line
that I just tested.
Could someone share experience on how to make fetcher use the full
bandwidth? We know the speed drops gradually during a long fetch run. But,
can the fetch achieve the highest speed allowed by the bandwidth when fetch
starts?

AJ
Reply | Threaded
Open this post in threaded view
|

Re: how to make fetcher to use the full bandwidth

Rod Taylor-2
On Thu, 2005-10-13 at 13:35 -0700, AJ Chen wrote:
> I try to fetch as fast as it can by using more threads on a large fetch
> list. But, the fetcher starts download at speed much lower than the full
> bandwidth allows. And the start download speed varies a lot from run to run,
> 200kb/s to 1200kb/s on my DSL line. This variation also happens on T1 line
> that I just tested.
> Could someone share experience on how to make fetcher use the full
> bandwidth? We know the speed drops gradually during a long fetch run. But,
> can the fetch achieve the highest speed allowed by the bandwidth when fetch
> starts?

I found that for high bandwidth (50Mbits and above) DNS seems to be a
limiting factor.

4000 threads with a local caching DNS server seems to be enough to fill
the pipe though

--
Rod Taylor <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: how to make fetcher to use the full bandwidth

Rod Taylor-2
On Thu, 2005-10-13 at 16:42 -0400, Rod Taylor wrote:

> On Thu, 2005-10-13 at 13:35 -0700, AJ Chen wrote:
> > I try to fetch as fast as it can by using more threads on a large fetch
> > list. But, the fetcher starts download at speed much lower than the full
> > bandwidth allows. And the start download speed varies a lot from run to run,
> > 200kb/s to 1200kb/s on my DSL line. This variation also happens on T1 line
> > that I just tested.
> > Could someone share experience on how to make fetcher use the full
> > bandwidth? We know the speed drops gradually during a long fetch run. But,
> > can the fetch achieve the highest speed allowed by the bandwidth when fetch
> > starts?

> 4000 threads with a local caching DNS server seems to be enough to fill
> the pipe though

Your also limited by the number of servers you are connecting out to
since Nutch will by default limit itself to asking for a single page at
a time from a single server.

--
Rod Taylor <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: how to make fetcher to use the full bandwidth

AJ Chen-2
In reply to this post by Rod Taylor-2
Thanks, Rod. Were you always able to fill the pipe under the same
conditions? I'm puzzling by the difference in fetch speed even when the same
number of threads and root urls are used.

I don't have local DNS server yet. To avoid overwhelming ISP's DNS server, I
use only 10 threads for the first run of fetch and so the fetch speed is
expected not great in this run. But, in the second fetch run, I use 500
threads and it can fill the pipe sometimes, but most of time uses 1/5 of the
pipe. The number of hosts, >1500, may be small. How many hosts are usually
used in your crawl?

AJ


On 10/13/05, Rod Taylor <[hidden email]> wrote:

>
> On Thu, 2005-10-13 at 13:35 -0700, AJ Chen wrote:
> > I try to fetch as fast as it can by using more threads on a large fetch
> > list. But, the fetcher starts download at speed much lower than the full
> > bandwidth allows. And the start download speed varies a lot from run to
> run,
> > 200kb/s to 1200kb/s on my DSL line. This variation also happens on T1
> line
> > that I just tested.
> > Could someone share experience on how to make fetcher use the full
> > bandwidth? We know the speed drops gradually during a long fetch run.
> But,
> > can the fetch achieve the highest speed allowed by the bandwidth when
> fetch
> > starts?
>
> I found that for high bandwidth (50Mbits and above) DNS seems to be a
> limiting factor.
>
> 4000 threads with a local caching DNS server seems to be enough to fill
> the pipe though
>
> --
> Rod Taylor <[hidden email]>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: how to make fetcher to use the full bandwidth

Jon Shoberg


I like to use vnstat to monitor bandwidth.  Jsut keep adding threads as
long as the CPU/memory/pipe keep holding up.

http://humdi.net/vnstat/


-j

Reply | Threaded
Open this post in threaded view
|

Re: how to make fetcher to use the full bandwidth

Jon Shoberg
In reply to this post by AJ Chen-2


I like to use vnstat to monitor bandwidth.  Jsut keep adding threads as
long as the CPU/memory/pipe keep holding up.

http://humdi.net/vnstat/


-j