protocol-httpclient; maximum total connections

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

protocol-httpclient; maximum total connections

orkunt.sabuncu
Hi,

Protocol-httpclient sets the maximum number of total connections to
"fetcher.threads.fetch" configuration parameter for underlying
commons-httpclient. However, if -threads argument is used with the fetcher it
doesn't change fetcher.threads.fetch. Giving whatever number of threads to
-threads argument, httpclient will use default value of number of total
connections (10). This will affect the performance of crawling. It seems to
be a bug. Any comment on this?

Possible solution can be adding below line to setThreadCount function of
Fetcher class.
 NutchConf.get().setInt("fetcher.threads.fetch", threadCount);

Also, fetcher seems to be using lots of memory; maybe due to memory leak. It
starts with %10~%15; after several hours Linux top command reports it's using
%50~%70 of the whole memory. Anyone experiencing this behaviour?

Thanks,
-orkunt.
Reply | Threaded
Open this post in threaded view
|

Re: protocol-httpclient; maximum total connections

Stefan Groschupf-2
Thanks for finding this bug, please open a bug report in jira and if  
you like I guess patches are always welcome. :-)

Am 23.01.2006 um 15:00 schrieb [hidden email]:

> Hi,
>
> Protocol-httpclient sets the maximum number of total connections to
> "fetcher.threads.fetch" configuration parameter for underlying
> commons-httpclient. However, if -threads argument is used with the  
> fetcher it
> doesn't change fetcher.threads.fetch. Giving whatever number of  
> threads to
> -threads argument, httpclient will use default value of number of  
> total
> connections (10). This will affect the performance of crawling. It  
> seems to
> be a bug. Any comment on this?
>
> Possible solution can be adding below line to setThreadCount  
> function of
> Fetcher class.
>  NutchConf.get().setInt("fetcher.threads.fetch", threadCount);
>
> Also, fetcher seems to be using lots of memory; maybe due to memory  
> leak. It
> starts with %10~%15; after several hours Linux top command reports  
> it's using
> %50~%70 of the whole memory. Anyone experiencing this behaviour?
>
> Thanks,
> -orkunt.
>

---------------------------------------------------------------
company:        http://www.media-style.com
forum:        http://www.text-mining.org
blog:            http://www.find23.net