[jira] Created: (NUTCH-69) fetcher.threads.per.host ignored

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-69) fetcher.threads.per.host ignored

Sergey Smolyakov (Jira)
fetcher.threads.per.host ignored
--------------------------------

         Key: NUTCH-69
         URL: http://issues.apache.org/jira/browse/NUTCH-69
     Project: Nutch
        Type: Bug
  Components: fetcher  
    Reporter: Matthias Jaekle


Fetcher ignores 'maximum threads per host'.
If you fetch less domains with multiple threads, some webservers feel attacked or could not serve you any more.
So you loose lots of existing pages in your segments.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (NUTCH-69) fetcher.threads.per.host ignored

Sergey Smolyakov (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-69?page=all ]
     
Andrzej Bialecki  resolved NUTCH-69:
------------------------------------

    Resolution: Invalid

This behaviour is caused by improper configuration. When crawling less hosts than (fetcher threads / threads per host), some threads will always be blocked. Solution: change configuration to use less threads, or more threads per host, or increase the max.http.delay so that blocked threads would wait longer..

> fetcher.threads.per.host ignored
> --------------------------------
>
>          Key: NUTCH-69
>          URL: http://issues.apache.org/jira/browse/NUTCH-69
>      Project: Nutch
>         Type: Bug
>   Components: fetcher
>     Reporter: Matthias Jaekle

>
> Fetcher ignores 'maximum threads per host'.
> If you fetch less domains with multiple threads, some webservers feel attacked or could not serve you any more.
> So you loose lots of existing pages in your segments.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira