Fetcher freezes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Fetcher freezes

Aisha-2
Hi,

I don't know why but I have no answer on the 3 forums where I sent my problem........
As the problem of Fetcher freezes occurs every time I try  to fetch my file system I can't imagine that I am the only one who have this problem and as I said in my last e-mail, I found many mails about this problem but no solution seems have been done........
It is a big problem so I don't understand why nobody seems interested on it........

I try to crawl over my file system but the crawl never finished, it aborted
with the message "Aborting with 3 hung threads".

The number of hung threads is not the same if I retry....

I modify the configuration grawing the number of threads but it doesn't solve the problem........

Please could somebody help me,
I can't crawl my file system..........

thanks in advance.
Aïcha
Reply | Threaded
Open this post in threaded view
|

Re: Fetcher freezes

Stefan Groschupf
Hi,

try to have no regular expression filter and check if this helps.
Let me know if this solve the problem.
You may be want to do a thread dump and send the log to the list to  
check where exactly the fetcher freezes.

Stefan

Am 03.11.2006 um 15:53 schrieb Aisha:

>
> Hi,
>
> I don't know why but I have no answer on the 3 forums where I sent my
> problem........
> As the problem of Fetcher freezes occurs every time I try  to fetch  
> my file
> system I can't imagine that I am the only one who have this problem  
> and as I
> said in my last e-mail, I found many mails about this problem but no
> solution seems have been done........
> It is a big problem so I don't understand why nobody seems  
> interested on
> it........
>
> I try to crawl over my file system but the crawl never finished, it  
> aborted
> with the message "Aborting with 3 hung threads".
>
> The number of hung threads is not the same if I retry....
>
> I modify the configuration grawing the number of threads but it  
> doesn't
> solve the problem........
>
> Please could somebody help me,
> I can't crawl my file system..........
>
> thanks in advance.
> Aïcha
>
> --
> View this message in context: http://www.nabble.com/Fetcher-freezes- 
> tf2568287.html#a7158776
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com



Reply | Threaded
Open this post in threaded view
|

Re: Fetcher freezes

Aisha-2
Hi,

I am not in my office so I will try on Monday and send you the logs and file configuration I use
but the freeze seems not linked with a file in partricular because in the logs the freezes doesn't occur
at the same time........

Thank you for your answer....
I will contact you on Monday,
Have a good week.
Aïcha

Stefan Groschupf-2 wrote
Hi,

try to have no regular expression filter and check if this helps.
Let me know if this solve the problem.
You may be want to do a thread dump and send the log to the list to  
check where exactly the fetcher freezes.

Stefan

Am 03.11.2006 um 15:53 schrieb Aisha:

>
> Hi,
>
> I don't know why but I have no answer on the 3 forums where I sent my
> problem........
> As the problem of Fetcher freezes occurs every time I try  to fetch  
> my file
> system I can't imagine that I am the only one who have this problem  
> and as I
> said in my last e-mail, I found many mails about this problem but no
> solution seems have been done........
> It is a big problem so I don't understand why nobody seems  
> interested on
> it........
>
> I try to crawl over my file system but the crawl never finished, it  
> aborted
> with the message "Aborting with 3 hung threads".
>
> The number of hung threads is not the same if I retry....
>
> I modify the configuration grawing the number of threads but it  
> doesn't
> solve the problem........
>
> Please could somebody help me,
> I can't crawl my file system..........
>
> thanks in advance.
> Aïcha
>
> --
> View this message in context: http://www.nabble.com/Fetcher-freezes- 
> tf2568287.html#a7158776
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com


Reply | Threaded
Open this post in threaded view
|

Re: Fetcher freezes

Aisha-2
In reply to this post by Stefan Groschupf
Hi,

I don't know if I well understood the "no regular expression filter" but I delete the urlfilter from my nutch-site.xml,

this is my nutch-site.xml configuration :

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<configuration>

<property>
<name>plugin.includes</name> 
 <value>protocol-file|parse-(text|msword|msexcel|mspowerpoint|rtf|xml|html|js|pdf|oo)|index-basic|query-basic|summary-basic|scoring-opic</value>
</property> 

<property>
  <name>file.content.ignored</name>
  <value>false</value>
</property>

<property>
<name>file.content.limit</name> <value>-1</value>
</property> 

<property>
  <name>db.ignore.external.links</name>
  <value>true</value>
</property>

<property>
  <name>fetcher.threads.fetch</name>
  <value>1000</value>
</property>

<property>
  <name>fetcher.threads.per.host</name>
  <value>1000</value>
  <description>This number is the maximum number of threads that
    should be allowed to access a host at one time.</description>
</property>

<property>
  <name>fetcher.verbose</name>
  <value>true</value>
  <description>If true, fetcher will log more verbosely.</description>
</property>
<property>
  <name>fetcher.server.delay</name>
  <value>5.0</value>
  <description>The number of seconds the fetcher will delay between
   successive requests to the same server.</description>
</property>

<property>
 <name>fetcher.max.crawl.delay</name>
 <value>30</value>
</property> 

<property>
  <name>indexer.max.tokens</name>
  <value>Integer.MAX_VALUE</value>
</property>


<property>
  <name>db.max.outlinks.per.page</name>
  <value>10000</value>
</property>
<property>
  <name>db.max.anchor.length</name>
  <value>200</value>
  <description>The maximum number of characters permitted in an anchor.
  </description>
</property>
</configuration>


the fetcher freezes after 2 hours.....
as I said the logs don't give informations because each time I run it, the freezes never occur on the same directory or file .....
Do I have to make a change in my configuration?

Thanks in advance,
Aïcha

Stefan Groschupf-2 wrote
Hi,

try to have no regular expression filter and check if this helps.
Let me know if this solve the problem.
You may be want to do a thread dump and send the log to the list to  
check where exactly the fetcher freezes.

Stefan

Am 03.11.2006 um 15:53 schrieb Aisha:

>
> Hi,
>
> I don't know why but I have no answer on the 3 forums where I sent my
> problem........
> As the problem of Fetcher freezes occurs every time I try  to fetch  
> my file
> system I can't imagine that I am the only one who have this problem  
> and as I
> said in my last e-mail, I found many mails about this problem but no
> solution seems have been done........
> It is a big problem so I don't understand why nobody seems  
> interested on
> it........
>
> I try to crawl over my file system but the crawl never finished, it  
> aborted
> with the message "Aborting with 3 hung threads".
>
> The number of hung threads is not the same if I retry....
>
> I modify the configuration grawing the number of threads but it  
> doesn't
> solve the problem........
>
> Please could somebody help me,
> I can't crawl my file system..........
>
> thanks in advance.
> Aïcha
>
> --
> View this message in context: http://www.nabble.com/Fetcher-freezes- 
> tf2568287.html#a7158776
> Sent from the Nutch - Dev mailing list archive at Nabble.com.
>
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
search tech for web 2.1
Menlo Park, California
http://www.101tec.com


Reply | Threaded
Open this post in threaded view
|

Re: Fetcher freezes

Aisha-2
Hi,

My configuration was as suggested Dennis Kubes in the nutch-user forum but I still have the problem.....
I think the problem was fixed for http protocol with the NUTCH-344 and the configuration :
<property>
  <name>http.max.delays</name>
  <value>30</value>
 </property> 

but putting the configuration :
<property>
 <name>fetcher.max.crawl.delay</name>
 <value>30</value>
</property> 
 
don't fix the problem for the crawling of the file system......
I repeat I am using the nutch nightly build on 19/10/2006