RegexUrlFilter hangs up

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

RegexUrlFilter hangs up

Marko Bauhardt-2
Hi all,
I use nutch-mapred from the svn-branch. Sometimes the reduce job of  
the fetchprocess hangs up. The CoreDump prints out that the  
RegexUrlFilter is in work.
In the regex-urlfilter.txt i uncommented the line
#-[?*!@=]

because I want to fetch dynamic urls like jsp's.



Here is the CoreDump.

051017 151123 reduce > reduce
Full thread dump Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode):

"MultiThreadedHttpConnectionManager cleanup" daemon prio=1  
tid=0x08249fa0 nid=0x7645 in Object.wait() [6d489000..6d489868]
         at java.lang.Object.wait(Native Method)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
         - locked <0x753a19c0> (a java.lang.ref.ReferenceQueue$Lock)
         at  
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
$ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1100)

"Thread-1" prio=1 tid=0x082149b0 nid=0x7645 runnable  
[6efc3000..6efc3868]
         at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  
Source)
         at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  
Source)
         at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  
Source)
         at org.apache.oro.text.regex.Perl5Matcher.__tryExpression
(Unknown Source)
         at org.apache.oro.text.regex.Perl5Matcher.__interpret
(Unknown Source)
         at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown  
Source)
         at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown  
Source)
         at org.apache.nutch.net.RegexURLFilter.filter
(RegexURLFilter.java:114)
         - locked <0x753d8cc8> (a org.apache.nutch.net.RegexURLFilter)
         at org.apache.nutch.net.URLFilters.filter(URLFilters.java:77)
         at org.apache.nutch.crawl.ParseOutputFormat$1.write
(ParseOutputFormat.java:71)
         at org.apache.nutch.crawl.FetcherOutputFormat$1.write
(FetcherOutputFormat.java:78)
         at org.apache.nutch.mapred.ReduceTask$2.collect
(ReduceTask.java:247)
         at org.apache.nutch.mapred.lib.IdentityReducer.reduce
(IdentityReducer.java:41)
         at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
         at org.apache.nutch.mapred.LocalJobRunner$Job.run
(LocalJobRunner.java:90)

"Signal Dispatcher" daemon prio=1 tid=0x080a6ff8 nid=0x7645 waiting  
on condition [0..0]

"Finalizer" daemon prio=1 tid=0x080933e8 nid=0x7645 in Object.wait()  
[70159000..70159868]
         at java.lang.Object.wait(Native Method)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
         - locked <0x75350780> (a java.lang.ref.ReferenceQueue$Lock)
         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
         at java.lang.ref.Finalizer$FinalizerThread.run
(Finalizer.java:159)

"Reference Handler" daemon prio=1 tid=0x08091978 nid=0x7645 in  
Object.wait() [701da000..701da868]
         at java.lang.Object.wait(Native Method)
         at java.lang.Object.wait(Object.java:429)
         at java.lang.ref.Reference$ReferenceHandler.run
(Reference.java:115)
         - locked <0x753507e8> (a java.lang.ref.Reference$Lock)

"main" prio=1 tid=0x0805c0d8 nid=0x7645 waiting on condition  
[bfffb000..bfffb41c]
         at java.lang.Thread.sleep(Native Method)
         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:294)
         at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:333)
         at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:362)

"VM Thread" prio=1 tid=0x08090718 nid=0x7645 runnable

"VM Periodic Task Thread" prio=1 tid=0x6fb01420 nid=0x7645 waiting on  
condition
"Suspend Checker Thread" prio=1 tid=0x080a65f0 nid=0x7645 runnable


Reply | Threaded
Open this post in threaded view
|

Re: RegexUrlFilter hangs up

Doug Cutting-2
What makes you think that the fetcher is hung?

Doug

Marko Bauhardt wrote:

> Hi all,
> I use nutch-mapred from the svn-branch. Sometimes the reduce job of  the
> fetchprocess hangs up. The CoreDump prints out that the  RegexUrlFilter
> is in work.
> In the regex-urlfilter.txt i uncommented the line
> #-[?*!@=]
>
> because I want to fetch dynamic urls like jsp's.
>
>
>
> Here is the CoreDump.
>
> 051017 151123 reduce > reduce
> Full thread dump Java HotSpot(TM) Client VM (1.4.2_08-b03 mixed mode):
>
> "MultiThreadedHttpConnectionManager cleanup" daemon prio=1  
> tid=0x08249fa0 nid=0x7645 in Object.wait() [6d489000..6d489868]
>         at java.lang.Object.wait(Native Method)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
>         - locked <0x753a19c0> (a java.lang.ref.ReferenceQueue$Lock)
>         at  
> org.apache.commons.httpclient.MultiThreadedHttpConnectionManager
> $ReferenceQueueThread.run(MultiThreadedHttpConnectionManager.java:1100)
>
> "Thread-1" prio=1 tid=0x082149b0 nid=0x7645 runnable  [6efc3000..6efc3868]
>         at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  Source)
>         at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  Source)
>         at org.apache.oro.text.regex.Perl5Matcher.__match(Unknown  Source)
>         at org.apache.oro.text.regex.Perl5Matcher.__tryExpression
> (Unknown Source)
>         at org.apache.oro.text.regex.Perl5Matcher.__interpret (Unknown
> Source)
>         at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown  Source)
>         at org.apache.oro.text.regex.Perl5Matcher.contains(Unknown  Source)
>         at org.apache.nutch.net.RegexURLFilter.filter
> (RegexURLFilter.java:114)
>         - locked <0x753d8cc8> (a org.apache.nutch.net.RegexURLFilter)
>         at org.apache.nutch.net.URLFilters.filter(URLFilters.java:77)
>         at org.apache.nutch.crawl.ParseOutputFormat$1.write
> (ParseOutputFormat.java:71)
>         at org.apache.nutch.crawl.FetcherOutputFormat$1.write
> (FetcherOutputFormat.java:78)
>         at org.apache.nutch.mapred.ReduceTask$2.collect
> (ReduceTask.java:247)
>         at org.apache.nutch.mapred.lib.IdentityReducer.reduce
> (IdentityReducer.java:41)
>         at org.apache.nutch.mapred.ReduceTask.run(ReduceTask.java:260)
>         at org.apache.nutch.mapred.LocalJobRunner$Job.run
> (LocalJobRunner.java:90)
>
> "Signal Dispatcher" daemon prio=1 tid=0x080a6ff8 nid=0x7645 waiting  on
> condition [0..0]
>
> "Finalizer" daemon prio=1 tid=0x080933e8 nid=0x7645 in Object.wait()  
> [70159000..70159868]
>         at java.lang.Object.wait(Native Method)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:111)
>         - locked <0x75350780> (a java.lang.ref.ReferenceQueue$Lock)
>         at java.lang.ref.ReferenceQueue.remove(ReferenceQueue.java:127)
>         at java.lang.ref.Finalizer$FinalizerThread.run (Finalizer.java:159)
>
> "Reference Handler" daemon prio=1 tid=0x08091978 nid=0x7645 in  
> Object.wait() [701da000..701da868]
>         at java.lang.Object.wait(Native Method)
>         at java.lang.Object.wait(Object.java:429)
>         at java.lang.ref.Reference$ReferenceHandler.run
> (Reference.java:115)
>         - locked <0x753507e8> (a java.lang.ref.Reference$Lock)
>
> "main" prio=1 tid=0x0805c0d8 nid=0x7645 waiting on condition  
> [bfffb000..bfffb41c]
>         at java.lang.Thread.sleep(Native Method)
>         at org.apache.nutch.mapred.JobClient.runJob(JobClient.java:294)
>         at org.apache.nutch.crawl.Fetcher.fetch(Fetcher.java:333)
>         at org.apache.nutch.crawl.Fetcher.main(Fetcher.java:362)
>
> "VM Thread" prio=1 tid=0x08090718 nid=0x7645 runnable
>
> "VM Periodic Task Thread" prio=1 tid=0x6fb01420 nid=0x7645 waiting on  
> condition
> "Suspend Checker Thread" prio=1 tid=0x080a65f0 nid=0x7645 runnable
>
>
Reply | Threaded
Open this post in threaded view
|

Re: RegexUrlFilter hangs up

Marko Bauhardt-2

Am 18.10.2005 um 17:55 schrieb Doug Cutting:

> What makes you think that the fetcher is hung?




The entries in the logfile and the size of the segment didn't grow  
up. I was waiting about 8hours, but the last entry of my logfile is  
still '051017 151123 reduce > reduce'.
I use mapred on local fs .