Nutch frozen but not exiting

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch frozen but not exiting

Paul Tomblin
My nutch crawl just stopped.  The process is still there, and doesn't
respond to a "kill -TERM" or a "kill -HUP", but it hasn't written
anything to the log file in the last 40 minutes.  The last thing it
logged was some calls to my custom url filter.  Nothing has been
written in the hadoop directory or the crawldir/crawldb or the
segments dir in that time.

How can I tell what's going on and why it's stopped?

--
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin
Reply | Threaded
Open this post in threaded view
|

Re: Nutch frozen but not exiting

Andrzej Białecki-2
Paul Tomblin wrote:
> My nutch crawl just stopped.  The process is still there, and doesn't
> respond to a "kill -TERM" or a "kill -HUP", but it hasn't written
> anything to the log file in the last 40 minutes.  The last thing it
> logged was some calls to my custom url filter.  Nothing has been
> written in the hadoop directory or the crawldir/crawldb or the
> segments dir in that time.
>
> How can I tell what's going on and why it's stopped?

If you run in distributed / pseudo-distributed mode, you can check the
status in the JobTracker UI. If you are running in "local" mode, then
it's likely that the process is in a (single) reduce phase sorting the
data - with larger jobs in "local" mode the sorting phase may take very
long time, due to a heavy disk IO (and in disk-wait state it may be
uninterruptible).

Try to generate a thread dump to see what code is being executed.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: Nutch frozen but not exiting

Paul Tomblin
On Sat, Nov 28, 2009 at 4:45 PM, Andrzej Bialecki <[hidden email]> wrote:
> Paul Tomblin wrote:

>> How can I tell what's going on and why it's stopped?

> Try to generate a thread dump to see what code is being executed.

I didn't do any sort of distributed mode because I've only got one
core.  I had to do a "jstack -F" to force a stack dump, and here's
what it says:

-bash-3.2$ jstack -F 32507
Attaching to process ID 32507, please wait...
Debugger attached successfully.
Server compiler detected.
JVM version is 14.3-b01
Deadlock Detection:

No deadlocks found.

Thread 21558: (state = IN_NATIVE_TRANS)
 - java.lang.UNIXProcess.forkAndExec(byte[], byte[], int, byte[], int,
byte[], boolean, java.io.FileDescriptor, java.io.FileDescriptor,
java.io.FileDescriptor) @bci=0 (Interpreted frame)
 - java.lang.UNIXProcess.access$500(java.lang.UNIXProcess, byte[],
byte[], int, byte[], int, byte[], boolean, java.io.FileDescriptor,
java.io.FileDescriptor, java.io.FileDescriptor) @bci=18, line=20
(Interpreted frame)
 - java.lang.UNIXProcess$1$1.run() @bci=93, line=109 (Interpreted frame)


Thread 21548: (state = BLOCKED)
 - sun.misc.Unsafe.park(boolean, long) @bci=0 (Interpreted frame)
 - java.util.concurrent.locks.LockSupport.park(java.lang.Object)
@bci=14, line=158 (Interpreted frame)
 - java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await()
@bci=42, line=1925 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer$SpillThread.run()
@bci=55, line=882 (Interpreted frame)


Thread 21545: (state = BLOCKED_TRANS)
 - java.lang.Thread.sleep(long) @bci=0 (Interpreted frame)
 - org.apache.hadoop.mapred.Task$1.run() @bci=31, line=403 (Interpreted frame)
 - java.lang.Thread.run() @bci=11, line=619 (Interpreted frame)


Thread 21540: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=485 (Interpreted frame)
 - java.lang.UNIXProcess$Gate.waitForExit() @bci=10, line=64 (Interpreted frame)
 - java.lang.UNIXProcess.<init>(byte[], byte[], int, byte[], int,
byte[], boolean) @bci=74, line=145 (Interpreted frame)
 - java.lang.ProcessImpl.start(java.lang.String[], java.util.Map,
java.lang.String, boolean) @bci=182, line=65 (Interpreted frame)
 - java.lang.ProcessBuilder.start() @bci=112, line=452 (Interpreted frame)
 - org.apache.hadoop.util.Shell.runCommand() @bci=52, line=149
(Interpreted frame)
 - org.apache.hadoop.util.Shell.run() @bci=23, line=134 (Interpreted frame)
 - org.apache.hadoop.fs.DF.getAvailable() @bci=1, line=73 (Interpreted frame)
 - org.apache.hadoop.fs.LocalDirAllocator$AllocatorPerContext.getLocalPathForWrite(java.lang.String,
long, org.apache.hadoop.conf.Configuration) @bci=187, line=321
(Interpreted frame)
 - org.apache.hadoop.fs.LocalDirAllocator.getLocalPathForWrite(java.lang.String,
long, org.apache.hadoop.conf.Configuration) @bci=16, line=124
(Interpreted frame)
 - org.apache.hadoop.mapred.MapOutputFile.getSpillFileForWrite(org.apache.hadoop.mapred.TaskAttemptID,
int, long) @bci=50, line=107 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer.sortAndSpill()
@bci=78, line=930 (Compiled frame)
 - org.apache.hadoop.mapred.MapTask$MapOutputBuffer.flush() @bci=104,
line=842 (Interpreted frame)
 - org.apache.hadoop.mapred.MapTask.run(org.apache.hadoop.mapred.JobConf,
org.apache.hadoop.mapred.TaskUmbilicalProtocol) @bci=391, line=343
(Interpreted frame)
 - org.apache.hadoop.mapred.LocalJobRunner$Job.run() @bci=282,
line=138 (Interpreted frame)


Thread 32521: (state = BLOCKED_TRANS)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118
(Interpreted frame)
 - org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$ReferenceQueueThread.run()
@bci=9, line=1082 (Interpreted frame)


Thread 32516: (state = BLOCKED_TRANS)


Thread 32515: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove(long) @bci=44, line=118
(Interpreted frame)
 - java.lang.ref.ReferenceQueue.remove() @bci=2, line=134 (Compiled frame)
 - java.lang.ref.Finalizer$FinalizerThread.run() @bci=3, line=159
(Compiled frame)


Thread 32514: (state = BLOCKED)
 - java.lang.Object.wait(long) @bci=0 (Interpreted frame)
 - java.lang.Object.wait() @bci=2, line=485 (Compiled frame)
 - java.lang.ref.Reference$ReferenceHandler.run() @bci=46, line=116
(Compiled frame)


Thread 32508: (state = IN_VM_TRANS)
 - org.apache.hadoop.mapred.JobStatus.getRunState() @bci=0, line=199
(Interpreted frame)
 - org.apache.hadoop.mapred.JobClient$NetworkedJob.isComplete()
@bci=8, line=278 (Interpreted frame)
 - org.apache.hadoop.mapred.JobClient.runJob(org.apache.hadoop.mapred.JobConf)
@bci=149, line=1155 (Interpreted frame)
 - org.apache.nutch.crawl.CrawlDb.update(org.apache.hadoop.fs.Path,
org.apache.hadoop.fs.Path[], boolean, boolean, boolean, boolean)
@bci=363, line=94 (Interpreted frame)
 - com.lucidityworks.nutch.crawler.Crawler.crawlIt(java.io.File,
org.apache.hadoop.fs.Path, org.apache.hadoop.fs.Path,
org.apache.hadoop.fs.Path, java.io.File, java.io.OutputStreamWriter)
@bci=531, line=448 (Interpreted frame)
 - com.lucidityworks.nutch.crawler.Crawler.crawlSite(java.io.File,
java.io.File, java.io.File, java.io.OutputStreamWriter) @bci=609,
line=381 (Interpreted frame)
 - com.lucidityworks.nutch.crawler.Crawler.crawlCategory(java.io.File,
boolean) @bci=325, line=255 (Interpreted frame)
 - com.lucidityworks.nutch.crawler.Crawler.crawl(java.lang.String,
boolean) @bci=51, line=166 (Interpreted frame)
 - com.lucidityworks.nutch.crawler.Crawler.main(java.lang.String[])
@bci=44, line=724 (Interpreted frame)


-bash-3.2$ java -version
java version "1.6.0_17"
Java(TM) SE Runtime Environment (build 1.6.0_17-b04)
Java HotSpot(TM) Server VM (build 14.3-b01, mixed mode)
-bash-3.2$




--
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin
Reply | Threaded
Open this post in threaded view
|

Re: Nutch frozen but not exiting

Andrzej Białecki-2
Paul Tomblin wrote:

> On Sat, Nov 28, 2009 at 4:45 PM, Andrzej Bialecki <[hidden email]> wrote:
>> Paul Tomblin wrote:
>
>>> How can I tell what's going on and why it's stopped?
>
>> Try to generate a thread dump to see what code is being executed.
>
> I didn't do any sort of distributed mode because I've only got one
> core.  I had to do a "jstack -F" to force a stack dump, and here's
> what it says:
>
> -bash-3.2$ jstack -F 32507
> Attaching to process ID 32507, please wait...

Hm, I can't see anything obviously wrong with that thread dump. What's
the CPU and swap usage, and loadavg?


--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: Nutch frozen but not exiting

Paul Tomblin
On Sat, Nov 28, 2009 at 5:48 PM, Andrzej Bialecki <[hidden email]> wrote:
> Paul Tomblin wrote:

>> -bash-3.2$ jstack -F 32507
>> Attaching to process ID 32507, please wait...
>
> Hm, I can't see anything obviously wrong with that thread dump. What's the
> CPU and swap usage, and loadavg?

The process is using a lot of CPU.  loadavg is up over 5.

top - 15:12:19 up 22 days,  4:06,  2 users,  load average: 5.01, 5.00, 4.93
Tasks:  48 total,   2 running,  45 sleeping,   0 stopped,   1 zombie
Cpu(s):  1.0% us, 99.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
Mem:   3170584k total,  2231700k used,   938884k free,        0k buffers
Swap:        0k total,        0k used,        0k free,        0k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
32507 discover  16   0 1163m 974m 8604 S 394.7 31.5 719:40.71 java

Actually, the memory is a real annoyance - the hosting company doesn't
give me any swap, so when hadoop does a fork/exec just to do a
"whoami", I have to leave as much memory free as the crawl reserves
with -Xmx for itself.


--
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin
Reply | Threaded
Open this post in threaded view
|

Re: Nutch frozen but not exiting

Andrzej Białecki-2
Paul Tomblin wrote:

> On Sat, Nov 28, 2009 at 5:48 PM, Andrzej Bialecki <[hidden email]> wrote:
>> Paul Tomblin wrote:
>
>>> -bash-3.2$ jstack -F 32507
>>> Attaching to process ID 32507, please wait...
>> Hm, I can't see anything obviously wrong with that thread dump. What's the
>> CPU and swap usage, and loadavg?
>
> The process is using a lot of CPU.  loadavg is up over 5.
>
> top - 15:12:19 up 22 days,  4:06,  2 users,  load average: 5.01, 5.00, 4.93
> Tasks:  48 total,   2 running,  45 sleeping,   0 stopped,   1 zombie
> Cpu(s):  1.0% us, 99.0% sy,  0.0% ni,  0.0% id,  0.0% wa,  0.0% hi,  0.0% si
> Mem:   3170584k total,  2231700k used,   938884k free,        0k buffers
> Swap:        0k total,        0k used,        0k free,        0k cached
>
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> 32507 discover  16   0 1163m 974m 8604 S 394.7 31.5 719:40.71 java
>
> Actually, the memory is a real annoyance - the hosting company doesn't
> give me any swap, so when hadoop does a fork/exec just to do a
> "whoami", I have to leave as much memory free as the crawl reserves
> with -Xmx for itself.

Hm, the curious thing here is that the java process is sleeping, and 99%
of cpu is in "system" time ... usually this would indicate swapping, but
since there is no swap in your setup I'm stumped. Still, this may be
related to the weird memory/swap setup on that machine - try decreasing
the heap size and see what happens.

--
Best regards,
Andrzej Bialecki     <><
  ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply | Threaded
Open this post in threaded view
|

Re: Nutch frozen but not exiting

Paul Tomblin
On Sat, Nov 28, 2009 at 8:25 PM, Andrzej Bialecki <[hidden email]> wrote:

> Hm, the curious thing here is that the java process is sleeping, and 99% of
> cpu is in "system" time ... usually this would indicate swapping, but since
> there is no swap in your setup I'm stumped. Still, this may be related to
> the weird memory/swap setup on that machine - try decreasing the heap size
> and see what happens.

When I decrease the heap size, it dies pretty early on.

--
http://www.linkedin.com/in/paultomblin
http://careers.stackoverflow.com/ptomblin