Hanging shell commands question, and strange delays in processing

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Hanging shell commands question, and strange delays in processing

C G-4
I'm working on a 4 node grid at the moment (physical iron, not virtual), Hadoop 0.15.0 to test out a prototype system before deployment onto a larger grid.  I've noticed a few odd behaviors within Hadoop itself.  I'm wondering if others have seen these, if they are bugs, or if there is a way to "tune around" some of these problems:
   
  1.  Hanging shell commands:  Our system is all script-driven.  The first thing our main driver does when it starts up is delete old datasets in preparation for creating new ones.  A command like this will hang forever about 5% of the time:
   
              bin/hadoop dfs -rmr /import/data/20071030
   
  2.  Long lags during job processing:  I'm using smallish datasets (several megabytes expressed as 300,000 - 500,000 rows of data) for testing/evaluation purposes.  With many of the M/R jobs I run, I see very long delays  in processing where nothing appears to be running (i.e. CPU activity on all 4 nodes is basically zero).  Then system activity will pick up again.  I am wondering if these delays are attributable to some sort of scheduler latency issue, or perhaps something else.
   
  3.  I see jobs hang sometimes, and inspection of the task tracker log on the master node shows the following:
  2007-11-23 17:37:13,447 INFO org.apache.hadoop.mapred.TaskTracker: task_200711191216_0344_r_000007_0 0.16666667% reduce > copy (1 of 2 at 0.26 MB/s) >
2007-11-23 17:37:16,450 INFO org.apache.hadoop.mapred.TaskTracker: task_200711191216_0344_r_000007_0 0.16666667% reduce > copy (1 of 2 at 0.26 MB/s) >
2007-11-23 17:37:18,501 INFO org.apache.hadoop.mapred.TaskTracker: task_200711191216_0344_r_000001_0 0.16666667% reduce > copy (1 of 2 at 0.10 MB/s) >

  repeating forever.  I've left the system running in this state for several hours to see if the copy will complete and it never does.
   
  Any thoughts on these issues, or has anybody experienced problems like this?
   
  Thanks for any help...
  C G
   

       
---------------------------------
Be a better sports nut! Let your teams follow you with Yahoo Mobile. Try it now.
Reply | Threaded
Open this post in threaded view
|

RE: Hanging shell commands question, and strange delays in processing

Devaraj Das
>   3.  I see jobs hang sometimes, and inspection of the task
> tracker log on the master node shows the following:
>   2007-11-23 17:37:13,447 INFO
> org.apache.hadoop.mapred.TaskTracker:
> task_200711191216_0344_r_000007_0 0.16666667% reduce > copy
> (1 of 2 at 0.26 MB/s) >
> 2007-11-23 17:37:16,450 INFO
> org.apache.hadoop.mapred.TaskTracker:
> task_200711191216_0344_r_000007_0 0.16666667% reduce > copy
> (1 of 2 at 0.26 MB/s) >
> 2007-11-23 17:37:18,501 INFO
> org.apache.hadoop.mapred.TaskTracker:
> task_200711191216_0344_r_000001_0 0.16666667% reduce > copy
> (1 of 2 at 0.10 MB/s) >
>
>   repeating forever.  I've left the system running in this
> state for several hours to see if the copy will complete and
> it never does.

Some clarification questions - Did you see any lost tasktracker for the
tasktracker(s) that ran the maps? Any lost map tasks (due to progress
timeout, exception, etc.)? Was speculative execution on for the jobs that
hung?
Could you please take a look at the task logs to see if the copy is
resulting in any exception? The task logs can be accessed via the web ui for
the job.
Also, if this happens again, could you please kill a hung reducetask and see
if it executes successfully upon reexecution.


> -----Original Message-----
> From: C G [mailto:[hidden email]]
> Sent: Saturday, November 24, 2007 9:51 AM
> To: [hidden email]
> Subject: Hanging shell commands question, and strange delays
> in processing
>
> I'm working on a 4 node grid at the moment (physical iron,
> not virtual), Hadoop 0.15.0 to test out a prototype system
> before deployment onto a larger grid.  I've noticed a few odd
> behaviors within Hadoop itself.  I'm wondering if others have
> seen these, if they are bugs, or if there is a way to "tune
> around" some of these problems:
>    
>   1.  Hanging shell commands:  Our system is all
> script-driven.  The first thing our main driver does when it
> starts up is delete old datasets in preparation for creating
> new ones.  A command like this will hang forever about 5% of the time:
>    
>               bin/hadoop dfs -rmr /import/data/20071030
>    
>   2.  Long lags during job processing:  I'm using smallish
> datasets (several megabytes expressed as 300,000 - 500,000
> rows of data) for testing/evaluation purposes.  With many of
> the M/R jobs I run, I see very long delays  in processing
> where nothing appears to be running (i.e. CPU activity on all
> 4 nodes is basically zero).  Then system activity will pick
> up again.  I am wondering if these delays are attributable to
> some sort of scheduler latency issue, or perhaps something else.
>    
>   3.  I see jobs hang sometimes, and inspection of the task
> tracker log on the master node shows the following:
>   2007-11-23 17:37:13,447 INFO
> org.apache.hadoop.mapred.TaskTracker:
> task_200711191216_0344_r_000007_0 0.16666667% reduce > copy
> (1 of 2 at 0.26 MB/s) >
> 2007-11-23 17:37:16,450 INFO
> org.apache.hadoop.mapred.TaskTracker:
> task_200711191216_0344_r_000007_0 0.16666667% reduce > copy
> (1 of 2 at 0.26 MB/s) >
> 2007-11-23 17:37:18,501 INFO
> org.apache.hadoop.mapred.TaskTracker:
> task_200711191216_0344_r_000001_0 0.16666667% reduce > copy
> (1 of 2 at 0.10 MB/s) >
>
>   repeating forever.  I've left the system running in this
> state for several hours to see if the copy will complete and
> it never does.
>    
>   Any thoughts on these issues, or has anybody experienced
> problems like this?
>    
>   Thanks for any help...
>   C G
>    
>
>        
> ---------------------------------
> Be a better sports nut! Let your teams follow you with Yahoo
> Mobile. Try it now.
>

Reply | Threaded
Open this post in threaded view
|

RE: Hanging shell commands question, and strange delays in processing

Dhruba Borthakur-2
In reply to this post by C G-4
Hi CG,

When the shell hangs, can you pl get the following:

1. The namenode log file
2. The namenode stack dump. This can be generated by running the command
"jstack <namenode_process_id>". You will typically find the jstack
utility in the same directory where the "java"  binary is installed.

Thanks,
dhruba

   
  1.  Hanging shell commands:  Our system is all script-driven.  The
first thing our main driver does when it starts up is delete old
datasets in preparation for creating new ones.  A command like this will
hang forever about 5% of the time:
   
              bin/hadoop dfs -rmr /import/data/20071030
   

Reply | Threaded
Open this post in threaded view
|

RE: Hanging shell commands question, and strange delays in processing

C G-4
Hi Dhruba:
   
  I'll run some tests in isolation and try to get these logs for you.  I saw a hang like this last night in some testing and I decided to let it run.  About 3 hours later it finally completed...this still seems out-of-spec.  I'll try to get some clean logs for you and send them along.
   
  Thanks,
  C G

dhruba Borthakur <[hidden email]> wrote:
  Hi CG,

When the shell hangs, can you pl get the following:

1. The namenode log file
2. The namenode stack dump. This can be generated by running the command
"jstack ". You will typically find the jstack
utility in the same directory where the "java" binary is installed.

Thanks,
dhruba


1. Hanging shell commands: Our system is all script-driven. The
first thing our main driver does when it starts up is delete old
datasets in preparation for creating new ones. A command like this will
hang forever about 5% of the time:

bin/hadoop dfs -rmr /import/data/20071030




       
---------------------------------
Get easy, one-click access to your favorites.  Make Yahoo! your homepage.
Reply | Threaded
Open this post in threaded view
|

RE: Hanging shell commands question, and strange delays in processing

Dhruba Borthakur-2
Thanks CG. I will wait to see the logs.

Thanks,
dhruba

-----Original Message-----
From: C G [mailto:[hidden email]]
Sent: Saturday, November 24, 2007 6:38 PM
To: [hidden email]
Subject: RE: Hanging shell commands question, and strange delays in
processing

Hi Dhruba:
   
  I'll run some tests in isolation and try to get these logs for you.  I
saw a hang like this last night in some testing and I decided to let it
run.  About 3 hours later it finally completed...this still seems
out-of-spec.  I'll try to get some clean logs for you and send them
along.
   
  Thanks,
  C G

dhruba Borthakur <[hidden email]> wrote:
  Hi CG,

When the shell hangs, can you pl get the following:

1. The namenode log file
2. The namenode stack dump. This can be generated by running the command
"jstack ". You will typically find the jstack
utility in the same directory where the "java" binary is installed.

Thanks,
dhruba


1. Hanging shell commands: Our system is all script-driven. The
first thing our main driver does when it starts up is delete old
datasets in preparation for creating new ones. A command like this will
hang forever about 5% of the time:

bin/hadoop dfs -rmr /import/data/20071030




       
---------------------------------
Get easy, one-click access to your favorites.  Make Yahoo! your
homepage.