Hadoop WordCount hanging on reduce stage

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Hadoop WordCount hanging on reduce stage

Pedro Guedes
Hi hadooping people...

I'm having trouble running the wordcount example with hadoop... i ran it
ok with only one host but when i add another machine to the cluster...
it falls apart! :(

I read in the malling-list archive about someone having a similar
problem but the proposed solution was to downgrade to 0.11.2 (from
0.12.0, I'm using 0.12.2)... is that right? A reference here:
http://www.mail-archive.com/hadoop-user@.../msg00863.html

The only difference in my case is that mine hangs around 60% of the
reduce phase... but the tasktracker for the slave node shows the same
'IOException: 'file .....mapx_out not created' and that's the only error
i see...

Any sugestions?

thanks in advance...

Pedro
Reply | Threaded
Open this post in threaded view
|

Re: Hadoop WordCount hanging on reduce stage

Pedro Guedes
Well, moving to 0.11.2 won't fix it... tried that!

The first interesting thing in the log is:
2007-04-02 15:45:41,960 WARN org.apache.hadoop.mapred.TaskRunner:
java.io.IOException: File
/home/ciclope/hadoop-install/hadoop-data/mapred/local/task_0001_r_000001_0/map_10.out-0
not created
    at
org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:282)
    at
org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:243)

And the tasktracker of the slave node keeps repeating himself with:

2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000003_0 Need 12 map output(s)
2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000003_0 Got 12 known map output location(s); scheduling...
2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
task_0001_r_000003_0 Scheduled 0 of 12 known outputs (12 slow hosts and
0 dup hosts)
2007-04-02 15:47:03,273 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:03,969 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:04,277 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:04,973 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:05,281 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:05,977 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:06,285 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
2007-04-02 15:47:06,981 INFO org.apache.hadoop.mapred.TaskTracker:
task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >


Pedro Guedes wrote:

> Hi hadooping people...
>
> I'm having trouble running the wordcount example with hadoop... i ran it
> ok with only one host but when i add another machine to the cluster...
> it falls apart! :(
>
> I read in the malling-list archive about someone having a similar
> problem but the proposed solution was to downgrade to 0.11.2 (from
> 0.12.0, I'm using 0.12.2)... is that right? A reference here:
> http://www.mail-archive.com/hadoop-user@.../msg00863.html
>
> The only difference in my case is that mine hangs around 60% of the
> reduce phase... but the tasktracker for the slave node shows the same
> 'IOException: 'file .....mapx_out not created' and that's the only error
> i see...
>
> Any sugestions?
>
> thanks in advance...
>
> Pedro
>
>  

Reply | Threaded
Open this post in threaded view
|

about jobConf.set(String,Object)

wangxu-3
Can I set an Object other than "String" Class in a job's context?
like I call
    job.set("test",new SomeOtherClass());
and get it from
    job.getObject("test");
??

I did so ,but always comes out the ClassCastException.
thanks.
Reply | Threaded
Open this post in threaded view
|

Re: about jobConf.set(String,Object)

wangxu-3
I am using hadoop-0.10.1
wangxu wrote:

> Can I set an Object other than "String" Class in a job's context?
> like I call
>    job.set("test",new SomeOtherClass());
> and get it from
>    job.getObject("test");
> ??
>
> I did so ,but always comes out the ClassCastException.
> thanks.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: about jobConf.set(String,Object)

wangxu-3
I switched to use job.setObject(..,...),
but job.get get Null in return
wangxu wrote:

> I am using hadoop-0.10.1
> wangxu wrote:
>> Can I set an Object other than "String" Class in a job's context?
>> like I call
>>    job.set("test",new SomeOtherClass());
>> and get it from
>>    job.getObject("test");
>> ??
>>
>> I did so ,but always comes out the ClassCastException.
>> thanks.
>>
>>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: about jobConf.set(String,Object)

Ion Badita
In reply to this post by wangxu-3
wangxu wrote:
> Can I set an Object other than "String" Class in a job's context?
> like I call
>    job.set("test",new SomeOtherClass());
> and get it from
>    job.getObject("test");
> ??
>
> I did so ,but always comes out the ClassCastException.
> thanks.
When the job gets "serialized" it saves only the String values.
Reply | Threaded
Open this post in threaded view
|

Re: about jobConf.set(String,Object)

Senthil-7
In reply to this post by wangxu-3
I observed the same. JobConf does not work anything beside String

On 03/04/07, wangxu <[hidden email]> wrote:

>
> Can I set an Object other than "String" Class in a job's context?
> like I call
>     job.set("test",new SomeOtherClass());
> and get it from
>     job.getObject("test");
> ??
>
> I did so ,but always comes out the ClassCastException.
> thanks.
>



--
Shanmugam Senthil
Reply | Threaded
Open this post in threaded view
|

Re: about jobConf.set(String,Object)

wangxu-3
In reply to this post by Ion Badita
OK,I will encode the Objects to String then.
Ion Badita wrote:

> wangxu wrote:
>> Can I set an Object other than "String" Class in a job's context?
>> like I call
>>    job.set("test",new SomeOtherClass());
>> and get it from
>>    job.getObject("test");
>> ??
>>
>> I did so ,but always comes out the ClassCastException.
>> thanks.
> When the job gets "serialized" it saves only the String values.
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Hadoop WordCount hanging on reduce stage

Pedro Guedes
In reply to this post by Pedro Guedes
Seems I was looking in the wrong log file :) ... was looking at the
tasktracker when i should be looking underneath!

It was a problem with the HDFS breaking because the machines couldn't
find each other... they are configured with IP's in hadoop-site.xml but
when the cluster is running they (somehow) try to resolve each others
hostnames... Know why?

Fixed it by adding the nodes hostnames to each others /etc/hosts...

Pedro

Pedro Guedes wrote:

> Well, moving to 0.11.2 won't fix it... tried that!
>
> The first interesting thing in the log is:
> 2007-04-02 15:45:41,960 WARN org.apache.hadoop.mapred.TaskRunner:
> java.io.IOException: File
> /home/ciclope/hadoop-install/hadoop-data/mapred/local/task_0001_r_000001_0/map_10.out-0
> not created
>     at
> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.copyOutput(ReduceTaskRunner.java:282)
>     at
> org.apache.hadoop.mapred.ReduceTaskRunner$MapOutputCopier.run(ReduceTaskRunner.java:243)
>
> And the tasktracker of the slave node keeps repeating himself with:
>
> 2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000003_0 Need 12 map output(s)
> 2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000003_0 Got 12 known map output location(s); scheduling...
> 2007-04-02 15:47:03,089 INFO org.apache.hadoop.mapred.TaskRunner:
> task_0001_r_000003_0 Scheduled 0 of 12 known outputs (12 slow hosts and
> 0 dup hosts)
> 2007-04-02 15:47:03,273 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:03,969 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:04,277 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:04,973 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:05,281 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:05,977 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:06,285 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000001_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
> 2007-04-02 15:47:06,981 INFO org.apache.hadoop.mapred.TaskTracker:
> task_0001_r_000003_0 0.14285715% reduce > copy (9 of 21 at 0.00 MB/s) >
>
>
> Pedro Guedes wrote:
>  
>> Hi hadooping people...
>>
>> I'm having trouble running the wordcount example with hadoop... i ran it
>> ok with only one host but when i add another machine to the cluster...
>> it falls apart! :(
>>
>> I read in the malling-list archive about someone having a similar
>> problem but the proposed solution was to downgrade to 0.11.2 (from
>> 0.12.0, I'm using 0.12.2)... is that right? A reference here:
>> http://www.mail-archive.com/hadoop-user@.../msg00863.html
>>
>> The only difference in my case is that mine hangs around 60% of the
>> reduce phase... but the tasktracker for the slave node shows the same
>> 'IOException: 'file .....mapx_out not created' and that's the only error
>> i see...
>>
>> Any sugestions?
>>
>> thanks in advance...
>>
>> Pedro
>>
>>  
>>    
>
>
>  

Reply | Threaded
Open this post in threaded view
|

Re: about jobConf.set(String,Object)

Owen O'Malley-5
In reply to this post by wangxu-3
Based on this message and similar previous ones, I'm proposing  
removing the confusing get/set methods for Objects. Please comment on  
http://issues.apache.org/jira/browse/HADOOP-1197.

Thanks,
    Owen