Namenode cannot accept connection from datanode

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Namenode cannot accept connection from datanode

Cedric Ho
Hi all,

We were trying to setup hadoop in our linux environments. When we
tried to use a slow machine as the namenode (some Pentium III machine
with 512Mb ram). It seems that it was unable to accept connection from
other datanodes. (I can access its status from http at port 50070
however).

But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
The settings, etc are exactly the same.

The problem seems to be on the namenode. Is it because the machine is slow ?

The version we use is 0.12.3

Any help is appreciated.


Following is the log from the abnormal namenode.

2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: STATE*
Network topology has 0 racks and 0 datanodes
2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: STATE*
UnderReplicatedBlocks has 0 blocks
2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
Resource aliases
2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.servlet.WebApplicationHandler@587c94
2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
WebApplicationContext[/,/]
2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
HttpContext[/logs,/logs]
2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
HttpContext[/static,/static]
2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
SocketListener on 0.0.0.0:50070
2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
org.mortbay.jetty.Server@e53108
2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
listener on 9000: starting
2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 0 on 9000: starting
2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 1 on 9000: starting
2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 2 on 9000: starting
2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 3 on 9000: starting
2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 4 on 9000: starting
2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 5 on 9000: starting
2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 6 on 9000: starting
2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 7 on 9000: starting
2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 8 on 9000: starting
2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
handler 9 on 9000: starting


And these are from the datanode

2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 1 time(s).
2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 2 time(s).
2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 3 time(s).
2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 4 time(s).
2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 5 time(s).
2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 6 time(s).
2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 7 time(s).
2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 8 time(s).
2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 9 time(s).
2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 10 time(s).
2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, Zzzzz...
2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 1 time(s).
2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 2 time(s).
2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
tried 3 time(s).


Thanks,
Cedric
Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Michael Bieniosek-2
I would try to debug this as a network problem - when the namenode is
running, can you connect to 192.168.1.179:9000 from the machine the datanode
is on?

While the namenode does use a lot of RAM as the cluster size increases, an
overloaded namenode will typically start panicking in its log messages.
This doesn't occur in your namenode logs - it doesn't appear any datanodes
connected at all.

-Michael

On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:

> Hi all,
>
> We were trying to setup hadoop in our linux environments. When we
> tried to use a slow machine as the namenode (some Pentium III machine
> with 512Mb ram). It seems that it was unable to accept connection from
> other datanodes. (I can access its status from http at port 50070
> however).
>
> But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
> The settings, etc are exactly the same.
>
> The problem seems to be on the namenode. Is it because the machine is slow ?
>
> The version we use is 0.12.3
>
> Any help is appreciated.
>
>
> Following is the log from the abnormal namenode.
>
> 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: STATE*
> Network topology has 0 racks and 0 datanodes
> 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: STATE*
> UnderReplicatedBlocks has 0 blocks
> 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
> Resource aliases
> 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
> org.mortbay.jetty.servlet.WebApplicationHandler@587c94
> 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
> WebApplicationContext[/,/]
> 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> HttpContext[/logs,/logs]
> 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> HttpContext[/static,/static]
> 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
> SocketListener on 0.0.0.0:50070
> 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
> org.mortbay.jetty.Server@e53108
> 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
> listener on 9000: starting
> 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 0 on 9000: starting
> 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 1 on 9000: starting
> 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 2 on 9000: starting
> 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 3 on 9000: starting
> 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 4 on 9000: starting
> 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 5 on 9000: starting
> 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 6 on 9000: starting
> 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 7 on 9000: starting
> 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 8 on 9000: starting
> 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> handler 9 on 9000: starting
>
>
> And these are from the datanode
>
> 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 1 time(s).
> 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 2 time(s).
> 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 3 time(s).
> 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 4 time(s).
> 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 5 time(s).
> 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 6 time(s).
> 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 7 time(s).
> 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 8 time(s).
> 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 9 time(s).
> 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 10 time(s).
> 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
> hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, Zzzzz...
> 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 1 time(s).
> 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 2 time(s).
> 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> tried 3 time(s).
>
>
> Thanks,
> Cedric

Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Cedric Ho
I performed more testing on this. While the namenode is running, I
cannot connect to 192.168.1.179:9000 from other machines, but I can
connect to it locally. It seems that the serverSocket only bind to the
127.0.0.1:9000 but not 192.168.1.179:9000.

I've also confirmed that there's no firewall, connection bocking etc
on this machine. In fact I've written a small Java program that open a
serverSocket on 9000, started with the same user on the same machine.
And I am able to connect to it from all other machines.

So is there some settings that will cause the namenode to only bind to
the 9000 port on the local interface ?

Cedric


On 5/12/07, Michael Bieniosek <[hidden email]> wrote:

> I would try to debug this as a network problem - when the namenode is
> running, can you connect to 192.168.1.179:9000 from the machine the datanode
> is on?
>
> While the namenode does use a lot of RAM as the cluster size increases, an
> overloaded namenode will typically start panicking in its log messages.
> This doesn't occur in your namenode logs - it doesn't appear any datanodes
> connected at all.
>
> -Michael
>
> On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:
>
> > Hi all,
> >
> > We were trying to setup hadoop in our linux environments. When we
> > tried to use a slow machine as the namenode (some Pentium III machine
> > with 512Mb ram). It seems that it was unable to accept connection from
> > other datanodes. (I can access its status from http at port 50070
> > however).
> >
> > But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
> > The settings, etc are exactly the same.
> >
> > The problem seems to be on the namenode. Is it because the machine is slow ?
> >
> > The version we use is 0.12.3
> >
> > Any help is appreciated.
> >
> >
> > Following is the log from the abnormal namenode.
> >
> > 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: STATE*
> > Network topology has 0 racks and 0 datanodes
> > 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: STATE*
> > UnderReplicatedBlocks has 0 blocks
> > 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
> > Resource aliases
> > 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> > 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
> > org.mortbay.jetty.servlet.WebApplicationHandler@587c94
> > 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
> > WebApplicationContext[/,/]
> > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> > HttpContext[/logs,/logs]
> > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> > HttpContext[/static,/static]
> > 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
> > SocketListener on 0.0.0.0:50070
> > 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
> > org.mortbay.jetty.Server@e53108
> > 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
> > listener on 9000: starting
> > 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 0 on 9000: starting
> > 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 1 on 9000: starting
> > 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 2 on 9000: starting
> > 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 3 on 9000: starting
> > 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 4 on 9000: starting
> > 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 5 on 9000: starting
> > 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 6 on 9000: starting
> > 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 7 on 9000: starting
> > 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 8 on 9000: starting
> > 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> > handler 9 on 9000: starting
> >
> >
> > And these are from the datanode
> >
> > 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 1 time(s).
> > 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 2 time(s).
> > 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 3 time(s).
> > 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 4 time(s).
> > 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 5 time(s).
> > 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 6 time(s).
> > 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 7 time(s).
> > 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 8 time(s).
> > 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 9 time(s).
> > 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 10 time(s).
> > 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
> > hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, Zzzzz...
> > 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 1 time(s).
> > 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 2 time(s).
> > 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
> > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > tried 3 time(s).
> >
> >
> > Thanks,
> > Cedric
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Cedric Ho
Oh, and I also tried to use 192.168.1.179 as a datanode itself, and
only this datanode connect to the namenode on this same host
successfully.

On 5/14/07, Cedric Ho <[hidden email]> wrote:

> I performed more testing on this. While the namenode is running, I
> cannot connect to 192.168.1.179:9000 from other machines, but I can
> connect to it locally. It seems that the serverSocket only bind to the
> 127.0.0.1:9000 but not 192.168.1.179:9000.
>
> I've also confirmed that there's no firewall, connection bocking etc
> on this machine. In fact I've written a small Java program that open a
> serverSocket on 9000, started with the same user on the same machine.
> And I am able to connect to it from all other machines.
>
> So is there some settings that will cause the namenode to only bind to
> the 9000 port on the local interface ?
>
> Cedric
>
>
> On 5/12/07, Michael Bieniosek <[hidden email]> wrote:
> > I would try to debug this as a network problem - when the namenode is
> > running, can you connect to 192.168.1.179:9000 from the machine the datanode
> > is on?
> >
> > While the namenode does use a lot of RAM as the cluster size increases, an
> > overloaded namenode will typically start panicking in its log messages.
> > This doesn't occur in your namenode logs - it doesn't appear any datanodes
> > connected at all.
> >
> > -Michael
> >
> > On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:
> >
> > > Hi all,
> > >
> > > We were trying to setup hadoop in our linux environments. When we
> > > tried to use a slow machine as the namenode (some Pentium III machine
> > > with 512Mb ram). It seems that it was unable to accept connection from
> > > other datanodes. (I can access its status from http at port 50070
> > > however).
> > >
> > > But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
> > > The settings, etc are exactly the same.
> > >
> > > The problem seems to be on the namenode. Is it because the machine is slow ?
> > >
> > > The version we use is 0.12.3
> > >
> > > Any help is appreciated.
> > >
> > >
> > > Following is the log from the abnormal namenode.
> > >
> > > 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: STATE*
> > > Network topology has 0 racks and 0 datanodes
> > > 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: STATE*
> > > UnderReplicatedBlocks has 0 blocks
> > > 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
> > > Resource aliases
> > > 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version Jetty/5.1.4
> > > 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
> > > org.mortbay.jetty.servlet.WebApplicationHandler@587c94
> > > 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
> > > WebApplicationContext[/,/]
> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> > > HttpContext[/logs,/logs]
> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> > > HttpContext[/static,/static]
> > > 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
> > > SocketListener on 0.0.0.0:50070
> > > 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
> > > org.mortbay.jetty.Server@e53108
> > > 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > listener on 9000: starting
> > > 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 0 on 9000: starting
> > > 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 1 on 9000: starting
> > > 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 2 on 9000: starting
> > > 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 3 on 9000: starting
> > > 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 4 on 9000: starting
> > > 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 5 on 9000: starting
> > > 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 6 on 9000: starting
> > > 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 7 on 9000: starting
> > > 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 8 on 9000: starting
> > > 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> > > handler 9 on 9000: starting
> > >
> > >
> > > And these are from the datanode
> > >
> > > 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 1 time(s).
> > > 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 2 time(s).
> > > 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 3 time(s).
> > > 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 4 time(s).
> > > 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 5 time(s).
> > > 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 6 time(s).
> > > 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 7 time(s).
> > > 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 8 time(s).
> > > 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 9 time(s).
> > > 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 10 time(s).
> > > 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
> > > hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, Zzzzz...
> > > 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 1 time(s).
> > > 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 2 time(s).
> > > 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
> > > tried 3 time(s).
> > >
> > >
> > > Thanks,
> > > Cedric
> >
> >
>


--
愛@上.Keyboard
Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Jin Yiqing
i guess, i just guess that maybe you have got the same problem as i have
got.
you may try edit the /etc/hosts file on each machine running datanode, and
add add all server names of each nodes to the file,such as
192.168.1.179    yourservername1
192.168.1.180    yourservername2
...

and try again. i hope that will work.



2007/5/14, Cedric Ho <[hidden email]>:

>
> Oh, and I also tried to use 192.168.1.179 as a datanode itself, and
> only this datanode connect to the namenode on this same host
> successfully.
>
> On 5/14/07, Cedric Ho <[hidden email]> wrote:
> > I performed more testing on this. While the namenode is running, I
> > cannot connect to 192.168.1.179:9000 from other machines, but I can
> > connect to it locally. It seems that the serverSocket only bind to the
> > 127.0.0.1:9000 but not 192.168.1.179:9000.
> >
> > I've also confirmed that there's no firewall, connection bocking etc
> > on this machine. In fact I've written a small Java program that open a
> > serverSocket on 9000, started with the same user on the same machine.
> > And I am able to connect to it from all other machines.
> >
> > So is there some settings that will cause the namenode to only bind to
> > the 9000 port on the local interface ?
> >
> > Cedric
> >
> >
> > On 5/12/07, Michael Bieniosek <[hidden email]> wrote:
> > > I would try to debug this as a network problem - when the namenode is
> > > running, can you connect to 192.168.1.179:9000 from the machine the
> datanode
> > > is on?
> > >
> > > While the namenode does use a lot of RAM as the cluster size
> increases, an
> > > overloaded namenode will typically start panicking in its log
> messages.
> > > This doesn't occur in your namenode logs - it doesn't appear any
> datanodes
> > > connected at all.
> > >
> > > -Michael
> > >
> > > On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:
> > >
> > > > Hi all,
> > > >
> > > > We were trying to setup hadoop in our linux environments. When we
> > > > tried to use a slow machine as the namenode (some Pentium III
> machine
> > > > with 512Mb ram). It seems that it was unable to accept connection
> from
> > > > other datanodes. (I can access its status from http at port 50070
> > > > however).
> > > >
> > > > But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
> > > > The settings, etc are exactly the same.
> > > >
> > > > The problem seems to be on the namenode. Is it because the machine
> is slow ?
> > > >
> > > > The version we use is 0.12.3
> > > >
> > > > Any help is appreciated.
> > > >
> > > >
> > > > Following is the log from the abnormal namenode.
> > > >
> > > > 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange:
> STATE*
> > > > Network topology has 0 racks and 0 datanodes
> > > > 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange:
> STATE*
> > > > UnderReplicatedBlocks has 0 blocks
> > > > 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
> > > > Resource aliases
> > > > 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version
> Jetty/5.1.4
> > > > 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
> > > > org.mortbay.jetty.servlet.WebApplicationHandler@587c94
> > > > 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
> > > > WebApplicationContext[/,/]
> > > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> > > > HttpContext[/logs,/logs]
> > > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> > > > HttpContext[/static,/static]
> > > > 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener:
> Started
> > > > SocketListener on 0.0.0.0:50070
> > > > 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
> > > > org.mortbay.jetty.Server@e53108
> > > > 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > listener on 9000: starting
> > > > 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 0 on 9000: starting
> > > > 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 1 on 9000: starting
> > > > 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 2 on 9000: starting
> > > > 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 3 on 9000: starting
> > > > 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 4 on 9000: starting
> > > > 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 5 on 9000: starting
> > > > 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 6 on 9000: starting
> > > > 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 7 on 9000: starting
> > > > 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 8 on 9000: starting
> > > > 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC
> Server
> > > > handler 9 on 9000: starting
> > > >
> > > >
> > > > And these are from the datanode
> > > >
> > > > 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 1 time(s).
> > > > 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 2 time(s).
> > > > 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 3 time(s).
> > > > 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 4 time(s).
> > > > 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 5 time(s).
> > > > 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 6 time(s).
> > > > 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 7 time(s).
> > > > 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 8 time(s).
> > > > 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 9 time(s).
> > > > 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 10 time(s).
> > > > 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
> > > > hadoop01.ourcompany.com/192.168.1.179:9000 not available yet,
> Zzzzz...
> > > > 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 1 time(s).
> > > > 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 2 time(s).
> > > > 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
> > > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> Already
> > > > tried 3 time(s).
> > > >
> > > >
> > > > Thanks,
> > > > Cedric
> > >
> > >
> >
>
>
> --
> ��@上.Keyboard
>
Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Michael Bieniosek-2
In reply to this post by Cedric Ho
In hadoop-site.xml on your namenode, what is the value of fs.default.name?
It should be set to the fully-qualified domain name of the host.

On 5/13/07 11:40 PM, "Cedric Ho" <[hidden email]> wrote:

> I performed more testing on this. While the namenode is running, I
> cannot connect to 192.168.1.179:9000 from other machines, but I can
> connect to it locally. It seems that the serverSocket only bind to the
> 127.0.0.1:9000 but not 192.168.1.179:9000.
>
> I've also confirmed that there's no firewall, connection bocking etc
> on this machine. In fact I've written a small Java program that open a
> serverSocket on 9000, started with the same user on the same machine.
> And I am able to connect to it from all other machines.
>
> So is there some settings that will cause the namenode to only bind to
> the 9000 port on the local interface ?
>
> Cedric
>
>
> On 5/12/07, Michael Bieniosek <[hidden email]> wrote:
>> I would try to debug this as a network problem - when the namenode is
>> running, can you connect to 192.168.1.179:9000 from the machine the datanode
>> is on?
>>
>> While the namenode does use a lot of RAM as the cluster size increases, an
>> overloaded namenode will typically start panicking in its log messages.
>> This doesn't occur in your namenode logs - it doesn't appear any datanodes
>> connected at all.
>>
>> -Michael
>>
>> On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:
>>
>>> Hi all,
>>>
>>> We were trying to setup hadoop in our linux environments. When we
>>> tried to use a slow machine as the namenode (some Pentium III machine
>>> with 512Mb ram). It seems that it was unable to accept connection from
>>> other datanodes. (I can access its status from http at port 50070
>>> however).
>>>
>>> But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
>>> The settings, etc are exactly the same.
>>>
>>> The problem seems to be on the namenode. Is it because the machine is slow ?
>>>
>>> The version we use is 0.12.3
>>>
>>> Any help is appreciated.
>>>
>>>
>>> Following is the log from the abnormal namenode.
>>>
>>> 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange: STATE*
>>> Network topology has 0 racks and 0 datanodes
>>> 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange: STATE*
>>> UnderReplicatedBlocks has 0 blocks
>>> 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
>>> Resource aliases
>>> 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version
>>> Jetty/5.1.4
>>> 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
>>> org.mortbay.jetty.servlet.WebApplicationHandler@587c94
>>> 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
>>> WebApplicationContext[/,/]
>>> 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
>>> HttpContext[/logs,/logs]
>>> 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
>>> HttpContext[/static,/static]
>>> 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
>>> SocketListener on 0.0.0.0:50070
>>> 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
>>> org.mortbay.jetty.Server@e53108
>>> 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> listener on 9000: starting
>>> 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 0 on 9000: starting
>>> 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 1 on 9000: starting
>>> 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 2 on 9000: starting
>>> 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 3 on 9000: starting
>>> 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 4 on 9000: starting
>>> 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 5 on 9000: starting
>>> 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 6 on 9000: starting
>>> 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 7 on 9000: starting
>>> 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 8 on 9000: starting
>>> 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
>>> handler 9 on 9000: starting
>>>
>>>
>>> And these are from the datanode
>>>
>>> 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 1 time(s).
>>> 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 2 time(s).
>>> 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 3 time(s).
>>> 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 4 time(s).
>>> 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 5 time(s).
>>> 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 6 time(s).
>>> 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 7 time(s).
>>> 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 8 time(s).
>>> 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 9 time(s).
>>> 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 10 time(s).
>>> 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
>>> hadoop01.ourcompany.com/192.168.1.179:9000 not available yet, Zzzzz...
>>> 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 1 time(s).
>>> 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 2 time(s).
>>> 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
>>> connect to server: hadoop01.ourcompany.com/192.168.1.179:9000. Already
>>> tried 3 time(s).
>>>
>>>
>>> Thanks,
>>> Cedric
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Dennis Kubes
In reply to this post by Cedric Ho
We have run into this problem before.  If you have a static address for
the machine, make sure that your hosts file is pointing to the static
address for the namenode host name as opposed to the 127.0.0.1 address.
  It should look something like this with the values replaced with your
values.

127.0.0.1               localhost.localdomain localhost
192.x.x.x               yourhost.yourdomain.com yourhost

Dennis Kubes

Cedric Ho wrote:

> Oh, and I also tried to use 192.168.1.179 as a datanode itself, and
> only this datanode connect to the namenode on this same host
> successfully.
>
> On 5/14/07, Cedric Ho <[hidden email]> wrote:
>> I performed more testing on this. While the namenode is running, I
>> cannot connect to 192.168.1.179:9000 from other machines, but I can
>> connect to it locally. It seems that the serverSocket only bind to the
>> 127.0.0.1:9000 but not 192.168.1.179:9000.
>>
>> I've also confirmed that there's no firewall, connection bocking etc
>> on this machine. In fact I've written a small Java program that open a
>> serverSocket on 9000, started with the same user on the same machine.
>> And I am able to connect to it from all other machines.
>>
>> So is there some settings that will cause the namenode to only bind to
>> the 9000 port on the local interface ?
>>
>> Cedric
>>
>>
>> On 5/12/07, Michael Bieniosek <[hidden email]> wrote:
>> > I would try to debug this as a network problem - when the namenode is
>> > running, can you connect to 192.168.1.179:9000 from the machine the
>> datanode
>> > is on?
>> >
>> > While the namenode does use a lot of RAM as the cluster size
>> increases, an
>> > overloaded namenode will typically start panicking in its log messages.
>> > This doesn't occur in your namenode logs - it doesn't appear any
>> datanodes
>> > connected at all.
>> >
>> > -Michael
>> >
>> > On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:
>> >
>> > > Hi all,
>> > >
>> > > We were trying to setup hadoop in our linux environments. When we
>> > > tried to use a slow machine as the namenode (some Pentium III machine
>> > > with 512Mb ram). It seems that it was unable to accept connection
>> from
>> > > other datanodes. (I can access its status from http at port 50070
>> > > however).
>> > >
>> > > But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
>> > > The settings, etc are exactly the same.
>> > >
>> > > The problem seems to be on the namenode. Is it because the machine
>> is slow ?
>> > >
>> > > The version we use is 0.12.3
>> > >
>> > > Any help is appreciated.
>> > >
>> > >
>> > > Following is the log from the abnormal namenode.
>> > >
>> > > 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange:
>> STATE*
>> > > Network topology has 0 racks and 0 datanodes
>> > > 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange:
>> STATE*
>> > > UnderReplicatedBlocks has 0 blocks
>> > > 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
>> > > Resource aliases
>> > > 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version
>> Jetty/5.1.4
>> > > 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
>> > > org.mortbay.jetty.servlet.WebApplicationHandler@587c94
>> > > 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
>> > > WebApplicationContext[/,/]
>> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
>> > > HttpContext[/logs,/logs]
>> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
>> > > HttpContext[/static,/static]
>> > > 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
>> > > SocketListener on 0.0.0.0:50070
>> > > 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
>> > > org.mortbay.jetty.Server@e53108
>> > > 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > listener on 9000: starting
>> > > 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 0 on 9000: starting
>> > > 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 1 on 9000: starting
>> > > 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 2 on 9000: starting
>> > > 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 3 on 9000: starting
>> > > 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 4 on 9000: starting
>> > > 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 5 on 9000: starting
>> > > 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 6 on 9000: starting
>> > > 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 7 on 9000: starting
>> > > 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 8 on 9000: starting
>> > > 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
>> > > handler 9 on 9000: starting
>> > >
>> > >
>> > > And these are from the datanode
>> > >
>> > > 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 1 time(s).
>> > > 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 2 time(s).
>> > > 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 3 time(s).
>> > > 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 4 time(s).
>> > > 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 5 time(s).
>> > > 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 6 time(s).
>> > > 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 7 time(s).
>> > > 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 8 time(s).
>> > > 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 9 time(s).
>> > > 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 10 time(s).
>> > > 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
>> > > hadoop01.ourcompany.com/192.168.1.179:9000 not available yet,
>> Zzzzz...
>> > > 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 1 time(s).
>> > > 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 2 time(s).
>> > > 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
>> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
>> Already
>> > > tried 3 time(s).
>> > >
>> > >
>> > > Thanks,
>> > > Cedric
>> >
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Namenode cannot accept connection from datanode

Cedric Ho
This is it! The problem is solved after I make the changes you suggested.

Thanks everyone for helping =)

Cedric


On 5/15/07, Dennis Kubes <[hidden email]> wrote:

> We have run into this problem before.  If you have a static address for
> the machine, make sure that your hosts file is pointing to the static
> address for the namenode host name as opposed to the 127.0.0.1 address.
>   It should look something like this with the values replaced with your
> values.
>
> 127.0.0.1               localhost.localdomain localhost
> 192.x.x.x               yourhost.yourdomain.com yourhost
>
> Dennis Kubes
>
> Cedric Ho wrote:
> > Oh, and I also tried to use 192.168.1.179 as a datanode itself, and
> > only this datanode connect to the namenode on this same host
> > successfully.
> >
> > On 5/14/07, Cedric Ho <[hidden email]> wrote:
> >> I performed more testing on this. While the namenode is running, I
> >> cannot connect to 192.168.1.179:9000 from other machines, but I can
> >> connect to it locally. It seems that the serverSocket only bind to the
> >> 127.0.0.1:9000 but not 192.168.1.179:9000.
> >>
> >> I've also confirmed that there's no firewall, connection bocking etc
> >> on this machine. In fact I've written a small Java program that open a
> >> serverSocket on 9000, started with the same user on the same machine.
> >> And I am able to connect to it from all other machines.
> >>
> >> So is there some settings that will cause the namenode to only bind to
> >> the 9000 port on the local interface ?
> >>
> >> Cedric
> >>
> >>
> >> On 5/12/07, Michael Bieniosek <[hidden email]> wrote:
> >> > I would try to debug this as a network problem - when the namenode is
> >> > running, can you connect to 192.168.1.179:9000 from the machine the
> >> datanode
> >> > is on?
> >> >
> >> > While the namenode does use a lot of RAM as the cluster size
> >> increases, an
> >> > overloaded namenode will typically start panicking in its log messages.
> >> > This doesn't occur in your namenode logs - it doesn't appear any
> >> datanodes
> >> > connected at all.
> >> >
> >> > -Michael
> >> >
> >> > On 5/10/07 7:39 PM, "Cedric Ho" <[hidden email]> wrote:
> >> >
> >> > > Hi all,
> >> > >
> >> > > We were trying to setup hadoop in our linux environments. When we
> >> > > tried to use a slow machine as the namenode (some Pentium III machine
> >> > > with 512Mb ram). It seems that it was unable to accept connection
> >> from
> >> > > other datanodes. (I can access its status from http at port 50070
> >> > > however).
> >> > >
> >> > > But it works fine on a faster machine (Pentium4 3Ghz with 3Gb ram).
> >> > > The settings, etc are exactly the same.
> >> > >
> >> > > The problem seems to be on the namenode. Is it because the machine
> >> is slow ?
> >> > >
> >> > > The version we use is 0.12.3
> >> > >
> >> > > Any help is appreciated.
> >> > >
> >> > >
> >> > > Following is the log from the abnormal namenode.
> >> > >
> >> > > 2007-05-09 18:18:46,998 INFO org.apache.hadoop.dfs.StateChange:
> >> STATE*
> >> > > Network topology has 0 racks and 0 datanodes
> >> > > 2007-05-09 18:18:47,000 INFO org.apache.hadoop.dfs.StateChange:
> >> STATE*
> >> > > UnderReplicatedBlocks has 0 blocks
> >> > > 2007-05-09 18:18:47,432 INFO org.mortbay.util.Credential: Checking
> >> > > Resource aliases
> >> > > 2007-05-09 18:18:48,051 INFO org.mortbay.http.HttpServer: Version
> >> Jetty/5.1.4
> >> > > 2007-05-09 18:18:50,524 INFO org.mortbay.util.Container: Started
> >> > > org.mortbay.jetty.servlet.WebApplicationHandler@587c94
> >> > > 2007-05-09 18:18:51,064 INFO org.mortbay.util.Container: Started
> >> > > WebApplicationContext[/,/]
> >> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> >> > > HttpContext[/logs,/logs]
> >> > > 2007-05-09 18:18:51,065 INFO org.mortbay.util.Container: Started
> >> > > HttpContext[/static,/static]
> >> > > 2007-05-09 18:18:51,147 INFO org.mortbay.http.SocketListener: Started
> >> > > SocketListener on 0.0.0.0:50070
> >> > > 2007-05-09 18:18:51,148 INFO org.mortbay.util.Container: Started
> >> > > org.mortbay.jetty.Server@e53108
> >> > > 2007-05-09 18:18:51,223 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > listener on 9000: starting
> >> > > 2007-05-09 18:18:51,226 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 0 on 9000: starting
> >> > > 2007-05-09 18:18:51,227 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 1 on 9000: starting
> >> > > 2007-05-09 18:18:51,228 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 2 on 9000: starting
> >> > > 2007-05-09 18:18:51,229 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 3 on 9000: starting
> >> > > 2007-05-09 18:18:51,391 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 4 on 9000: starting
> >> > > 2007-05-09 18:18:51,392 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 5 on 9000: starting
> >> > > 2007-05-09 18:18:51,393 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 6 on 9000: starting
> >> > > 2007-05-09 18:18:51,394 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 7 on 9000: starting
> >> > > 2007-05-09 18:18:51,395 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 8 on 9000: starting
> >> > > 2007-05-09 18:18:51,397 INFO org.apache.hadoop.ipc.Server: IPC Server
> >> > > handler 9 on 9000: starting
> >> > >
> >> > >
> >> > > And these are from the datanode
> >> > >
> >> > > 2007-05-09 18:35:13,263 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 1 time(s).
> >> > > 2007-05-09 18:35:14,266 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 2 time(s).
> >> > > 2007-05-09 18:35:15,270 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 3 time(s).
> >> > > 2007-05-09 18:35:16,274 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 4 time(s).
> >> > > 2007-05-09 18:35:17,279 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 5 time(s).
> >> > > 2007-05-09 18:35:18,283 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 6 time(s).
> >> > > 2007-05-09 18:35:19,288 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 7 time(s).
> >> > > 2007-05-09 18:35:20,293 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 8 time(s).
> >> > > 2007-05-09 18:35:21,295 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 9 time(s).
> >> > > 2007-05-09 18:35:22,298 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 10 time(s).
> >> > > 2007-05-09 18:35:23,304 INFO org.apache.hadoop.ipc.RPC: Server at
> >> > > hadoop01.ourcompany.com/192.168.1.179:9000 not available yet,
> >> Zzzzz...
> >> > > 2007-05-09 18:35:24,308 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 1 time(s).
> >> > > 2007-05-09 18:35:25,317 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 2 time(s).
> >> > > 2007-05-09 18:35:26,322 INFO org.apache.hadoop.ipc.Client: Retrying
> >> > > connect to server: hadoop01.ourcompany.com/192.168.1.179:9000.
> >> Already
> >> > > tried 3 time(s).
> >> > >
> >> > >
> >> > > Thanks,
> >> > > Cedric
> >> >
> >> >
> >>
> >
> >
>