a million log lines from one job tracker startup

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

a million log lines from one job tracker startup

kate rhodes
running r578879.
I shut everything down, wiped out all logs, then ran bin/start-all.sh

when the jobtracker starts up i get 939,805 lines composed of some
startup messages, and then on stack trace repeated again and again
(see below). After that is the following line and then it sits there
happily doing nothing (nothing is being run through the system):
2007-09-26 09:58:34,192 INFO org.apache.hadoop.mapred.JobTracker:
Starting RUNNING

I'm wondering why i'm seeing this repeated stack trace and why i'm
seeing almost *a million lines worth* of it just from one startup?



The repeated stack trace is:

2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
problem cleaning system directory:
/home/krhodes/hadoop_files/temp/krhodes/mapred/system
org.apache.hadoop.ipc.RemoteException:
org.apache.hadoop.dfs.SafeModeException: Cannot delete
/home/krhodes/hadoop_files/temp/krhodes/mapred/system. Name node is in
safe mode.
Safe mode will be turned off automatically.
        at org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1222)
        at org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
        at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
        at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)

        at org.apache.hadoop.ipc.Client.call(Client.java:470)
        at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
        at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:585)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
        at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
        at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
        at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:406)
        at org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:150)
        at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:690)
        at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:116)
        at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:1851)



--
- kate = masukomi
http://weblog.masukomi.org/
Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

kate rhodes
on a related note
433,797 log lines in the namenode log from the same startup (shorter
version of same stack trace repeated)



On 9/26/07, kate rhodes <[hidden email]> wrote:

> running r578879.
> I shut everything down, wiped out all logs, then ran bin/start-all.sh
>
> when the jobtracker starts up i get 939,805 lines composed of some
> startup messages, and then on stack trace repeated again and again
> (see below). After that is the following line and then it sits there
> happily doing nothing (nothing is being run through the system):
> 2007-09-26 09:58:34,192 INFO org.apache.hadoop.mapred.JobTracker:
> Starting RUNNING
>
> I'm wondering why i'm seeing this repeated stack trace and why i'm
> seeing almost *a million lines worth* of it just from one startup?
>
>
>
> The repeated stack trace is:
>
> 2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
> problem cleaning system directory:
> /home/krhodes/hadoop_files/temp/krhodes/mapred/system
> org.apache.hadoop.ipc.RemoteException:
> org.apache.hadoop.dfs.SafeModeException: Cannot delete
> /home/krhodes/hadoop_files/temp/krhodes/mapred/system. Name node is in
> safe mode.
> Safe mode will be turned off automatically.
>         at org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1222)
>         at org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
>         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
>         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
>
>         at org.apache.hadoop.ipc.Client.call(Client.java:470)
>         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
>         at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
>         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
>         at java.lang.reflect.Method.invoke(Method.java:585)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
>         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
>         at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
>         at org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:406)
>         at org.apache.hadoop.dfs.DistributedFileSystem.delete(DistributedFileSystem.java:150)
>         at org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:690)
>         at org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:116)
>         at org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:1851)
>
>
>
> --
> - kate = masukomi
> http://weblog.masukomi.org/
>


--
- kate = masukomi
http://weblog.masukomi.org/
Reply | Threaded
Open this post in threaded view
|

RE: a million log lines from one job tracker startup

Devaraj Das
This is okay. At the JobTracker startup, it connects to the namenode, and
tries to delete the configured system dir. It fails to do so since the
namenode is in safe mode. The JT keeps on retrying the delete until it
succeeds. It should eventually succeed if the hdfs starts up fine.

> -----Original Message-----
> From: kate rhodes [mailto:[hidden email]]
> Sent: Wednesday, September 26, 2007 7:55 PM
> To: [hidden email]
> Subject: Re: a million log lines from one job tracker startup
>
> on a related note
> 433,797 log lines in the namenode log from the same startup
> (shorter version of same stack trace repeated)
>
>
>
> On 9/26/07, kate rhodes <[hidden email]> wrote:
> > running r578879.
> > I shut everything down, wiped out all logs, then ran
> bin/start-all.sh
> >
> > when the jobtracker starts up i get 939,805 lines composed of some
> > startup messages, and then on stack trace repeated again and again
> > (see below). After that is the following line and then it
> sits there
> > happily doing nothing (nothing is being run through the system):
> > 2007-09-26 09:58:34,192 INFO org.apache.hadoop.mapred.JobTracker:
> > Starting RUNNING
> >
> > I'm wondering why i'm seeing this repeated stack trace and why i'm
> > seeing almost *a million lines worth* of it just from one startup?
> >
> >
> >
> > The repeated stack trace is:
> >
> > 2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
> > problem cleaning system directory:
> > /home/krhodes/hadoop_files/temp/krhodes/mapred/system
> > org.apache.hadoop.ipc.RemoteException:
> > org.apache.hadoop.dfs.SafeModeException: Cannot delete
> > /home/krhodes/hadoop_files/temp/krhodes/mapred/system. Name
> node is in
> > safe mode.
> > Safe mode will be turned off automatically.
> >         at
> org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem
> .java:1222)
> >         at
> org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
> >         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
> >         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess
> orImpl.java:39)
> >         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth
> odAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:340)
> >         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:596)
> >
> >         at org.apache.hadoop.ipc.Client.call(Client.java:470)
> >         at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:165)
> >         at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
> >         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccess
> orImpl.java:39)
> >         at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMeth
> odAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:585)
> >         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod
> (RetryInvocationHandler.java:82)
> >         at
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(Retry
> InvocationHandler.java:59)
> >         at org.apache.hadoop.dfs.$Proxy0.delete(Unknown Source)
> >         at
> org.apache.hadoop.dfs.DFSClient.delete(DFSClient.java:406)
> >         at
> org.apache.hadoop.dfs.DistributedFileSystem.delete(Distributed
> FileSystem.java:150)
> >         at
> org.apache.hadoop.mapred.JobTracker.<init>(JobTracker.java:690)
> >         at
> org.apache.hadoop.mapred.JobTracker.startTracker(JobTracker.java:116)
> >         at
> > org.apache.hadoop.mapred.JobTracker.main(JobTracker.java:1851)
> >
> >
> >
> > --
> > - kate = masukomi
> > http://weblog.masukomi.org/
> >
>
>
> --
> - kate = masukomi
> http://weblog.masukomi.org/
>

Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

Ted Dunning-3
In reply to this post by kate rhodes

It looks like you have a problem with insufficient replication or a
corrupted file.  This can happen if you are running with low replication
count and have lost a datanode or few.  I have also seen this happen
associated with somewhat aggressive nuking of hadoop jobs or processes or
overfull disk (I am not sure which).  In that case, I wound up with missing
blocks for map reduce intermediate output.

The simplest, but almost always unsatisfactory repair is to simply nuke the
contents of HDFS and reload cleanly.

It is also possible that the namenode will eventually be able to repair the
situation.

You may also be able to repair the file system piece-meal if the persistent
problems that you are experiencing have to do with files that you don¹t care
about.  To do this, you would use hadoop fsck / to find what the problems
really are, turn off safe mode by hand (warning, Will Robinson, DANGER), and
delete the files that are causing problems.  This is somewhat laborious.  I
think that there is a ³force repair² option on fsck, but I was unable to get
that right.

If you are a real cowboy, you can simply turn off safe mode and go forward.
If the goobered files are not important to you, this can let you get some
work.  This is a really bad idea, of course, since you are circumventing
some really important safe-guards.

My own impression of having experienced this as well as having watched files
slooowwly be replicated more widely after changing the replication count for
a bunch of files is that I would love to be able to tell the namenode to be
very aggressive about repairing replication issues.  Normally, the slow pace
that is used for fixing under-replication is a good thing since it allows
you to continue with additional work while replication goes on, but there
are situations where you really want the issues resolved sooner.


On 9/26/07 7:25 AM, "kate rhodes" <[hidden email]> wrote:

>> 2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
>> problem cleaning system directory:
>> /home/krhodes/hadoop_files/temp/krhodes/mapred/system
>> org.apache.hadoop.ipc.RemoteException:
>> org.apache.hadoop.dfs.SafeModeException: Cannot delete
>> /home/krhodes/hadoop_files/temp/krhodes/mapred/system. Name node is in
>> safe mode.
>> Safe mode will be turned off automatically.
>>         at
>> org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem.java:1222)
>>         at org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
>>         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
>>         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>>         at

Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

Ted Dunning-3
In reply to this post by Devaraj Das

It is decidedly not OK in a near disk full situation.  My experience was
that HDFS filled the disk (almost) and then these logs made the situation
nearly unrecoverable.


On 9/26/07 8:46 AM, "Devaraj Das" <[hidden email]> wrote:

> This is okay. At the JobTracker startup, it connects to the namenode, and
> tries to delete the configured system dir. It fails to do so since the
> namenode is in safe mode. The JT keeps on retrying the delete until it
> succeeds. It should eventually succeed if the hdfs starts up fine.

Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

Owen O'Malley-4

On Sep 26, 2007, at 9:02 AM, Ted Dunning wrote:

> It is decidedly not OK in a near disk full situation.  My  
> experience was
> that HDFS filled the disk (almost) and then these logs made the  
> situation
> nearly unrecoverable.

If disk is tight, you probably should configure log4j to roll the  
logs based on size rather than the default of daily.  I just created  
the jira HADOOP-1947 to make it easier to configure the appender and  
the associated rolling strategy.

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

kate rhodes
I think that making that configurable is a good thing but also it just
seems an excessive amount of lines to begin with.  Can anyone explain
why this is getting called so many times? What value is there in
calling it that much in such a short span of time (this is like 2-3
seconds).

And do we really need to be dumping these very similar stack traces to
the job tracker logs AND the namenode logs?

Also, in my specific case, I format the namenode, start up and all is
good. Then i put about 7Gb of logs in via bin/hadoop dfs -put ...   ,
( dfs.replication is 1 ) wait for it to finish. stop-all.sh clear the
logs start-all.sh and voilla I'm back to a million+ log lines on
startup.

--
- kate = masukomi
http://weblog.masukomi.org/

On 9/26/07, Owen O'Malley <[hidden email]> wrote:

>
> On Sep 26, 2007, at 9:02 AM, Ted Dunning wrote:
>
> > It is decidedly not OK in a near disk full situation.  My
> > experience was
> > that HDFS filled the disk (almost) and then these logs made the
> > situation
> > nearly unrecoverable.
>
> If disk is tight, you probably should configure log4j to roll the
> logs based on size rather than the default of daily.  I just created
> the jira HADOOP-1947 to make it easier to configure the appender and
> the associated rolling strategy.
>
> -- Owen
>
Reply | Threaded
Open this post in threaded view
|

RE: a million log lines from one job tracker startup

Devaraj Das
In reply to this post by Ted Dunning-3
Ted, to clarify my earlier mail... So what I thought was Kate was curious to
know why thousands of lines of the delete exception log messages appeared at
the JobTracker startup.  Now, what you pointed out regarding corrupted
block, etc., makes sense - it might prevent the namenode from coming out of
the safe mode, but the namenode goes into safe mode every time it starts up,
and even without any dfs corruption, the JobTracker would get those
exceptions to do with deleting the mapred's system directory until the
NameNode comes out the Safe more. From his log message "Starting RUNNING",
it is clear that the namenode did come out of the Safe mode and entered a
consistent state, otherwise the JobTracker wouldn't have entered the RUNNING
state.

> -----Original Message-----
> From: Ted Dunning [mailto:[hidden email]]
> Sent: Wednesday, September 26, 2007 9:31 PM
> To: [hidden email]
> Subject: Re: a million log lines from one job tracker startup
>
>
> It looks like you have a problem with insufficient
> replication or a corrupted file.  This can happen if you are
> running with low replication count and have lost a datanode
> or few.  I have also seen this happen associated with
> somewhat aggressive nuking of hadoop jobs or processes or
> overfull disk (I am not sure which).  In that case, I wound
> up with missing blocks for map reduce intermediate output.
>
> The simplest, but almost always unsatisfactory repair is to
> simply nuke the contents of HDFS and reload cleanly.
>
> It is also possible that the namenode will eventually be able
> to repair the situation.
>
> You may also be able to repair the file system piece-meal if
> the persistent problems that you are experiencing have to do
> with files that you don¹t care about.  To do this, you would
> use hadoop fsck / to find what the problems really are, turn
> off safe mode by hand (warning, Will Robinson, DANGER), and
> delete the files that are causing problems.  This is somewhat
> laborious.  I think that there is a ³force repair² option on
> fsck, but I was unable to get that right.
>
> If you are a real cowboy, you can simply turn off safe mode
> and go forward.
> If the goobered files are not important to you, this can let
> you get some work.  This is a really bad idea, of course,
> since you are circumventing some really important safe-guards.
>
> My own impression of having experienced this as well as
> having watched files slooowwly be replicated more widely
> after changing the replication count for a bunch of files is
> that I would love to be able to tell the namenode to be very
> aggressive about repairing replication issues.  Normally, the
> slow pace that is used for fixing under-replication is a good
> thing since it allows you to continue with additional work
> while replication goes on, but there are situations where you
> really want the issues resolved sooner.
>
>
> On 9/26/07 7:25 AM, "kate rhodes" <[hidden email]> wrote:
>
> >> 2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
> >> problem cleaning system directory:
> >> /home/krhodes/hadoop_files/temp/krhodes/mapred/system
> >> org.apache.hadoop.ipc.RemoteException:
> >> org.apache.hadoop.dfs.SafeModeException: Cannot delete
> >> /home/krhodes/hadoop_files/temp/krhodes/mapred/system.
> Name node is
> >> in safe mode.
> >> Safe mode will be turned off automatically.
> >>         at
> >>
> org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem
> .java:1222)
> >>         at
> org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
> >>         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
> >>         at
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >>         at
>
>

Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

kate rhodes
Is it me or is a million log lines a bit excessive if this is a
sequence of events that is not only possible but intended? If it is
supposed to go into safe mode every time it starts then things
shouldn't be throwing exceptions about it doing what it was supposed
to?

Should i file a ticket about this? Because I've just spent the past
few hours trying to figure out what I'd misconfigured in order to
generate all this crap only to find out it's just a horribly
misleading and verbose set of errors.

-Kate

On 9/26/07, Devaraj Das <[hidden email]> wrote:

> Ted, to clarify my earlier mail... So what I thought was Kate was curious to
> know why thousands of lines of the delete exception log messages appeared at
> the JobTracker startup.  Now, what you pointed out regarding corrupted
> block, etc., makes sense - it might prevent the namenode from coming out of
> the safe mode, but the namenode goes into safe mode every time it starts up,
> and even without any dfs corruption, the JobTracker would get those
> exceptions to do with deleting the mapred's system directory until the
> NameNode comes out the Safe more. From his log message "Starting RUNNING",
> it is clear that the namenode did come out of the Safe mode and entered a
> consistent state, otherwise the JobTracker wouldn't have entered the RUNNING
> state.
>
> > -----Original Message-----
> > From: Ted Dunning [mailto:[hidden email]]
> > Sent: Wednesday, September 26, 2007 9:31 PM
> > To: [hidden email]
> > Subject: Re: a million log lines from one job tracker startup
> >
> >
> > It looks like you have a problem with insufficient
> > replication or a corrupted file.  This can happen if you are
> > running with low replication count and have lost a datanode
> > or few.  I have also seen this happen associated with
> > somewhat aggressive nuking of hadoop jobs or processes or
> > overfull disk (I am not sure which).  In that case, I wound
> > up with missing blocks for map reduce intermediate output.
> >
> > The simplest, but almost always unsatisfactory repair is to
> > simply nuke the contents of HDFS and reload cleanly.
> >
> > It is also possible that the namenode will eventually be able
> > to repair the situation.
> >
> > You may also be able to repair the file system piece-meal if
> > the persistent problems that you are experiencing have to do
> > with files that you don¹t care about.  To do this, you would
> > use hadoop fsck / to find what the problems really are, turn
> > off safe mode by hand (warning, Will Robinson, DANGER), and
> > delete the files that are causing problems.  This is somewhat
> > laborious.  I think that there is a ³force repair² option on
> > fsck, but I was unable to get that right.
> >
> > If you are a real cowboy, you can simply turn off safe mode
> > and go forward.
> > If the goobered files are not important to you, this can let
> > you get some work.  This is a really bad idea, of course,
> > since you are circumventing some really important safe-guards.
> >
> > My own impression of having experienced this as well as
> > having watched files slooowwly be replicated more widely
> > after changing the replication count for a bunch of files is
> > that I would love to be able to tell the namenode to be very
> > aggressive about repairing replication issues.  Normally, the
> > slow pace that is used for fixing under-replication is a good
> > thing since it allows you to continue with additional work
> > while replication goes on, but there are situations where you
> > really want the issues resolved sooner.
> >
> >
> > On 9/26/07 7:25 AM, "kate rhodes" <[hidden email]> wrote:
> >
> > >> 2007-09-26 09:58:06,472 INFO org.apache.hadoop.mapred.JobTracker:
> > >> problem cleaning system directory:
> > >> /home/krhodes/hadoop_files/temp/krhodes/mapred/system
> > >> org.apache.hadoop.ipc.RemoteException:
> > >> org.apache.hadoop.dfs.SafeModeException: Cannot delete
> > >> /home/krhodes/hadoop_files/temp/krhodes/mapred/system.
> > Name node is
> > >> in safe mode.
> > >> Safe mode will be turned off automatically.
> > >>         at
> > >>
> > org.apache.hadoop.dfs.FSNamesystem.deleteInternal(FSNamesystem
> > .java:1222)
> > >>         at
> > org.apache.hadoop.dfs.FSNamesystem.delete(FSNamesystem.java:1200)
> > >>         at org.apache.hadoop.dfs.NameNode.delete(NameNode.java:399)
> > >>         at
> > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> > >>         at
> >
> >
>
>
Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

Owen O'Malley-4

On Sep 26, 2007, at 2:11 PM, kate rhodes wrote:

> Is it me or is a million log lines a bit excessive if this is a
> sequence of events that is not only possible but intended?

It is excessive. How fast was the retry occurring?

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

kate rhodes
It retries as fast as it can.

To make matters worse, if you introduce a non IOException into the
FSNamesystem.deleteInternal(...) method it will cause an infinite
loop.

-Kate

On 9/26/07, Owen O'Malley <[hidden email]> wrote:

>
> On Sep 26, 2007, at 2:11 PM, kate rhodes wrote:
>
> > Is it me or is a million log lines a bit excessive if this is a
> > sequence of events that is not only possible but intended?
>
> It is excessive. How fast was the retry occurring?
>
> -- Owen
>


--
- kate = masukomi
http://weblog.masukomi.org/
Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

Doug Cutting
kate rhodes wrote:
> It retries as fast as it can.

Yes, I can see that.  It seems we should either insert a call to
'sleep(1000)' at JobTracker.java line 696, or remove that while loop
altogether, since JobTracker#startTracker() will already retry on a
one-second interval.  In the latter case, the directory creation should
be moved back to the top of JobTracker#JobTracker(), before there are
threads, etc. to clean up.

Owen added this loop recently, in:

   https://issues.apache.org/jira/browse/HADOOP-1819

Can you please file an issue for this in Jira?

Thanks,

Doug


Reply | Threaded
Open this post in threaded view
|

Re: a million log lines from one job tracker startup

Owen O'Malley-4

On Sep 26, 2007, at 3:14 PM, Doug Cutting wrote:

> Can you please file an issue for this in Jira?

I already filed it as HADOOP-1953.

-- Owen