how to get the time of a hadoop cluster, v0.20.2

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
hi all,

is there a way to get the current time of a hadoop cluster via the
api? in particular, getting the time from the namenode or jobtracker
would suffice.

i looked at JobClient but didn't see anything helpful.
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Niels Basjes
If you have all nodes using NTP then you can simply use the native Java SPI
to get the current system time.


On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <[hidden email]>wrote:

> hi all,
>
> is there a way to get the current time of a hadoop cluster via the
> api? in particular, getting the time from the namenode or jobtracker
> would suffice.
>
> i looked at JobClient but didn't see anything helpful.
>



--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
niels,

i'm not familiar with the native java spi. spi = service provider
interface? could you let me know if this spi is part of the hadoop
api? if so, which package/class?

but yes, all nodes on the cluster are using NTP to synchronize time.
however, the server (which is not a part of the hadoop cluster)
accessing/interfacing with the hadoop cluster cannot be assumed to be
using NTP. will this still make a difference? and actually, this is
the primary reason why i need to get the date/time of the hadoop
cluster (need to check if the date/time of the hadooop cluster is in
sync with the server).



On Tue, May 14, 2013 at 11:38 AM, Niels Basjes <[hidden email]> wrote:

> If you have all nodes using NTP then you can simply use the native Java SPI
> to get the current system time.
>
>
> On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <[hidden email]>wrote:
>
>> hi all,
>>
>> is there a way to get the current time of a hadoop cluster via the
>> api? in particular, getting the time from the namenode or jobtracker
>> would suffice.
>>
>> i looked at JobClient but didn't see anything helpful.
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Niels Basjes
I made a typo. I meant API (instead of SPI).

Have a look at this for more information:
http://stackoverflow.com/questions/833768/java-code-for-getting-current-time


If you have a client that is not under NTP then that should be the way to
fix your issue.
Once you  have that getting the current time is easy.

Niels Basjes



On Tue, May 14, 2013 at 5:46 PM, Jane Wayne <[hidden email]>wrote:

> niels,
>
> i'm not familiar with the native java spi. spi = service provider
> interface? could you let me know if this spi is part of the hadoop
> api? if so, which package/class?
>
> but yes, all nodes on the cluster are using NTP to synchronize time.
> however, the server (which is not a part of the hadoop cluster)
> accessing/interfacing with the hadoop cluster cannot be assumed to be
> using NTP. will this still make a difference? and actually, this is
> the primary reason why i need to get the date/time of the hadoop
> cluster (need to check if the date/time of the hadooop cluster is in
> sync with the server).
>
>
>
> On Tue, May 14, 2013 at 11:38 AM, Niels Basjes <[hidden email]> wrote:
> > If you have all nodes using NTP then you can simply use the native Java
> SPI
> > to get the current system time.
> >
> >
> > On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <[hidden email]
> >wrote:
> >
> >> hi all,
> >>
> >> is there a way to get the current time of a hadoop cluster via the
> >> api? in particular, getting the time from the namenode or jobtracker
> >> would suffice.
> >>
> >> i looked at JobClient but didn't see anything helpful.
> >>
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
>



--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
yes, but that gets the current time on the server, not the hadoop cluster.
i need to be able to probe the date/time of the hadoop cluster.


On Tue, May 14, 2013 at 5:09 PM, Niels Basjes <[hidden email]> wrote:

> I made a typo. I meant API (instead of SPI).
>
> Have a look at this for more information:
>
> http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
>
>
> If you have a client that is not under NTP then that should be the way to
> fix your issue.
> Once you  have that getting the current time is easy.
>
> Niels Basjes
>
>
>
> On Tue, May 14, 2013 at 5:46 PM, Jane Wayne <[hidden email]
> >wrote:
>
> > niels,
> >
> > i'm not familiar with the native java spi. spi = service provider
> > interface? could you let me know if this spi is part of the hadoop
> > api? if so, which package/class?
> >
> > but yes, all nodes on the cluster are using NTP to synchronize time.
> > however, the server (which is not a part of the hadoop cluster)
> > accessing/interfacing with the hadoop cluster cannot be assumed to be
> > using NTP. will this still make a difference? and actually, this is
> > the primary reason why i need to get the date/time of the hadoop
> > cluster (need to check if the date/time of the hadooop cluster is in
> > sync with the server).
> >
> >
> >
> > On Tue, May 14, 2013 at 11:38 AM, Niels Basjes <[hidden email]> wrote:
> > > If you have all nodes using NTP then you can simply use the native Java
> > SPI
> > > to get the current system time.
> > >
> > >
> > > On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <[hidden email]
> > >wrote:
> > >
> > >> hi all,
> > >>
> > >> is there a way to get the current time of a hadoop cluster via the
> > >> api? in particular, getting the time from the namenode or jobtracker
> > >> would suffice.
> > >>
> > >> i looked at JobClient but didn't see anything helpful.
> > >>
> > >
> > >
> > >
> > > --
> > > Best regards / Met vriendelijke groeten,
> > >
> > > Niels Basjes
> >
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Niels Basjes
If you make sure that everything uses NTP then this becomes an irrelevant
distinction.


On Thu, May 16, 2013 at 4:01 PM, Jane Wayne <[hidden email]>wrote:

> yes, but that gets the current time on the server, not the hadoop cluster.
> i need to be able to probe the date/time of the hadoop cluster.
>
>
> On Tue, May 14, 2013 at 5:09 PM, Niels Basjes <[hidden email]> wrote:
>
> > I made a typo. I meant API (instead of SPI).
> >
> > Have a look at this for more information:
> >
> >
> http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
> >
> >
> > If you have a client that is not under NTP then that should be the way to
> > fix your issue.
> > Once you  have that getting the current time is easy.
> >
> > Niels Basjes
> >
> >
> >
> > On Tue, May 14, 2013 at 5:46 PM, Jane Wayne <[hidden email]
> > >wrote:
> >
> > > niels,
> > >
> > > i'm not familiar with the native java spi. spi = service provider
> > > interface? could you let me know if this spi is part of the hadoop
> > > api? if so, which package/class?
> > >
> > > but yes, all nodes on the cluster are using NTP to synchronize time.
> > > however, the server (which is not a part of the hadoop cluster)
> > > accessing/interfacing with the hadoop cluster cannot be assumed to be
> > > using NTP. will this still make a difference? and actually, this is
> > > the primary reason why i need to get the date/time of the hadoop
> > > cluster (need to check if the date/time of the hadooop cluster is in
> > > sync with the server).
> > >
> > >
> > >
> > > On Tue, May 14, 2013 at 11:38 AM, Niels Basjes <[hidden email]>
> wrote:
> > > > If you have all nodes using NTP then you can simply use the native
> Java
> > > SPI
> > > > to get the current system time.
> > > >
> > > >
> > > > On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <
> [hidden email]
> > > >wrote:
> > > >
> > > >> hi all,
> > > >>
> > > >> is there a way to get the current time of a hadoop cluster via the
> > > >> api? in particular, getting the time from the namenode or jobtracker
> > > >> would suffice.
> > > >>
> > > >> i looked at JobClient but didn't see anything helpful.
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Best regards / Met vriendelijke groeten,
> > > >
> > > > Niels Basjes
> > >
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>



--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Michael Segel
Uhm... sort of...

Niels is essentially correct and for the most of us, just starting an NNTPd on a server that sync's with a government clock and then your local servers sync to that... will be enough. However... in more detail...

Time is relative. ;-)

Ok... being a bit more serious...

There are two things you have to consider... What is meant by 'cluster time'?  and What you want to achieve?

Each machine in the cluster has its own clock. These will still have a certain amount of drift throughout the day.

So you can set up your own NTP server. (You can either run NTPd and sync to a known government clock) or you can spend money and buy an atomic clock for your servers or machine room.
(See http://www.atomic-clock.galleon.eu.com/ )

Then periodically throughout the day, via cron, have the machines in your machine room sync to the local NTP server.
This way all of your machines will have the same and correct time.

So this will sync the clocks to a degree, but then drift sets in.

Of course you also need to set up a machine to sync from... my vote would be the Name node. ;-)

HTH

-Mike


On May 16, 2013, at 10:34 AM, Niels Basjes <[hidden email]> wrote:

> If you make sure that everything uses NTP then this becomes an irrelevant
> distinction.
>
>
> On Thu, May 16, 2013 at 4:01 PM, Jane Wayne <[hidden email]>wrote:
>
>> yes, but that gets the current time on the server, not the hadoop cluster.
>> i need to be able to probe the date/time of the hadoop cluster.
>>
>>
>> On Tue, May 14, 2013 at 5:09 PM, Niels Basjes <[hidden email]> wrote:
>>
>>> I made a typo. I meant API (instead of SPI).
>>>
>>> Have a look at this for more information:
>>>
>>>
>> http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
>>>
>>>
>>> If you have a client that is not under NTP then that should be the way to
>>> fix your issue.
>>> Once you  have that getting the current time is easy.
>>>
>>> Niels Basjes
>>>
>>>
>>>
>>> On Tue, May 14, 2013 at 5:46 PM, Jane Wayne <[hidden email]
>>>> wrote:
>>>
>>>> niels,
>>>>
>>>> i'm not familiar with the native java spi. spi = service provider
>>>> interface? could you let me know if this spi is part of the hadoop
>>>> api? if so, which package/class?
>>>>
>>>> but yes, all nodes on the cluster are using NTP to synchronize time.
>>>> however, the server (which is not a part of the hadoop cluster)
>>>> accessing/interfacing with the hadoop cluster cannot be assumed to be
>>>> using NTP. will this still make a difference? and actually, this is
>>>> the primary reason why i need to get the date/time of the hadoop
>>>> cluster (need to check if the date/time of the hadooop cluster is in
>>>> sync with the server).
>>>>
>>>>
>>>>
>>>> On Tue, May 14, 2013 at 11:38 AM, Niels Basjes <[hidden email]>
>> wrote:
>>>>> If you have all nodes using NTP then you can simply use the native
>> Java
>>>> SPI
>>>>> to get the current system time.
>>>>>
>>>>>
>>>>> On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <
>> [hidden email]
>>>>> wrote:
>>>>>
>>>>>> hi all,
>>>>>>
>>>>>> is there a way to get the current time of a hadoop cluster via the
>>>>>> api? in particular, getting the time from the namenode or jobtracker
>>>>>> would suffice.
>>>>>>
>>>>>> i looked at JobClient but didn't see anything helpful.
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Best regards / Met vriendelijke groeten,
>>>>>
>>>>> Niels Basjes
>>>>
>>>
>>>
>>>
>>> --
>>> Best regards / Met vriendelijke groeten,
>>>
>>> Niels Basjes
>>>
>>
>
>
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes

Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
> What is meant by 'cluster time'?  and What you want to achieve?

let me try to clarify. i have a hadoop cluster (e.g. name node, data nodes,
job tracker, task trackers, etc...). all the nodes in this hadoop cluster
use ntp to sync time.

i have another computer (which i have referred to as a server, since it is
running tomcat), and this computer is NOT a part of the hadoop cluster (it
doesn't run any of the hadoop daemons), but does submit jobs to the hadoop
cluster via a JEE webapp interface. i need to check that the time on this
computer is in sync with the time on the hadoop cluster. when i say "check
that the time is in sync", there is a defined tolerance/threshold
difference in date/time that i am willing to accept (e.g. the date/time
should be the same down to the minute).

so, using niels link, i can get the time on the "server" (the computer that
is running tomcat and not a part of the hadoop cluster). which solves 1/3
of the problem.
how do i get the time of the hadoop cluster? this is 1/3 of the problem.
the last 1/3 of the problem, for me, is to then take the time on the
"server", denote this as A, the time on the hadoop cluster, denote this as
B, and subtract them,

C = | A - B |

and then i want to see if C < threshold.

by "cluster time", i am assuming, per my understanding, that the hadoop
cluster (all its nodes), somehow has a notion of "the time" (maybe i'm
wrong). now, i know that having all the date/time to the second or
millisecond between all the hadoop nodes to be exactly the same is unlikely
(similar to what you have stated). but, at least, the date/time between the
nodes should be the same down to the minute (i think that's reasonably fair
to expect that condition). but even if that's not the case, that's ok,
because that's not really what i'm trying to check (not my goal to ensure
time sync, as my goal is to probe the date/time from the cluster and
compare it to the "server").

so, is there a way to programmatically (via the hadoop API) get the hadoop
cluster's date/time? or can i get the date/time via the hadoop API from
just the name node or job tracker? (preferably the latter).




On Thu, May 16, 2013 at 12:46 PM, Michael Segel
<[hidden email]>wrote:

> Uhm... sort of...
>
> Niels is essentially correct and for the most of us, just starting an
> NNTPd on a server that sync's with a government clock and then your local
> servers sync to that... will be enough. However... in more detail...
>
> Time is relative. ;-)
>
> Ok... being a bit more serious...
>
> There are two things you have to consider... What is meant by 'cluster
> time'?  and What you want to achieve?
>
> Each machine in the cluster has its own clock. These will still have a
> certain amount of drift throughout the day.
>
> So you can set up your own NTP server. (You can either run NTPd and sync
> to a known government clock) or you can spend money and buy an atomic clock
> for your servers or machine room.
> (See http://www.atomic-clock.galleon.eu.com/ )
>
> Then periodically throughout the day, via cron, have the machines in your
> machine room sync to the local NTP server.
> This way all of your machines will have the same and correct time.
>
> So this will sync the clocks to a degree, but then drift sets in.
>
> Of course you also need to set up a machine to sync from... my vote would
> be the Name node. ;-)
>
> HTH
>
> -Mike
>
>
> On May 16, 2013, at 10:34 AM, Niels Basjes <[hidden email]> wrote:
>
> > If you make sure that everything uses NTP then this becomes an irrelevant
> > distinction.
> >
> >
> > On Thu, May 16, 2013 at 4:01 PM, Jane Wayne <[hidden email]
> >wrote:
> >
> >> yes, but that gets the current time on the server, not the hadoop
> cluster.
> >> i need to be able to probe the date/time of the hadoop cluster.
> >>
> >>
> >> On Tue, May 14, 2013 at 5:09 PM, Niels Basjes <[hidden email]> wrote:
> >>
> >>> I made a typo. I meant API (instead of SPI).
> >>>
> >>> Have a look at this for more information:
> >>>
> >>>
> >>
> http://stackoverflow.com/questions/833768/java-code-for-getting-current-time
> >>>
> >>>
> >>> If you have a client that is not under NTP then that should be the way
> to
> >>> fix your issue.
> >>> Once you  have that getting the current time is easy.
> >>>
> >>> Niels Basjes
> >>>
> >>>
> >>>
> >>> On Tue, May 14, 2013 at 5:46 PM, Jane Wayne <[hidden email]
> >>>> wrote:
> >>>
> >>>> niels,
> >>>>
> >>>> i'm not familiar with the native java spi. spi = service provider
> >>>> interface? could you let me know if this spi is part of the hadoop
> >>>> api? if so, which package/class?
> >>>>
> >>>> but yes, all nodes on the cluster are using NTP to synchronize time.
> >>>> however, the server (which is not a part of the hadoop cluster)
> >>>> accessing/interfacing with the hadoop cluster cannot be assumed to be
> >>>> using NTP. will this still make a difference? and actually, this is
> >>>> the primary reason why i need to get the date/time of the hadoop
> >>>> cluster (need to check if the date/time of the hadooop cluster is in
> >>>> sync with the server).
> >>>>
> >>>>
> >>>>
> >>>> On Tue, May 14, 2013 at 11:38 AM, Niels Basjes <[hidden email]>
> >> wrote:
> >>>>> If you have all nodes using NTP then you can simply use the native
> >> Java
> >>>> SPI
> >>>>> to get the current system time.
> >>>>>
> >>>>>
> >>>>> On Tue, May 14, 2013 at 4:41 PM, Jane Wayne <
> >> [hidden email]
> >>>>> wrote:
> >>>>>
> >>>>>> hi all,
> >>>>>>
> >>>>>> is there a way to get the current time of a hadoop cluster via the
> >>>>>> api? in particular, getting the time from the namenode or jobtracker
> >>>>>> would suffice.
> >>>>>>
> >>>>>> i looked at JobClient but didn't see anything helpful.
> >>>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Best regards / Met vriendelijke groeten,
> >>>>>
> >>>>> Niels Basjes
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> Best regards / Met vriendelijke groeten,
> >>>
> >>> Niels Basjes
> >>>
> >>
> >
> >
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
>
>
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Niels Basjes
Hi,

> i have another computer (which i have referred to as a server, since it is
> running tomcat), and this computer is NOT a part of the hadoop cluster (it
> doesn't run any of the hadoop daemons), but does submit jobs to the hadoop
> cluster via a JEE webapp interface. i need to check that the time on this
> computer is in sync with the time on the hadoop cluster. when i say "check
> that the time is in sync", there is a defined tolerance/threshold
> difference in date/time that i am willing to accept (e.g. the date/time
> should be the same down to the minute).

If you ensure (using NTP) that all your servers have the same time then you
can simply query your local server for the time and you have the correct
answer to your question.

You are searching for a solution in the Hadoop API (where this does not
exist) when the solution is present at a different level.

--
Best regards / Met vriendelijke groeten,

Niels Basjes
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
> You are searching for a solution in the Hadoop API (where this does not
exist)

thanks, that's all i needed to know.

cheers.



On Fri, May 17, 2013 at 9:17 AM, Niels Basjes <[hidden email]> wrote:

> Hi,
>
> > i have another computer (which i have referred to as a server, since it
> is
> > running tomcat), and this computer is NOT a part of the hadoop cluster
> (it
> > doesn't run any of the hadoop daemons), but does submit jobs to the
> hadoop
> > cluster via a JEE webapp interface. i need to check that the time on this
> > computer is in sync with the time on the hadoop cluster. when i say
> "check
> > that the time is in sync", there is a defined tolerance/threshold
> > difference in date/time that i am willing to accept (e.g. the date/time
> > should be the same down to the minute).
>
> If you ensure (using NTP) that all your servers have the same time then you
> can simply query your local server for the time and you have the correct
> answer to your question.
>
> You are searching for a solution in the Hadoop API (where this does not
> exist) when the solution is present at a different level.
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Bertrand Dechoux
In reply to this post by Niels Basjes
For hadoop, 'cluster time' is the local OS time. You might want to get the
time of the namenode machine but indeed if NTP is correctly used, the local
OS time from your server machine will be the best estimation. If you
request the time from the namenode machine, you will be penalized by the
delay of your request.

Regards

Bertrand


On Fri, May 17, 2013 at 3:17 PM, Niels Basjes <[hidden email]> wrote:

> Hi,
>
> > i have another computer (which i have referred to as a server, since it
> is
> > running tomcat), and this computer is NOT a part of the hadoop cluster
> (it
> > doesn't run any of the hadoop daemons), but does submit jobs to the
> hadoop
> > cluster via a JEE webapp interface. i need to check that the time on this
> > computer is in sync with the time on the hadoop cluster. when i say
> "check
> > that the time is in sync", there is a defined tolerance/threshold
> > difference in date/time that i am willing to accept (e.g. the date/time
> > should be the same down to the minute).
>
> If you ensure (using NTP) that all your servers have the same time then you
> can simply query your local server for the time and you have the correct
> answer to your question.
>
> You are searching for a solution in the Hadoop API (where this does not
> exist) when the solution is present at a different level.
>
> --
> Best regards / Met vriendelijke groeten,
>
> Niels Basjes
>
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
"if NTP is correclty used"

that's the key statement. in several of our clusters, NTP setup is kludgy.
note that the professionals administering the cluster are different from
"us" the engineers. so, there's a lot of red tape to go through to get
something trivial or not fixed. we have noticed that NTP is not setup
correctly (using default GMT timezone, for example). without explaining all
the tedious details, this mismatch of date/time (of the hadoop cluster to
the server machine) is causing some pains.

i'm not sure i agree with "the local OS time from your server machine will
be the best estimation." that doesn't make sense.

but what i want to achieve is very simple. as stated before, i just want to
ask the namenode or jobtracker, "hey, what date/time do you have?"
unfortunately for me, as niels pointed out, this query is not possible via
the hadoop api.

thanks for helping, though.

:)


On Fri, May 17, 2013 at 10:02 AM, Bertrand Dechoux <[hidden email]>wrote:

> For hadoop, 'cluster time' is the local OS time. You might want to get the
> time of the namenode machine but indeed if NTP is correctly used, the local
> OS time from your server machine will be the best estimation. If you
> request the time from the namenode machine, you will be penalized by the
> delay of your request.
>
> Regards
>
> Bertrand
>
>
> On Fri, May 17, 2013 at 3:17 PM, Niels Basjes <[hidden email]> wrote:
>
> > Hi,
> >
> > > i have another computer (which i have referred to as a server, since it
> > is
> > > running tomcat), and this computer is NOT a part of the hadoop cluster
> > (it
> > > doesn't run any of the hadoop daemons), but does submit jobs to the
> > hadoop
> > > cluster via a JEE webapp interface. i need to check that the time on
> this
> > > computer is in sync with the time on the hadoop cluster. when i say
> > "check
> > > that the time is in sync", there is a defined tolerance/threshold
> > > difference in date/time that i am willing to accept (e.g. the date/time
> > > should be the same down to the minute).
> >
> > If you ensure (using NTP) that all your servers have the same time then
> you
> > can simply query your local server for the time and you have the correct
> > answer to your question.
> >
> > You are searching for a solution in the Hadoop API (where this does not
> > exist) when the solution is present at a different level.
> >
> > --
> > Best regards / Met vriendelijke groeten,
> >
> > Niels Basjes
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Jane Wayne
and please remember, i stated that although the hadoop cluster uses NTP,
the "server" (the machine that is not a part of the hadoop cluster) cannot
assume to be using NTP (and in fact, doesn't).


On Fri, May 17, 2013 at 10:10 AM, Jane Wayne <[hidden email]>wrote:

> "if NTP is correclty used"
>
> that's the key statement. in several of our clusters, NTP setup is kludgy.
> note that the professionals administering the cluster are different from
> "us" the engineers. so, there's a lot of red tape to go through to get
> something trivial or not fixed. we have noticed that NTP is not setup
> correctly (using default GMT timezone, for example). without explaining all
> the tedious details, this mismatch of date/time (of the hadoop cluster to
> the server machine) is causing some pains.
>
> i'm not sure i agree with "the local OS time from your server machine will
> be the best estimation." that doesn't make sense.
>
> but what i want to achieve is very simple. as stated before, i just want
> to ask the namenode or jobtracker, "hey, what date/time do you have?"
> unfortunately for me, as niels pointed out, this query is not possible via
> the hadoop api.
>
> thanks for helping, though.
>
> :)
>
>
> On Fri, May 17, 2013 at 10:02 AM, Bertrand Dechoux <[hidden email]>wrote:
>
>> For hadoop, 'cluster time' is the local OS time. You might want to get the
>> time of the namenode machine but indeed if NTP is correctly used, the
>> local
>> OS time from your server machine will be the best estimation. If you
>> request the time from the namenode machine, you will be penalized by the
>> delay of your request.
>>
>> Regards
>>
>> Bertrand
>>
>>
>> On Fri, May 17, 2013 at 3:17 PM, Niels Basjes <[hidden email]> wrote:
>>
>> > Hi,
>> >
>> > > i have another computer (which i have referred to as a server, since
>> it
>> > is
>> > > running tomcat), and this computer is NOT a part of the hadoop cluster
>> > (it
>> > > doesn't run any of the hadoop daemons), but does submit jobs to the
>> > hadoop
>> > > cluster via a JEE webapp interface. i need to check that the time on
>> this
>> > > computer is in sync with the time on the hadoop cluster. when i say
>> > "check
>> > > that the time is in sync", there is a defined tolerance/threshold
>> > > difference in date/time that i am willing to accept (e.g. the
>> date/time
>> > > should be the same down to the minute).
>> >
>> > If you ensure (using NTP) that all your servers have the same time then
>> you
>> > can simply query your local server for the time and you have the correct
>> > answer to your question.
>> >
>> > You are searching for a solution in the Hadoop API (where this does not
>> > exist) when the solution is present at a different level.
>> >
>> > --
>> > Best regards / Met vriendelijke groeten,
>> >
>> > Niels Basjes
>> >
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: how to get the time of a hadoop cluster, v0.20.2

Michael Segel
Then you have a problem where the solution is more of people management and not technical.

All of your servers should be using NTP.  At a minimum, you have one server that gets the time from a national (government) time server, and then have all of the machines in that Data Center use that machine as its NTP server, or you can have all machines by default use the government server for NTP.

You can also buy your own clock server that syncs to either GPS or national time servers via a radio signal.

But you have a problem of staff that is either unwilling or unable to do their job.

You can either take a carrot or a stick approach.
I suggest that maybe bribing them with a bottle of scotch. (That seems to be the current liquid lubricator that works universally these days, unless of course they don't drink...)

HTH

-Mike

On May 17, 2013, at 9:13 AM, Jane Wayne <[hidden email]> wrote:

> and please remember, i stated that although the hadoop cluster uses NTP,
> the "server" (the machine that is not a part of the hadoop cluster) cannot
> assume to be using NTP (and in fact, doesn't).
>
>
> On Fri, May 17, 2013 at 10:10 AM, Jane Wayne <[hidden email]>wrote:
>
>> "if NTP is correclty used"
>>
>> that's the key statement. in several of our clusters, NTP setup is kludgy.
>> note that the professionals administering the cluster are different from
>> "us" the engineers. so, there's a lot of red tape to go through to get
>> something trivial or not fixed. we have noticed that NTP is not setup
>> correctly (using default GMT timezone, for example). without explaining all
>> the tedious details, this mismatch of date/time (of the hadoop cluster to
>> the server machine) is causing some pains.
>>
>> i'm not sure i agree with "the local OS time from your server machine will
>> be the best estimation." that doesn't make sense.
>>
>> but what i want to achieve is very simple. as stated before, i just want
>> to ask the namenode or jobtracker, "hey, what date/time do you have?"
>> unfortunately for me, as niels pointed out, this query is not possible via
>> the hadoop api.
>>
>> thanks for helping, though.
>>
>> :)
>>
>>
>> On Fri, May 17, 2013 at 10:02 AM, Bertrand Dechoux <[hidden email]>wrote:
>>
>>> For hadoop, 'cluster time' is the local OS time. You might want to get the
>>> time of the namenode machine but indeed if NTP is correctly used, the
>>> local
>>> OS time from your server machine will be the best estimation. If you
>>> request the time from the namenode machine, you will be penalized by the
>>> delay of your request.
>>>
>>> Regards
>>>
>>> Bertrand
>>>
>>>
>>> On Fri, May 17, 2013 at 3:17 PM, Niels Basjes <[hidden email]> wrote:
>>>
>>>> Hi,
>>>>
>>>>> i have another computer (which i have referred to as a server, since
>>> it
>>>> is
>>>>> running tomcat), and this computer is NOT a part of the hadoop cluster
>>>> (it
>>>>> doesn't run any of the hadoop daemons), but does submit jobs to the
>>>> hadoop
>>>>> cluster via a JEE webapp interface. i need to check that the time on
>>> this
>>>>> computer is in sync with the time on the hadoop cluster. when i say
>>>> "check
>>>>> that the time is in sync", there is a defined tolerance/threshold
>>>>> difference in date/time that i am willing to accept (e.g. the
>>> date/time
>>>>> should be the same down to the minute).
>>>>
>>>> If you ensure (using NTP) that all your servers have the same time then
>>> you
>>>> can simply query your local server for the time and you have the correct
>>>> answer to your question.
>>>>
>>>> You are searching for a solution in the Hadoop API (where this does not
>>>> exist) when the solution is present at a different level.
>>>>
>>>> --
>>>> Best regards / Met vriendelijke groeten,
>>>>
>>>> Niels Basjes
>>>>
>>>
>>
>>