Quantcast

Why hadoop is written in java?

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Why hadoop is written in java?

elton sky
I always have this question but couldn't find proper answer for this. For
system level applications, c/c++ is preferable. But why this one using java?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Dennis-2-3
It's easier to use java. Using c/c++, you going to need write 10 times code than java.I think.
Dennis

--- On Sun, 10/10/10, elton sky <[hidden email]> wrote:

From: elton sky <[hidden email]>
Subject: Why hadoop is written in java?
To: "common-user" <[hidden email]>
Date: Sunday, October 10, 2010, 12:40 PM

I always have this question but couldn't find proper answer for this. For
system level applications, c/c++ is preferable. But why this one using java?



     
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

maha-2
I totally agree with Dennis, besides, Java is more secure compared to C++ (eg. pointers operation with memory management).

    Maha


On Oct 9, 2010, at 9:43 PM, Dennis wrote:

> It's easier to use java. Using c/c++, you going to need write 10 times code than java.I think.
> Dennis
>
> --- On Sun, 10/10/10, elton sky <[hidden email]> wrote:
>
> From: elton sky <[hidden email]>
> Subject: Why hadoop is written in java?
> To: "common-user" <[hidden email]>
> Date: Sunday, October 10, 2010, 12:40 PM
>
> I always have this question but couldn't find proper answer for this. For
> system level applications, c/c++ is preferable. But why this one using java?
>
>
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Arvind Kalyan
In reply to this post by elton sky
On Sat, Oct 9, 2010 at 9:40 PM, elton sky <[hidden email]> wrote:

> I always have this question but couldn't find proper answer for this. For
> system level applications, c/c++ is preferable. But why this one using
> java?
>


Look at the system (software) requirements for running Hadoop:
http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs

Imagine how it would be, if it were to be written in C/C++.

While C/C++ might give you a performance improvement at run-time, it can be
a total nightmare to develop and maintain. Especially if the network gets to
be heterogeneous.



--
Arvind Kalyan
http://www.linkedin.com/in/base16
h: (408) 331-7921 m: (541) 971-9225
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Shi Yu
Wondering how Hadoop running with python and other languages. Java is
easy to develop, however, not very efficient to handle numerical
computation with objects like sparse matrices. Maybe hadoop will have
Matlab, R extensions as well? Hope to see it happens.



On 2010-10-10 1:07, Arvind Kalyan wrote:

> On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[hidden email]>  wrote:
>
>    
>> I always have this question but couldn't find proper answer for this. For
>> system level applications, c/c++ is preferable. But why this one using
>> java?
>>
>>      
>
> Look at the system (software) requirements for running Hadoop:
> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
>
> Imagine how it would be, if it were to be written in C/C++.
>
> While C/C++ might give you a performance improvement at run-time, it can be
> a total nightmare to develop and maintain. Especially if the network gets to
> be heterogeneous.
>
>
>
>    
BM
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

BM
In reply to this post by elton sky
On Sun, Oct 10, 2010 at 1:40 PM, elton sky <[hidden email]> wrote:
> I always have this question but couldn't find proper answer for this. For
> system level applications, c/c++ is preferable. But why this one using java?

Long story short: Because C/C++ sucks bit time at clustering and
development speed, especially when it comes to maintain heterogeneity
and security. At the same time, benefit is not very big (rather too
small to pay attention to it), since performance of it is still very
questionable. Now C++ is not that much faster these days from Java to
let someone sacrifice entire life, locking [him/her]self in a cell of
monastery for that whole Hadoop mission... :-)

--
Kind regards, BM

Things, that are stupid at the beginning, rarely ends up wisely.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Ken Goodhope
In reply to this post by Shi Yu
You might want to take a look at Dumbo for use in writing hadoop jobs
with python.

On Saturday, October 9, 2010, Shi Yu <[hidden email]> wrote:

> Wondering how Hadoop running with python and other languages. Java is easy to develop, however, not very efficient to handle numerical computation with objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as well? Hope to see it happens.
>
>
>
> On 2010-10-10 1:07, Arvind Kalyan wrote:
>
> On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[hidden email]>  wrote:
>
>
>
> I always have this question but couldn't find proper answer for this. For
> system level applications, c/c++ is preferable. But why this one using
> java?
>
>
>
>
> Look at the system (software) requirements for running Hadoop:
> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
>
> Imagine how it would be, if it were to be written in C/C++.
>
> While C/C++ might give you a performance improvement at run-time, it can be
> a total nightmare to develop and maintain. Especially if the network gets to
> be heterogeneous.
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Wildan Maulana
AFAIK, Google using C/C++ to build hadoop like  that power the google search
now ...

CMIIW

Regards,
Wildan
---
OpenThink Labs Indonesia | http://www.openthinklabs.com
Harmonizing IT, Business and Education

Negeri Pelangi | http://www.negeripelangi.com
a Pay it Forward Company

Wildan Maulana  Blog |
http://wildan.openthinklabs.com

Ecopreneur's Guide |
http://wildan.openthinklabs.com/ecopreneurs-guide-handbook/

>> +62-87884599249

Y! : hawking_123
Linkedln : http://www.linkedin.com/in/wildanmaulana
Twitter : http://twitter.com/wildanmaulana



On Mon, Oct 11, 2010 at 2:00 AM, Ken Goodhope <[hidden email]> wrote:

> You might want to take a look at Dumbo for use in writing hadoop jobs
> with python.
>
> On Saturday, October 9, 2010, Shi Yu <[hidden email]> wrote:
> > Wondering how Hadoop running with python and other languages. Java is
> easy to develop, however, not very efficient to handle numerical computation
> with objects like sparse matrices. Maybe hadoop will have Matlab, R
> extensions as well? Hope to see it happens.
> >
> >
> >
> > On 2010-10-10 1:07, Arvind Kalyan wrote:
> >
> > On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[hidden email]>
>  wrote:
> >
> >
> >
> > I always have this question but couldn't find proper answer for this. For
> > system level applications, c/c++ is preferable. But why this one using
> > java?
> >
> >
> >
> >
> > Look at the system (software) requirements for running Hadoop:
> >
> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
> >
> > Imagine how it would be, if it were to be written in C/C++.
> >
> > While C/C++ might give you a performance improvement at run-time, it can
> be
> > a total nightmare to develop and maintain. Especially if the network gets
> to
> > be heterogeneous.
> >
> >
> >
> >
> >
> >
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Owen O'Malley-3
In reply to this post by elton sky
The real answer is that Hadoop was written originally to support Nutch, which is in Java. Java has mostly served us well being reliable, extremely powerful libraries, and being far easier to debug than C++. There are issues of course... Java's interface to the OS is very weak, object memory overhead is high, and program startup is very slow.

-- Owen

On Oct 9, 2010, at 21:40, elton sky <[hidden email]> wrote:

> I always have this question but couldn't find proper answer for this. For
> system level applications, c/c++ is preferable. But why this one using java?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Matt Tanquary
In reply to this post by Shi Yu
Please check out Rhipe and Mahout projects, there are others as well,
but these are coming on strong and Hadoop has many avenues for
extension through things such as python or matlab that you can take
advantage of. The good thing is, if you have an algorithm or
computational challenge that hasn't been met, you can solve it and
share it with the rest of us.

On Sat, Oct 9, 2010 at 11:39 PM, Shi Yu <[hidden email]> wrote:

> Wondering how Hadoop running with python and other languages. Java is easy
> to develop, however, not very efficient to handle numerical computation with
> objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as
> well? Hope to see it happens.
>
>
>
> On 2010-10-10 1:07, Arvind Kalyan wrote:
>>
>> On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[hidden email]>  wrote:
>>
>>
>>>
>>> I always have this question but couldn't find proper answer for this. For
>>> system level applications, c/c++ is preferable. But why this one using
>>> java?
>>>
>>>
>>
>> Look at the system (software) requirements for running Hadoop:
>>
>> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
>>
>> Imagine how it would be, if it were to be written in C/C++.
>>
>> While C/C++ might give you a performance improvement at run-time, it can
>> be
>> a total nightmare to develop and maintain. Especially if the network gets
>> to
>> be heterogeneous.
>>
>>
>>
>>
>



--
Have you thanked a teacher today? ---> http://www.liftateacher.org
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Shi Yu
That sounds interesting. I am interested in the perspective of using
Hadoop to solve huge scale convex / noconvex optimization problems. Will
take a look at them. Thanks.

Shi

On 2010-10-10 16:12, Matt Tanquary wrote:

> Please check out Rhipe and Mahout projects, there are others as well,
> but these are coming on strong and Hadoop has many avenues for
> extension through things such as python or matlab that you can take
> advantage of. The good thing is, if you have an algorithm or
> computational challenge that hasn't been met, you can solve it and
> share it with the rest of us.
>
> On Sat, Oct 9, 2010 at 11:39 PM, Shi Yu<[hidden email]>  wrote:
>    
>> Wondering how Hadoop running with python and other languages. Java is easy
>> to develop, however, not very efficient to handle numerical computation with
>> objects like sparse matrices. Maybe hadoop will have Matlab, R extensions as
>> well? Hope to see it happens.
>>
>>
>>
>> On 2010-10-10 1:07, Arvind Kalyan wrote:
>>      
>>> On Sat, Oct 9, 2010 at 9:40 PM, elton sky<[hidden email]>    wrote:
>>>
>>>
>>>        
>>>> I always have this question but couldn't find proper answer for this. For
>>>> system level applications, c/c++ is preferable. But why this one using
>>>> java?
>>>>
>>>>
>>>>          
>>> Look at the system (software) requirements for running Hadoop:
>>>
>>> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
>>>
>>> Imagine how it would be, if it were to be written in C/C++.
>>>
>>> While C/C++ might give you a performance improvement at run-time, it can
>>> be
>>> a total nightmare to develop and maintain. Especially if the network gets
>>> to
>>> be heterogeneous.
>>>
>>>
>>>
>>>
>>>        
>>      
>
>
>    


--
Postdoctoral Scholar
Institute for Genomics and Systems Biology
Department of Medicine, the University of Chicago
Knapp Center for Biomedical Discovery
900 E. 57th St. Room 10148
Chicago, IL 60637, US
Tel: 773-702-6799

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Konstantin Boudnik
In reply to this post by Arvind Kalyan
To second your point ;-) Reminds me of times when Sun Micro bought GridEngine
(C-app). Me and a couple other folks were developing Distributed Task execution
Framework (written in Java on top of JINI).

Every time new version of eh... Windows was coming around the corner Grid
people were screaming. Guess how easy it was for us ;)

Cos

On Sat, Oct 09, 2010 at 11:07PM, Arvind Kalyan wrote:

> On Sat, Oct 9, 2010 at 9:40 PM, elton sky <[hidden email]> wrote:
>
> > I always have this question but couldn't find proper answer for this. For
> > system level applications, c/c++ is preferable. But why this one using
> > java?
> >
>
>
> Look at the system (software) requirements for running Hadoop:
> http://hadoop.apache.org/common/docs/current/single_node_setup.html#PreReqs
>
> Imagine how it would be, if it were to be written in C/C++.
>
> While C/C++ might give you a performance improvement at run-time, it can be
> a total nightmare to develop and maintain. Especially if the network gets to
> be heterogeneous.
>
>
>
> --
> Arvind Kalyan
> http://www.linkedin.com/in/base16
> h: (408) 331-7921 m: (541) 971-9225
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Steve Loughran
On 11/10/10 16:56, Konstantin Boudnik wrote:
> To second your point ;-) Reminds me of times when Sun Micro bought GridEngine
> (C-app). Me and a couple other folks were developing Distributed Task execution
> Framework (written in Java on top of JINI).
>
> Every time new version of eh... Windows was coming around the corner Grid
> people were screaming. Guess how easy it was for us ;)
>

That said, the only large scale platform people are deploying Hadoop on
is Linux, because it's the only one that other people running Hadoop are
using. This leads to a bias in bug reports, optimisations and other
deployment support. Even though Hadoop does run on other unixes, Windows
and OS/X, whoever deploys it at scale gets to find the issues. And if
there is some problem where the fix helps you but hurts linux
installations, you aren't going to get your patch in. Same for non-Sun
JVMs, which is one reason why I stopped using JRockit -the other being
Oracle stopped giving the security patches away to developers who
weren't paying the fees.

Effectively Hadoop is a Linux only application, even there, being in
Java has some advantages
 -no need to recompile the non-native bits for different OS releases.
 -memory management makes it way, way easier to write applications that
don't leak memory.
 -good cross-platform build, testing and logging tools make it much
easier for open source developers to play with.
 -because you can run test builds on windows, developers whose desktops
are Windows can still code and debug locally. This makes it easier to
play with hadoop.

A C/C++ app would have to commit to an OS -inevitably, Linux- and use
their build/test processes. You'd get good OS integration, at the cost
of having to do more OS integration testing, and scare off code
contributions from anyone who wasn't a C/C++ on Linux developer. And
you've have to pick a Linux distribution to work "in".

Incidentally, Cos, I hear that Dan Templeton and Tom White were demoing
Hadoop on Grid Engine last month. Not seen the slides though.

-Steve


Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

helwr
In reply to this post by elton sky
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Dhruba Borthakur
I agree with others in this list that Java provides faster software
development, the IO cost in Java is practically the same as in C/C++, etc.
In short, most pieces of distributed software can be written in Java without
any performance hiccups, as long as it is only system metadata that is
handled by Java.

One problem is when data-flow has to occur in Java. Each record that is read
from the storage has to be de-serialized, uncompressed and then processed.
This processing could be very slow in Java compared to when written in other
languages, especially because of the creation/destruction of too many
objects.  It would have been nice if the map/reduce task could have been
written in C/C++, or better still, if the sorting inside the MR framework
could occur in C/C++.

thanks,
dhruba

On Mon, Oct 11, 2010 at 4:50 PM, helwr <[hidden email]> wrote:

>
> Check out this thread:
> https://www.quora.com/Why-was-Hadoop-written-in-Java
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>



--
Connect to me at http://www.facebook.com/dhruba
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Chris Dyer-2
The Java memory overhead is a quite serious problem, and a legitimate
and serious criticism of Hadoop. For MapReduce applications, it is
often (although not always) possible to improve performance by doing
more work in memory (e.g., using combiners and the like) before
emitting data. Thus, the more memory available to your application,
the more efficient it runs. Therefore, if you have a framework that
locks up 500mb rather than 50mb, you systematically get less
performance out of your cluster.

The second issue is that C/C++ bindings are common and widely used
from many languages, but it is not generally possible to interface
directly with Java (or Java libraries) from another language, unless
that language is also built on top of the JVM. This is a very
unfortunate because many problems that would be quite naturally
expressed in MapReduce are better solved in non-JVM languages.

But, Java is what we have, and it works well enough for many things.

On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[hidden email]> wrote:

> I agree with others in this list that Java provides faster software
> development, the IO cost in Java is practically the same as in C/C++, etc.
> In short, most pieces of distributed software can be written in Java without
> any performance hiccups, as long as it is only system metadata that is
> handled by Java.
>
> One problem is when data-flow has to occur in Java. Each record that is read
> from the storage has to be de-serialized, uncompressed and then processed.
> This processing could be very slow in Java compared to when written in other
> languages, especially because of the creation/destruction of too many
> objects.  It would have been nice if the map/reduce task could have been
> written in C/C++, or better still, if the sorting inside the MR framework
> could occur in C/C++.
>
> thanks,
> dhruba
>
> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[hidden email]> wrote:
>
>>
>> Check out this thread:
>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Steve Loughran
On 12/10/10 05:20, Chris Dyer wrote:

> The Java memory overhead is a quite serious problem, and a legitimate
> and serious criticism of Hadoop. For MapReduce applications, it is
> often (although not always) possible to improve performance by doing
> more work in memory (e.g., using combiners and the like) before
> emitting data. Thus, the more memory available to your application,
> the more efficient it runs. Therefore, if you have a framework that
> locks up 500mb rather than 50mb, you systematically get less
> performance out of your cluster.
>
> The second issue is that C/C++ bindings are common and widely used
> from many languages, but it is not generally possible to interface
> directly with Java (or Java libraries) from another language, unless
> that language is also built on top of the JVM. This is a very
> unfortunate because many problems that would be quite naturally
> expressed in MapReduce are better solved in non-JVM languages.

A few years back I went from a java project to 6 months doing something
in C/C++.

First it was like rediscovering stuff: mixins! ability to overwrite
operators! STL!

Then you start looking at the build and test process, and think "this
hasn't moved on for a while", then struggling with CppUnit to do
test-first development of COM service, setting up Cruise Control to run
a build.xml that just <execs> visual studio's build to build your app,
then you run the tests. Eventually, the tests worked.

But then there was the memory leaks, the reference counter problems, the
threading and race conditions issues, the inconsistency between windows
and linux. And the string types. Oh, so many string types. char*,
TCHAR*, LPCSTR, BSTR, etc.

In Java, you have to go out of your way for a memory leak, so if your
tests work, your code is functional and good to ship. But in C/C++, the
engineering to go from code that passes its functional tests and code
that doesn't leak memory, is thread safe and secure is way harder.

Try representing a large graph in C++ that is shared across threads and
not have memory problems to see what I mean.

I agree, some Java independence would be nice, but I'd go higher,
towards more graph and list centric languages, not closer to the metal.

Scala support, anyone?
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Why hadoop is written in java?

Ricky Ho
In reply to this post by elton sky
Is it easier if we change the question to : "Why does Java people create Hadoop
before C++ people ?"

I agree that for framework like Hadoop, execution efficiency is at a higher
priority than developer productivity.  And if the user can use any language to
write map and reduce function (like Hadoop streaming), then we should use the
most efficient language to write the core framework.

But again, don't forget the dynamics.  It is not about which language is the
most efficient.  It is about within the group of parallel computing experts who
is willing to spend time in Open source, what language are they more familiar
with (or passionate about).

Rgds,
Ricky


-----Original Message-----
From: Chris Dyer [mailto:[hidden email]]
Sent: Monday, October 11, 2010 9:20 PM
To: [hidden email]
Cc: [hidden email]
Subject: Re: Why hadoop is written in java?
 
The Java memory overhead is a quite serious problem, and a legitimate
and serious criticism of Hadoop. For MapReduce applications, it is
often (although not always) possible to improve performance by doing
more work in memory (e.g., using combiners and the like) before
emitting data. Thus, the more memory available to your application,
the more efficient it runs. Therefore, if you have a framework that
locks up 500mb rather than 50mb, you systematically get less
performance out of your cluster.
 
The second issue is that C/C++ bindings are common and widely used
from many languages, but it is not generally possible to interface
directly with Java (or Java libraries) from another language, unless
that language is also built on top of the JVM. This is a very
unfortunate because many problems that would be quite naturally
expressed in MapReduce are better solved in non-JVM languages.
 
But, Java is what we have, and it works well enough for many things.
 
On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[hidden email]> wrote:

> I agree with others in this list that Java provides faster software
> development, the IO cost in Java is practically the same as in C/C++, etc.
> In short, most pieces of distributed software can be written in Java without
> any performance hiccups, as long as it is only system metadata that is
> handled by Java.
>
> One problem is when data-flow has to occur in Java. Each record that is read
> from the storage has to be de-serialized, uncompressed and then processed.
> This processing could be very slow in Java compared to when written in other
> languages, especially because of the creation/destruction of too many
> objects.  It would have been nice if the map/reduce task could have been
> written in C/C++, or better still, if the sorting inside the MR framework
> could occur in C/C++.
>
> thanks,
> dhruba
>
> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[hidden email]> wrote:
>
>>
>> Check out this thread:
>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>> --
>> View this message in context:
>>http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>>l
>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>
>
>
>
> --
> Connect to me at http://www.facebook.com/dhruba


     
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

Edward Capriolo
In reply to this post by Chris Dyer-2
On Tue, Oct 12, 2010 at 12:20 AM, Chris Dyer <[hidden email]> wrote:

> The Java memory overhead is a quite serious problem, and a legitimate
> and serious criticism of Hadoop. For MapReduce applications, it is
> often (although not always) possible to improve performance by doing
> more work in memory (e.g., using combiners and the like) before
> emitting data. Thus, the more memory available to your application,
> the more efficient it runs. Therefore, if you have a framework that
> locks up 500mb rather than 50mb, you systematically get less
> performance out of your cluster.
>
> The second issue is that C/C++ bindings are common and widely used
> from many languages, but it is not generally possible to interface
> directly with Java (or Java libraries) from another language, unless
> that language is also built on top of the JVM. This is a very
> unfortunate because many problems that would be quite naturally
> expressed in MapReduce are better solved in non-JVM languages.
>
> But, Java is what we have, and it works well enough for many things.
>
> On Mon, Oct 11, 2010 at 11:18 PM, Dhruba Borthakur <[hidden email]> wrote:
>> I agree with others in this list that Java provides faster software
>> development, the IO cost in Java is practically the same as in C/C++, etc.
>> In short, most pieces of distributed software can be written in Java without
>> any performance hiccups, as long as it is only system metadata that is
>> handled by Java.
>>
>> One problem is when data-flow has to occur in Java. Each record that is read
>> from the storage has to be de-serialized, uncompressed and then processed.
>> This processing could be very slow in Java compared to when written in other
>> languages, especially because of the creation/destruction of too many
>> objects.  It would have been nice if the map/reduce task could have been
>> written in C/C++, or better still, if the sorting inside the MR framework
>> could occur in C/C++.
>>
>> thanks,
>> dhruba
>>
>> On Mon, Oct 11, 2010 at 4:50 PM, helwr <[hidden email]> wrote:
>>
>>>
>>> Check out this thread:
>>> https://www.quora.com/Why-was-Hadoop-written-in-Java
>>> --
>>> View this message in context:
>>> http://lucene.472066.n3.nabble.com/Why-hadoop-is-written-in-java-tp1673148p1684291.html
>>> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>>>
>>
>>
>>
>> --
>> Connect to me at http://www.facebook.com/dhruba
>>
>

Hate to say it this way... but yet another "java is slow compared to
the equivalent non existent c/c++ alternative"
Until http://code.google.com/p/qizmt/ wins the TeraSort benchmark or
when Google open sources Google MapReduce, I am sure if someone coded
hadoop in assembler it would trump the theoretical hadoop written in c
as well.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Why hadoop is written in java?

michael j pan-2
In reply to this post by Ricky Ho
It would be good to recognize that Hadoop is a Java implementation of
a MapReduce framework.  There are other MapReduce framework
implementations out there, written in other languages
- for C/C++, Sector/Sphere   http://sector.sourceforge.net/
- for Python/Erlang, Disco   http://discoproject.org/
I'm sure there are others.

To respond to Ricky (below), I doubt that Google (who wrote the
MapReduce paper), implemented their MapReduce in Java.  So the
question may be, why is Hadoop (which implements MapReduce as
described in that paper) the most popular MapReduce framework in the
wild, even though it was not the first, nor the most efficient?

Cheers
Mike


On Wed, Oct 13, 2010 at 00:57, Ricky Ho <[hidden email]> wrote:
> Is it easier if we change the question to : "Why does Java people create Hadoop
> before C++ people ?"

> I agree that for framework like Hadoop, execution efficiency is at a higher
> priority than developer productivity.  And if the user can use any language to
> write map and reduce function (like Hadoop streaming), then we should use the
> most efficient language to write the core framework.

> But again, don't forget the dynamics.  It is not about which language is the
> most efficient.  It is about within the group of parallel computing experts who
> is willing to spend time in Open source, what language are they more familiar
> with (or passionate about).
12
Loading...