Quantcast

calling C programs from Hadoop

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

calling C programs from Hadoop

Michael Robinson
I am new to Hadoop. I have successfully run java programs from Hadoop and I would like to call C programs from Hadoop.

Thank you for your help

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Asif Jan
Look at Hadoop streaming, may be it is helpful to you.

asif
On May 29, 2010, at 8:31 PM, Michael Robinson wrote:

>
> I am new to Hadoop. I have successfully run java programs from  
> Hadoop and I
> would like to call C programs from Hadoop.
>
> Thank you for your help
>
> Michael
> --
> View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p854833.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.

****************************************************************************************
Asif Jan
Gaia Project
SixSq Sarl / ISDC Astrophysics Data Centre & Geneva Observatory
Chemin  des Ecogia 16
CH-1290 Versoix
Switzerland
       
E-mail : [hidden email]
Tel. : +41 22 37 92198
Fax : +41 22 37 92133
****************************************************************************************




Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Owen O'Malley-2
On Sat, May 29, 2010 at 12:52 PM, Asif Jan <[hidden email]> wrote:
> Look at Hadoop streaming, may be it is helpful to you.

There is also Pipes, which is the C++ interface to MapReduce.

-- Owen
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
Owen,

Where do I find information about PIPES

Thanks much

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
In reply to this post by Asif Jan
Asif,

Thanks very much for your help, I found and downloaded hadoop streaming

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
In reply to this post by Asif Jan
Thanks for your answers.

I have read "hadoop streaming" and I think it is great, however what I am trying to do is to run a C program that I have with its own data, and have hadoop do the scheduling and make it run in multiple nodes as a distributed system.

The process I need to do does NOT do map and reduce type of process, so what I was thinking was either feed the C program to Hadoop or write a java program that would call the C program and have Hadoop do its magic.

Thanks

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
In reply to this post by Asif Jan
Thanks for your answers.

I have read "hadoop streaming" and I think it is great, however what I am trying to do is to run a C program that I have with its own data, and have hadoop do the scheduling and make it run in multiple nodes as a distributed system.

The process I need to do does NOT do map and reduce type of process, so what I was thinking was either feed the C program to Hadoop or write a java program that would call the C program and have Hadoop do its magic.

Thanks

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Jeff Bean
In reply to this post by Michael Robinson
Hi Michael,

How come you can't specify the C program as the mapper in streaming and just
have no reducers?

Jeff

On Sat, May 29, 2010 at 6:14 PM, Michael Robinson
<[hidden email]>wrote:

>
> Thanks for your answers.
>
> I have read "hadoop streaming" and I think it is great, however what I am
> trying to do is to run a C program that I have with its own data, and have
> hadoop do the scheduling and make it run in multiple nodes as a distributed
> system.
>
> The process I need to do does NOT do map and reduce type of process, so
> what
> I was thinking was either feed the C program to Hadoop or write a java
> program that would call the C program and have Hadoop do its magic.
>
> Thanks
>
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855338.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Brian Bockelman
In reply to this post by Michael Robinson
Uh...

So you want a batch system?  Look up PBS (Torque/Maui), SGE, or Condor.

Brian

On May 29, 2010, at 8:17 PM, Michael Robinson wrote:

>
> Thanks for your answers.
>
> I have read "hadoop streaming" and I think it is great, however what I am
> trying to do is to run a C program that I have with its own data, and have
> hadoop do the scheduling and make it run in multiple nodes as a distributed
> system.
>
> The process I need to do does NOT do map and reduce type of process, so what
> I was thinking was either feed the C program to Hadoop or write a java
> program that would call the C program and have Hadoop do its magic.
>
> Thanks
>
> Michael
> --
> View this message in context: http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p855341.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.


smime.p7s (3K) Download Attachment
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
Hi Brian,

Yes, it is a batch process.

I am using Ubuntu Linux, can you tell me how to open the p7s file you send me?

I googled for p7s viewer and it seems they work on windows and mac only.

Thanks

Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
In reply to this post by Jeff Bean
Hi Jef,

I have a C program that processes very large data files which are compressed, so this program has to have full control of the process. However the input data can be broken down into chunks, and a separate (distributed) process for each chunk can be run, which what I am doing now, but I am doing this manually at this time.

I am looking to use a distributed system like Hadoop to do this so that i controls the scheduling and all those great things I have read about Hadoop.

I was wondering if I can have Hadoop run a batch file (.bat in windows or .sh in linux), also I would like to run this in Virtual Machines.

Thanks


Michael
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Jeff Bean
Hi Michael,

Why did you determine that Hadoop streaming was insufficient for you?

Jeff

On Mon, May 31, 2010 at 9:17 AM, Michael Robinson
<[hidden email]>wrote:

>
> Hi Jef,
>
> I have a C program that processes very large data files which are
> compressed, so this program has to have full control of the process.
> However
> the input data can be broken down into chunks, and a separate (distributed)
> process for each chunk can be run, which what I am doing now, but I am
> doing
> this manually at this time.
>
> I am looking to use a distributed system like Hadoop to do this so that i
> controls the scheduling and all those great things I have read about
> Hadoop.
>
> I was wondering if I can have Hadoop run a batch file (.bat in windows or
> .sh in linux), also I would like to run this in Virtual Machines.
>
> Thanks
>
>
> Michael
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/calling-C-programs-from-Hadoop-tp854833p858959.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: calling C programs from Hadoop

Michael Robinson
Jeff,

Reading "Hadoof Streaming" I found the following:

"How Does Streaming Work
In the above example, both the mapper and the reducer are executables that read the input from stdin (line by line) and emit the output to stdout. The utility will create a Map/Reduce job, submit the job to an appropriate cluster, and monitor the progress of the job until it completes.
"

I am beginning to think that my understanding of map/reduce is faulty. At this time I understand that the mapper takes in data and splits it into chunks creating lists of  (<key>, <values>), then it combines this output and sends the result to the reducer.

The C program I have reads each line in the input file and searches a master file looking for exact and similar matches then it does computations bases on how similar the results are, so there is no need for creating <key>, <values> lists.


Thanks very much

Michael
Loading...