Quantcast

Importing log files from various machines

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Importing log files from various machines

Blargy
I am currently looking into importing all of our application log files (~100+ host machines) into HDFS. Can someone point me in the right direction or walk me through the process of how I can accomplish this? Any good reading material on this subject? Videos?

I hope I don't need to physically copy all of the log files to one target machine before importing.

Thanks
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Importing log files from various machines

S. Venkatesh
You could write a simple map-only job with each map pulling a bunch of
files from each of the servers. You could use a NLineInputFormat and
tweak N based on the # of maps, # of files, etc.

Venkatesh

On Tue, Jun 29, 2010 at 5:40 AM, Blargy <[hidden email]> wrote:

>
> I am currently looking into importing all of our application log files (~100+
> host machines) into HDFS. Can someone point me in the right direction or
> walk me through the process of how I can accomplish this? Any good reading
> material on this subject? Videos?
>
> I hope I don't need to physically copy all of the log files to one target
> machine before importing.
>
> Thanks
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Importing-log-files-from-various-machines-tp929423p929423.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>



--
Regards,
Venkatesh

“Perfection (in design) is achieved not when there is nothing more to
add, but rather when there is nothing more to take away.”
- Antoine de Saint-Exupéry
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Importing log files from various machines

Steve Loughran
S. Venkatesh wrote:
> You could write a simple map-only job with each map pulling a bunch of
> files from each of the servers. You could use a NLineInputFormat and
> tweak N based on the # of maps, # of files, etc.
>

Problem is you can't request which physical host work runs on, a problem
you hit when you look at other work scheduling issues

  * rebalancing data on HDDs on a single node
  * checksumming blocks (which is treated as a special case in the
datanodes, not as jobscheduler work
  * machine health checks

It would be nice for me to be able to  push out work to specific nodes,
more for management than MR work. I can do that, there are ways (cron is
always handy), but such work doesn't co-operate with the jobscheduler,
whereas I would like idle task trackers to be picking up the management
tasks for that node.

For now, I'd put the log upload in as cron jobs, run on the machines,
copy the data to dfs: filestore, then analyse with MR; it's good to
clean your log dirs up anyway to prevent outages.

-steve
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Importing log files from various machines

Marc Limotte-2
In reply to this post by Blargy
Just use the hadoop client tools.  That is the hadoop package and configure
it to point to your running cluster.  You don't need to start any hadoop
processes on the node with your logs.  Just use the command line (hadoop dfs
-put) or (hadoop distcp) to move the files from each application server
directly into your HDFS cluster.

Marc

On Mon, Jun 28, 2010 at 5:10 PM, Blargy <[hidden email]> wrote:

>
> I am currently looking into importing all of our application log files
> (~100+
> host machines) into HDFS. Can someone point me in the right direction or
> walk me through the process of how I can accomplish this? Any good reading
> material on this subject? Videos?
>
> I hope I don't need to physically copy all of the log files to one target
> machine before importing.
>
> Thanks
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Importing-log-files-from-various-machines-tp929423p929423.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
Loading...