Map Job of Nutch creates huge amount of logs ( Nutch 2.3.1 + Hadoop 2.7.1 + Yarn)

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Map Job of Nutch creates huge amount of logs ( Nutch 2.3.1 + Hadoop 2.7.1 + Yarn)

shubham.gupta
Hey

I am running Nutch processes on Hadoop. The fetcher.parse property is
set TRUE. While the job is running Map spills are created in the
directory : /home/hadoop/nodelogs/usercache/root/appcache.


The spills are created during the Map JOB of fetch phase. The file size
created amounts upto 17 gigs of data and occupies over 90% of datanode
disk space. The state of the datanode changes to UNHEALTHY after this.
Therefore, I need to delete the logs created periodically so as the
process keeps running smoothly but sometimes it hinders with the process
and tends to increase the job completion time.
I have set logging of only ERROR messages or above in mapred-site.xml. I
have changed the mapred.userlog.limit.kb to 10240.

Please provide your suggestions such that this can be avoided and lead
to the proper functioning of NUTCH.

--

Shubham Gupta


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]