Serious OOM while using PhantomJS on Nutch 1.13

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Serious OOM while using PhantomJS on Nutch 1.13

Zoltán Zvara
Dear Community,

We are experiencing troubling PhantomJS 1.9.8 memory leaks, in which neighbor services, for example a DataNode is not able to execute even a "df" command due to OOM errors on the node. Each node has 128 GB of total memory, and a PhantomJS process easily eats 80GB until it is shut down by the kernel. The problem is that the crawl job is co-located with other services running on YARN. These services throw OOM as well, resulting in cluster-wide failures.

We tried to set up the FF driver with Selenium 2.48.X, which is the current Selenium embedded in Nutch 1.13. The latest FF seems not to be compatible with Selenium 2.48.

1. Would Selenium embedded within Nutch 1.13 work with PhantomJS 2.1.X?
2. What FF driver version would work with embedded Selenium? And how to get it? :-)
3. Have anyone tried "chrome" driver? Any tutorials on how to set it up?