Best and economical way of setting hadoop cluster for distributed crawling

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Best and economical way of setting hadoop cluster for distributed crawling

Sachin Mittal
Hi,
I have been running nutch in local mode and so far I am able to have a good
understanding on how it all works.

I wanted to start with distributed crawling using some public cloud
provider.

I just wanted to know if fellow users have any experience in setting up
nutch for distributed crawling.

From nutch wiki I have some idea on what hardware requirements should be.

I just wanted to know which of the public cloud providers (IaaS or PaaS)
are good to setup hadoop clusters on. Basically ones on which it is easy to
setup/manage the cluster and ones which are easy on budget.

Please let me know if you folks have any insights based on your experiences.

Thanks and Regards
Sachin