Nutch as a large scale RSS aggregator?

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Nutch as a large scale RSS aggregator?

Nutch as a large scale RSS aggregator?

Email full of questions:

Do some of you use the Nutch distribution capabilities in order to aggregate periodically a very large number of RSS feeds ?

I am basically still wondering if I should to the "build from scratch" aggregator or use Nutch crawler/extracter on multiple nodes/machines.

If I go the Nutch route, I would prefer to do much of the calls to Nutch from the Java API than the command line, in order to schedule/control better the jobs. (some feeds have to be updated every 5 minutes, others every hour or once a day, and the "RSS" protocol also gives a date/time that should be respected)

What do you guys think? Is nutch the right tool for this, as I think it could?
(I haven't found any open source already done large scale RSS aggregators.)


smime.p7s (7K) Download Attachment