Do some of you use the Nutch distribution capabilities in order to aggregate periodically a very large number of RSS feeds ?
I am basically still wondering if I should to the "build from scratch" aggregator or use Nutch crawler/extracter on multiple nodes/machines.
If I go the Nutch route, I would prefer to do much of the calls to Nutch from the Java API than the command line, in order to schedule/control better the jobs. (some feeds have to be updated every 5 minutes, others every hour or once a day, and the "RSS" protocol also gives a date/time that should be respected)
What do you guys think? Is nutch the right tool for this, as I think it could?
(I haven't found any open source already done large scale RSS aggregators.)