We do this within the current framework by having a separate java
application that manages the instances of the crawlers, and only refers to
Nutch which sits in a separate folder. The structure is something like:
The Crawler App has commands to create a new crawler, to start or stop the
crawler etc. When creating a crawler, it copies the default conf settings
from the Nutch Package. Obviously, it has to have properties to define the
location of Nutch, Java etc.
This works pretty well as a skeletal starting point, but obviously for true
enterprise use, a front end administration layer needs to sit above.
Tel: +44 (20) 7257 6125
Mobile: +44 (7796) 932 362
From: Nathan Ter Bogt [mailto:[hidden email]]
Sent: 25 January 2007 01:03
To: [hidden email] Subject: Multiple collections
Has there been any thought given to the possibility of allowing users to
define multiple collections? perhaps something in the structure of
bin/nutch crawl mysite2?
I believe a lot of end users would find this extremely useful and it would
make nutch more suitable to becoming an enterprise search solution.