Multiple collections

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Multiple collections

Nathan Ter Bogt-2
Has there been any thought given to the possibility of allowing users to define multiple collections? perhaps something in the structure of

/conf/mysite1/*.xml
/conf/mysite2/*.xml

bin/nutch crawl mysite2?

I believe a lot of end users would find this extremely useful and it would make nutch more suitable to becoming an enterprise search solution.

Thanks,
--
Nathan ter Bogt | Software engineer

Agileware Pty. Ltd.
Reply | Threaded
Open this post in threaded view
|

RE: Multiple collections

Alan Tanaman
Nathan,

We do this within the current framework by having a separate java
application that manages the instances of the crawlers, and only refers to
Nutch which sits in a separate folder.  The structure is something like:

+-Nutch Package (as is)
+-Crawler App
--+--mysite1
-----+--conf
-----+--...nutch generated folders...
-----+--urls
--+--mysite2
-----+--conf
-----+--...nutch generated folders...
-----+--urls

The Crawler App has commands to create a new crawler, to start or stop the
crawler etc.  When creating a crawler, it copies the default conf settings
from the Nutch Package.  Obviously, it has to have properties to define the
location of Nutch, Java etc.

This works pretty well as a skeletal starting point, but obviously for true
enterprise use, a front end administration layer needs to sit above.

Best regards,
Alan
_________________________
Alan Tanaman
iDNA Solutions
Tel: +44 (20) 7257 6125
Mobile: +44 (7796) 932 362
http://blog.idna-solutions.com

-----Original Message-----
From: Nathan Ter Bogt [mailto:[hidden email]]
Sent: 25 January 2007 01:03
To: [hidden email]
Subject: Multiple collections

Has there been any thought given to the possibility of allowing users to
define multiple collections? perhaps something in the structure of

/conf/mysite1/*.xml
/conf/mysite2/*.xml

bin/nutch crawl mysite2?

I believe a lot of end users would find this extremely useful and it would
make nutch more suitable to becoming an enterprise search solution.

Thanks,
--
Nathan ter Bogt | Software engineer

Agileware Pty. Ltd.