Nutch Crawl

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch Crawl

Mahajan, Vineet

Hi:

I want to call Nutch CrawlTool from my Java Class. There will be
multiple instances of my java class.
Each instance will be crawling only one url at a time discretely.
So how should I make this call and how at the run-time each instance can
use its own regex-urlfilter.xml
for filter.



Thanks:
Vineet


****DISCLAIMER
The information contained in this e-mail and attachments, if any, is confidential and may be subject to legal privilege.  If you are not the intended recipient, you must not use, copy, distribute or disclose the e-mail and its attachment, or any part of its content or take any action in reliance of it.  If you have received this e-mail in error, please e-mail the message back to the sender by replying and then deleting it.  We cannot accept responsibility for loss or damage arising from the use of this e-mail or attachments, and recommend that you subject these to your virus checking procedures prior to use
Reply | Threaded
Open this post in threaded view
|

Re: Nutch Crawl

Lukáš Vlček
Hi,

I am not sure what is your intent. Calling CrawlTool from java should
be easy, just call main(Args[]) method of its main class and pass in
your arguments.

As for different set up for different Nutch instances I think you
could have multiple installations on your server where each instance
would have its own conf directory (with specific config files) and
source code can be shared via symbolic link.

Regards,
Lukas

On 6/9/06, Mahajan, Vineet <[hidden email]> wrote:

>
> Hi:
>
> I want to call Nutch CrawlTool from my Java Class. There will be
> multiple instances of my java class.
> Each instance will be crawling only one url at a time discretely.
> So how should I make this call and how at the run-time each instance can
> use its own regex-urlfilter.xml
> for filter.
>
>
>
> Thanks:
> Vineet
>
>
> ****DISCLAIMER
> The information contained in this e-mail and attachments, if any, is confidential and may be subject to legal privilege.  If you are not the intended recipient, you must not use, copy, distribute or disclose the e-mail and its attachment, or any part of its content or take any action in reliance of it.  If you have received this e-mail in error, please e-mail the message back to the sender by replying and then deleting it.  We cannot accept responsibility for loss or damage arising from the use of this e-mail or attachments, and recommend that you subject these to your virus checking procedures prior to use
>
>