problems: crawling specific domain

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

problems: crawling specific domain

riyal

Hi,

How can i crawl specific domain only(like www.yellowpages.co.za)? What i have to change to work things correctly?I tried with the change in crawl-urlfilter.txt and nutch started crawling outside my domain after sometimes.

I am using nutch 0.9 in standalone mode(without hadoop).Can anyone gives me some idea how to merge indexes from different crawl to a single indexes?

Regards.
--mohammad monirul hoque