problems: crawling specific domain

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

problems: crawling specific domain

riyal

Hi,

How can i crawl specific domain only(like www.yellowpages.co.za)? What i have to change to work things correctly?I tried with the change in crawl-urlfilter.txt and nutch started crawling outside my domain after sometimes.

I am using nutch 0.9 in standalone mode(without hadoop).Can anyone gives me some idea how to merge indexes from different crawl to a single indexes?

Regards.
--mohammad monirul hoque


     
Reply | Threaded
Open this post in threaded view
|

Re: problems: crawling specific domain

David Jashi
Ever tried to use this one:
http://wiki.apache.org/nutch/Nutch_0%2e9_Crawl_Script_Tutorial ?
About single site crawl:
http://peterpuwang.googlepages.com/NutchGuideForDummies.htm , part 4.


On Wed, Sep 3, 2008 at 8:53 AM, Mohammad Monirul Hoque
<[hidden email]> wrote:

>
> Hi,
>
> How can i crawl specific domain only(like www.yellowpages.co.za)? What i have to change to work things correctly?I tried with the change in crawl-urlfilter.txt and nutch started crawling outside my domain after sometimes.
>
> I am using nutch 0.9 in standalone mode(without hadoop).Can anyone gives me some idea how to merge indexes from different crawl to a single indexes?
>
> Regards.
> --mohammad monirul hoque
>
>
>



--
with best regards,
David Jashi
Web development EO,
Caucasus Online
+995(32)970368
[hidden email]

პატივისცემით,
დავით ჯაში
ვებ–განვითარების დირექტორი
"კავკასუს ონლაინი"
+995(32)970368
[hidden email]