how is crawl-urlfilter.txt taken care of?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

how is crawl-urlfilter.txt taken care of?

Manoharam Reddy
I find four url-filters

automaton-urlfilter.txt
regex-urlfilter.txt
suffix-urlfilter.txt
crawl-urlfilter.txt

I can see plugins for the first 4 in nutch-site.xml file but not for
the 4th one. So, how is the crawl-urlfilter.txt considered by Nutch?
Reply | Threaded
Open this post in threaded view
|

Re: how is crawl-urlfilter.txt taken care of?

Sami Siren-2
Manoharam Reddy wrote:
> I find four url-filters
>
> automaton-urlfilter.txt
> regex-urlfilter.txt
> suffix-urlfilter.txt
> crawl-urlfilter.txt
>
> I can see plugins for the first 4 in nutch-site.xml file but not for
> the 4th one. So, how is the crawl-urlfilter.txt considered by Nutch?

This question is more suitable for the user list.

crawl-urlfilter is used by the crawl command by default (see crawl-tool.xml)

--
 Sami Siren