Only crawling out from pages that meet a certain criteria

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Only crawling out from pages that meet a certain criteria

jthompson-2
Is there a way for me to prevent nutch from fetching outlinks from pages
that I decide to be irrelevant (where I make the decision that a page is
irrelevant during the parsing of that page with my parse filter)?  I realize
that I can stop nutch from indexing such pages, but I believe the index is
separate from the structure that determines what new pages should be
fetched.

Best,
John