Follow urls with GET/Query String?

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Follow urls with GET/Query String?

Chris Stephens-3
How do I get Nutch to follow URLs that contain a query string such as
?blah=something at the end of the url?  Nutch seems to ignore these and
I didn't find any configuration option to enable this.  Does a plugin or
some such exist to facilitate following these types of links?



Reply | Threaded
Open this post in threaded view
|

Re: Follow urls with GET/Query String?

Chris Stephens-3
To answer my own question, I now realize there is an entry in
crawl-urlfilter.txt to ignore query strings by default.  I commented
that out and it works now.

Chris Stephens wrote:
> How do I get Nutch to follow URLs that contain a query string such as
> ?blah=something at the end of the url?  Nutch seems to ignore these
> and I didn't find any configuration option to enable this.  Does a
> plugin or some such exist to facilitate following these types of links?
>
>
>
>