problem, Limiting dynamic pages with static URLs

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

problem, Limiting dynamic pages with static URLs

Jon Shoberg
Some sites use relative links and the fetcher is getting confused.  See
the example below:

http://www.domain.xyz/index.php/research/academics/research/libraries/

The content returned simply keeps following the few relative links and
the URI keeps building.  It basically the same problem as sessionIDs but
not something to clealy regex out.

Anyone see this before? Thoughts?

-j
Reply | Threaded
Open this post in threaded view
|

Re: problem, Limiting dynamic pages with static URLs

Doug Cutting-2
Please see:

http://www.mail-archive.com/nutch-dev@.../msg00634.html

Doug

Jon Shoberg wrote:

> Some sites use relative links and the fetcher is getting confused.  See
> the example below:
>
> http://www.domain.xyz/index.php/research/academics/research/libraries/
>
> The content returned simply keeps following the few relative links and
> the URI keeps building.  It basically the same problem as sessionIDs but
> not something to clealy regex out.
>
> Anyone see this before? Thoughts?
>
> -j