Links limit per page?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Links limit per page?

Aled Jones
Hi

Does nutch have a limit on the number of links it will fetch per page?

I have a directory-like structure to my web pages, with each subfolder
having it's own index page.  Some index pages have a lot of links, up to
about 200 in some cases.

There are some index pages where it isn't fetching all the links at the
bottom.  It will do the first 100 or so, but there's no sign of the
rest.

Any ideas?

Thanks
Aled




###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/
************************************************************************
This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored.

Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free.
 

Reply | Threaded
Open this post in threaded view
|

Re: Links limit per page?

Thomas Delnoij-3
There is a db.max.outlinks.per.page setting in nutch-default.xml. You should
increase the value of this setting in nutch-site.xml if you want Nutch to
fetch more outlinks per page than the default (which is 100 if I remember
correctly).

Rgrds, Thomas





On 3/15/06, Aled Jones <[hidden email]> wrote:

>
> Hi
>
> Does nutch have a limit on the number of links it will fetch per page?
>
> I have a directory-like structure to my web pages, with each subfolder
> having it's own index page.  Some index pages have a lot of links, up to
> about 200 in some cases.
>
> There are some index pages where it isn't fetching all the links at the
> bottom.  It will do the first 100 or so, but there's no sign of the
> rest.
>
> Any ideas?
>
> Thanks
> Aled
>
>
>
>
> ###########################################
>
> This message has been scanned by F-Secure Anti-Virus for Microsoft
> Exchange.
> For more information, connect to http://www.f-secure.com/
> ************************************************************************
> This e-mail and any attachments are strictly confidential and intended
> solely for the addressee. They may contain information which is covered by
> legal, professional or other privilege. If you are not the intended
> addressee, you must not copy the e-mail or the attachments, or use them for
> any purpose or disclose their contents to any other person. To do so may be
> unlawful. If you have received this transmission in error, please notify us
> as soon as possible and delete the message and attachments from all places
> in your computer where they are stored.
>
> Although we have scanned this e-mail and any attachments for viruses, it
> is your responsibility to ensure that they are actually virus free.
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

ATB: Links limit per page?

Aled Jones
In reply to this post by Aled Jones
Nice one thanks.

> -----Neges Wreiddiol-----/-----Original Message-----
> Oddi wrth/From: TDLN [mailto:[hidden email]]
> Anfonwyd/Sent: 15 March 2006 09:21
> At/To: [hidden email]
> Pwnc/Subject: Re: Links limit per page?
>
> There is a db.max.outlinks.per.page setting in
> nutch-default.xml. You should increase the value of this
> setting in nutch-site.xml if you want Nutch to fetch more
> outlinks per page than the default (which is 100 if I
> remember correctly).
>
> Rgrds, Thomas
>
>
>
>
>
> On 3/15/06, Aled Jones <[hidden email]> wrote:
> >
> > Hi
> >
> > Does nutch have a limit on the number of links it will
> fetch per page?
> >
> > I have a directory-like structure to my web pages, with
> each subfolder
> > having it's own index page.  Some index pages have a lot of
> links, up
> > to about 200 in some cases.
> >
> > There are some index pages where it isn't fetching all the links at
> > the bottom.  It will do the first 100 or so, but there's no sign of
> > the rest.
> >
> > Any ideas?
> >
> > Thanks
> > Aled
> >
> >
> >
> >
> > ###########################################
> >
> > This message has been scanned by F-Secure Anti-Virus for Microsoft
> > Exchange.
> > For more information, connect to http://www.f-secure.com/
> >
> **********************************************************************
> > ** This e-mail and any attachments are strictly confidential and
> > intended solely for the addressee. They may contain
> information which
> > is covered by legal, professional or other privilege. If
> you are not
> > the intended addressee, you must not copy the e-mail or the
> > attachments, or use them for any purpose or disclose their
> contents to
> > any other person. To do so may be unlawful. If you have
> received this
> > transmission in error, please notify us as soon as possible
> and delete
> > the message and attachments from all places in your computer where
> > they are stored.
> >
> > Although we have scanned this e-mail and any attachments
> for viruses,
> > it is your responsibility to ensure that they are actually
> virus free.
> >
> >
> >
> >
>
###########################################

This message has been scanned by F-Secure Anti-Virus for Microsoft Exchange.
For more information, connect to http://www.f-secure.com/

************************************************************************
This e-mail and any attachments are strictly confidential and intended solely for the addressee. They may contain information which is covered by legal, professional or other privilege. If you are not the intended addressee, you must not copy the e-mail or the attachments, or use them for any purpose or disclose their contents to any other person. To do so may be unlawful. If you have received this transmission in error, please notify us as soon as possible and delete the message and attachments from all places in your computer where they are stored.

Although we have scanned this e-mail and any attachments for viruses, it is your responsibility to ensure that they are actually virus free.