fetcher error

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

fetcher error

Kashif Khadim
Hi,

I am doing intranet crawl but keep getting this error
and after few of same errors my fetcher dies and fetch
no more

Error is :


org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)
050619 144703 fetch of http://espn.go.com failed with:
java.lang.Exception:
org.apache.nutch.protocol.RetryLater: Exceeded
http.max.delays: retry later.


The main issue i think is "Exceeded http.max.delays:
retry later"

Thanks

Kashif.







               
____________________________________________________
Yahoo! Sports
Rekindle the Rivalries. Sign up for Fantasy Football
http://football.fantasysports.yahoo.com
Reply | Threaded
Open this post in threaded view
|

RE: fetcher error

Howie Wang
That just means the site is not responding. You can try to
give it more time by setting http.timeout to something
larger in your nutch-default.xml.  You can also try
increasing the number of retries in the same file.

>I am doing intranet crawl but keep getting this error
>and after few of same errors my fetcher dies and fetch
>no more
>
>Error is :
>
>
>org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)
>050619 144703 fetch of http://espn.go.com failed with:
>java.lang.Exception:
>org.apache.nutch.protocol.RetryLater: Exceeded
>http.max.delays: retry later.
>
>
>The main issue i think is "Exceeded http.max.delays:
>retry later"
>
>Thanks
>
>Kashif.
>
>
>
>
>
>
>
>
>____________________________________________________
>Yahoo! Sports
>Rekindle the Rivalries. Sign up for Fantasy Football
>http://football.fantasysports.yahoo.com


Reply | Threaded
Open this post in threaded view
|

RE: fetcher error

Kashif Khadim
Thanks for help. Fetcher get stuck on some pages when
i am doing intranet crawl and i tested on many
websites.

I tried the setting you suggested before but most of
the time fetchers dies and i am unable to fetch
websites for my intranet crawl.It fetches few pages
from website then throw the error.It is not only
problem with one website but it is happening for many
sites i tested.

Thanks.

Kashif

--- Howie Wang <[hidden email]> wrote:

> That just means the site is not responding. You can
> try to
> give it more time by setting http.timeout to
> something
> larger in your nutch-default.xml.  You can also try
> increasing the number of retries in the same file.
>
> >I am doing intranet crawl but keep getting this
> error
> >and after few of same errors my fetcher dies and
> fetch
> >no more
> >
> >Error is :
> >
> >
>
>org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)
> >050619 144703 fetch of http://espn.go.com failed
> with:
> >java.lang.Exception:
> >org.apache.nutch.protocol.RetryLater: Exceeded
> >http.max.delays: retry later.
> >
> >
> >The main issue i think is "Exceeded
> http.max.delays:
> >retry later"
> >
> >Thanks
> >
> >Kashif.
> >
> >
> >
> >
> >
> >
> >
> >
>
>____________________________________________________
> >Yahoo! Sports
> >Rekindle the Rivalries. Sign up for Fantasy
> Football
> >http://football.fantasysports.yahoo.com
>
>
>


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around
http://mail.yahoo.com 
Reply | Threaded
Open this post in threaded view
|

RE: fetcher error

Howie Wang
If you're doing only a single site or a few sites at a time,
the multiple fetcher threads might be blocking each other
out. By default, it will send out 10 threads, and if they all
hit the same server, 1 thread will fetch and the other 9
will wait up to a certain timeout for that first threa to finish.

To see if this is your problem, try turning the number of
threads to 1 and see if it helps.

>Thanks for help. Fetcher get stuck on some pages when
>i am doing intranet crawl and i tested on many
>websites.
>
>I tried the setting you suggested before but most of
>the time fetchers dies and i am unable to fetch
>websites for my intranet crawl.It fetches few pages
>from website then throw the error.It is not only
>problem with one website but it is happening for many
>sites i tested.
>
>Thanks.
>
>Kashif
>
>--- Howie Wang <[hidden email]> wrote:
>
> > That just means the site is not responding. You can
> > try to
> > give it more time by setting http.timeout to
> > something
> > larger in your nutch-default.xml.  You can also try
> > increasing the number of retries in the same file.
> >
> > >I am doing intranet crawl but keep getting this
> > error
> > >and after few of same errors my fetcher dies and
> > fetch
> > >no more
> > >
> > >Error is :
> > >
> > >
> >
> >org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135)
> > >050619 144703 fetch of http://espn.go.com failed
> > with:
> > >java.lang.Exception:
> > >org.apache.nutch.protocol.RetryLater: Exceeded
> > >http.max.delays: retry later.
> > >
> > >
> > >The main issue i think is "Exceeded
> > http.max.delays:
> > >retry later"
> > >
> > >Thanks
> > >
> > >Kashif.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
> >____________________________________________________
> > >Yahoo! Sports
> > >Rekindle the Rivalries. Sign up for Fantasy
> > Football
> > >http://football.fantasysports.yahoo.com
> >
> >
> >
>
>
>__________________________________________________
>Do You Yahoo!?
>Tired of spam?  Yahoo! Mail has the best spam protection around
>http://mail.yahoo.com