sockettimeout exception

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

sockettimeout exception

Raghavendra Prabhu
Hi

I am running a crawl using protocol-httpclient

I get a
java.io.IOException: java.net.SocketTimeoutException: Read timed out

Can someone tell me the reason why i get the error

After that the crawl hangs and is simply in the same state

Rgds
Prabhu
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sockettimeout exception

Stefan Groschupf-2
Is the host in your web-browser available?
Does this host block your ip, since he understand nutch as a DOS attack?
Is you bandwidth limited?

Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:

> Hi
>
> I am running a crawl using protocol-httpclient
>
> I get a
> java.io.IOException: java.net.SocketTimeoutException: Read timed out
>
> Can someone tell me the reason why i get the error
>
> After that the crawl hangs and is simply in the same state
>
> Rgds
> Prabhu

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sockettimeout exception

Raghavendra Prabhu
Hi Stefan

My bandwidth is limited .

But i am able to crawl other links with the same host (so he is not denying
i guess)

Is it because of the protocol-httpclient(shud i use protocol-http)

Rgds
Prabhu


On 2/5/06, Stefan Groschupf <[hidden email]> wrote:

>
> Is the host in your web-browser available?
> Does this host block your ip, since he understand nutch as a DOS attack?
> Is you bandwidth limited?
>
> Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:
>
> > Hi
> >
> > I am running a crawl using protocol-httpclient
> >
> > I get a
> > java.io.IOException: java.net.SocketTimeoutException: Read timed out
> >
> > Can someone tell me the reason why i get the error
> >
> > After that the crawl hangs and is simply in the same state
> >
> > Rgds
> > Prabhu
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sockettimeout exception

Stefan Groschupf-2
I personal prefer protocol-http.

Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu:

> Hi Stefan
>
> My bandwidth is limited .
>
> But i am able to crawl other links with the same host (so he is not  
> denying
> i guess)
>
> Is it because of the protocol-httpclient(shud i use protocol-http)
>
> Rgds
> Prabhu
>
>
> On 2/5/06, Stefan Groschupf <[hidden email]> wrote:
>>
>> Is the host in your web-browser available?
>> Does this host block your ip, since he understand nutch as a DOS  
>> attack?
>> Is you bandwidth limited?
>>
>> Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:
>>
>>> Hi
>>>
>>> I am running a crawl using protocol-httpclient
>>>
>>> I get a
>>> java.io.IOException: java.net.SocketTimeoutException: Read timed out
>>>
>>> Can someone tell me the reason why i get the error
>>>
>>> After that the crawl hangs and is simply in the same state
>>>
>>> Rgds
>>> Prabhu
>>
>>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sockettimeout exception

Raghavendra Prabhu
Hi Stefan

One more thing which i am seeing is some outlinks are not parsed properly.

I tried using both the html parser (neko and tagsoup)

I know that this may not be due to protocol-http but is  there a chance that
this may be also due to same reason ?

Thanks for the answer .

Rgds
Prabhu


On 2/5/06, Stefan Groschupf <[hidden email]> wrote:

>
> I personal prefer protocol-http.
>
> Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu:
>
> > Hi Stefan
> >
> > My bandwidth is limited .
> >
> > But i am able to crawl other links with the same host (so he is not
> > denying
> > i guess)
> >
> > Is it because of the protocol-httpclient(shud i use protocol-http)
> >
> > Rgds
> > Prabhu
> >
> >
> > On 2/5/06, Stefan Groschupf <[hidden email]> wrote:
> >>
> >> Is the host in your web-browser available?
> >> Does this host block your ip, since he understand nutch as a DOS
> >> attack?
> >> Is you bandwidth limited?
> >>
> >> Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:
> >>
> >>> Hi
> >>>
> >>> I am running a crawl using protocol-httpclient
> >>>
> >>> I get a
> >>> java.io.IOException: java.net.SocketTimeoutException: Read timed out
> >>>
> >>> Can someone tell me the reason why i get the error
> >>>
> >>> After that the crawl hangs and is simply in the same state
> >>>
> >>> Rgds
> >>> Prabhu
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: sockettimeout exception

Stefan Groschupf-2
may your page size is bigger than the setuped limit. See conf/nutch-
*.xml

Am 05.02.2006 um 18:39 schrieb Raghavendra Prabhu:

> Hi Stefan
>
> One more thing which i am seeing is some outlinks are not parsed  
> properly.
>
> I tried using both the html parser (neko and tagsoup)
>
> I know that this may not be due to protocol-http but is  there a  
> chance that
> this may be also due to same reason ?
>
> Thanks for the answer .
>
> Rgds
> Prabhu
>
>
> On 2/5/06, Stefan Groschupf <[hidden email]> wrote:
>>
>> I personal prefer protocol-http.
>>
>> Am 05.02.2006 um 18:26 schrieb Raghavendra Prabhu:
>>
>>> Hi Stefan
>>>
>>> My bandwidth is limited .
>>>
>>> But i am able to crawl other links with the same host (so he is not
>>> denying
>>> i guess)
>>>
>>> Is it because of the protocol-httpclient(shud i use protocol-http)
>>>
>>> Rgds
>>> Prabhu
>>>
>>>
>>> On 2/5/06, Stefan Groschupf <[hidden email]> wrote:
>>>>
>>>> Is the host in your web-browser available?
>>>> Does this host block your ip, since he understand nutch as a DOS
>>>> attack?
>>>> Is you bandwidth limited?
>>>>
>>>> Am 05.02.2006 um 18:17 schrieb Raghavendra Prabhu:
>>>>
>>>>> Hi
>>>>>
>>>>> I am running a crawl using protocol-httpclient
>>>>>
>>>>> I get a
>>>>> java.io.IOException: java.net.SocketTimeoutException: Read  
>>>>> timed out
>>>>>
>>>>> Can someone tell me the reason why i get the error
>>>>>
>>>>> After that the crawl hangs and is simply in the same state
>>>>>
>>>>> Rgds
>>>>> Prabhu
>>>>
>>>>
>>
>>

Loading...