new configuration proposal in nutch-site.xml (maximum url length)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

new configuration proposal in nutch-site.xml (maximum url length)

Murat Ali Bayir
Hi everybody, Is there any configuration that restrict length of urls, I
have faced with urls having length of 21-27KB characters.  this urls
lead to tasks to be killed. Is anybody think about it in new nutch version?
Reply | Threaded
Open this post in threaded view
|

Re: new configuration proposal in nutch-site.xml (maximum url length)

Lourival Júnior
Try this one:

<property>
  <name>db.max.anchor.length</name>
  <value>800</value>
  <description>The maximum number of characters permitted in an anchor.
  </description>
</property>

You can find all properties in nutch-default.xml.

Regards

Lourival Junior

On 8/25/06, Murat Ali Bayir <[hidden email]> wrote:
>
> Hi everybody, Is there any configuration that restrict length of urls, I
> have faced with urls having length of 21-27KB characters.  this urls
> lead to tasks to be killed. Is anybody think about it in new nutch
> version?
>



--
Lourival Junior
Universidade Federal do Pará
Curso de Bacharelado em Sistemas de Informação
http://www.ufpa.br/cbsi
Msn: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: new configuration proposal in nutch-site.xml (maximum url length)

Stefan Groschupf
In reply to this post by Murat Ali Bayir
Hi,
is that may be related to:
http://issues.apache.org/jira/browse/NUTCH-233
??

Do you may be can send me some of this urls? It would be good to have  
a test case for such situations.
Thanks.

Stefan

Am 25.08.2006 um 04:01 schrieb Murat Ali Bayir:

> Hi everybody, Is there any configuration that restrict length of  
> urls, I have faced with urls having length of 21-27KB characters.  
> this urls lead to tasks to be killed. Is anybody think about it in  
> new nutch version?
>

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California
http://www.101tec.com



Reply | Threaded
Open this post in threaded view
|

Re: new configuration proposal in nutch-site.xml (maximum url length)

Stefan Groschupf
In reply to this post by Lourival Júnior
I think that is the property for the anchor text length but not the  
length of a url.

Am 25.08.2006 um 04:28 schrieb Lourival Júnior:

> Try this one:
>
> <property>
>  <name>db.max.anchor.length</name>
>  <value>800</value>
>  <description>The maximum number of characters permitted in an anchor.
>  </description>
> </property>
>
> You can find all properties in nutch-default.xml.
>
> Regards
>
> Lourival Junior
>
> On 8/25/06, Murat Ali Bayir <[hidden email]> wrote:
>>
>> Hi everybody, Is there any configuration that restrict length of  
>> urls, I
>> have faced with urls having length of 21-27KB characters.  this urls
>> lead to tasks to be killed. Is anybody think about it in new nutch
>> version?
>>
>
>
>
> --
> Lourival Junior
> Universidade Federal do Pará
> Curso de Bacharelado em Sistemas de Informação
> http://www.ufpa.br/cbsi
> Msn: [hidden email]

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
101tec Inc.
Menlo Park, California
http://www.101tec.com