[jira] Created: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite

Nick Burch (Jira)
Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite
--------------------------------------------------------------------------------------------------------

         Key: NUTCH-244
         URL: http://issues.apache.org/jira/browse/NUTCH-244
     Project: Nutch
        Type: Bug

    Versions: 0.8-dev    
    Reporter: AJ Banck


Some properties like file.content.limit support using negative numbers (-1) to 'disable' a limitation.
Other properties do not support this.
I tried disabling the limit set by db.max.outlinks.per.page, but this isn't possible.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite

Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-244?page=comments#action_12373393 ]

Jerome Charron commented on NUTCH-244:
--------------------------------------

While taking a quick look at this, something astonished me in the code.
The db.max.outlinks.per.page property is exclusively used in ParseData.
In the ParseData, the number of outlinks used is filtered in the readFields method ...
Shouldn't it be directly filtered in the ParseData constructor ?

> Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite
> --------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-244
>          URL: http://issues.apache.org/jira/browse/NUTCH-244
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>     Reporter: AJ Banck

>
> Some properties like file.content.limit support using negative numbers (-1) to 'disable' a limitation.
> Other properties do not support this.
> I tried disabling the limit set by db.max.outlinks.per.page, but this isn't possible.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-244?page=comments#action_12373396 ]

Andrzej Bialecki  commented on NUTCH-244:
-----------------------------------------

We don't pass the Configuration object to the constructor, so we have no way to read the value of this. Configuration is set later, using setConf().

Also, ParseData needs to correctly read serialized instances, which were created with possibly different values of this parameter, so this piece of code has to be there anyway.

Also, note that we always write out all outlinks. This is to ensure that if you e.g. increase the parameter value in the future you can still recover as much data as possible from older segments.

> Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite
> --------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-244
>          URL: http://issues.apache.org/jira/browse/NUTCH-244
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>     Reporter: AJ Banck

>
> Some properties like file.content.limit support using negative numbers (-1) to 'disable' a limitation.
> Other properties do not support this.
> I tried disabling the limit set by db.max.outlinks.per.page, but this isn't possible.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-244?page=comments#action_12373398 ]

Jerome Charron commented on NUTCH-244:
--------------------------------------

That perfectly makes sense!
Thanks Andrzej.

> Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite
> --------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-244
>          URL: http://issues.apache.org/jira/browse/NUTCH-244
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>     Reporter: AJ Banck

>
> Some properties like file.content.limit support using negative numbers (-1) to 'disable' a limitation.
> Other properties do not support this.
> I tried disabling the limit set by db.max.outlinks.per.page, but this isn't possible.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (NUTCH-244) Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-244?page=all ]
     
Jerome Charron closed NUTCH-244:
--------------------------------

    Fix Version: 0.8-dev
     Resolution: Fixed
      Assign To: Jerome Charron

Fixed : http://svn.apache.org/viewcvs.cgi?rev=391958&view=rev

> Inconsistent handling of property values boundaries / unable to set db.max.outlinks.per.page to infinite
> --------------------------------------------------------------------------------------------------------
>
>          Key: NUTCH-244
>          URL: http://issues.apache.org/jira/browse/NUTCH-244
>      Project: Nutch
>         Type: Bug

>     Versions: 0.8-dev
>     Reporter: AJ Banck
>     Assignee: Jerome Charron
>      Fix For: 0.8-dev

>
> Some properties like file.content.limit support using negative numbers (-1) to 'disable' a limitation.
> Other properties do not support this.
> I tried disabling the limit set by db.max.outlinks.per.page, but this isn't possible.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira