[jira] Created: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query

Michael Gibney (Jira)
CommonGrams loads analysis.common.terms.file for each query
-----------------------------------------------------------

         Key: NUTCH-301
         URL: http://issues.apache.org/jira/browse/NUTCH-301
     Project: Nutch
        Type: Improvement

  Components: searcher  
    Versions: 0.8-dev    
    Reporter: Chris Schneider


The move away from static objects toward instance variables has resulted in CommonGrams constructor parsing its analysis.common.terms.file for each query. I'm not certain how large a performance impact this really is, but it seems like something you'd want to avoid doing for each query. Perhaps the solution is to keep around an instance of the CommonGrams object itself?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query

Michael Gibney (Jira)
    [ http://issues.apache.org/jira/browse/NUTCH-301?page=comments#action_12415098 ]

Jerome Charron commented on NUTCH-301:
--------------------------------------

We can store the CommonGrams instance in the Configuration as it is already done in many places in Nutch code.

> CommonGrams loads analysis.common.terms.file for each query
> -----------------------------------------------------------
>
>          Key: NUTCH-301
>          URL: http://issues.apache.org/jira/browse/NUTCH-301
>      Project: Nutch
>         Type: Improvement

>   Components: searcher
>     Versions: 0.8-dev
>     Reporter: Chris Schneider

>
> The move away from static objects toward instance variables has resulted in CommonGrams constructor parsing its analysis.common.terms.file for each query. I'm not certain how large a performance impact this really is, but it seems like something you'd want to avoid doing for each query. Perhaps the solution is to keep around an instance of the CommonGrams object itself?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-301?page=all ]

Stefan Groschupf updated NUTCH-301:
-----------------------------------

    Attachment: CommonGramsCacheV1.patch

Cache HashMap COMMON_TERMS in configuration instance.

> CommonGrams loads analysis.common.terms.file for each query
> -----------------------------------------------------------
>
>          Key: NUTCH-301
>          URL: http://issues.apache.org/jira/browse/NUTCH-301
>      Project: Nutch
>         Type: Improvement

>   Components: searcher
>     Versions: 0.8-dev
>     Reporter: Chris Schneider
>  Attachments: CommonGramsCacheV1.patch
>
> The move away from static objects toward instance variables has resulted in CommonGrams constructor parsing its analysis.common.terms.file for each query. I'm not certain how large a performance impact this really is, but it seems like something you'd want to avoid doing for each query. Perhaps the solution is to keep around an instance of the CommonGrams object itself?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query

Michael Gibney (Jira)
In reply to this post by Michael Gibney (Jira)
     [ http://issues.apache.org/jira/browse/NUTCH-301?page=all ]
     
Jerome Charron resolved NUTCH-301:
----------------------------------

    Fix Version: 0.8-dev
     Resolution: Fixed

Patch applied with some minor modifications.
Thanks Stefan.

> CommonGrams loads analysis.common.terms.file for each query
> -----------------------------------------------------------
>
>          Key: NUTCH-301
>          URL: http://issues.apache.org/jira/browse/NUTCH-301
>      Project: Nutch
>         Type: Improvement

>   Components: searcher
>     Versions: 0.8-dev
>     Reporter: Chris Schneider
>      Fix For: 0.8-dev
>  Attachments: CommonGramsCacheV1.patch
>
> The move away from static objects toward instance variables has resulted in CommonGrams constructor parsing its analysis.common.terms.file for each query. I'm not certain how large a performance impact this really is, but it seems like something you'd want to avoid doing for each query. Perhaps the solution is to keep around an instance of the CommonGrams object itself?

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira