[jira] [Resolved] (NUTCH-1971) The crawldb.url.filters property is not present in any configuration file

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (NUTCH-1971) The crawldb.url.filters property is not present in any configuration file

Tim Allison (Jira)

     [ https://issues.apache.org/jira/browse/NUTCH-1971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sebastian Nagel resolved NUTCH-1971.
------------------------------------
    Fix Version/s:     (was: 1.17)
       Resolution: Duplicate

This was done in NUTCH-2539 for Nutch 1.15, resolving this issue. Thanks, [~betolink]!

> The crawldb.url.filters property is not present in any configuration file
> -------------------------------------------------------------------------
>
>                 Key: NUTCH-1971
>                 URL: https://issues.apache.org/jira/browse/NUTCH-1971
>             Project: Nutch
>          Issue Type: Improvement
>          Components: crawldb
>    Affects Versions: 1.9
>            Reporter: Luis Lopez
>            Priority: Major
>              Labels: configuration, crawldb, nutch-default.xml
>
> In CrawlDbFilter.java there is a line for getting a boolean that sets if the filters are going to be applied or not:
>   public static final String URL_FILTERING = "crawldb.url.filters";
> However in nutch-default.xml that property is not present. Currently the only way to set this value is using the -filter parameter from the command line.
> The same applies to:  
> public static final String URL_NORMALIZING = "crawldb.url.normalizers";
> public static final String URL_NORMALIZING_SCOPE = "crawldb.url.normalizers.scope";



--
This message was sent by Atlassian Jira
(v8.3.4#803005)