[jira] [Commented] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2419) Domain blacklist URL filter does not respect command-line override for file

Tim Allison (Jira)

    [ https://issues.apache.org/jira/browse/NUTCH-2419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105413#comment-17105413 ]

Sebastian Nagel commented on NUTCH-2419:
----------------------------------------

Working on a patch. Turned out that the situation is more confused: the configured rule file does not take precedence over the attribute file the filters "domain", "domainblacklist", "prefix", "suffix" (but not "regex" and "automaton"), for the URL normalizers "host", "slash" and "protocol" and for "parsefilter-regex".

> Domain blacklist URL filter does not respect command-line override for file
> ---------------------------------------------------------------------------
>
>                 Key: NUTCH-2419
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2419
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 1.13
>            Reporter: Markus Jelsma
>            Assignee: Markus Jelsma
>            Priority: Minor
>             Fix For: 1.17
>
>         Attachments: NUTCH-2419.patch
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)