[jira] [Commented] (NUTCH-2409) Injector: complete command-line help and counters

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (NUTCH-2409) Injector: complete command-line help and counters

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/NUTCH-2409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16161058#comment-16161058 ]

ASF GitHub Bot commented on NUTCH-2409:
---------------------------------------

sebastian-nagel closed pull request #215: NUTCH-2409 Injector: complete command-line help and counters
URL: https://github.com/apache/nutch/pull/215
 
 
   
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[hidden email]


> Injector: complete command-line help and counters
> -------------------------------------------------
>
>                 Key: NUTCH-2409
>                 URL: https://issues.apache.org/jira/browse/NUTCH-2409
>             Project: Nutch
>          Issue Type: Improvement
>          Components: injector
>    Affects Versions: 1.13
>            Reporter: Sebastian Nagel
>            Priority: Trivial
>             Fix For: 1.14
>
>
> See discussion in [NUTCH-2335|https://issues.apache.org/jira/browse/NUTCH-2335?focusedCommentId=16130178&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16130178]:
> - add counters for removed items from CrawlDb:
> {noformat}
> Injector: Total urls removed from CrawlDb by filters: 2
> Injector: Total urls with status gone removed from CrawlDb (db.update.purge.404): 0
> {noformat}
> - add {{-Ddb.update.purge.404=true}} to command-line help:
> {noformat}
> Usage: Injector [-D...] <crawldb> <url_dir> [-overwrite|-update] [-noFilter] [-noNormalize] [-filterNormalizeAll]
> ...
>  -D...          set or overwrite configuration property (property=value)
>  -Ddb.update.purge.404=true
>                 remove URLs with status gone (404) from CrawlDb
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)