Nutch - Dev

This forum is an archive for the mailing list dev@nutch.apache.org (more options) Messages posted here will be sent to this mailing list.
If you'd like to contribute to Nutch, please subscribe to the Nutch developer mailing list.
1234 ... 514
Topics (17982)
Replies Last Post Views
[jira] [Commented] (NUTCH-2353) Create seed file with metadata using the REST API by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2483) Remove/replace indirect dependencies to org.json by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2392) Get same pages multiple times if URL contains relative path by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Reopened] (NUTCH-2392) Get same pages multiple times if URL contains relative path by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2212) Decrease memory consumption by tuning stack size by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2212) Decrease memory consumption by tuning stack size by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Reopened] (NUTCH-2212) Decrease memory consumption by tuning stack size by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Reopened] (NUTCH-2212) Decrease memory consumption by tuning stack size by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Updated] (NUTCH-2212) Decrease memory consumption by tuning stack size by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2392) Get same pages multiple times if URL contains relative path by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Reopened] (NUTCH-2392) Get same pages multiple times if URL contains relative path by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Updated] (NUTCH-2392) Get same pages multiple times if URL contains relative path by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Updated] (NUTCH-2185) protocol-soda-consumer plugin by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Updated] (NUTCH-2353) Create seed file with metadata using the REST API by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2353) Create seed file with metadata using the REST API by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2483) Remove/replace indirect dependencies to org.json by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2483) Remove/replace indirect dependencies to org.json by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2034) CrawlDB filtered documents counter. by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2354) Upgrade Hadoop dependencies to 2.7.4 by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2362) Upgrade MaxMind GeoIP version in index-geoip by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2380) indexer-elastic version upgrade to 5.3.0 by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2480) Upgrade crawler-commons dependency to 0.9 by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2439) Upgrade to Apache Tika 1.17 by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2478) // is not a valid base URL by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2216) db.ignore.*.links to optionally follow internal redirects by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2322) URL not available for Jexl operations by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2477) Refactor *Checker classes to use base class for common code by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2035) Regex filter using case sensitive rules. by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2474) CrawlDbReader -stats fails with ClassCastException by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2365) HTTP Redirects to SubDomains don't get crawled if db.ignore.external.links.mode == byDomain by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2295) Nutch master docker container broken by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2370) FileDumper: save JSON mapping file -> URL by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2295) Nutch master docker container broken by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Commented] (NUTCH-2295) Nutch master docker container broken by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] [Resolved] (NUTCH-2216) db.ignore.*.links to optionally follow internal redirects by JIRA jira@apache.org
0
by JIRA jira@apache.org
1234 ... 514