Nutch - Dev

This forum is an archive for the mailing list dev@nutch.apache.org (more options) Messages posted here will be sent to this mailing list.
If you'd like to contribute to Nutch, please subscribe to the Nutch developer mailing list.
1 ... 522523524525526527528 ... 612
Topics (21405)
Replies Last Post Views
[jira] Created: (NUTCH-750) HtmlParser plugin - page title extraction by Parth (Jira)
3
by Parth (Jira)
[jira] Created: (NUTCH-664) Possibility to update already stored documents. by Parth (Jira)
7
by Parth (Jira)
[jira] Updated: (NUTCH-310) Review Log Levels by Parth (Jira)
0
by Parth (Jira)
[jira] Created: (NUTCH-763) Separate configuration files from resources to be included in the job file by Parth (Jira)
1
by Parth (Jira)
[jira] Created: (NUTCH-577) Use explicit tika-config.xml file to enable mime magic detection to be turned on and off by Parth (Jira)
2
by Parth (Jira)
[jira] Updated: (NUTCH-309) Uses commons logging Code Guards by Parth (Jira)
0
by Parth (Jira)
[jira] Updated: (NUTCH-249) black- white list url filtering by Parth (Jira)
0
by Parth (Jira)
1.1 release? by Mattmann, Chris A (3...
3
by Mattmann, Chris A (3...
[jira] Created: (NUTCH-706) Url regex normalizer by Parth (Jira)
4
by Parth (Jira)
[jira] Created: (NUTCH-779) Mechanism for passing metadata from parse to crawldb by Parth (Jira)
11
by Parth (Jira)
[jira] Created: (NUTCH-714) Need a SFTP and SCP Protocol Handler by Parth (Jira)
5
by Parth (Jira)
[jira] Created: (NUTCH-785) Fetcher : copy metadata from origin URL when redirecting + call scfilters.initialScore on newly created URL by Parth (Jira)
4
by Parth (Jira)
[jira] Created: (NUTCH-784) CrawlDBScanner by Parth (Jira)
5
by Parth (Jira)
[jira] Created: (NUTCH-800) Generator builds a URL list that is not encoded by Parth (Jira)
2
by Parth (Jira)
[jira] Created: (NUTCH-783) IndexerChecker Utilty by Parth (Jira)
3
by Parth (Jira)
[jira] Created: (NUTCH-806) Merge CrawlDBScanner with CrawlDBReader by Parth (Jira)
0
by Parth (Jira)
[jira] Issue Comment Edited: (NUTCH-224) Nutch doesn't handle Korean text at all by Parth (Jira)
0
by Parth (Jira)
[jira] Commented: (NUTCH-224) Nutch doesn't handle Korean text at all by Parth (Jira)
0
by Parth (Jira)
[jira] Created: (NUTCH-805) Unable to resolve the url-blah-blah, skipping by Parth (Jira)
0
by Parth (Jira)
Will Nutch move to HBase 0.20 by work only
1
by Julien Nioche-4
[jira] Created: (NUTCH-804) CrawlDatum.statNames can be modified by Parth (Jira)
0
by Parth (Jira)
[jira] Created: (NUTCH-776) Configurable queue depth by Parth (Jira)
3
by Parth (Jira)
[DISCUSS] Nutch as a top level project (TLP)? by Andrzej BiaƂecki-2
4
by Sami Siren-3
[jira] Created: (NUTCH-740) Configuration option to override default language for fetched pages. by Parth (Jira)
9
by Parth (Jira)
[jira] Created: (NUTCH-762) Alternative Generator which can generate several segments in one parse of the crawlDB by Parth (Jira)
21
by Parth (Jira)
[Nutch Wiki] Update of "Support" by Christopher Bader by Apache Wiki
0
by Apache Wiki
[Nutch Wiki] Update of "FAQ" by Ankit Dangi by Apache Wiki
0
by Apache Wiki
[jira] Created: (NUTCH-803) Upgrade Hadoop to 0.20.2 by Parth (Jira)
2
by Parth (Jira)
[jira] Created: (NUTCH-787) Upgrade Lucene to 3.0.0. by Parth (Jira)
15
by Parth (Jira)
[jira] Commented: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a "?" by Parth (Jira)
0
by Parth (Jira)
[jira] Assigned: (NUTCH-797) parse-tika is not properly constructing URLs when the target begins with a "?" by Parth (Jira)
0
by Parth (Jira)
[jira] Created: (NUTCH-693) Add configurable option for treating nofollow behaviour. by Parth (Jira)
7
by Parth (Jira)
[jira] Created: (NUTCH-796) Zero results problems difficult to troubleshoot due to lack of logging by Parth (Jira)
3
by Parth (Jira)
[jira] Created: (NUTCH-780) Nutch crawler did not read configuration files by Parth (Jira)
11
by Parth (Jira)
[jira] Created: (NUTCH-795) Add ability to maintain nofollow attribute in linkdb by Parth (Jira)
2
by Parth (Jira)
1 ... 522523524525526527528 ... 612