Nutch - Dev

This forum is an archive for the mailing list dev@nutch.apache.org (more options) Messages posted here will be sent to this mailing list.
If you'd like to contribute to Nutch, please subscribe to the Nutch developer mailing list.
1 ... 521522523524525526527 ... 584
Topics (20421)
Replies Last Post Views
[jira] Closed: (NUTCH-159) Specify temp/working directory for crawl by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Commented: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Resolved: (NUTCH-152) TaskRunner io pipes are not setDaemon(true), cleanup and exception errors are incomplete, max heap too small by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Resolved: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Commented: (NUTCH-186) mapred-default.xml is over ridden by nutch-site.xml by JIRA jira@apache.org
0
by JIRA jira@apache.org
Need pointers regarding accessing crawled data/customizing policy for crawl. by Manoj Bist
1
by Andrzej Białecki-2
Help: parsing pdf files by Krishnamohan Meduri
1
by Martin Kuen
[jira] Created: (NUTCH-584) urls missing from fetchlist by JIRA jira@apache.org
6
by JIRA jira@apache.org
[jira] Created: (NUTCH-534) SegmentMerger: add -normalize option by JIRA jira@apache.org
7
by JIRA jira@apache.org
[jira] Created: (NUTCH-597) Fetcher2 - java.lang.NullPointerException when host does not exist and fetcher.threads.per.host.by.ip is set to true causes threads to finish. by JIRA jira@apache.org
4
by JIRA jira@apache.org
[jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice by JIRA jira@apache.org
0
by JIRA jira@apache.org
[jira] Commented: (NUTCH-363) Fetcher normalizes everything at least twice by JIRA jira@apache.org
0
by JIRA jira@apache.org
Serious bug in Generator / FreeGenerator by Andrzej Białecki-2
0
by Andrzej Białecki-2
[jira] Created: (NUTCH-528) CrawlDbReader: add some new stats + dump into a csv format by JIRA jira@apache.org
9
by JIRA jira@apache.org
[jira] Commented: (NUTCH-368) Message queueing system by JIRA jira@apache.org
0
by JIRA jira@apache.org
setting number of reduce outputs problem by viz-2-3
1
by Andrzej Białecki-2
Plugins? by Bryan Bishop
1
by Bryan Bishop
[jira] Created: (NUTCH-600) Nutch index problem by JIRA jira@apache.org
1
by JIRA jira@apache.org
nutch and future by tigger .
1
by Dennis Kubes-2
Build failed in Hudson: Nutch-Nightly #319 by hudson-6
4
by hudson-6
Problems with Hadhoop Log4J on Nutch 0.8.1 by Jesiel Trevisan
0
by Jesiel Trevisan
[jira] Created: (NUTCH-599) nutch crawl and index problem by JIRA jira@apache.org
3
by JIRA jira@apache.org
Tika 0.1-incubating released by chrismattmann
0
by chrismattmann
[jira] Created: (NUTCH-559) NTLM, Basic and Digest Authentication schemes for web/proxy server by JIRA jira@apache.org
18
by JIRA jira@apache.org
[jira] Created: (NUTCH-561) HttpClient plugin does not work with NTLM authentication by JIRA jira@apache.org
1
by JIRA jira@apache.org
[jira] Created: (NUTCH-560) protocol-httpclient reading more bytes than http.content.limit by JIRA jira@apache.org
2
by JIRA jira@apache.org
[jira] Created: (NUTCH-539) HttpClient plugin does not work with BasicAuthentication by JIRA jira@apache.org
4
by JIRA jira@apache.org
[jira] Created: (NUTCH-481) http.content.limit is broken in the protocol-httpclient plugin by JIRA jira@apache.org
2
by JIRA jira@apache.org
Build failed in Hudson: Nutch-Nightly #316 by hudson-6
1
by hudson-6
Student contributions by fmccown
3
by fmccown
Build failed in Hudson: Nutch-Nightly #311 by hudson-6
4
by hudson-6
Build failed in Hudson: Nutch-Nightly #307 by hudson-6
2
by hudson-6
nutch internet crawling help by NIDHI MALIK
0
by NIDHI MALIK
Enable Nutch to search for local file system by Torontoer
0
by Torontoer
scoring algorithm by Lirida Kercelli
0
by Lirida Kercelli
1 ... 521522523524525526527 ... 584