Nutch - Dev

This forum is an archive for the mailing list dev@nutch.apache.org (more options) Messages posted here will be sent to this mailing list.
If you'd like to contribute to Nutch, please subscribe to the Nutch developer mailing list.
1 ... 580581582583584585586 ... 612
Topics (21405)
Replies Last Post Views
nutch-default.xml configuration by Lourival Júnior
7
by Nuther
how to manipulate with MapWritable metaData in CrawlDatum structure by Feng Ji
2
by Stefan Groschupf-2
Nutch logging questions by Jérôme Charron
1
by Doug Cutting
Adding new urls in WebDB by Lourival Júnior
4
by Lourival Júnior
[jira] Created: (NUTCH-305) Update crawl and url filter lists to exclude jpeg|JPEG|bmp|BMP by Nick Burch (Jira)
1
by Nick Burch (Jira)
anchor text modifications by Brian Higgins-4
1
by Andrzej Białecki-2
How do I use nuch tomerge multiple webdb? by Nutch开发邮件
1
by Dennis Kubes
[jira] Created: (NUTCH-304) Change JIRA email address for nutch issues from apache incubator by Nick Burch (Jira)
0
by Nick Burch (Jira)
resolving IP in... by Stefan Groschupf-2
6
by Dennis Kubes
[jira] Created: (NUTCH-301) CommonGrams loads analysis.common.terms.file for each query by Nick Burch (Jira)
3
by Nick Burch (Jira)
a little deterrent by khz-2
0
by khz-2
[jira] Created: (NUTCH-275) Fetcher not parsing XHTML-pages at all by Nick Burch (Jira)
7
by Nick Burch (Jira)
Status of language plugin by T. Kuro Kurosaka
1
by Jérôme Charron
[jira] Created: (NUTCH-294) Topic-maps of related searchwords by Nick Burch (Jira)
5
by Nick Burch (Jira)
classloading problem hadoop .3.1 by Stefan Groschupf-2
0
by Stefan Groschupf-2
Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml by Otis Gospodnetic-2-2
5
by Andrzej Białecki-2
wildcard / regular expression searches by Björn Wilmsmann
0
by Björn Wilmsmann
[jira] Commented: (NUTCH-48) "Did you mean" query enhancement/refignment feature request by Nick Burch (Jira)
0
by Nick Burch (Jira)
Re: svn commit: r411943 - in /lucene/nutch/trunk/lib: commons-logging-1.0.4.jar hadoop-0.2.1.jar hadoop-0.3.1.jar log4j-1.2.13.jar by Jérôme Charron
3
by Doug Cutting
summary by Anton Potekhin
3
by Anton Potekhin
[jira] Created: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host by Nick Burch (Jira)
4
by Nick Burch (Jira)
[jira] Created: (NUTCH-201) add support for subcollections by Nick Burch (Jira)
3
by Nick Burch (Jira)
search engine spam detector by Stefan Groschupf-2
4
by Andrzej Białecki-2
parse OutOfMemoryError? by uygaryuzsuren
0
by uygaryuzsuren
[jira] Created: (NUTCH-299) Bittorrent Parser by Nick Burch (Jira)
3
by Nick Burch (Jira)
RobotRuleSet by Stefan Groschupf-2
0
by Stefan Groschupf-2
[jira] Created: (NUTCH-297) sandbox svn folder by Nick Burch (Jira)
0
by Nick Burch (Jira)
[jira] Created: (NUTCH-296) Image Search by Nick Burch (Jira)
1
by Nick Burch (Jira)
[jira] Created: (NUTCH-295) More description for fetcher.threads.fetch property by Nick Burch (Jira)
1
by Nick Burch (Jira)
[jira] Created: (NUTCH-290) parse-pdf: Garbage (?) indexed when text-extraction now allowed by Nick Burch (Jira)
8
by Nick Burch (Jira)
[jira] Created: (NUTCH-286) Handling common error-pages as 404 by Nick Burch (Jira)
3
by Nick Burch (Jira)
[jira] Created: (NUTCH-282) Showing too few results on a page (Paging not correct) by Nick Burch (Jira)
3
by Nick Burch (Jira)
[jira] Created: (NUTCH-274) Empty row in/at end of URL-list results in error by Nick Burch (Jira)
2
by Nick Burch (Jira)
[jira] Created: (NUTCH-291) OpenSearchServlet should return "date" as well as "lastModified" by Nick Burch (Jira)
3
by Nick Burch (Jira)
[jira] Created: (NUTCH-281) cached.jsp: base-href needs to be outside comments by Nick Burch (Jira)
2
by Nick Burch (Jira)
1 ... 580581582583584585586 ... 612