Nutch - Dev

This forum is an archive for the mailing list dev@nutch.apache.org (more options) Messages posted here will be sent to this mailing list.
If you'd like to contribute to Nutch, please subscribe to the Nutch developer mailing list.
1 ... 582583584585586587588 ... 617
Topics (21573)
Replies Last Post Views
[jira] Created: (NUTCH-247) robot parser to restrict. by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
[jira] Created: (NUTCH-310) Review Log Levels by Clark Perkins (Jira)
2
by Clark Perkins (Jira)
[jira] Created: (NUTCH-262) Summary excerpts and highlights problems by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
[jira] Created: (NUTCH-251) Administration GUI by Clark Perkins (Jira)
10
by Clark Perkins (Jira)
[jira] Created: (NUTCH-74) French Analyzer Plugin by Clark Perkins (Jira)
7
by Clark Perkins (Jira)
[jira] Created: (NUTCH-86) LanguageIdentifier API enhancements by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
[jira] Created: (NUTCH-246) segment size is never as big as topN or crawlDB size in a distributed deployement by Clark Perkins (Jira)
8
by Clark Perkins (Jira)
Limiting Results By Domain by Robert Sanford
0
by Robert Sanford
Scanning the database by Robert Sanford
1
by Stefan Neufeind
Indexing href attribute in links. by Robert Sanford
0
by Robert Sanford
Library for extracting text content from binaries by Jukka Zitting
4
by Jukka Zitting
Why was "prune" removed in 0.8? by Stefan Neufeind
1
by Andrzej Białecki-2
segread vs. readseg by Stefan Groschupf-2
4
by Stefan Groschupf-2
[jira] Created: (NUTCH-324) db.score.link.internal and db.score.link.external are ignored by Clark Perkins (Jira)
2
by Clark Perkins (Jira)
[jira] Created: (NUTCH-167) Observation of <META NAME="ROBOTS" CONTENT="NOARCHIVE"> directive by Clark Perkins (Jira)
2
by Clark Perkins (Jira)
[jira] Created: (NUTCH-329) CrawlDbReader processTopNJob does not set jobNames by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
result comparison tool? by Stefan Groschupf-2
1
by kkrugler
tests failing by Sami Siren-2
1
by Stefan Groschupf-2
[Fwd: Re: [jira] Commented: (NUTCH-271) Meta-data per URL/site/section] by Stefan Neufeind
0
by Stefan Neufeind
[jira] Created: (NUTCH-328) commons-cli-2.0-SNAPSHOT.jar provided with nutch is not compatible with jdk 1.4 by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
[jira] Created: (NUTCH-327) bin/nutch setting of log path problems on cygwin by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
Changing javac.version to 1.5? by Greg Kim
1
by Andrzej Białecki-2
[jira] Created: (NUTCH-326) WordExtractor throws java.util.NoSuchElementException on some documents by Clark Perkins (Jira)
0
by Clark Perkins (Jira)
multiple query filters by Chris Stephens-3
0
by Chris Stephens-3
Distributed Matrix Computering on Hadoop by Jack.Tang
0
by Jack.Tang
log when blocked by robots.txt by Stefan Groschupf-2
1
by Piotr Kosiorowski
[jira] Created: (NUTCH-271) Meta-data per URL/site/section by Clark Perkins (Jira)
7
by Stefan Neufeind
nutch-extensionpoints not in plugin.includes by Stefan Groschupf-2
2
by Stefan Groschupf-2
[jira] Created: (NUTCH-319) OPICScoringFilter should use logging API instead of printStackTrace by Clark Perkins (Jira)
1
by Clark Perkins (Jira)
Webcrawler by Brian M.B. Keaney
0
by Brian M.B. Keaney
[jira] Created: (NUTCH-321) Scoring API deficiency by Clark Perkins (Jira)
2
by Clark Perkins (Jira)
[jira] Created: (NUTCH-323) CrawlDatum.set just reference a mapWritable of a other object but not copy it. by Clark Perkins (Jira)
2
by Clark Perkins (Jira)
[jira] Created: (NUTCH-293) support for Crawl-delay in Robots.txt by Clark Perkins (Jira)
9
by Clark Perkins (Jira)
[jira] Created: (NUTCH-173) PerHost Crawling Policy ( crawl.ignore.external.links ) by Clark Perkins (Jira)
9
by Clark Perkins (Jira)
error in recommended plugin example by Chris Stephens-3
0
by Chris Stephens-3
1 ... 582583584585586587588 ... 617