adding [-numFetchers numFetchers] to crawl

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

adding [-numFetchers numFetchers] to crawl

Brian Tingle
How do I set the number of Map tasks when I do a command like

 

hadoop jar nutch-1.0.job org.apache.nutch.crawler.Crawl

 

?

 

I think I'm going to try out the change below, is there any reason not
to do it, or is Crawl supposed to be more of a demo and I should write
some script or my own crawler class?

 

> diff Crawl.java.orig Crawl.java

53c53

<         ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i]
[-topN N]");

---

>         ("Usage: Crawl <urlDir> [-dir d] [-threads n] [-depth i]
[-topN N] [-numFetchers]");

65a66

>     int numFetchers = -1;

78a80,82

>       } else if ("-numFetchers".equals(args[i])) {

>           numFetchers = Integer.parseInt(args[i+1]);

>           i++;

116c120

<       Path segment = generator.generate(crawlDb, segments, -1, topN,
System

---

>       Path segment = generator.generate(crawlDb, segments,
numFetchers, topN, System