|
This post was updated on .
Hi all,
I recently configured nutch-GORA on my cassandra DB. My colleague referred me to the below link, which is awesome. http://sujitpal.blogspot.in/2012/01/exploring-nutch-gora-with-cassandra.html I followed the steps in the blog as is. The problem I am having is, the first time, everything goes well - inject, generate, fetch, and parse. But when I iterate, nutch fetch does not fetch the data. As a result, my solr index only has 10 records (from the first successful run of course), and is not picking the data from the subsequent runs. Results from my nutch fetch (After iterating)- andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch 1329855266-1107256220 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1329855266-1107256220 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 0 records. Hit by time limit :0 -finishing thread FetcherThread0, activeThreads=0 -finishing thread FetcherThread1, activeThreads=0 -finishing thread FetcherThread2, activeThreads=0 -finishing thread FetcherThread3, activeThreads=0 -finishing thread FetcherThread4, activeThreads=0 -finishing thread FetcherThread5, activeThreads=0 -finishing thread FetcherThread6, activeThreads=0 -finishing thread FetcherThread7, activeThreads=0 -finishing thread FetcherThread8, activeThreads=0 -finishing thread FetcherThread9, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 -activeThreads=0 FetcherJob: done ************************************* vs the author of the above blog - sujit@cyclone:local$ bin/nutch fetch 1325709400-776802111 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1325709400-776802111 Using queue mode : byHost Fetcher: threads: 10 fetching http://www.parathyroid.com/parathyroid.htm QueueFeeder finished: total 47 records. Hit by time limit :0 -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=46 fetching http://www.parathyroid.com/Parathyroid-Surgeon.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=45 fetching http://www.parathyroid.com/paratiroide/index.html fetching http://www.parathyroid.com/diagnosis.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=43 fetching http://www.parathyroid.com/parathyroid-adenoma.htm fetching http://www.parathyroid.com/age.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=41 fetching http://www.parathyroid.com/FHH.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=40 fetching http://www.parathyroid.com/treatment-surgery.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=39 fetching http://www.parathyroid.com/who's_eligible.htm fetching http://www.parathyroid.com/parathyroid-disease.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=37 fetching http://www.parathyroid.com/FAQ.htm fetching http://www.parathyroid.com/finding-parathyroid.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=35 fetching http://www.parathyroid.com/hyperparathyroidism-diagnosis.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=34 fetching http://www.parathyroid.com/index.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=33 fetching http://www.parathyroid.com/parathyroid-pictures.htm fetching http://www.parathyroid.com/Parathyroid-Surgeon-Map.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=31 fetching http://www.parathyroid.com/mini-surgery.htm fetching http://www.parathyroid.com/about-Parathyroid.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=29 fetching http://www.parathyroid.com/disclaimer.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=28 fetching http://www.parathyroid.com/parathyroid-function.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=27 fetching http://www.parathyroid.com/paratiroide fetching http://www.parathyroid.com/low-vitamin-d.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=25 fetching http://www.parathyroid.com/parathyroid-symptoms-cartoon.htm fetching http://www.parathyroid.com/sestamibi.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=23 fetching http://www.parathyroid.com/osteoporosis.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=22 -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=22 fetching http://www.parathyroid.com/surgery_cure_rates.htm fetching http://www.parathyroid.com/low-calcium.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=20 fetching http://www.parathyroid.com/Sensipar-high-calcium.htm fetching http://www.parathyroid.com/Dr.Norman.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=18 fetching http://www.parathyroid.com/parathyroid-anatomy.htm fetching http://www.parathyroid.com/parathyroid-surgery.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=16 -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=16 fetching http://www.parathyroid.com/hypoparathyroidism.htm fetching http://www.parathyroid.com/endocrinology.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=14 fetching http://www.parathyroid.com/parathyroid-cancer.htm fetching http://www.parathyroid.com/testimonials.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=12 fetching http://www.parathyroid.com/hyperparathyroidism-videos.htm fetching http://www.parathyroid.com/high-calcium.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=10 -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=10 fetching http://www.parathyroid.com/osteoporosis2.htm fetching http://www.parathyroid.com/MEN-Syndrome.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=8 fetching http://www.parathyroid.com/causes.htm fetching http://www.parathyroid.com/MIRP-Surgery.htm -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=6 fetching http://www.parathyroid.com/Re-Operation.htm fetching http://www.parathyroid.com/pregnancy.htm -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=4 * queue: http://www.parathyroid.com I am thinking somewhere, depth needs to be specified? - If yes, where? I followed all the steps in the blog, and don't see a single error in my log file. My seed list directory is in - /home/andrew/nutch/ andrew@andrew-ubuntu:pwd /home/andrew/nutch/ andrew@andrew-ubuntu:~/nutch$ ls -ltr total 20 drwxrwxr-x 5 pooja pooja 4096 2012-02-19 19:38 workspace drwxrwxr-x 3 pooja pooja 4096 2012-02-19 21:23 install drwxrwxr-x 13 pooja pooja 4096 2012-02-20 08:06 gora drwxrwxr-x 9 pooja pooja 4096 2012-02-20 09:21 branch drwxrwxr-x 2 pooja pooja 4096 2012-02-21 12:05 web_seeds andrew@andrew-ubuntu:~/nutch$ cd web_seeds/ andrew@andrew-ubuntu:~/nutch/web_seeds$ ls -ltr total 4 -rwxr-xr-x 1 andrew andrew 19 2012-02-21 11:03 nutch.txt andrew@andrew-ubuntu:~/nutch/web_seeds$ cat * http://www.cnn.com For your reference, I have also pasted below the nutch inject, generate, fetch, and parse from my first run. andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch inject /home/andrew/nutch/web_seeds InjectorJob: starting InjectorJob: urlDir: /home/andrew/nutch/web_seeds InjectorJob: finished andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: true GeneratorJob: done GeneratorJob: generated batch id: 1329855121-1496717092 andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch 1329855121-1496717092 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1329855121-1496717092 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 1 records. Hit by time limit :0 fetching http://www.cnn.com/ -finishing thread FetcherThread1, activeThreads=1 -finishing thread FetcherThread3, activeThreads=1 -finishing thread FetcherThread2, activeThreads=1 -finishing thread FetcherThread4, activeThreads=1 -finishing thread FetcherThread5, activeThreads=1 -finishing thread FetcherThread6, activeThreads=1 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread8, activeThreads=1 -finishing thread FetcherThread9, activeThreads=1 -finishing thread FetcherThread0, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 -activeThreads=0 FetcherJob: done andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch parse 1329855121-1496717092 ParserJob: starting ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: batchId: 1329855121-1496717092 Parsing http://www.cnn.com/ ParserJob: success andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch updatedb DbUpdaterJob: starting DbUpdaterJob: done |
|
Any suggestions?
|
|
In reply to this post by apachenutch
Hi apachenutch,
I am the author of the blog post...thanks for the kind words... Did you miss the updatedb by any chance? This takes the outlinks from the parsed pages and adds them back to the fetch list so generate can then make these available for fetching... So initial cycle: inject, generate, fetch, parse, updatedb next cycle: generate, fetch, parse, updatedb ... finally: solrindex -sujit On Feb 21, 2012, at 12:32 PM, apachenutch wrote: > Hi all, > > I recently configured nutch-GORA on my cassandra DB. My colleague referred > me to the below link, which is awesome. > http://sujitpal.blogspot.in/2012/01/exploring-nutch-gora-with-cassandra.html > > I followed the steps in the blog as is. The problem I am having is, the > first time, everything goes well - inject, generate, fetch, and parse. But > when I iterate, nutch fetch does not fetch the data. As a result, my solr > index only has 10 records (from the first successful run), and is not > picking the data from the subsequent runs. > > Results from my nutch fetch - > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch > 1329855266-1107256220 > FetcherJob: starting > FetcherJob : timelimit set for : -1 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob: batchId: 1329855266-1107256220 > Using queue mode : byHost > Fetcher: threads: 10 > QueueFeeder finished: total 0 records. Hit by time limit :0 > -finishing thread FetcherThread0, activeThreads=0 > -finishing thread FetcherThread1, activeThreads=0 > -finishing thread FetcherThread2, activeThreads=0 > -finishing thread FetcherThread3, activeThreads=0 > -finishing thread FetcherThread4, activeThreads=0 > -finishing thread FetcherThread5, activeThreads=0 > -finishing thread FetcherThread6, activeThreads=0 > -finishing thread FetcherThread7, activeThreads=0 > -finishing thread FetcherThread8, activeThreads=0 > -finishing thread FetcherThread9, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 > -activeThreads=0 > FetcherJob: done > > ************************************* > vs the author of the above blog - > > sujit@cyclone:local$ bin/nutch fetch 1325709400-776802111 > FetcherJob: starting > FetcherJob : timelimit set for : -1 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob: batchId: 1325709400-776802111 > Using queue mode : byHost > Fetcher: threads: 10 > /*fetching http://www.parathyroid.com/parathyroid.htm > QueueFeeder finished: total 47 records. Hit by time limit :0 > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=46 > fetching http://www.parathyroid.com/Parathyroid-Surgeon.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=45 > fetching http://www.parathyroid.com/paratiroide/index.html > fetching http://www.parathyroid.com/diagnosis.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=43 > fetching http://www.parathyroid.com/parathyroid-adenoma.htm > fetching http://www.parathyroid.com/age.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=41 > fetching http://www.parathyroid.com/FHH.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=40 > fetching http://www.parathyroid.com/treatment-surgery.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=39 > fetching http://www.parathyroid.com/who's_eligible.htm > fetching http://www.parathyroid.com/parathyroid-disease.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=37 > fetching http://www.parathyroid.com/FAQ.htm > fetching http://www.parathyroid.com/finding-parathyroid.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=35 > fetching http://www.parathyroid.com/hyperparathyroidism-diagnosis.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=34 > fetching http://www.parathyroid.com/index.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=33 > fetching http://www.parathyroid.com/parathyroid-pictures.htm > fetching http://www.parathyroid.com/Parathyroid-Surgeon-Map.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=31 > fetching http://www.parathyroid.com/mini-surgery.htm > fetching http://www.parathyroid.com/about-Parathyroid.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=29 > fetching http://www.parathyroid.com/disclaimer.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=28 > fetching http://www.parathyroid.com/parathyroid-function.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=27 > fetching http://www.parathyroid.com/paratiroide > fetching http://www.parathyroid.com/low-vitamin-d.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=25 > fetching http://www.parathyroid.com/parathyroid-symptoms-cartoon.htm > fetching http://www.parathyroid.com/sestamibi.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=23 > fetching http://www.parathyroid.com/osteoporosis.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=22 > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=22 > fetching http://www.parathyroid.com/surgery_cure_rates.htm > fetching http://www.parathyroid.com/low-calcium.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=20 > fetching http://www.parathyroid.com/Sensipar-high-calcium.htm > fetching http://www.parathyroid.com/Dr.Norman.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=18 > fetching http://www.parathyroid.com/parathyroid-anatomy.htm > fetching http://www.parathyroid.com/parathyroid-surgery.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=16 > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=16 > fetching http://www.parathyroid.com/hypoparathyroidism.htm > fetching http://www.parathyroid.com/endocrinology.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=14 > fetching http://www.parathyroid.com/parathyroid-cancer.htm > fetching http://www.parathyroid.com/testimonials.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=12 > fetching http://www.parathyroid.com/hyperparathyroidism-videos.htm > fetching http://www.parathyroid.com/high-calcium.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=10 > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=10 > fetching http://www.parathyroid.com/osteoporosis2.htm > fetching http://www.parathyroid.com/MEN-Syndrome.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=8 > fetching http://www.parathyroid.com/causes.htm > fetching http://www.parathyroid.com/MIRP-Surgery.htm > -activeThreads=10, spinWaiting=10, fetchQueues= 1, fetchQueues.totalSize=6 > fetching http://www.parathyroid.com/Re-Operation.htm > fetching http://www.parathyroid.com/pregnancy.htm > -activeThreads=10, spinWaiting=9, fetchQueues= 1, fetchQueues.totalSize=4 > * queue: http://www.parathyroid.com*/ > > I am thinking somewhere, depth needs to be specified? - If yes, where? > I followed all the steps in the blog, and don't see a single error in my log > file. My seed list directory is in - /home/andrew/nutch/ > > andrew@andrew-ubuntu:pwd > /home/andrew/nutch/ > andrew@andrew-ubuntu:~/nutch$ ls -ltr > total 20 > drwxrwxr-x 5 pooja pooja 4096 2012-02-19 19:38 workspace > drwxrwxr-x 3 pooja pooja 4096 2012-02-19 21:23 install > drwxrwxr-x 13 pooja pooja 4096 2012-02-20 08:06 gora > drwxrwxr-x 9 pooja pooja 4096 2012-02-20 09:21 branch > drwxrwxr-x 2 pooja pooja 4096 2012-02-21 12:05 web_seeds > > andrew@andrew-ubuntu:~/nutch$ cd web_seeds/ > > andrew@andrew-ubuntu:~/nutch/web_seeds$ ls -ltr > total 4 > -rwxr-xr-x 1 andrew andrew 19 2012-02-21 11:03 nutch.txt > > andrew@andrew-ubuntu:~/nutch/web_seeds$ cat * > http://www.cnn.com > > For your reference, I have also pasted below the nutch inject, generate, > fetch, and parse from my first run. > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch inject > /home/andrew/nutch/web_seeds > InjectorJob: starting > InjectorJob: urlDir: /home/andrew/nutch/web_seeds > InjectorJob: finished > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate > GeneratorJob: Selecting best-scoring urls due for fetch. > GeneratorJob: starting > GeneratorJob: filtering: true > GeneratorJob: done > GeneratorJob: generated batch id: 1329855121-1496717092 > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch > 1329855121-1496717092 > FetcherJob: starting > FetcherJob : timelimit set for : -1 > FetcherJob: threads: 10 > FetcherJob: parsing: false > FetcherJob: resuming: false > FetcherJob: batchId: 1329855121-1496717092 > Using queue mode : byHost > Fetcher: threads: 10 > QueueFeeder finished: total 1 records. Hit by time limit :0 > fetching http://www.cnn.com/ > -finishing thread FetcherThread1, activeThreads=1 > -finishing thread FetcherThread3, activeThreads=1 > -finishing thread FetcherThread2, activeThreads=1 > -finishing thread FetcherThread4, activeThreads=1 > -finishing thread FetcherThread5, activeThreads=1 > -finishing thread FetcherThread6, activeThreads=1 > -finishing thread FetcherThread7, activeThreads=1 > -finishing thread FetcherThread8, activeThreads=1 > -finishing thread FetcherThread9, activeThreads=1 > -finishing thread FetcherThread0, activeThreads=0 > -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 > -activeThreads=0 > FetcherJob: done > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch parse > 1329855121-1496717092 > ParserJob: starting > ParserJob: resuming: false > ParserJob: forced reparse: false > ParserJob: batchId: 1329855121-1496717092 > Parsing http://www.cnn.com/ > ParserJob: success > > andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch updatedb > DbUpdaterJob: starting > DbUpdaterJob: done > > > > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764751.html > Sent from the Nutch - User mailing list archive at Nabble.com. |
|
Update DB was done, after inject, generate, fetch and parse.
Tried iterating after doing the update. |
|
Hi apachenutch,
Something of a wild guess here. Given that you are using the same seed file as I am, I would have expected to see a single URL in the index at the end of the first iteration, not 10. So the only time I have observed similar behavior was when the fetcher truncated the file because of the http.content.limit setting, you may want to set it to -1 and see if the problem gets fixed. You can verify if this is needed by looking at the cnt column for the seed URL and see if the contents of the page is the same as what you get from a view-source of the seed URL page on your browser. Also to answer your original question, the depth is the iteration number. Each time you go deeper and deeper because you are putting the outlinks generated from the previous call back into the fetch list and fetching/parsing them. You can of course script it and specify a depth parameter that controls the number of iterations... -sujit On Feb 21, 2012, at 2:16 PM, apachenutch wrote: > Update DB was done, after inject, generate, fetch and parse. > Tried iterating after doing the update. > > -- > View this message in context: http://lucene.472066.n3.nabble.com/Please-help-Nutch-fetch-command-not-fetching-data-tp3764751p3764994.html > Sent from the Nutch - User mailing list archive at Nabble.com. |
|
Thank you. I changed the value but no luck. (Changed in runtime/local/conf - nutch-default.xml)
<property> <name>http.content.limit</name> <value>-1</value> <description>The length limit for downloaded content using the http Output -------------------- andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch inject ../../../web_seeds InjectorJob: starting InjectorJob: urlDir: ../../../web_seeds InjectorJob: finished andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: true GeneratorJob: done GeneratorJob: generated batch id: 1329930779-110515839 andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch 1329930779-110515839 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1329930779-110515839 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 1 records. Hit by time limit :0 fetching http://www.q1a.com/ -finishing thread FetcherThread1, activeThreads=1 -finishing thread FetcherThread2, activeThreads=1 -finishing thread FetcherThread3, activeThreads=1 -finishing thread FetcherThread4, activeThreads=1 -finishing thread FetcherThread5, activeThreads=1 -finishing thread FetcherThread6, activeThreads=1 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread8, activeThreads=1 -finishing thread FetcherThread9, activeThreads=1 -activeThreads=1, spinWaiting=0, fetchQueues= 1, fetchQueues.totalSize=0 -finishing thread FetcherThread0, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 -activeThreads=0 FetcherJob: done andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch parse 1329930779-110515839 ParserJob: starting ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: batchId: 1329930779-110515839 Parsing http://www.q1a.com/ Skipping http://www.q1a.com/q1a; different batch id - Why does it say skipping here? ParserJob: success andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch updatedb DbUpdaterJob: starting DbUpdaterJob: done ************************ The first iteration **************************** andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch generate GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: true GeneratorJob: done GeneratorJob: generated batch id: 1329930901-1268252438 andrew@andrew-ubuntu:~/nutch/branch/runtime/local$ bin/nutch fetch 1329930901-1268252438 FetcherJob: starting FetcherJob : timelimit set for : -1 FetcherJob: threads: 10 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob: batchId: 1329930901-1268252438 Using queue mode : byHost Fetcher: threads: 10 QueueFeeder finished: total 1 records. Hit by time limit :0 fetching http://www.q1a.com/q1a -finishing thread FetcherThread1, activeThreads=1 -finishing thread FetcherThread2, activeThreads=1 -finishing thread FetcherThread3, activeThreads=1 -finishing thread FetcherThread4, activeThreads=1 -finishing thread FetcherThread5, activeThreads=1 -finishing thread FetcherThread6, activeThreads=1 -finishing thread FetcherThread7, activeThreads=1 -finishing thread FetcherThread8, activeThreads=1 -finishing thread FetcherThread9, activeThreads=1 -finishing thread FetcherThread0, activeThreads=0 -activeThreads=0, spinWaiting=0, fetchQueues= 0, fetchQueues.totalSize=0 -activeThreads=0 FetcherJob: done I stopped here, since its not doing what it is supposed to. Please suggest. |
| Powered by Nabble | Edit this page |
