Nutch - User
only in this topic
Open this post in threaded view
wondering if anybody else had been having problem with the script at:
I am doing the simple crawl like this:
bin/nutch url1 -dir crawl1 -depth 2
bin/nutch url2 -dir crawl2 -depth 2
# cwd at /nutch/search - since mergecrawl require absolute path
bin/mergecrawl /nutch/search/merged /nutch/search/crawl1 /nutch/search/crawl2
The individual crawl result was fine but however the merged result was not.
I suspect the result is with the final merge stage with index, since
if i manually reindex with:
bin/nutch index merged/indexes merged/crawldb merged/linkdb
then the will work perfectly fine (i.e. searchable via the nutch searcher).
How would one go about debugging this? Is there any way to read the
index similar to the readdb for reading crawldb?
Many thanks in advance
Return to Nutch - User
1 view|%1 views
Free forum by Nabble
Edit this page