Does the data size in 0.8 vesion should be much smaller than in version 0.7?
I am running few cycles of fetching on nutch 0.8 and I notice that the data
size is much smaller than the data size I got in version 0.7 (running the
same cycle about the same time from different machines), about 5G after the
third cycle starting with about 72000 URLs .
All the processes ended sucssesfuly, everything seems to be fine but I am
afraid that I'm missing somthing.
Each cycle includes :
updatedb crawldb segments/..
generate crawldb segments
The configuration in nutch-site.xml are :