I have crawled a million pages before, and right now I would like to read
from crawled local files instead of from the Internet again. I changed the
http plugin to do so, but the speed is quite slow - it took 10 minutes to
read and parse( I was running "fetch" command) only 400 files/pages. This
means reading 1 million will take 400 hours, which is half-a- month. I used
4 threads on a 4 CPU box. Using more threads, like 8, made it even slower.
Let me know if this speed is reasonable. And what can I do to improve this?