Time of Reading Local Files

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Time of Reading Local Files

Jane Zhen

Hi Folks,

I have crawled a million pages before, and right now I would like to read
from crawled local files instead of from the Internet again. I changed the
http plugin to do so, but the speed is quite slow - it took 10 minutes to
read and parse( I was running "fetch" command) only 400 files/pages. This
means reading 1 million will take 400 hours, which is half-a- month. I used
4 threads on a 4 CPU box. Using more threads, like 8, made it even slower.

Let me know if this speed is reasonable. And what can I do to improve this?



Express yourself instantly with MSN Messenger! Download today it's FREE!