measure crawl rate of crawled website from nutch

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

measure crawl rate of crawled website from nutch

srinir
Is there a way to measure (some sort of stats) how many requests did nutch
send to a website for one day or one hour ? I would like to measure the
crawl rate ?

Here are the options i tried so far (with the dump i created out of crawldb)

- use the "tstamp" field in the index and aggregate it and count by every
unique date/hour
- filter the crawldb by modified date ( to the date being analyzed) and
then aggregate again by date/hour ( to make sure we dont just count
db_fetched, but everything else).

Thanks
Srini