Is there a way to measure (some sort of stats) how many requests did nutch
send to a website for one day or one hour ? I would like to measure the
crawl rate ?
Here are the options i tried so far (with the dump i created out of crawldb)
- use the "tstamp" field in the index and aggregate it and count by every
- filter the crawldb by modified date ( to the date being analyzed) and
then aggregate again by date/hour ( to make sure we dont just count
db_fetched, but everything else).