Re: Nutch robot hitting our Web servers

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: Nutch robot hitting our Web servers

CatOs Mandros
Hi Ric,

The timing of the requests seems nice: > 5 seconds between each GET, it is too much for your machines?. Does it respect Robots.txt ?
My take on that is a bad parsing of links, most likely caused by malformed ones. Could you identify where the links would came from?

Cheers,

2010/12/13 Doğacan Güney <[hidden email]>
Hi,

Thank you for the email. Can you provide some more information? For example,
how many requests does the bot make per second, does it respect robots.txt, etc?

On Mon, Dec 13, 2010 at 11:28, Chrislip, Ric <[hidden email]> wrote:
> For several days now a Nutch robot from IP 174.36.195.29 has been hitting
> our run-time Web servers.  I noticed because our event logs are showing many
> ASP.NET warnings about "illegal characters in path".
>
> Your Web page at http://nutch.apache.org/bot.htm says that you would "like
> to hear about any bad behavior."
>
> I have attached today's log entries from that IP address on one of our
> servers.
>
> Ric Chrislip
> Senior Programmer/Analyst, E-mail Administrator
> Clark Hall 111
> Hartwick College
> Oneonta, New York, USA
> 607-431-4189
>



--
Doğacan Güney