Authentication / Content-type

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Authentication / Content-type

Thushara Wijeratna
Hi,

I used nutch-0.7.1 to index an intranet. It is a really great tool,
thanks for developing it! I had to hack something quick for
Authentication (somehow couldn't get the crawler to accept the
http.auth.basic.user etc). I also found an issue where parsing an html
page returned an error "Content type is xml not html". Turns out that
sometimes the string "Content-Type" is used instead of "Content-type".
So I hacked HttpResponse.java - toContent method like this:

 

            String contentType = getHeader("Content-type");

            If (contentType == null) {

                        contentType = getHeader("Content-Type");

            }

Just thought I'll share with you all.

Thanks,

Thushara