Authentication / Content-type

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

Authentication / Content-type

Thushara Wijeratna

I used nutch-0.7.1 to index an intranet. It is a really great tool,
thanks for developing it! I had to hack something quick for
Authentication (somehow couldn't get the crawler to accept the
http.auth.basic.user etc). I also found an issue where parsing an html
page returned an error "Content type is xml not html". Turns out that
sometimes the string "Content-Type" is used instead of "Content-type".
So I hacked - toContent method like this:


            String contentType = getHeader("Content-type");

            If (contentType == null) {

                        contentType = getHeader("Content-Type");


Just thought I'll share with you all.