[Nutch Wiki] Update of "HttpAuthenticationSchemes" by susam

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[Nutch Wiki] Update of "HttpAuthenticationSchemes" by susam

Apache Wiki
Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Nutch Wiki" for change notification.

The following page has been changed by susam:
http://wiki.apache.org/nutch/HttpAuthenticationSchemes

The comment on the change is:
changed introduction

------------------------------------------------------------------------------
  == Introduction ==
- 'protocol-httpclient' is a protocol plugin which supports retrieving documents via the HTTP 1.0, HTTP 1.1 and HTTPS protocols, optionally with Basic, Digest and NTLM authentication schemes for web server as well as proxy server. This feature can not do POST based authentication that depends on cookies. More information on this can be found at: HttpPostAuthentication
+ This is a feature in Nutch, developed by Susam Pal, that allows the crawler to authenticate itself to websites requiring NTLM, Basic or Digest authentication. This feature can not do POST based authentication that depends on cookies. More information on this can be found at: HttpPostAuthentication
 
  == Necessity ==
  There were two plugins already present, viz. 'protocol-http' and 'protocol-httpclient'. However, 'protocol-http' could not support HTTP 1.1, HTTPS and NTLM, Basic and Digest authentication schemes. 'protocol-httpclient' supported HTTPS and had code for NTLM authentication but the NTLM authentication didn't work due to a bug. Some portions of 'protocol-httpclient' were re-written to solve these problems, provide additional features like authentication support for proxy server and better inline documentation for the properties to be used to configure authentication.