[jira] Created: (NUTCH-208) http: proxy exception list:

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (NUTCH-208) http: proxy exception list:

JIRA jira@apache.org
http: proxy exception list:
----------------------------

         Key: NUTCH-208
         URL: http://issues.apache.org/jira/browse/NUTCH-208
     Project: Nutch
        Type: New Feature
  Components: fetcher  
    Versions: 0.8-dev    
    Reporter: Matthias Günter
    Priority: Minor


I suggest that a parameter is added to nutch-default.xml which allows to generate a proxy exception list.

<property>
  <name>http.proxy.exception.list</name>
  <value></value>
  <description>URL's and hosts that don't use the proxy (e.g. intranets)</description>
</property>


This is useful when scanning intranet/internet combinations from behind a firewall. A preliminary patch is added to this extend to this request, showing the changes. We will test it and update it if necessary. this also reflects the reality in web browsers, where there is in most cases an exception list.



--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-208) http: proxy exception list:

JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]

Matthias Günter updated NUTCH-208:
----------------------------------

    Attachment: patch.txt

A preliminary patch!!

> http: proxy exception list:
> ---------------------------
>
>          Key: NUTCH-208
>          URL: http://issues.apache.org/jira/browse/NUTCH-208
>      Project: Nutch
>         Type: New Feature
>   Components: fetcher
>     Versions: 0.8-dev
>     Reporter: Matthias Günter
>     Priority: Minor
>  Attachments: patch.txt, patch.txt
>
> I suggest that a parameter is added to nutch-default.xml which allows to generate a proxy exception list.
> <property>
>   <name>http.proxy.exception.list</name>
>   <value></value>
>   <description>URL's and hosts that don't use the proxy (e.g. intranets)</description>
> </property>
> This is useful when scanning intranet/internet combinations from behind a firewall. A preliminary patch is added to this extend to this request, showing the changes. We will test it and update it if necessary. this also reflects the reality in web browsers, where there is in most cases an exception list.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-208) http: proxy exception list:

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]

Matthias Günter updated NUTCH-208:
----------------------------------

    Attachment: patch.txt

A preliminary patch!!

> http: proxy exception list:
> ---------------------------
>
>          Key: NUTCH-208
>          URL: http://issues.apache.org/jira/browse/NUTCH-208
>      Project: Nutch
>         Type: New Feature
>   Components: fetcher
>     Versions: 0.8-dev
>     Reporter: Matthias Günter
>     Priority: Minor
>  Attachments: patch.txt, patch.txt
>
> I suggest that a parameter is added to nutch-default.xml which allows to generate a proxy exception list.
> <property>
>   <name>http.proxy.exception.list</name>
>   <value></value>
>   <description>URL's and hosts that don't use the proxy (e.g. intranets)</description>
> </property>
> This is useful when scanning intranet/internet combinations from behind a firewall. A preliminary patch is added to this extend to this request, showing the changes. We will test it and update it if necessary. this also reflects the reality in web browsers, where there is in most cases an exception list.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (NUTCH-208) http: proxy exception list:

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
     [ http://issues.apache.org/jira/browse/NUTCH-208?page=all ]

Renaud Richardet updated NUTCH-208:
-----------------------------------

    Attachment: proxy_exception_list-0.8.diff

I updated the patch to 0.8 and corrected small typo (if (!"".equals(input[i].trim())){  ). The proxy exception list feature works well. You can test it using any proxy, eg tinyproxy (http://wiki.apache.org/nutch/SetupProxyForNutch)

> http: proxy exception list:
> ---------------------------
>
>                 Key: NUTCH-208
>                 URL: http://issues.apache.org/jira/browse/NUTCH-208
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Matthias Günter
>            Priority: Minor
>         Attachments: patch.txt, patch.txt, proxy_exception_list-0.8.diff
>
>
> I suggest that a parameter is added to nutch-default.xml which allows to generate a proxy exception list.
> <property>
>   <name>http.proxy.exception.list</name>
>   <value></value>
>   <description>URL's and hosts that don't use the proxy (e.g. intranets)</description>
> </property>
> This is useful when scanning intranet/internet combinations from behind a firewall. A preliminary patch is added to this extend to this request, showing the changes. We will test it and update it if necessary. this also reflects the reality in web browsers, where there is in most cases an exception list.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (NUTCH-208) http: proxy exception list:

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org
    [ http://issues.apache.org/jira/browse/NUTCH-208?page=comments#action_12433175 ]
           
Sami Siren commented on NUTCH-208:
----------------------------------

This looks like a good addition to Nutch, couple of comments:
-The added comments in HttpResponse should be removed.
-Any change to add some junit test along with this ?


> http: proxy exception list:
> ---------------------------
>
>                 Key: NUTCH-208
>                 URL: http://issues.apache.org/jira/browse/NUTCH-208
>             Project: Nutch
>          Issue Type: New Feature
>          Components: fetcher
>    Affects Versions: 0.8
>            Reporter: Matthias Günter
>            Priority: Minor
>         Attachments: patch.txt, patch.txt, proxy_exception_list-0.8.diff
>
>
> I suggest that a parameter is added to nutch-default.xml which allows to generate a proxy exception list.
> <property>
>   <name>http.proxy.exception.list</name>
>   <value></value>
>   <description>URL's and hosts that don't use the proxy (e.g. intranets)</description>
> </property>
> This is useful when scanning intranet/internet combinations from behind a firewall. A preliminary patch is added to this extend to this request, showing the changes. We will test it and update it if necessary. this also reflects the reality in web browsers, where there is in most cases an exception list.

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira