[jira] [Created] (NUTCH-2657) Protocol-http to store HTTP response header with "\r\n"

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Created] (NUTCH-2657) Protocol-http to store HTTP response header with "\r\n"

Chris Mattmann (Jira)
Sebastian Nagel created NUTCH-2657:
--------------------------------------

             Summary: Protocol-http to store HTTP response header with "\r\n"
                 Key: NUTCH-2657
                 URL: https://issues.apache.org/jira/browse/NUTCH-2657
             Project: Nutch
          Issue Type: Improvement
          Components: protocol
    Affects Versions: 1.15
            Reporter: Sebastian Nagel
             Fix For: 1.16


The plugins protocol-http and protocol-okhttp allow to store the HTTP request and/or response headers in the response metadata. However, there is no consensus which line breaks ("\r\n" or "\n") are used between header lines and whether there is a trailing second line break at the end of the headers: while request headers are stored by both plugins with "\r\n" and two trailing "\r\n",  the response headers are stored by protocol-http with "\n" and a single trailing line break. This is difficult to handle if the headers are required to be stored uniformly (I've created such a [nasty bug writing WARC files|https://github.com/commoncrawl/nutch/issues/5]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (NUTCH-2657) Protocol-http to store HTTP response header with "\r\n"

Shaharia Azam
unsubscribe

Regards,
Shaharia Azam
Preview Technologies

Phone: +88 09611 738 439
          + 88 02 913 8532


---- On Mon, 15 Oct 2018 20:28:00 +0600 Sebastian Nagel (JIRA) <[hidden email]> wrote ----

Sebastian Nagel created NUTCH-2657:
--------------------------------------

Summary: Protocol-http to store HTTP response header with "\r\n"
Key: NUTCH-2657
Project: Nutch
Issue Type: Improvement
Components: protocol
Affects Versions: 1.15
Reporter: Sebastian Nagel
Fix For: 1.16


The plugins protocol-http and protocol-okhttp allow to store the HTTP request and/or response headers in the response metadata. However, there is no consensus which line breaks ("\r\n" or "\n") are used between header lines and whether there is a trailing second line break at the end of the headers: while request headers are stored by both plugins with "\r\n" and two trailing "\r\n", the response headers are stored by protocol-http with "\n" and a single trailing line break. This is difficult to handle if the headers are required to be stored uniformly (I've created such a [nasty bug writing WARC files|<a target="_blank" href="https://github.com/commoncrawl/nutch/issues/5]).">https://github.com/commoncrawl/nutch/issues/5]).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply | Threaded
Open this post in threaded view
|

Re: [jira] [Created] (NUTCH-2657) Protocol-http to store HTTP response header with "\r\n"

Sebastian Nagel
Hi,

please follow the instructions given on
  http://nutch.apache.org/mailing_lists.html

In short: send a mail to [hidden email]

Best,
Sebastian


On 10/15/2018 04:39 PM, Shaharia Azam wrote:

> unsubscribe
>
> Regards,
> Shaharia Azam
> Preview Technologies
>
> Phone: +88 09611 738 439
>           + 88 02 913 8532
> URL: https://www.previewtechs.com
>
>
> ---- On Mon, 15 Oct 2018 20:28:00 +0600 *Sebastian Nagel (JIRA) <[hidden email]>* wrote ----
>
>     Sebastian Nagel created NUTCH-2657:
>     --------------------------------------
>
>     Summary: Protocol-http to store HTTP response header with "\r\n"
>     Key: NUTCH-2657
>     URL: https://issues.apache.org/jira/browse/NUTCH-2657
>     Project: Nutch
>     Issue Type: Improvement
>     Components: protocol
>     Affects Versions: 1.15
>     Reporter: Sebastian Nagel
>     Fix For: 1.16
>
>
>     The plugins protocol-http and protocol-okhttp allow to store the HTTP request and/or response
>     headers in the response metadata. However, there is no consensus which line breaks ("\r\n" or
>     "\n") are used between header lines and whether there is a trailing second line break at the end
>     of the headers: while request headers are stored by both plugins with "\r\n" and two trailing
>     "\r\n", the response headers are stored by protocol-http with "\n" and a single trailing line
>     break. This is difficult to handle if the headers are required to be stored uniformly (I've
>     created such a [nasty bug writing WARC files|https://github.com/commoncrawl/nutch/issues/5]).
>     <https://github.com/commoncrawl/nutch/issues/5]%29.>
>
>
>
>     --
>     This message was sent by Atlassian JIRA
>     (v7.6.3#76005)
>
>