double curl calls in post.sh?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

double curl calls in post.sh?

Chris Hostetter-3

am i smoking crack of is post.sh mistakenly sending every doc twice in a
row? ...

for f in $FILES; do
  echo Posting file $f to $URL
  curl $URL --data-binary @$f
  curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
  echo
done


...is there any reason not to delete that first execution of curl?



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: double curl calls in post.sh?

Bill Au
Looks like that was added for the UTF-8 example.  But setting the
content-type/charset should work for all the other examples too,
right?  So I don't see any reason for not deleting the first curl.

Bill

On 9/17/06, Chris Hostetter <[hidden email]> wrote:

>
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
>
> for f in $FILES; do
>   echo Posting file $f to $URL
>   curl $URL --data-binary @$f
>   curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
>   echo
> done
>
>
> ...is there any reason not to delete that first execution of curl?
>
>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: double curl calls in post.sh?

Yonik Seeley-2
In reply to this post by Chris Hostetter-3
On 9/17/06, Chris Hostetter <[hidden email]> wrote:
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...

Heh... must have been a cut-n-paste bug.  I just removed it.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: double curl calls in post.sh?

Walter Underwood, Netflix
In reply to this post by Chris Hostetter-3
Also, do not use text/xml. Even with a charset parameter. In a correct
implementation, that will override the XML declaration of charset.
With text/xml, the charset parameter must be correct. When it is
omitted, the content MUST be interpreted as US-ASCII (yuk).

Instead, use a media type of application/xml, so that the server
is allowed to sniff the content to discover the character encoding.

For the gory details, see RFC 3023:

  http://www.ietf.org/rfc/rfc3023.txt

wunder
==
Walter Underwood
Search Guru, Netflix

On 9/17/06 1:00 PM, "Chris Hostetter" <[hidden email]> wrote:

>
> am i smoking crack of is post.sh mistakenly sending every doc twice in a
> row? ...
>
> for f in $FILES; do
>   echo Posting file $f to $URL
>   curl $URL --data-binary @$f
>   curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
>   echo
> done
>
>
> ...is there any reason not to delete that first execution of curl?
>
>
>
> -Hoss
>

Reply | Threaded
Open this post in threaded view
|

Re: double curl calls in post.sh?

Yonik Seeley-2
On 9/18/06, Walter Underwood <[hidden email]> wrote:
> Instead, use a media type of application/xml, so that the server
> is allowed to sniff the content to discover the character encoding.

Cool!  Do you know what servlet containers currently implement this "sniffing"?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: double curl calls in post.sh?

Walter Underwood, Netflix
On 9/18/06 10:10 AM, "Yonik Seeley" <[hidden email]> wrote:
> On 9/18/06, Walter Underwood <[hidden email]> wrote:
>> Instead, use a media type of application/xml, so that the server
>> is allowed to sniff the content to discover the character encoding.
>
> Cool!  Do you know what servlet containers currently implement this
> "sniffing"?

XML parsers already do this correctly. They look at the XML declaration
for the encoding, and if that isn't there, they look for a BOM or
UTF-8 content, as described in the (non-normative) appendix to the
XML spec.

  http://www.w3.org/TR/REC-xml/#sec-guessing

The servlet container needs to hand the raw bytes to the parser,
which should be normal behavior for application/*.

wunder
--
Walter Underwood
Search Guru, Netflix

Reply | Threaded
Open this post in threaded view
|

Re: double curl calls in post.sh?

Yonik Seeley-2
On 9/18/06, Walter Underwood <[hidden email]> wrote:
> XML parsers already do this correctly.

Ah, I thought that maybe the servlet container itself could do that
when one requests a Reader.  Using a byte-oriented InputStream and
passing that to the parser would work, but would require some little
changes to Solr.

-Yonik