[jira] Created: (SOLR-1959) SolrJ GET operation does not send correct encoding

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org
SolrJ GET operation does not send correct encoding
--------------------------------------------------

                 Key: SOLR-1959
                 URL: https://issues.apache.org/jira/browse/SOLR-1959
             Project: Solr
          Issue Type: Bug
          Components: clients - java
    Affects Versions: 1.4.1, Next
            Reporter: Lance Norskog


The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.

The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.

The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
{code}
QueryResponse qr = CommonsHttpSolrServer.query(query);
{code}
to:
{code}
QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
{code}
One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879912#action_12879912 ]

Lance Norskog commented on SOLR-1959:
-------------------------------------

Using POST as a workaround means that query strings will not show in an Apache server log.

> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Issue Comment Edited: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879912#action_12879912 ]

Lance Norskog edited comment on SOLR-1959 at 6/17/10 7:59 PM:
--------------------------------------------------------------

Using POST as a workaround means that query strings will not show in a servlet engine log.

      was (Author: lancenorskog):
    Using POST as a workaround means that query strings will not show in an Apache server log.
 

> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lance Norskog updated SOLR-1959:
--------------------------------

    Attachment: SOLR-1959.patch

This patch applies against tags/release-1.4.0 and trunk, so this bit of code seems untouched over the ages.

> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>         Attachments: SOLR-1959.patch
>
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881073#action_12881073 ]

Lance Norskog commented on SOLR-1959:
-------------------------------------

Demonstrating this bug is rather difficult with encoding-challenged text editors.
This test uses the Greek letter sigma, Unicode character 03/A3, defined here:
[http://en.wikipedia.org/wiki/Greek_alphabet#Greek_and_Coptic]

With the solr/example/exampledocs/post.sh application, index this file:
{code:title=sigma.xml|borderStyle=solid}
<add>
<doc>
  <field name="id">SP2514N</field>
  <field name="name">A greek letter: &#x03A3; should be a sigma</field>
</doc>
</add>
{code}
Do a search with this command:
{code}
curl "http://localhost:8983/solr/select?q=%ce%a3&indent=on"
{code}
(Yes, it's C3 and not 03.)
Without the patch, search with this text string via solrj:
{code:title=search code snippet|borderStyle=solid}
String queryString = URLDecoder.decode("%ce%a3", "UTF-8");
CommonsHttpSolrServer server =
  new CommonsHttpSolrServer("http://localhost:8983/solr");
SolrQuery query = new SolrQuery();
query.setQuery(q);
QueryResponse qr = server.query(query, SolrRequest.METHOD.GET);
{code}
This search will fail, because the HTTP server decodes the %xx characters via ISO-8859-1.
Now, change GET to POST. The code will work, because POST explicitly sets UTF-8.
This patch does the same default for queries.

As I said, seeing the right characters in all of the moving parts is tricky. Tracking all of this is easier with a tcp/ip monitor; I used apache's tcpmon.





> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>         Attachments: SOLR-1959.patch
>
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881190#action_12881190 ]

Yonik Seeley commented on SOLR-1959:
------------------------------------

bq. The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set.

Content-type does not apply to a GET.  The URL in a GET is strictly defined to be percent encoded UTF-8 bytes.  For historic reasons, Tomcat defaults to latin-1, and it needs to be changed in the server config.

http://www.ietf.org/rfc/rfc3986.txt


> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>         Attachments: SOLR-1959.patch
>
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Closed: (SOLR-1959) SolrJ GET operation does not send correct encoding

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-1959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Lance Norskog closed SOLR-1959.
-------------------------------

    Resolution: Fixed

Not a bug.

That'll teach me to believe my lying eyes.

> SolrJ GET operation does not send correct encoding
> --------------------------------------------------
>
>                 Key: SOLR-1959
>                 URL: https://issues.apache.org/jira/browse/SOLR-1959
>             Project: Solr
>          Issue Type: Bug
>          Components: clients - java
>    Affects Versions: 1.4.1, Next
>            Reporter: Lance Norskog
>         Attachments: SOLR-1959.patch
>
>
> The SolrJ query operation fails to set the character encoding when doing a GET. It works when doing a POST.
> The problem is that URLs are urlencoded with UTF-8 but the Content-type: header is not set. I tested it with "Content-Type:text/plain;charset=utf-8" and that worked. The Content-type header encoding defaults to ISO 8859-1.
> The result is that SolrJ queries fail for any search with a character above 127. The work around is to use a POST query instead of a GET. I have not searched for other places. So, change:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query);
> {code}
> to:
> {code}
> QueryResponse qr = CommonsHttpSolrServer.query(query, SolrRequest.METHOD.POST);
> {code}
> One quirk of this behavior is that url-bashing a query string with an ISO 8859-1 character (like an umlaut) works in a browser, but fails in a SolrJ request.. It also searches correctly from the admin/index.jsp and admin/form.jsp pages, because they set the content-type in the FORM declaration.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]