[jira] Created: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

classic Classic list List threaded Threaded
26 messages Options
12
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
javabin should use UTF-8, not modified UTF-8
--------------------------------------------

                 Key: SOLR-2034
                 URL: https://issues.apache.org/jira/browse/SOLR-2034
             Project: Solr
          Issue Type: Bug
            Reporter: Robert Muir
         Attachments: SOLR-2034.patch

for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2034:
------------------------------

    Attachment: SOLR-2034.patch

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898391#action_12898391 ]

Hoss Man commented on SOLR-2034:
--------------------------------

+1 to the patch

My only concern is upgrade compatibility -- it would be preferable if people upgrading either Solr or their SolrJ client (but not both at the exact same moment) would still have a functioning system.

As i recall, the BinaryResponseWriter / Parser use a version param and version metadata in the response (just like the XmlResponseWriter) to indicate the codec version requested and the code version returned -- this seems like the kind of thing that should probably warrant a new coden impl with a new version number.

that said: i didn't follow the details of the binary response writer / parser / codec implementation very closely, so i have no idea how hard it will be to make it all work smoothly for people: if it's a pain in the ass then i'm totally fine with saying that SolrJ 3.x can't talk to Solr 1.x (and vice versa) ... but we should still probably update the binary code version info to make it clear there is a difference

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12898405#action_12898405 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

Hoss, thanks, I agree with regards to backwards compat, unfortunately its not immediately obvious to me how to implement the versioning (seamless like you said, would be preferable).

the only thing i see is the version in the response parser, but i will play some and see if i can do it in a versioned way (any more pointers would be very helpful).

ultimately the goal would be to make it easier for non-java clients to implement this protocol. although the wiki says only the java client implements this, i found an issue for the .NET client here: http://code.google.com/p/solrnet/issues/detail?id=71

I took a look at the github source code (http://github.com/mausch/SolrNet/blob/javabin/SolrNet/Impl/JavaBinCodec.cs) and was a little concerned to see writeChars implemented with Encoding.UTF8.GetBytes... I know its likely a work in progress etc, but I think it illustrates the benefits of standard UTF-8.



> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2034:
------------------------------

    Attachment: SOLR-2034.patch

I bumped the BinaryResponseParser version (only version i can find here).

its not really obvious to me if this is actually written over the wire / how to conditionalize modified-UTF-8 based on it, and seems risky.

I think its best to just go to UTF-8 and never look back (but if someone knows how to support modified UTF-8 when version=1, thats great, I just don't have the heart)

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12899730#action_12899730 ]

Ryan McKinley commented on SOLR-2034:
-------------------------------------

I don't think adding many hoops for back compatibility is worth the trouble.  Note that that does not mean people can not use solrj to talk across different versions -- they may have to use xml though.

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900076#action_12900076 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

Thanks Ryan, I'll wait a few days before committing to see if there are any objections.

If there aren't, i'll update wiki / CHANGES loud and clear that the binary format has changed.


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900423#action_12900423 ]

Hoss Man commented on SOLR-2034:
--------------------------------

bq. I don't think adding many hoops for back compatibility is worth the trouble. Note that that does not mean people can not use solrj to talk across different versions - they may have to use xml though....

Agreed, my chief concern is what happens when someone tries to use SolrJ 1.4 to talk to Solr 3.1 w/javabin (or vice versa).

A) If they get an error: great, i'm totaly fine with that -- we just document that they should use XML in this case.

B) If the commands succeed, but the string data is _always_ corrupted, that's not ideal -- but not totally horrible since the probably should be immediately obvious and should have read the documentation and known not to do that.

C) if the commands succeed, but the string data is _sometimes_ corrupted (as i recall, not every character is different in UTF8 vs Java's  modified UTF8, correct?) then that seems really bad ... people may start using javabin to update their index and not notice for quite some time that big hard to identify chunks of their data are corrupted.

as long a someone sanity checks that the situation is either #A or #B before committing, i'm totally cool with it ... but #C scares the bejesus out of me.

(i'll try to run some tests myself in the next few days if no one else gets a chance)


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900432#action_12900432 ]

Yonik Seeley commented on SOLR-2034:
------------------------------------

Seems OK.  I think modified UTF-8 was originally used so that the string chars could be directly written to the output stream instead of to a temp buffer.  But copying to a temp buffer first shouldn't have that much overhead.

JavaBinCodec.VERSION should be bumped.
It is serialized and verified when decoding, and currently an exception is thrown if it does not match the current version.

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900442#action_12900442 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

Hoss man, I hear your concerns but i don't understand how we can address any of this.

This is really one of the problems of modified-UTF8, and really my big concern with using it (that clients will be wrongly written, see my example above). Its not really possible or reasonable to address it at the javabin layer: it needs to be done at a higher protocol level.

of course, if we can figure this out, then maybe it would be easy to provide back compat too, but i didnt see any obvious places in the code where any versioning is written over the wire.


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12900443#action_12900443 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

{quote}
JavaBinCodec.VERSION should be bumped.
It is serialized and verified when decoding, and currently an exception is thrown if it does not match the current version.
{quote}

Ahhh, I totally missed that version byte. So did I bump the wrong version in the patch (BinaryResponseParser's)? I'll fix


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2034:
------------------------------

    Attachment: SOLR-2034.patch

OK, i bumped the byte version as Yonik suggested, and tried to use an old client.

Here's the exception:

{noformat}
java.lang.RuntimeException: Invalid version or the data in not in 'javabin' format
        at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:99)
        at org.apache.solr.client.solrj.impl.BinaryResponseParser.processResponse(BinaryResponseParser.java:39)
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:477)
        at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:89)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:118)
{noformat}


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901204#action_12901204 ]

Hoss Man commented on SOLR-2034:
--------------------------------

beautiful.

+1 commit

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12901225#action_12901225 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

thanks hoss, ill commit tomorrow if there are no objections.

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir updated SOLR-2034:
------------------------------

    Attachment: SOLR-2034.patch

well, i object to my own patch, because i think it would suck to have solrj depend on the lucene jar.

here's a modified version with its own utf-8 conversion and no bytesref/unicodeutil


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12902744#action_12902744 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

if no one objects to the latest patch, i'd like to commit in a day or two.


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

     [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Robert Muir resolved SOLR-2034.
-------------------------------

      Assignee: Robert Muir
    Resolution: Fixed

Committed revision 990180 (trunk), 990183 (3x), and updated the 'javabin' page on the wiki.

I tried to make the change easy to understand in CHANGES/wiki, if you have improvements to
the wording please do not hesitate.


> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903940#action_12903940 ]

Noble Paul commented on SOLR-2034:
----------------------------------

Is interoperability the issue? The question is , is there any other client using javabin format? Or is it just to be standards compliant?

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903964#action_12903964 ]

Robert Muir commented on SOLR-2034:
-----------------------------------

Noble, please see my comment: https://issues.apache.org/jira/browse/SOLR-2034?focusedCommentId=12898405&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12898405

thats an example of another client trying to implement javabin, and wrongly implementing the modified-utf8 conversion... but there doesn't need to be any justification to not use modified utf-8 over the wire, its just wrong.

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-2034) javabin should use UTF-8, not modified UTF-8

Sebastian Nagel (Jira)
In reply to this post by Sebastian Nagel (Jira)

    [ https://issues.apache.org/jira/browse/SOLR-2034?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12903974#action_12903974 ]

Noble Paul commented on SOLR-2034:
----------------------------------

ok. I was wondering if we are planning to implement javabin in any other languages

> javabin should use UTF-8, not modified UTF-8
> --------------------------------------------
>
>                 Key: SOLR-2034
>                 URL: https://issues.apache.org/jira/browse/SOLR-2034
>             Project: Solr
>          Issue Type: Bug
>            Reporter: Robert Muir
>            Assignee: Robert Muir
>         Attachments: SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch, SOLR-2034.patch
>
>
> for better interoperability, javabin should use standard UTF-8 instead of modified UTF-8 (http://www.unicode.org/reports/tr26/)

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

12