solr utf 16 ?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

solr utf 16 ?

brian beard-2
Are there any plans to make solr UTF-16 compliant in the future?
If so, is it in the short-term or long-term?

_________________________________________________________________
MSN is giving away a trip to Vegas to see Elton John.  Enter to win today.
http://msnconcertcontest.com?icid-nceltontagline

Reply | Threaded
Open this post in threaded view
|

Re: solr utf 16 ?

kkrugler
>Are there any plans to make solr UTF-16 compliant in the future?
>If so, is it in the short-term or long-term?

I'm curious what you mean by "UTF-16 complaint". Do you mean being
able to handle UTF-16 encoded XML?

Thanks,

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
Reply | Threaded
Open this post in threaded view
|

Re: solr utf 16 ?

brian beard-2
Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
to be added to the index, that solr would not be able to handle this?

>I'm curious what you mean by "UTF-16 complaint". Do you mean being able to
>handle UTF-16 encoded XML?
>

_________________________________________________________________
Don’t quit your job – Take Classes Online and Earn your Degree in 1 year.
Start Today!
http://www.classesusa.com/clickcount.cfm?id=866146&goto=http%3A%2F%2Fwww.classesusa.com%2Ffeaturedschools%2Fonlinedegreesmp%2Fform-dyn1.html%3Fsplovr%3D866144

Reply | Threaded
Open this post in threaded view
|

Re: solr utf 16 ?

kkrugler
>>I'm curious what you mean by "UTF-16 complaint". Do you mean being
>>able to handle UTF-16 encoded XML?
>>
>Yes. I'm assuming if you have UTF-16 encoded data in a document that
>needs to be added to the index, that solr would not be able to
>handle this?
>
I've never tried sending anything but UTF-8 to Solr, so I can't
comment on what issues you'll run into.

But based on my experience to date, I'd strongly suggest converting
it to UTF-8 before you post it to Solr.

-- Ken
--
Ken Krugler
Krugle, Inc.
+1 530-210-6378
"Find Code, Find Answers"
Reply | Threaded
Open this post in threaded view
|

Re: solr utf 16 ?

Mike Klaas
In reply to this post by brian beard-2
On 4/23/07, brian beard <[hidden email]> wrote:
> Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
> to be added to the index, that solr would not be able to handle this?

I believe that handling arbitrary encodings is on the list of future
enhancements, but I couldn't give you a timeline.

For the time being, consider that
 1. utf-8 is the "lingua franca" of xml document encoding
 2. it is very easy to convert it yourself (it would be a 3-4 line
python commandline filter, frinstance).

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: solr utf 16 ?

brian beard-2
Thanks for all the comments. The conversion seems like a good alternative.

>From: "Mike Klaas" <[hidden email]>
>Reply-To: [hidden email]
>To: [hidden email]
>Subject: Re: solr utf 16 ?
>Date: Mon, 23 Apr 2007 11:13:54 -0700
>
>On 4/23/07, brian beard <[hidden email]> wrote:
>>Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
>>to be added to the index, that solr would not be able to handle this?
>
>I believe that handling arbitrary encodings is on the list of future
>enhancements, but I couldn't give you a timeline.
>
>For the time being, consider that
>1. utf-8 is the "lingua franca" of xml document encoding
>2. it is very easy to convert it yourself (it would be a 3-4 line
>python commandline filter, frinstance).
>
>-Mike

_________________________________________________________________
Need a break? Find your escape route with Live Search Maps.
http://maps.live.com/?icid=hmtag3

Reply | Threaded
Open this post in threaded view
|

Re: solr utf 16 ?

Walter Underwood, Netflix
In reply to this post by Mike Klaas
UTF-16 support should not require any changes to the XML parsing.
All XML parsers are required to support that encoding. The real
change is implementing RFC 3023 (XML Media Types) so that the
encoding can be specified over HTTP.

wunder

On 4/23/07 11:13 AM, "Mike Klaas" <[hidden email]> wrote:

> On 4/23/07, brian beard <[hidden email]> wrote:
>> Yes. I'm assuming if you have UTF-16 encoded data in a document that needs
>> to be added to the index, that solr would not be able to handle this?
>
> I believe that handling arbitrary encodings is on the list of future
> enhancements, but I couldn't give you a timeline.
>
> For the time being, consider that
>  1. utf-8 is the "lingua franca" of xml document encoding
>  2. it is very easy to convert it yourself (it would be a 3-4 line
> python commandline filter, frinstance).
>
> -Mike