Querying international characters

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Querying international characters

Scott Leonard-2
I have a mirror of the entire dmoz content in a solr index. International
characters seem to be loaded and returned in queries just fine but queries
that _contain_ international character queries return no results for known
matching patterns.

Is there a filter class I need to be using for international character
support?  Any other gotchas in supporting these characters in solr?

.scott

Reply | Threaded
Open this post in threaded view
|

SV: Querying international characters

AE-4
Hi :

If you haven't done so.. I think you need to enable UTF-8 support in your tomcat/jetty etc.. for quries from web browsers.. have a look

http://wiki.apache.org/tomcat/Tomcat/UTF-8

Regards

Scott Leonard <[hidden email]> skrev: I have a mirror of the entire dmoz content in a solr index. International
characters seem to be loaded and returned in queries just fine but queries
that _contain_ international character queries return no results for known
matching patterns.

Is there a filter class I need to be using for international character
support?  Any other gotchas in supporting these characters in solr?

.scott



 
---------------------------------

Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och mycket mer! Få den på http://se.mail.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: SV: Querying international characters

Scott Leonard-2
i'm actually using resin here.


On 1/29/07 3:33 PM, "Antonio Eggberg" <[hidden email]> wrote:

> Hi :
>
> If you haven't done so.. I think you need to enable UTF-8 support in your
> tomcat/jetty etc.. for quries from web browsers.. have a look
>
> http://wiki.apache.org/tomcat/Tomcat/UTF-8
>
> Regards
>
> Scott Leonard <[hidden email]> skrev: I have a mirror of the entire
> dmoz content in a solr index. International
> characters seem to be loaded and returned in queries just fine but queries
> that _contain_ international character queries return no results for known
> matching patterns.
>
> Is there a filter class I need to be using for international character
> support?  Any other gotchas in supporting these characters in solr?
>
> .scott
>
>
>
>
> ---------------------------------
>
> Stava rätt! Stava lätt! Yahoo! Mails stavkontroll tar hand om tryckfelen och
> mycket mer! Få den på http://se.mail.yahoo.com

Reply | Threaded
Open this post in threaded view
|

Re: Querying international characters

Chris Hostetter-3
In reply to this post by Scott Leonard-2

: I have a mirror of the entire dmoz content in a solr index. International
: characters seem to be loaded and returned in queries just fine but queries
: that _contain_ international character queries return no results for known
: matching patterns.
:
: Is there a filter class I need to be using for international character
: support?  Any other gotchas in supporting these characters in solr?

there are a couple of things that might be going on...

1) at the moment, solr only really plays nicely with UTF-8 ... so if you
are dealing with another charset, that may be the orrigin of the issue...

2) the HTTP requests you are sending may not be encoding the characters
properly in the request ... what does your query URL look like?

Using the example schema, and searching for "LATIN SMALL LETTER E WITH
ACUTE" my URL looks like this...

http://localhost:8983/solr/select/?q=%C3%A9&version=2.2&start=0&rows=10&indent=on

and correctly finds the doc with id UTF8TEST

3) you may be using an Analyzer/TokenFilter that is striping/replacing
your characters during analysis, try using /solr/admin/analysis.jsp to see
what is getting indexed in each field when you put in your international
characters and what tokens your query time analyzer produces for your
input.


-Hoss