Trouble handling Unit symbol

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Trouble handling Unit symbol

Rajinimaski
Hi,

We have data having such symbols like :  µ


Indexed data has  -    Dose:"0 µL"
Language type - "English"


Now , when  it is searched as  - Dose:"0 µL"
Number of document matched = 0


Query Q value observed  : <str name="q">S257:"0 µL/injection"</str>




*Any solution to handle such cases? *

Thanks & Regards,
Rajani
*
*
*
*
Reply | Threaded
Open this post in threaded view
|

Re: Trouble handling Unit symbol

Paul Libbrecht-4
Rajani,

you need to look at the analysis tools of solr-admin, or even luke, to help you.

paul


Le 30 mars 2012 à 10:01, Rajani Maski a écrit :

> Hi,
>
> We have data having such symbols like :  µ
>
>
> Indexed data has  -    Dose:"0 µL"
> Language type - "English"
>
>
> Now , when  it is searched as  - Dose:"0 µL"
> Number of document matched = 0
>
>
> Query Q value observed  : <str name="q">S257:"0 µL/injection"</str>
>
>
>
>
> *Any solution to handle such cases? *
>
> Thanks & Regards,
> Rajani
> *
> *
> *
> *

Reply | Threaded
Open this post in threaded view
|

Re: Trouble handling Unit symbol

Chris Hostetter-3
In reply to this post by Rajinimaski

: We have data having such symbols like :  µ
: Indexed data has  -    Dose:"0 µL"
: Now , when  it is searched as  - Dose:"0 µL"
        ...
: Query Q value observed  : <str name="q">S257:"0 µL/injection"</str>

First off: your "when searched as" example does not match up to your
"Query Q" observed value (ie: field queries, extra "/injection" text at
the end) suggesting that you maybe cut/paste something you didn't mean to
-- so take the rest of this advice with a grain of salt.

If i ignore your "when it is searched as" exampleand focus entirely on
what you say you've indexed the data as, and the Q value you are sing (in
what looks like the echoParams output) then the first thing that jumps out
at me is that it looks like your servlet container (or perhaps your web
browser if that's where you tested this) is not dealing with the unicode
correctly -- because allthough i see a "µ" in the first three lines i
quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
preceeded by a "Â" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "µ"
did not get URL encoded properly when the request was made to your servlet
container?

In particular, you might want to take a look at...

https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
The example/exampledocs/test_utf8.sh script included with solr




-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Trouble handling Unit symbol

Rajinimaski
Thank you for the reply.



On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter
<[hidden email]>wrote:

>
> : We have data having such symbols like :  ต
> : Indexed data has  -    Dose:"0 ตL"
> : Now , when  it is searched as  - Dose:"0 ตL"
>        ...
> : Query Q value observed  : <str name="q">S257:"0 ยตL/injection"</str>
>
> First off: your "when searched as" example does not match up to your
> "Query Q" observed value (ie: field queries, extra "/injection" text at
> the end) suggesting that you maybe cut/paste something you didn't mean to
> -- so take the rest of this advice with a grain of salt.
>
> If i ignore your "when it is searched as" exampleand focus entirely on
> what you say you've indexed the data as, and the Q value you are sing (in
> what looks like the echoParams output) then the first thing that jumps out
> at me is that it looks like your servlet container (or perhaps your web
> browser if that's where you tested this) is not dealing with the unicode
> correctly -- because allthough i see a "ต" in the first three lines i
> quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
> preceeded by a "ย" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "ต"
> did not get URL encoded properly when the request was made to your servlet
> container?
>
> In particular, you might want to take a look at...
>
>
> https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> The example/exampledocs/test_utf8.sh script included with solr
>
>
>
>
> -Hoss
Reply | Threaded
Open this post in threaded view
|

Re: Trouble handling Unit symbol

Rajinimaski
Hi All,

   I tried to index with UTF-8  encode but the issue is still not fixed.
Please see my inputs below.

*Indexed XML:*
<?xml version="1.0" encoding="UTF-8" ?>
<add>
  <doc>
    <field name="ID">0.1000000</field>
    <field name="BODY">µ</field>
  </doc>
</add>

*Search Query - * BODY:µ

numfound : 0 results obtained.

*What can be the reason for this? How do i need to make search query so
that the above document is found.*


Thanks & Regards

Regards
Rajani



2012/4/2 Rajani Maski <[hidden email]>

> Thank you for the reply.
>
>
>
> On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter <[hidden email]
> > wrote:
>
>>
>> : We have data having such symbols like :  ต
>> : Indexed data has  -    Dose:"0 ตL"
>> : Now , when  it is searched as  - Dose:"0 ตL"
>>        ...
>> : Query Q value observed  : <str name="q">S257:"0 ยตL/injection"</str>
>>
>> First off: your "when searched as" example does not match up to your
>> "Query Q" observed value (ie: field queries, extra "/injection" text at
>> the end) suggesting that you maybe cut/paste something you didn't mean to
>> -- so take the rest of this advice with a grain of salt.
>>
>> If i ignore your "when it is searched as" exampleand focus entirely on
>> what you say you've indexed the data as, and the Q value you are sing (in
>> what looks like the echoParams output) then the first thing that jumps out
>> at me is that it looks like your servlet container (or perhaps your web
>> browser if that's where you tested this) is not dealing with the unicode
>> correctly -- because allthough i see a "ต" in the first three lines i
>> quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
>> preceeded by a "ย" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "ต"
>> did not get URL encoded properly when the request was made to your servlet
>> container?
>>
>> In particular, you might want to take a look at...
>>
>>
>> https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
>> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
>> The example/exampledocs/test_utf8.sh script included with solr
>>
>>
>>
>>
>> -Hoss
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Trouble handling Unit symbol

Erick Erickson
Please review:
http://wiki.apache.org/solr/UsingMailingLists

Especially the bit about adding &debugQuery=on
and showing the results. You're asking people
to guess at solutions without providing much
in the way of context.

You might try looking at your index with Luke to
see what's actually in your index, or perhaps
TermsComponent


Best
Erick

On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski <[hidden email]> wrote:

> Hi All,
>
>   I tried to index with UTF-8  encode but the issue is still not fixed.
> Please see my inputs below.
>
> *Indexed XML:*
> <?xml version="1.0" encoding="UTF-8" ?>
> <add>
>  <doc>
>    <field name="ID">0.1000000</field>
>    <field name="BODY">µ</field>
>  </doc>
> </add>
>
> *Search Query - * BODY:µ
>
> numfound : 0 results obtained.
>
> *What can be the reason for this? How do i need to make search query so
> that the above document is found.*
>
>
> Thanks & Regards
>
> Regards
> Rajani
>
>
>
> 2012/4/2 Rajani Maski <[hidden email]>
>
>> Thank you for the reply.
>>
>>
>>
>> On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter <[hidden email]
>> > wrote:
>>
>>>
>>> : We have data having such symbols like :  ต
>>> : Indexed data has  -    Dose:"0 ตL"
>>> : Now , when  it is searched as  - Dose:"0 ตL"
>>>        ...
>>> : Query Q value observed  : <str name="q">S257:"0 ยตL/injection"</str>
>>>
>>> First off: your "when searched as" example does not match up to your
>>> "Query Q" observed value (ie: field queries, extra "/injection" text at
>>> the end) suggesting that you maybe cut/paste something you didn't mean to
>>> -- so take the rest of this advice with a grain of salt.
>>>
>>> If i ignore your "when it is searched as" exampleand focus entirely on
>>> what you say you've indexed the data as, and the Q value you are sing (in
>>> what looks like the echoParams output) then the first thing that jumps out
>>> at me is that it looks like your servlet container (or perhaps your web
>>> browser if that's where you tested this) is not dealing with the unicode
>>> correctly -- because allthough i see a "ต" in the first three lines i
>>> quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
>>> preceeded by a "ย" (UTF8: 0xC3 0x82) ... suggesting that perhaps the "ต"
>>> did not get URL encoded properly when the request was made to your servlet
>>> container?
>>>
>>> In particular, you might want to take a look at...
>>>
>>>
>>> https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
>>> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
>>> The example/exampledocs/test_utf8.sh script included with solr
>>>
>>>
>>>
>>>
>>> -Hoss
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Trouble handling Unit symbol

Rajinimaski
Fine. Thank you. I will look at it.


On Fri, Apr 13, 2012 at 5:21 PM, Erick Erickson <[hidden email]>wrote:

> Please review:
> http://wiki.apache.org/solr/UsingMailingLists
>
> Especially the bit about adding &debugQuery=on
> and showing the results. You're asking people
> to guess at solutions without providing much
> in the way of context.
>
> You might try looking at your index with Luke to
> see what's actually in your index, or perhaps
> TermsComponent
>
>
> Best
> Erick
>
> On Fri, Apr 13, 2012 at 2:29 AM, Rajani Maski <[hidden email]>
> wrote:
> > Hi All,
> >
> >   I tried to index with UTF-8  encode but the issue is still not fixed.
> > Please see my inputs below.
> >
> > *Indexed XML:*
> > <?xml version="1.0" encoding="UTF-8" ?>
> > <add>
> >  <doc>
> >    <field name="ID">0.1000000</field>
> >    <field name="BODY">µ</field>
> >  </doc>
> > </add>
> >
> > *Search Query - * BODY:µ
> >
> > numfound : 0 results obtained.
> >
> > *What can be the reason for this? How do i need to make search query so
> > that the above document is found.*
> >
> >
> > Thanks & Regards
> >
> > Regards
> > Rajani
> >
> >
> >
> > 2012/4/2 Rajani Maski <[hidden email]>
> >
> >> Thank you for the reply.
> >>
> >>
> >>
> >> On Sat, Mar 31, 2012 at 3:38 AM, Chris Hostetter <
> [hidden email]
> >> > wrote:
> >>
> >>>
> >>> : We have data having such symbols like :  ต
> >>> : Indexed data has  -    Dose:"0 ตL"
> >>> : Now , when  it is searched as  - Dose:"0 ตL"
> >>>        ...
> >>> : Query Q value observed  : <str name="q">S257:"0 ยตL/injection"</str>
> >>>
> >>> First off: your "when searched as" example does not match up to your
> >>> "Query Q" observed value (ie: field queries, extra "/injection" text at
> >>> the end) suggesting that you maybe cut/paste something you didn't mean
> to
> >>> -- so take the rest of this advice with a grain of salt.
> >>>
> >>> If i ignore your "when it is searched as" exampleand focus entirely on
> >>> what you say you've indexed the data as, and the Q value you are sing
> (in
> >>> what looks like the echoParams output) then the first thing that jumps
> out
> >>> at me is that it looks like your servlet container (or perhaps your web
> >>> browser if that's where you tested this) is not dealing with the
> unicode
> >>> correctly -- because allthough i see a "ต" in the first three lines i
> >>> quoted above (UTF8: 0xC2 0xB5) in your value observed i'm seeing it
> >>> preceeded by a "ย" (UTF8: 0xC3 0x82) ... suggesting that perhaps the
> "ต"
> >>> did not get URL encoded properly when the request was made to your
> servlet
> >>> container?
> >>>
> >>> In particular, you might want to take a look at...
> >>>
> >>>
> >>>
> https://wiki.apache.org/solr/FAQ#Why_don.27t_International_Characters_Work.3F
> >>> http://wiki.apache.org/solr/SolrTomcat#URI_Charset_Config
> >>> The example/exampledocs/test_utf8.sh script included with solr
> >>>
> >>>
> >>>
> >>>
> >>> -Hoss
> >>
> >>
> >>
>