TextField case sensitivity

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

TextField case sensitivity

Xuesong Luo
I run a problem when searching on a TextField. When I pass q=William or
q=WILLiam, solr is able to find records whose default search field value
is William, however if I pass q=WilliAm, solr did not return any thing.
I searched on the archive, Yonik mentioned the lowercasefilterfactory
doesn't work for wildcard because the QueryParser does not invoke
analysis for partial word, that makes sense. But in my case, it's a
whole word. Anyone knows why it's not working? Below is my schema info.

Thanks
Xuesong

<fieldtype name="text" class="solr.TextField"
positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>        
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
      </analyzer>
</fieldtype>

Reply | Threaded
Open this post in threaded view
|

Re: TextField case sensitivity

Yonik Seeley-2
On 6/7/07, Xuesong Luo <[hidden email]> wrote:
> I run a problem when searching on a TextField. When I pass q=William or
> q=WILLiam, solr is able to find records whose default search field value
> is William, however if I pass q=WilliAm, solr did not return any thing.

Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for "text", did you restart Solr and
re-index your collection?

-Yonik


> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema info.
>
> Thanks
> Xuesong
>
> <fieldtype name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
> </fieldtype>
Reply | Threaded
Open this post in threaded view
|

Re: TextField case sensitivity

Ryan McKinley
In reply to this post by Xuesong Luo
have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your
description, I'm suspicious that the generated tokens are actually:
  willi am

Also, if you want the same analyzer for indexing and query, just define one:

<analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>



Xuesong Luo wrote:

> I run a problem when searching on a TextField. When I pass q=William or
> q=WILLiam, solr is able to find records whose default search field value
> is William, however if I pass q=WilliAm, solr did not return any thing.
> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema info.
>
> Thanks
> Xuesong
>
> <fieldtype name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>        
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
> </fieldtype>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: TextField case sensitivity

Xuesong Luo
In reply to this post by Xuesong Luo
I have WordDelimiterFilter defined in the schema, I didn't include it in
my original email because I thought it doesn't matter. It seems it
matters. Looks like WilliAm is treated as two words. That's why it
didn't find a match.

Thanks
Xuesong

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik
Seeley
Sent: Thursday, June 07, 2007 11:25 AM
To: [hidden email]
Subject: Re: TextField case sensitivity

On 6/7/07, Xuesong Luo <[hidden email]> wrote:
> I run a problem when searching on a TextField. When I pass q=William
or
> q=WILLiam, solr is able to find records whose default search field
value
> is William, however if I pass q=WilliAm, solr did not return any
thing.

Sounds like WordDelimiterFilter is still being used for your fieldType.
After you changed the fieldType for "text", did you restart Solr and
re-index your collection?

-Yonik


> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema
info.

>
> Thanks
> Xuesong
>
> <fieldtype name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
> </fieldtype>


Reply | Threaded
Open this post in threaded view
|

RE: TextField case sensitivity

Xuesong Luo
In reply to this post by Xuesong Luo
Ryan, you are right, that's the problem. WilliAM is treated as two words
by the WordDelimiterFilterFactory.

Thanks
Xuesong

-----Original Message-----
From: Ryan McKinley [mailto:[hidden email]]
Sent: Thursday, June 07, 2007 11:30 AM
To: [hidden email]
Subject: Re: TextField case sensitivity

have you taken a look the output from the admin/analysis?
http://localhost:8983/solr/admin/analysis.jsp?highlight=on

This lets you see what tokens are generated for index/query.  From your
description, I'm suspicious that the generated tokens are actually:
  willi am

Also, if you want the same analyzer for indexing and query, just define
one:

<analyzer>
     <tokenizer class="solr.WhitespaceTokenizerFactory"/>
      <filter class="solr.LowerCaseFilterFactory"/>
</analyzer>



Xuesong Luo wrote:
> I run a problem when searching on a TextField. When I pass q=William
or
> q=WILLiam, solr is able to find records whose default search field
value
> is William, however if I pass q=WilliAm, solr did not return any
thing.
> I searched on the archive, Yonik mentioned the lowercasefilterfactory
> doesn't work for wildcard because the QueryParser does not invoke
> analysis for partial word, that makes sense. But in my case, it's a
> whole word. Anyone knows why it's not working? Below is my schema
info.

>
> Thanks
> Xuesong
>
> <fieldtype name="text" class="solr.TextField"
> positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>        
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
>         <filter class="solr.LowerCaseFilterFactory"/>
>       </analyzer>
> </fieldtype>
>
>


Reply | Threaded
Open this post in threaded view
|

Re: TextField case sensitivity

Mike Klaas

On 7-Jun-07, at 1:04 PM, Xuesong Luo wrote:

> Ryan, you are right, that's the problem. WilliAM is treated as two  
> words
> by the WordDelimiterFilterFactory.

I have found this behaviour a little too aggresive for my needs, so i  
added an option to disable it.  Patch is here:
http://issues.apache.org/jira/browse/SOLR-257

I'll probably commit it in a day or so, at which point it will be  
part of the Solr nightly build.

-Mike