Issues with alphanumeric search terms

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Issues with alphanumeric search terms

convoyer
Hi

My solr deployment is giving correct results for normal search terms like "john".
But when i search with "john55" or "55" it will return all the search terms, including those which neither contains john nor 55.
Below is the fieldtype defined for this field.

<fieldType name="mytype" class="solr.TextField">
    <analyzer type="index">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
    <analyzer type="query">
        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
    </analyzer>
</fieldType>

Is there any other tokenizers or filters need to be set for alphanumeric/Number search?

Reply | Threaded
Open this post in threaded view
|

Re: Issues with alphanumeric search terms

Erick Erickson
Hmmm, what does debugQuery=on show?

And did you mean documents here?
<< it will return all the search terms>>

Best
Erick

On Thu, Dec 3, 2009 at 11:40 AM, con <[hidden email]> wrote:

>
> Hi
>
> My solr deployment is giving correct results for normal search terms like
> "john".
> But when i search with "john55" or "55" it will return all the search
> terms,
> including those which neither contains john nor 55.
> Below is the fieldtype defined for this field.
>
> <fieldType name="mytype" class="solr.TextField">
>    <analyzer type="index">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
> </fieldType>
>
> Is there any other tokenizers or filters need to be set for
> alphanumeric/Number search?
>
>
> --
> View this message in context:
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues with alphanumeric search terms

convoyer
Yes. I meant all the indexed documents.

With debugQuery=on, i got the following result:

<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>

<lst name="params">
<str name="debugQuery">on</str>
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">(phone:650 AND rowtype:contacts)</str>
<str name="wt">xml</str>
<str name="rows">1</str>
<str name="version">2.2</str>
</lst>
</lst>

<result name="response" numFound="104" start="0">

<doc>
<str name="ADDRESS">  </str> 
<str name="CITY">  </str> 
<str name="COUNTRY">  </str>
<date name="CREATEDTIME">2009-09-22T06:50:36.943Z</date> 
<str name="NAME">Adam</str> 
<str name="email">adam@abc.com</str>
<str name="firstname">Adam</str>
<str name="lastname">smith</str>
<str name="locale">en_US</str> 
<str name="phone">  </str>
<str name="rowtype">contacts</str> 
</doc>
</result>

<lst name="debug">
<str name="rawquerystring">(phone:650 AND rowtype:contacts)</str>
<str name="querystring">(phone:650 AND rowtype:contacts)</str>
<str name="parsedquery">+rowtype:contacts</str>
<str name="parsedquery_toString">+rowtype:contacts</str>

<lst name="explain">

<str name="1030422en_US">

0.99043053 = (MATCH) fieldWeight(rowtype:contacts in 0), product of:
  1.0 = tf(termFreq(rowtype:contacts)=1)
  0.99043053 = idf(docFreq=104, maxDocs=104)
  1.0 = fieldNorm(field=rowtype, doc=0)
</str>
</lst>
<str name="QParser">LuceneQParser</str>

<lst name="timing">
<double name="time">1.0</double>

<lst name="prepare">
<double name="time">0.0</double>

<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>

<lst name="process">
<double name="time">1.0</double>

<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">1.0</double>
</lst>
</lst>
</lst>
</lst>
</response>



************************************************************

Erick Erickson wrote
Hmmm, what does debugQuery=on show?

And did you mean documents here?
<< it will return all the search terms>>

Best
Erick

On Thu, Dec 3, 2009 at 11:40 AM, con <convoyer@gmail.com> wrote:

>
> Hi
>
> My solr deployment is giving correct results for normal search terms like
> "john".
> But when i search with "john55" or "55" it will return all the search
> terms,
> including those which neither contains john nor 55.
> Below is the fieldtype defined for this field.
>
> <fieldType name="mytype" class="solr.TextField">
>    <analyzer type="index">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
> </fieldType>
>
> Is there any other tokenizers or filters need to be set for
> alphanumeric/Number search?
>
>
> --
> View this message in context:
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues with alphanumeric search terms

Erick Erickson
hmmmm, I don't think you want LowerCaseTokenizerFactory..

from:
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LowerCaseTokenizerFactory

Creates org.apache.lucene.analysis.LowerCaseTokenizer.

Creates tokens by lowercasing all letters and dropping non-letters.
Example: "I can't" ==> "i", "can", "t"


also see:
http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/analysis/LowerCaseTokenizer.html

This seems consistent with this part of your debug query:
<str name="rawquerystring">(phone:
650 AND rowtype:contacts)</str>
<str name="querystring">(phone:650 AND rowtype:contacts)</str>
<str name="parsedquery">+rowtype:contacts</str>
<str name="parsedquery_toString">+rowtype:contacts</str>

Note that the number portion of your original query is
completely missing from the parsed query...

How do you want your input tokenized? Maybe you
want a WhitespaceTokenizer and a LowerCase *filter*?

HTH
Erick



On Thu, Dec 3, 2009 at 2:05 PM, con <[hidden email]> wrote:

>
> Yes. I meant all the indexed documents.
>
> With debugQuery=on, i got the following result:
>
> <response>
> -
> <lst name="responseHeader">
> <int name="status">0</int>
> <int name="QTime">1</int>
> -
> <lst name="params">
> <str name="debugQuery">on</str>
> <str name="indent">on</str>
> <str name="start">0</str>
> <str name="q">(phone:650 AND rowtype:contacts)</str>
> <str name="wt">xml</str>
> <str name="rows">1</str>
> <str name="version">2.2</str>
> </lst>
> </lst>
> -
> <result name="response" numFound="104" start="0">
> -
> <doc>
> <str name="ADDRESS">  </str>
> <str name="CITY">  </str>
> <str name="COUNTRY">  </str>
> <date name="CREATEDTIME">2009-09-22T06:50:36.943Z</date>
> <str name="NAME">Adam</str>
> <str name="email">[hidden email]</str>
> <str name="firstname">Adam</str>
> <str name="lastname">smith</str>
> <str name="locale">en_US</str>
> <str name="phone">  </str>
> <str name="rowtype">contacts</str>
> </doc>
> </result>
> -
> <lst name="debug">
> <str name="rawquerystring">(phone:650 AND rowtype:contacts)</str>
> <str name="querystring">(phone:650 AND rowtype:contacts)</str>
> <str name="parsedquery">+rowtype:contacts</str>
> <str name="parsedquery_toString">+rowtype:contacts</str>
> -
> <lst name="explain">
> -
> <str name="1030422en_US">
>
> 0.99043053 = (MATCH) fieldWeight(rowtype:contacts in 0), product of:
>  1.0 = tf(termFreq(rowtype:contacts)=1)
>  0.99043053 = idf(docFreq=104, maxDocs=104)
>  1.0 = fieldNorm(field=rowtype, doc=0)
> </str>
> </lst>
> <str name="QParser">LuceneQParser</str>
> -
> <lst name="timing">
> <double name="time">1.0</double>
> -
> <lst name="prepare">
> <double name="time">0.0</double>
> -
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">0.0</double>
> </lst>
> </lst>
> -
> <lst name="process">
> <double name="time">1.0</double>
> -
> <lst name="org.apache.solr.handler.component.QueryComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.FacetComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.HighlightComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.StatsComponent">
> <double name="time">0.0</double>
> </lst>
> -
> <lst name="org.apache.solr.handler.component.DebugComponent">
> <double name="time">1.0</double>
> </lst>
> </lst>
> </lst>
> </lst>
> </response>
>
>
>
> ************************************************************
>
>
> Erick Erickson wrote:
> >
> > Hmmm, what does debugQuery=on show?
> >
> > And did you mean documents here?
> > << it will return all the search terms>>
> >
> > Best
> > Erick
> >
> > On Thu, Dec 3, 2009 at 11:40 AM, con <[hidden email]> wrote:
> >
> >>
> >> Hi
> >>
> >> My solr deployment is giving correct results for normal search terms
> like
> >> "john".
> >> But when i search with "john55" or "55" it will return all the search
> >> terms,
> >> including those which neither contains john nor 55.
> >> Below is the fieldtype defined for this field.
> >>
> >> <fieldType name="mytype" class="solr.TextField">
> >>    <analyzer type="index">
> >>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
> >>    </analyzer>
> >>    <analyzer type="query">
> >>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
> >>    </analyzer>
> >> </fieldType>
> >>
> >> Is there any other tokenizers or filters need to be set for
> >> alphanumeric/Number search?
> >>
> >>
> >> --
> >> View this message in context:
> >>
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
> >> Sent from the Solr - User mailing list archive at Nabble.com.
> >>
> >>
> >
> >
>
> --
> View this message in context:
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26631343.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues with alphanumeric search terms

convoyer
This post was updated on .
In reply to this post by convoyer
I have added
        <filter class="solr.WordDelimiterFilterFactory" catenateAll="1" />
to both index and query but still getting same behaviour.

Is there any other that i am missing?



con wrote
Yes. I meant all the indexed documents.

With debugQuery=on, i got the following result:

<response>

<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>

<lst name="params">
<str name="debugQuery">on</str>
<str name="indent">on</str>
<str name="start">0</str>
<str name="q">(phone:650 AND rowtype:contacts)</str>
<str name="wt">xml</str>
<str name="rows">1</str>
<str name="version">2.2</str>
</lst>
</lst>

<result name="response" numFound="104" start="0">

<doc>
<str name="ADDRESS">  </str> 
<str name="CITY">  </str> 
<str name="COUNTRY">  </str>
<date name="CREATEDTIME">2009-09-22T06:50:36.943Z</date> 
<str name="NAME">Adam</str> 
<str name="email">adam@abc.com</str>
<str name="firstname">Adam</str>
<str name="lastname">smith</str>
<str name="locale">en_US</str> 
<str name="phone">  </str>
<str name="rowtype">contacts</str> 
</doc>
</result>

<lst name="debug">
<str name="rawquerystring">(phone:650 AND rowtype:contacts)</str>
<str name="querystring">(phone:650 AND rowtype:contacts)</str>
<str name="parsedquery">+rowtype:contacts</str>
<str name="parsedquery_toString">+rowtype:contacts</str>

<lst name="explain">

<str name="1030422en_US">

0.99043053 = (MATCH) fieldWeight(rowtype:contacts in 0), product of:
  1.0 = tf(termFreq(rowtype:contacts)=1)
  0.99043053 = idf(docFreq=104, maxDocs=104)
  1.0 = fieldNorm(field=rowtype, doc=0)
</str>
</lst>
<str name="QParser">LuceneQParser</str>

<lst name="timing">
<double name="time">1.0</double>

<lst name="prepare">
<double name="time">0.0</double>

<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">0.0</double>
</lst>
</lst>

<lst name="process">
<double name="time">1.0</double>

<lst name="org.apache.solr.handler.component.QueryComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.FacetComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.MoreLikeThisComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.HighlightComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.StatsComponent">
<double name="time">0.0</double>
</lst>

<lst name="org.apache.solr.handler.component.DebugComponent">
<double name="time">1.0</double>
</lst>
</lst>
</lst>
</lst>
</response>



************************************************************

Erick Erickson wrote
Hmmm, what does debugQuery=on show?

And did you mean documents here?
<< it will return all the search terms>>

Best
Erick

On Thu, Dec 3, 2009 at 11:40 AM, con <convoyer@gmail.com> wrote:

>
> Hi
>
> My solr deployment is giving correct results for normal search terms like
> "john".
> But when i search with "john55" or "55" it will return all the search
> terms,
> including those which neither contains john nor 55.
> Below is the fieldtype defined for this field.
>
> <fieldType name="mytype" class="solr.TextField">
>    <analyzer type="index">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
>    <analyzer type="query">
>        <tokenizer class="solr.LowerCaseTokenizerFactory"/>
>    </analyzer>
> </fieldType>
>
> Is there any other tokenizers or filters need to be set for
> alphanumeric/Number search?
>
>
> --
> View this message in context:
> http://old.nabble.com/Issues-with-alphanumeric-search-terms-tp26629048p26629048.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Issues with alphanumeric search terms

iorixxx
> I have added
>     <filter
> class="solr.WordDelimiterFilterFactory" catenateAll="1"
> />
> to both index and query but still getting same behaviour.
>
> Is there any other that i am missing?
>

Did you re-start tomcat and re-index? Why not use StandardTokenizerFactory?



Reply | Threaded
Open this post in threaded view
|

Re: Issues with alphanumeric search terms

Erick Erickson
as Ahmet says, you need to re-index.

Nothing about WordDelmiterFilterFactory alters case as far as I can tell
from
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimiterFilterFactory

Are you applying this in addition to the LowerCaseTokenizerFactory? In
which
case it's too late. The numbers have already been stripped...

Please get a copy of Luke and examine your index to see what actually
gets indexed, it'll give you a *much* better idea of what the various
analyzers actually put in your index.

Best
Erick

On Fri, Dec 4, 2009 at 6:57 AM, AHMET ARSLAN <[hidden email]> wrote:

> > I have added
> >     <filter
> > class="solr.WordDelimiterFilterFactory" catenateAll="1"
> > />
> > to both index and query but still getting same behaviour.
> >
> > Is there any other that i am missing?
> >
>
> Did you re-start tomcat and re-index? Why not use StandardTokenizerFactory?
>
>
>
>