Solr creates whitespace in dismax query

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr creates whitespace in dismax query

Johann Höchtl
I have a fieldtype with the following definition:

     <fieldType name="text_kstem" class="solr.TextField" positionIncrementGap="100">
       <analyzer type="index">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false" />
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0" splitOnCaseChange="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="solr.SynonymFilterFactory" synonyms="openthesaurus.txt" ignoreCase="true" expand="true"/>
         <filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory" protected="protwords.txt"/>
         <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
       </analyzer>
       <analyzer type="query">
         <tokenizer class="solr.WhitespaceTokenizerFactory"/>
         <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
         <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"/>
         <filter class="solr.LowerCaseFilterFactory"/>
         <filter class="com.lucidimagination.solrworks.analysis.LucidKStemFilterFactory" protected="protwords.txt"/>
       </analyzer>
     </fieldType>

I have a value "blume2000.de" in a field with the fieldtype above. If I issue a query with select?q=blume2000&qt=dismax (yes the provided field gets searched by dismax handler) and
the result is empty. Only if I enter the query select?q=blume+2000&qt=dismax I get the result I want.

So I used the debugQuery=true to find out what's wrong. The interesting thing is, that the rawquerystring is still correct, but the
parsedquery is:
+DisjunctionMaxQuery((name:"blume 2000" | teaser:"blume 2000")) DisjunctionMaxQuery((teaser:"blume 2000"~3 | name:"blume 2000"~3))

Now I gotta ask, where does the whitespace come from and why isn't the document matched?

If I analyze the query using the admin backend: Field(name): name Fieldvalue(Index): blume2000.de  and Fieldvalue(Query): blume2000.de it works...

Anybody already had that problem?


Reply | Threaded
Open this post in threaded view
|

Re: Solr creates whitespace in dismax query

MitchK
Johann,

try to remove the wordDelimiterFilter from the query-analyzer of your fieldType.
If your index-analyzer-wordDelimiterFilter is well configured, it will find everything you want.

Does this solve the problem?

Kind regards,
- Mitch
Reply | Threaded
Open this post in threaded view
|

Re: Solr creates whitespace in dismax query

Johann Höchtl
No, it didn't solve the problem, bit I got a different solution. I make
a second field in schema.xml and copy the content. This field gets
analyzed by the keywordtokenizer factory.

Thanks,
Johann

Am 24.08.2010 21:53, schrieb MitchK:

> Johann,
>
> try to remove the wordDelimiterFilter from the query-analyzer of your
> fieldType.
> If your index-analyzer-wordDelimiterFilter is well configured, it will find
> everything you want.
>
> Does this solve the problem?
>
> Kind regards,
> - Mitch
>    
Reply | Threaded
Open this post in threaded view
|

Re: Solr creates whitespace in dismax query

Erick Erickson
keywordtokenizerfactory interprets the entire input as a single token, so
this could
be a problem for you. For instance, the text:
bloom2000.de is some text
will get indexed as a single token. Seaches on "some" or "text" won't match.
This
may be what you're looking for, but....

I really think Mitch pointed you in the right direction.
WordDelimiteFilterFactory
was probably part of your problem. The stemmer might have done interesting
things
too.

Also, if you didn't re-index after changing your schema, you might have had
trouble
too.

the admin/analysis page can help you a lot in these situations.

Best
Erick

On Tue, Aug 31, 2010 at 6:34 AM, Johann Höchtl <[hidden email]> wrote:

> No, it didn't solve the problem, bit I got a different solution. I make a
> second field in schema.xml and copy the content. This field gets analyzed by
> the keywordtokenizer factory.
>
> Thanks,
> Johann
>
> Am 24.08.2010 21:53, schrieb MitchK:
>
>  Johann,
>>
>> try to remove the wordDelimiterFilter from the query-analyzer of your
>> fieldType.
>> If your index-analyzer-wordDelimiterFilter is well configured, it will
>> find
>> everything you want.
>>
>> Does this solve the problem?
>>
>> Kind regards,
>> - Mitch
>>
>>
>