Query related question

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Query related question

iboppana
Hi All,

When I query for a word say Tiger woods, and sort results by score... i do notice that the results are mixed up i.e first 5 results match Tiger woods the next 2 match either tiger/tigers or wood/woods
the next 2 after that i notice again match tiger woods.

How do i make sure that when searching for words like above i get all the results matching whole search term first, followed by individual tokens like tiger, woods later.

My text fieldtype defined as follows

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
       
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
       
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>



Thanks
 Indrani
Reply | Threaded
Open this post in threaded view
|

RE: Query related question

Jonathan Rochkind
One way to do it would be to use dismax request handler at query time, with a pf paramater on the same field(s) as your qf paramter, but with a big boost on the pf.  http://wiki.apache.org/solr/DisMaxRequestHandler

I'm not sure why you're getting matches for "tigers" and "woods" on "tiger woods" though; your example has the EnglishPorterFilterFactory commented out, if you had that actually in there that would explain it but as it is, I'm not sure what does. Your synonyms file? That seems odd.

If you WERE using stemming, but wanted un-stemmed results to rank higher, one way to do it would be to actually use two different solr fields, one stemmed and one not stemmed. And then again use dismax, and boost the un-stemmed field higher, in either both qf and pf, or just pf.

Jonathan
________________________________________
From: iboppana [[hidden email]]
Sent: Tuesday, June 01, 2010 10:45 PM
To: [hidden email]
Subject: Query related question

Hi All,

When I query for a word say Tiger woods, and sort results by score... i do
notice that the results are mixed up i.e first 5 results match Tiger woods
the next 2 match either tiger/tigers or wood/woods
the next 2 after that i notice again match tiger woods.

How do i make sure that when searching for words like above i get all the
results matching whole search term first, followed by individual tokens like
tiger, woods later.

My text fieldtype defined as follows

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/>-->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <!--<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"/> -->
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>
    </fieldType>



Thanks
 Indrani
--
View this message in context: http://lucene.472066.n3.nabble.com/Query-related-question-tp863523p863523.html
Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|

Re: Query related question

Chris Hostetter-3
In reply to this post by iboppana
: When I query for a word say Tiger woods, and sort results by score... i do
: notice that the results are mixed up i.e first 5 results match Tiger woods
: the next 2 match either tiger/tigers or wood/woods
: the next 2 after that i notice again match tiger woods.
:
: How do i make sure that when searching for words like above i get all the
: results matching whole search term first, followed by individual tokens like
: tiger, woods later.

for starters, you have to make sense of why exactly those docs are scoring
that way -- this is what the param debugQuery=true is for -- look at the
score explanations and see why those docs are scoring lower.

My guess is that it's because of fieldNorms (ie: longer documents score
lower with the same number of matches) but it could also be a term
frequency factor (some documents contain "tiger" so many times they score
high even w/o "woods") ... you have to understand why your docs score they
way they do before you can come up with a general plan for how to change
the scoring to better meet your goals.



-Hoss