DisMax request handler doesn't work with stopwords?

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

DisMax request handler doesn't work with stopwords?

Casey Durfee
 
It appears that if your search terms include stopwords and you use the DisMax request handler, you get no results whereas the same search with the standard request handler does give you results.  Is this a bug or by design?
 
Thanks,
 
--Casey
 
Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Chris Hostetter-3

: It appears that if your search terms include stopwords and you use the
: DisMax request handler, you get no results whereas the same search with
: the standard request handler does give you results.  Is this a bug or by
: design?

dismax works just fine with stop words ... can you give a specific
example url?  what does the query toString look like when you use
debugQuery?




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Casey Durfee
Sure thing.  I downloaded the latest version of Solr, started up the example server, and indexed the ipod_other.xml file.  The following URLs give a result:
 
http://localhost:8983/solr/select/?q=ipod 
http://localhost:8983/solr/select/?q=the+ipod 
http://localhost:8983/solr/select/?q=ipod&qt=dismax 
The following URL does not:
http://localhost:8983/solr/select/?q=the+ipod&qt=dismax 
 
the toString in the last case is:
 
+(((cat:the^1.4 | id:the^10.0)~0.01 (text:ipod^0.5 | cat:ipod^1.4 | features:ipod | name:ipod^1.2 | sku:ipod^1.5 | manu:ipod^1.1 | id:ipod^10.0)~0.01)~2) (text:ipod^0.2 | manu:ipod^1.4 | name:ipod^1.5 | manu_exact:the ipod^1.9 | features:ipod^1.1)~0.01 (org.apache.solr.search.function.OrdFieldSource:ord(poplarity))^0.5 (org.apache.solr.search.function.ReciprocalFloatFunction:1000.0/(1.0*float(rord(price))+1000.0))^0.3
 

>>> Chris Hostetter <[hidden email]> 6/7/2007 2:12 PM >>>

: It appears that if your search terms include stopwords and you use the
: DisMax request handler, you get no results whereas the same search with
: the standard request handler does give you results.  Is this a bug or by
: design?

dismax works just fine with stop words ... can you give a specific
example url?  what does the query toString look like when you use
debugQuery?




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Mike Klaas
In reply to this post by Casey Durfee
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:

> It appears that if your search terms include stopwords and you use  
> the DisMax request handler, you get no results whereas the same  
> search with the standard request handler does give you results.  Is  
> this a bug or by design?

There is a subtlety with stopwords and dismax.  Imagine a search  
"what's in python", using a typical analyzer with stopwords for  
fields such as title, inlinks, rawText, but a more restrictive  
analyzer for fields such as url, that have no stopwords.
For the above search using the following weight function

title^1.2 inlinks^1.4 rawText^1.0
produces the following parsed query string

+(
   (
    (rawText:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (rawText:python | inlinks:python^1.4 | title:python^1.2)~0.01
   )~2
  )
  (rawText:"what python"~5 | inlinks:"what python"~5^1.4 |  
title:"what python"~5^1.2)~0.01
while the same query with a weight function of

title^1.2 inlinks^1.4 rawText^1.0 url^1.0
produces this query string

+(
   (
    (rawText:what | url:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (url:in)~0.01
    (rawText:python | url:python | inlinks:python^1.4 |  
title:python^1.2)~0.01
   )~3
  )
  (rawText:"what python"~5 | url:"what in python"~5 | inlinks:"what  
python"~5^1.4 | title:"what python"~5^1.2)~0.01
Note the latter includes a term (url:in)~0.01 on its own. This  
interacts poorly when using a high mm (minimum #clauses match)  
setting with dismax, as it effectively requires 'in' to be in the url  
column, which was probably not the intent of the query.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Casey Durfee
Thank you!  That makes sense.
 
--Casey

>>> Mike Klaas <[hidden email]> 6/7/2007 2:35 PM >>>
On 7-Jun-07, at 1:41 PM, Casey Durfee wrote:

> It appears that if your search terms include stopwords and you use  
> the DisMax request handler, you get no results whereas the same  
> search with the standard request handler does give you results.  Is  
> this a bug or by design?

There is a subtlety with stopwords and dismax.  Imagine a search  
"what's in python", using a typical analyzer with stopwords for  
fields such as title, inlinks, rawText, but a more restrictive  
analyzer for fields such as url, that have no stopwords.
For the above search using the following weight function

title^1.2 inlinks^1.4 rawText^1.0
produces the following parsed query string

+(
   (
    (rawText:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (rawText:python | inlinks:python^1.4 | title:python^1.2)~0.01
   )~2
  )
  (rawText:"what python"~5 | inlinks:"what python"~5^1.4 |  
title:"what python"~5^1.2)~0.01
while the same query with a weight function of

title^1.2 inlinks^1.4 rawText^1.0 url^1.0
produces this query string

+(
   (
    (rawText:what | url:what | inlinks:what^1.4 | title:what^1.2)~0.01
    (url:in)~0.01
    (rawText:python | url:python | inlinks:python^1.4 |  
title:python^1.2)~0.01
   )~3
  )
  (rawText:"what python"~5 | url:"what in python"~5 | inlinks:"what  
python"~5^1.4 | title:"what python"~5^1.2)~0.01
Note the latter includes a term (url:in)~0.01 on its own. This  
interacts poorly when using a high mm (minimum #clauses match)  
setting with dismax, as it effectively requires 'in' to be in the url  
column, which was probably not the intent of the query.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Dan Davis
In reply to this post by Casey Durfee
I'm having the same issues. We are using Dismax, with a stopword list. Currently we are having customers typing in "model ipod", we added model to the stopwords list and tested with the standard handler..works fine, but not with dismax (MM = 3<-1 5<-2 6<90%). When i comment out MM, it works. Do you have any recommendations on how to deal with this issue, without doing away with MM (MM does help with alot of phrase queries).

Thanks,

Dan

Casey Durfee wrote
 
It appears that if your search terms include stopwords and you use the DisMax request handler, you get no results whereas the same search with the standard request handler does give you results.  Is this a bug or by design?
 
Thanks,
 
--Casey
 
Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Chris Hostetter-3

: I'm having the same issues. We are using Dismax, with a stopword list.
: Currently we are having customers typing in "model ipod", we added model to
: the stopwords list and tested with the standard handler..works fine, but not
: with dismax (MM = 3&lt;-1 5&lt;-2 6&lt;90%). When i comment out MM, it
: works. Do you have any recommendations on how to deal with this issue,
: without doing away with MM (MM does help with alot of phrase queries).

are you sure your problem isn't the same as Casey's?  that you are using
dismax across a field which doesn't treat model as a stopword?

can you provide the query toString info from debugQuery=true so we can see
exactly how dismax is parsing your request?



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: DisMax request handler doesn't work with stopwords?

Dan Davis
You're absolute right. I missed one field, which did not have the solr.StopFilterFactory applied to. I must of missed that while reading the post yesterday. Anyways, I ensured all the fields that dismax was searching across had the stopwords applied, and now everything works great!

Thanks Hoss!

Dan

hossman_lucene wrote
: I'm having the same issues. We are using Dismax, with a stopword list.
: Currently we are having customers typing in "model ipod", we added model to
: the stopwords list and tested with the standard handler..works fine, but not
: with dismax (MM = 3<-1 5<-2 6<90%). When i comment out MM, it
: works. Do you have any recommendations on how to deal with this issue,
: without doing away with MM (MM does help with alot of phrase queries).

are you sure your problem isn't the same as Casey's?  that you are using
dismax across a field which doesn't treat model as a stopword?

can you provide the query toString info from debugQuery=true so we can see
exactly how dismax is parsing your request?



-Hoss