Oddness with Phrase Query

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Oddness with Phrase Query

Simon Wistow
I have a document with the title "Here, there be dragons" and a body.

When I search for

q  = Here, there be dragons
qf = title^2.0 body^0.8
qt = dismax

Which is parsed as

+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here
dragon"^2.0)~0.01) ()

I get the document as the first hit which is what I'd suspect.

However, if change the query to

q  = "Here, there be dragons"

(with quotes)

which is parsed as

+DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here
dragon"^2.0)~0.01) ()

then I don't get the document at all. Which is not what I'd suspect.

I've tried modifying the phrase slop but still don't get any results
back.

Am I doing something wrong - do I have to have an untokenized copy of
fields lying around?

Thanks,

Simon


Reply | Threaded
Open this post in threaded view
|

Re: Oddness with Phrase Query

hossman

Several things about your message don't make sense...

1) the field names listed in your "qf" don't match up to the field names
in the generated query.toString() ... suggesting that they come from
differnet examples

2) the query.toString() output from each of your queries are identicle,
and yet you claim both the input q strings and the results were different.

3) the query.toString() you provided for the 'unquoted" example doesn't
make sense -- there's no reason the dismax parser would have generated a
query structure like that unless your query string was allready quoted.

...all of this suggests major disconnect from what you're actually tried
and what you've pasted in your email ... probably just the result of
trying different things and getting confused about the results and then
mixxing them up when asking your question.

can you try testing the various options again, and please include the
actaul, raw, urls along with the cut/past output from debugQuery (we
probably don't need to the explain section)

It would also be helpful to see the schema.xml section for the
fields/fieldtypes you are using.


: When I search for
:
: q  = Here, there be dragons
: qf = title^2.0 body^0.8
: qt = dismax
:
: Which is parsed as
:
: +DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here
: dragon"^2.0)~0.01) ()
:
: I get the document as the first hit which is what I'd suspect.
:
: However, if change the query to
:
: q  = "Here, there be dragons"
:
: (with quotes)
:
: which is parsed as
:
: +DisjunctionMaxQuery((content:"here dragon"^0.8 | title:"here
: dragon"^2.0)~0.01) ()
:
: then I don't get the document at all. Which is not what I'd suspect.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Oddness with Phrase Query

Simon Wistow
On Tue, Nov 17, 2009 at 11:09:38AM -0800, Chris Hostetter said:
>
> Several things about your message don't make sense...

Hmm, sorry - a byproduct of building up the mail over time I think.

The query

?q="Here there be dragons"
&fl=id,title,score
&debugQuery=on
&qt=dismax
&qf=title

gets echoed as

<lst name="params">
 <str name="qf">title</str>
 <str name="fl">id,title,score</str>
 <str name="debugQuery">on</str>
 <str name="q">"Here there be dragons"</str>
 <str name="qt">dismax</str>
</lst>

and gets parsed as

+DisjunctionMaxQuery((title:"here dragon")~0.01) ()

and gets no results.


Whereas

?q=Here there be dragons
&fl=id,title,score
&debugQuery=on
&qt=dismax
&qf=title

gets echoed as

<lst name="params">
<str name="debugQuery">on</str>
<str name="fl">id,title,score</str>
<str name="q">Here, there be dragons</str>
<str name="qf">title</str>
<str name="qt">dismax</str>
</lst>

and parsed as

+((DisjunctionMaxQuery((title:here)~0.01)
DisjunctionMaxQuery((title:dragon)~0.01))~2) ()

Gets one result

<doc>
<float name="score">6.3863463</float>
<str name="id">20980889</str>
<str name="title">Zelazny, Roger - Here There Be Dragons</str>
</doc>


It looks like it might be related to

SOLR-879: Enable position increments in the query parser and fix the
          example schema to enable position increments for the stop
          filter in both the index and query analyzers to fix the bug
          with phrase queries with stopwords. (yonik)

http://issues.apache.org/jira/browse/SOLR-879

Although I added enablePositionIncrements="true" to

<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>

in to the <analyzer type="query"> for <fieldType name="text"> in the
schema which didn't fix it - I presume this means that I have to reindex
everything (although the StopFilterFactory in <analyzer type="index">
already had it).



Reply | Threaded
Open this post in threaded view
|

Re: Oddness with Phrase Query

hossman

: ?q="Here there be dragons"
: &qt=dismax
: &qf=title
        ...
: +DisjunctionMaxQuery((title:"here dragon")~0.01) ()

...the quotes cause the entire string to be passed to the analyzer for
the title field and the resulting Tokens are used to construct a phrase
query.

: ?q=Here there be dragons
: &qt=dismax
: &qf=title
        ...
: +((DisjunctionMaxQuery((title:here)~0.01)
: DisjunctionMaxQuery((title:dragon)~0.01))~2) ()

...the lack of quotes just results in two term queries, that must be
anywhere in the string.

: It looks like it might be related to
        ...
: http://issues.apache.org/jira/browse/SOLR-879
:
: Although I added enablePositionIncrements="true" to
:
: <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
:
: in to the <analyzer type="query"> for <fieldType name="text"> in the
: schema which didn't fix it - I presume this means that I have to reindex
: everything (although the StopFilterFactory in <analyzer type="index">
: already had it).

...hmm, you shouldn't have to reindex everything.  arey ou sure you
restarted solr after making the enablePositionIncrements="true" change to
the query analyzer?

what do the offsets look like when you go to analysis.jsp and past in that
sentence?

the other thing to consider: you can increase the slop value on that
phrase query (to allow looser matching) using the "qs" param (query slop)
... that could help in this situation (stop words getting striped out of
hte query) as well as other situations (ie: what if the user just types
"here be dragons" -- with or without stop words)



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Oddness with Phrase Query

Simon Wistow
On Mon, Nov 23, 2009 at 12:10:42PM -0800, Chris Hostetter said:
> ...hmm, you shouldn't have to reindex everything.  arey ou sure you
> restarted solr after making the enablePositionIncrements="true" change to
> the query analyzer?

Yup - definitely restarted
 
> what do the offsets look like when you go to analysis.jsp and past in that
> sentence?

org.apache.solr.analysis.StopFilterFactory
{words=stopwords.txt, ignoreCase=true, enablePositionIncrements=true}

term position:   1    4
term text:        Here Dragons
term type:        word word
source start,end 0,4    14,21
payload


> the other thing to consider: you can increase the slop value on that
> phrase query (to allow looser matching) using the "qs" param (query slop)
> ... that could help in this situation (stop words getting striped out of
> hte query) as well as other situations (ie: what if the user just types
> "here be dragons" -- with or without stop words)

After fiddling with the position incremements stuff I upped the query
slop to 2 which seems to now provide better results but I'm worried
about that effecting relevancy elsewhere (which I presume is the reason
why it's not the default value).

If that's the case - is it worth writing something for my app so that if
it detects a phrase query with lots of stop words it ups the phrase
slop?

Either way it seems to be working now  - thanks for all the help,

Simon