eDismax pf2 and pf3

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

eDismax pf2 and pf3

Jan Høydahl / Cominvent
Testing pf2 and pf3. I thought that when using pf2=myfield, and q=foo bar, you would get a phrase query "foo bar", but you don't, unless there are at least 3 terms in the query. Is this intentional? I think of "pf2" as boosting any two words in the query, even if there are only two words. The offending code is:

    if (null == fields || fields.isEmpty() ||
        null == clauses || clauses.size() <= shingleSize )
      return;

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: eDismax pf2 and pf3

Yonik Seeley-2-2
On Wed, Apr 11, 2012 at 7:40 PM, Jan Høydahl <[hidden email]> wrote:
> Testing pf2 and pf3. I thought that when using pf2=myfield, and q=foo bar, you would get a phrase query "foo bar", but you don't, unless there are at least 3 terms in the query. Is this intentional?

Nope.

> I think of "pf2" as boosting any two words in the query, even if there are only two words.

Correct.

> The offending code is:
>
>    if (null == fields || fields.isEmpty() ||
>        null == clauses || clauses.size() <= shingleSize )
>      return;

Correct.  Looks like a bug probably introduced during a refactor
(since I don't recall using the "shingle" terminology).

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: eDismax pf2 and pf3

Jan Høydahl / Cominvent
SOLR-3352

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 12. apr. 2012, at 02:10, Yonik Seeley wrote:

> On Wed, Apr 11, 2012 at 7:40 PM, Jan Høydahl <[hidden email]> wrote:
>> Testing pf2 and pf3. I thought that when using pf2=myfield, and q=foo bar, you would get a phrase query "foo bar", but you don't, unless there are at least 3 terms in the query. Is this intentional?
>
> Nope.
>
>> I think of "pf2" as boosting any two words in the query, even if there are only two words.
>
> Correct.
>
>> The offending code is:
>>
>>    if (null == fields || fields.isEmpty() ||
>>        null == clauses || clauses.size() <= shingleSize )
>>      return;
>
> Correct.  Looks like a bug probably introduced during a refactor
> (since I don't recall using the "shingle" terminology).
>
> -Yonik
> lucenerevolution.com - Lucene/Solr Open Source Search Conference.
> Boston May 7-10
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: eDismax pf2 and pf3

Chris Hostetter-3
In reply to this post by Yonik Seeley-2-2

: > Testing pf2 and pf3. I thought that when using pf2=myfield, and q=foo
: bar, you would get a phrase query "foo bar", but you don't, unless there
: are at least 3 terms in the query. Is this intentional?
:
: Nope.
:
: > I think of "pf2" as boosting any two words in the query, even if there
: are only two words.
:
: Correct.

-0 ... getting "double the boosting" on the original query if you use
both pf and pf2 smells weird to me in a way i can't fully describe, but i
can certainly understand how the consistency would at least be easier to
understand.

: Correct.  Looks like a bug probably introduced during a refactor
: (since I don't recall using the "shingle" terminology).

FWIW: i did the refactoring of that method and introduced those variables,
but the same logic is in the original SOLR-1553 patch...

+        Map<String,Float> pf = phraseFields;
+        if (normalClauses.size() >= 2 && pf.size() > 0) {
+          StringBuilder sb = new StringBuilder();
+          for (int i=0; i<normalClauses.size()-1; i++) {
...
+        pf = phraseFields3;
+        if (normalClauses.size() >= 3 && pf.size() > 0) {
+          StringBuilder sb = new StringBuilder();
+          for (int i=0; i<normalClauses.size()-2; i++) {

...so it was't a bug introduced later, it was written out that way
explicitly in the begining for some reason.


-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: eDismax pf2 and pf3

Yonik Seeley-2-2
On Wed, Apr 11, 2012 at 10:24 PM, Chris Hostetter
<[hidden email]> wrote:

>
> : > Testing pf2 and pf3. I thought that when using pf2=myfield, and q=foo
> : bar, you would get a phrase query "foo bar", but you don't, unless there
> : are at least 3 terms in the query. Is this intentional?
> :
> : Nope.
> :
> : > I think of "pf2" as boosting any two words in the query, even if there
> : are only two words.
> :
> : Correct.
>
> -0 ... getting "double the boosting" on the original query if you use
> both pf and pf2 smells weird to me in a way i can't fully describe, but i
> can certainly understand how the consistency would at least be easier to
> understand.

And if "pf2" is the only pf parameter?
I don't know what the right behavior is if multiple "pf" parameters
are used, but it certainly seems like you should always get phrase
boosting if possible if you are using only one parameter, and that
should be the common case.


> : Correct.  Looks like a bug probably introduced during a refactor
> : (since I don't recall using the "shingle" terminology).
>
> FWIW: i did the refactoring of that method and introduced those variables,
> but the same logic is in the original SOLR-1553 patch...
>
> +        Map<String,Float> pf = phraseFields;
> +        if (normalClauses.size() >= 2 && pf.size() > 0) {
> +          StringBuilder sb = new StringBuilder();
> +          for (int i=0; i<normalClauses.size()-1; i++) {
> ...
> +        pf = phraseFields3;
> +        if (normalClauses.size() >= 3 && pf.size() > 0) {
> +          StringBuilder sb = new StringBuilder();
> +          for (int i=0; i<normalClauses.size()-2; i++) {
>
> ...so it was't a bug introduced later, it was written out that way
> explicitly in the begining for some reason.


Just glancing at it quickly... but it seems like the original code
quoted above would add phrases if there were 2 terms (keeping in mind
that "pf" in the original patch was eventually changed to "pf2".)

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: eDismax pf2 and pf3

Chris Hostetter-3

: Just glancing at it quickly... but it seems like the original code
: quoted above would add phrases if there were 2 terms (keeping in mind
: that "pf" in the original patch was eventually changed to "pf2".)

BAH!!!! ... you are absolutely correct ...

aparently i made the same mistake *twice* ... once when refactoring it,
and once yesterday when reading it to see if i screwed up the refactoring.

-Hoss

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]