Is it possible to do near terms without using phrase slop in query parser syntax?

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Is it possible to do near terms without using phrase slop in query parser syntax?

Daniel Einspanjer
I've got a field that is indexing people names.  The field is
multivalued and I'm using Solr with a positionIncrementGap of 100.

I've found that trying to specify a near query using something like:
actor_name_mv:"Foster, Jody"~2
matches "Foster, Jody" with a tf score of 1, but it matches "Jody
Foster" with a tf score of .577  The phraseFreq in the first case is 1
and the phraseFreq in the second is 1/3.

For this particular case, I would like the scores for these two cases
to be the same. Is that possible?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

Chris Hostetter-3

: I've found that trying to specify a near query using something like:
: actor_name_mv:"Foster, Jody"~2
: matches "Foster, Jody" with a tf score of 1, but it matches "Jody
: Foster" with a tf score of .577  The phraseFreq in the first case is 1
: and the phraseFreq in the second is 1/3.

as i recall, phraseFreq is passed to the tf(float) function of your
similarity, you can get differnet behavior between the tf() for phrase
queries and simple term queries by having different tf impls for tf(float)
- used for phrases; and tf(int) - used for terms.  by default, tf(int)
calls tf(float)

so you could make tf(float) round up to hte nearest int, and then
fractional phraseFreqs should score the same as exact phraseFreqs.  do
some testing of cases where the phrase matches more then once on the same
field though ... it may not be what you expect ( i believe the
phraseFreqs are summed before calling tf() so there is no way to tell the
differnce between 1 exact match with a phraseFreq of "1" and 2 sloppy
matches each with phraseFreqs of 0.5.


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

Doron Cohen
Chris Hostetter <[hidden email]> wrote on 29/05/2007 12:51:38:

>
> : I've found that trying to specify a near query using something like:
> : actor_name_mv:"Foster, Jody"~2
> : matches "Foster, Jody" with a tf score of 1, but it matches "Jody
> : Foster" with a tf score of .577  The phraseFreq in the first case is 1
> : and the phraseFreq in the second is 1/3.
>
> as i recall, phraseFreq is passed to the tf(float) function of your
> similarity, you can get differnet behavior between the tf() for phrase
> queries and simple term queries by having different tf impls
> for tf(float)
> - used for phrases; and tf(int) - used for terms.  by default, tf(int)
> calls tf(float)
>
> so you could make tf(float) round up to hte nearest int, and then
> fractional phraseFreqs should score the same as exact phraseFreqs.  do
> some testing of cases where the phrase matches more then once on the same
> field though ... it may not be what you expect ( i believe the
> phraseFreqs are summed before calling tf() so there is no way to tell the
> differnce between 1 exact match with a phraseFreq of "1" and 2 sloppy
> matches each with phraseFreqs of 0.5.

Yes they are summed before calling tf(). Would perhaps be
better to override Similarity.sloppyFreq(int) to return 1
(when searching those queries) - this would actually mean
that the sloppiness degree is ignored. It would not be symmetric
though, in the sense that eg query "A B"~3, while it would
score the same these docs: "A B"; "B A"; "A X B"; "B X A",
it would find match "A X Y Z B" but not "B Z Y X A". In
other words, this would not be equivalent to having
SpanQuery's inOrder = false.

Doron


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

Daniel Einspanjer
Thank you both for the assistance.
I ended up going the tf(float) override route rather than sloppyFreq.
I want to keep the ability to specify how far part something is
allowed to be, but from what I understood of Doron's response, I might
lose that if I overrode sloppyFreq.

Because my application is a matching tool rather than a searching
tool, it is okay for a term or phrase that matches multiple times to
have the same score as a single match.  Multiple matches don't mean
anything good in my application.

On 5/30/07, Doron Cohen <[hidden email]> wrote:

> Chris Hostetter <[hidden email]> wrote on 29/05/2007 12:51:38:
> >
> > : I've found that trying to specify a near query using something like:
> > : actor_name_mv:"Foster, Jody"~2
> > : matches "Foster, Jody" with a tf score of 1, but it matches "Jody
> > : Foster" with a tf score of .577  The phraseFreq in the first case is 1
> > : and the phraseFreq in the second is 1/3.
> >
> > as i recall, phraseFreq is passed to the tf(float) function of your
> > similarity, you can get differnet behavior between the tf() for phrase
> > queries and simple term queries by having different tf impls
> > for tf(float)
> > - used for phrases; and tf(int) - used for terms.  by default, tf(int)
> > calls tf(float)
> >
> > so you could make tf(float) round up to hte nearest int, and then
> > fractional phraseFreqs should score the same as exact phraseFreqs.  do
> > some testing of cases where the phrase matches more then once on the same
> > field though ... it may not be what you expect ( i believe the
> > phraseFreqs are summed before calling tf() so there is no way to tell the
> > differnce between 1 exact match with a phraseFreq of "1" and 2 sloppy
> > matches each with phraseFreqs of 0.5.
>
> Yes they are summed before calling tf(). Would perhaps be
> better to override Similarity.sloppyFreq(int) to return 1
> (when searching those queries) - this would actually mean
> that the sloppiness degree is ignored. It would not be symmetric
> though, in the sense that eg query "A B"~3, while it would
> score the same these docs: "A B"; "B A"; "A X B"; "B X A",
> it would find match "A X Y Z B" but not "B Z Y X A". In
> other words, this would not be equivalent to having
> SpanQuery's inOrder = false.
>
> Doron
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

Doron Cohen
"Daniel Einspanjer" <[hidden email]> wrote on 30/05/2007 11:20:51:

> I want to keep the ability to specify how far part something is
> allowed to be, but from what I understood of Doron's response, I might
> lose that if I overrode sloppyFreq.

Just to clarify: sloopyFreq is invoked only for valid distances.
So overriding it to return a constant would just cause not to
reward a shorter distance over a longer (valid) distance.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Is it possible to do near terms without using phrase slop in query parser syntax?

Daniel Einspanjer
Hrm. So maybe I need to go back and take another look at that. I think
I might still need to be overriding tf() to handle the other issue I
mentioned in the thread I replied to just before this about Caesar
appearing multiple times in one title.

On 5/30/07, Doron Cohen <[hidden email]> wrote:

> "Daniel Einspanjer" <[hidden email]> wrote on 30/05/2007 11:20:51:
>
> > I want to keep the ability to specify how far part something is
> > allowed to be, but from what I understood of Doron's response, I might
> > lose that if I overrode sloppyFreq.
>
> Just to clarify: sloopyFreq is invoked only for valid distances.
> So overriding it to return a constant would just cause not to
> reward a shorter distance over a longer (valid) distance.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]