Somewhat complex scoring/boosting

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Somewhat complex scoring/boosting

Ravis-3
Hi Folks,

I have somewhat complex scoring/boosting requirement.

Say I have 3 text fields A, B, C and a Numeric field called D.
Say My query is "testrank".

Scoring should be based on following:

Query matches
1. text fields A, B and C, & Highest value of D (highest boost/rank)
2. A and B, & Highest value of D (2nd highest)
3. A and C, & Highest value of D (3rd highest)
4. B and C, & Highest value of D (4th highest)
5. B, & Highest value of D (5th highest)
6. C, & Highest value of D (6th highest)

i). If I use the standard query, it will be query (with boost) something
like this:

query = (A:testrank AND B:testrank AND C:testrank)^10 OR (A:testrank AND
B:testrank)^9 OR (A:testrank AND C:testrank)^8 OR (B:testrank AND
C:testrank)^7 OR (A:testrank)^6 OR (B:testrank)^5 OR (C:testrank)^4
sort = by Score (primary), Field D (Secondary)

Also, I do need to override Similarity such that tf, idf etc doesn't
interfere; and all docs should score purely based on boost values, I have
specified. That way seconday sort can be effective.

This will be a poor query so I would like to avoid it.

ii). I have never used DisjunctionMaxQuery (or Solr qt=DisMax) and at first
glance, it appeared just like what I need (with tiebreaker = 0). However it
is not. If I understand it correctly #1, #2 and #3 will score equally
because it just score based on highest boost (which is A for #1, #2 and #3).
This will not work.

iii) Wondering, Do I have to write a custom Query (custom score) like
DisjunctionMaxQuery which scores based on sum of matching fields instead of
just taking highest. Wondering, if I could override the scoring of
DisjustionMaxQuery such that it takes sum of scores from sub-queries.

If anyone has any clever suggestion, I will really really appreciate.

Thanks,
Ravi
Reply | Threaded
Open this post in threaded view
|

Re: Somewhat complex scoring/boosting

Grant Ingersoll-2

On Sep 5, 2008, at 6:27 PM, Ravindra Sharma wrote:

> Hi Folks,
>
> I have somewhat complex scoring/boosting requirement.
>
> Say I have 3 text fields A, B, C and a Numeric field called D.
> Say My query is "testrank".
>
> Scoring should be based on following:
>
> Query matches
> 1. text fields A, B and C, & Highest value of D (highest boost/rank)
> 2. A and B, & Highest value of D (2nd highest)
> 3. A and C, & Highest value of D (3rd highest)
> 4. B and C, & Highest value of D (4th highest)
> 5. B, & Highest value of D (5th highest)
> 6. C, & Highest value of D (6th highest)
>
> i). If I use the standard query, it will be query (with boost)  
> something
> like this:
>
> query = (A:testrank AND B:testrank AND C:testrank)^10 OR (A:testrank  
> AND
> B:testrank)^9 OR (A:testrank AND C:testrank)^8 OR (B:testrank AND
> C:testrank)^7 OR (A:testrank)^6 OR (B:testrank)^5 OR (C:testrank)^4
> sort = by Score (primary), Field D (Secondary)
>
> Also, I do need to override Similarity such that tf, idf etc doesn't
> interfere; and all docs should score purely based on boost values, I  
> have
> specified. That way seconday sort can be effective.
>
> This will be a poor query so I would like to avoid it.

Why is it poor?  I admit, I'm not fully following what you are trying  
to do.  Perhaps, taking a step back and letting us know the bigger  
picture you want to solve will help.  For example, how did you come up  
w/ the need for the scoring algorithm above?  Is this research or are  
you trying to factor in PageRank or something like that?


-Grant
Reply | Threaded
Open this post in threaded view
|

Re: Somewhat complex scoring/boosting

hossman

: > query = (A:testrank AND B:testrank AND C:testrank)^10 OR (A:testrank AND
: > B:testrank)^9 OR (A:testrank AND C:testrank)^8 OR (B:testrank AND
: > C:testrank)^7 OR (A:testrank)^6 OR (B:testrank)^5 OR (C:testrank)^4
: > sort = by Score (primary), Field D (Secondary)
: >
: > Also, I do need to override Similarity such that tf, idf etc doesn't
: > interfere; and all docs should score purely based on boost values, I have
: > specified. That way seconday sort can be effective.

the way you've phrased your question leads me to believe you haven't
checked out function queries ... factoring one based on field D into your
query and then just sorting straight up on score should get you fairly
close to what you wnat (probably without needing to much with your
Similarity class)


-Hoss