Boosting by calculated distance buckets

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Boosting by calculated distance buckets

sraav
I hit a block when I ran into a use case where I had to boost on ranges of distances calculated at query time. This is the case when the distance is not present in the document initially but calulated based on the user entered lat/long values.

1. Is it required that all the boost parameters be searchable or can we boost on dynamic parameters which are calculated ?
2. Is there a way to boost on geodist() in a specific range – For example – Boost all the cars listed within 20-50kms range(from the search zip) by 100. And give a boost of 85 to all the cars listed within 51-80kms range  from the search zip.

Please provide your feedback and let me know if there are any other options that i could try out.
Reply | Threaded
Open this post in threaded view
|

Re: Boosting by calculated distance buckets

David Smiley
Hello,
You can totally boost by calculations that happen on-the-fly on a per-document basis when you search.  These are called function queries in Solr.

Your your specific example… a solution that doesn’t involve writing a custom so-called ValueSource in Java would likely mean calculating the distance multiple times per document for each range.  Instead I suggest a continuous function, like the reciprocal of the distance.  See the definition of the formula here:  https://cwiki.apache.org/confluence/display/solr/Function+Queries#FunctionQueries-AvailableFunctions   For ‘m’ provide 1.0.  For ‘a’ and ‘b’ I suggest using the same value set to roughly 1/10th the distance to the perimeter of the region of relevant interest — perhaps 1/10th of say 200km.  You will of course fiddle with this to your liking.  Assuming you use edismax, you could multiply the natural score by something like:
&boost=recip(geodist(),1,20,20)    

~ David Smiley
Freelance Apache Lucene/Solr Search Consultant/Developer
http://www.linkedin.com/in/davidwsmiley

sraav wrote
I hit a block when I ran into a use case where I had to boost on ranges of distances calculated at query time. This is the case when the distance is not present in the document initially but calulated based on the user entered lat/long values.

1. Is it required that all the boost parameters be searchable or can we boost on dynamic parameters which are calculated ?
2. Is there a way to boost on geodist() in a specific range – For example – Boost all the cars listed within 20-50kms range(from the search zip) by 100. And give a boost of 85 to all the cars listed within 51-80kms range  from the search zip.

Please provide your feedback and let me know if there are any other options that i could try out.
Reply | Threaded
Open this post in threaded view
|

Re: Boosting by calculated distance buckets

sraav
This post has NOT been accepted by the mailing list yet.
David,

Thank you for your prompt response. I truly appreciate it. Also, My post was not accepted the first two times so I am posting it again one final time.

In my case I want to turn off the dependency on scoring and let solr use just the boost values that I pass to each function to sort on. Here is a quick example of how I got that to work with non-geo fields which are present in the document and are not dynamically calculated. Using edismax ofcourse.

I was able to turn off the scoring (i mean remove the dependency on score) on the result set and drive the sort by the boost that I mentioned in the below query. In the below function For example - if the "document1" matches the date listed it gets a boost = 5. If the same document matches the owner AND product  - it will get an additional boost of 5 more. The total boost of this "document1" is 10. From what ever I have seen, it seems like i was able to turn off of negate the affects of solr score. There was a query norm param that was affecting the boost but it seemed to be a constant around 0.70345...most of the time for any fq mentioned).  

bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0))

What I am trying to do is to add additional boosting function to the custom boost that will eventually tie into the above function and boost value.

For example - if "document1" falls in 0-20 KM range i would like to add a boost of 50 making the final boost value to be 60. If it falls under 20-40KM - i would like to add a boost of 40 and so on.  

Is there a way we can do this?  Please let me know if I can provide better clarity on the use case that I am trying to solve. Thank you David.

Thanks,
Raav
Reply | Threaded
Open this post in threaded view
|

Re: Boosting by calculated distance buckets

David Smiley
Raav,

You may need to actually subscribe to the solr-user list.  Nabble seems to not be working to well.
p.s. I’m on vacation this week so I can’t be very responsive

First of all... it's not clear you actually want to *boost* (since you seem to not care about the relevancy score), it seems you want to *sort* based on a function query.  So simply sort by the function query instead of using the 'bq' param.

Have you read about geodist() in the Solr Reference Guide?  It returns the spatial distance.  With that and other function queries like map() you could do something like sum(map(geodist(),0,40,40,0),map(geodist(),0,20,10,0)) and you could put that into your main function query.  I purposefully overlapped the map ranges so that I didn't have to deal with double-counting an edge.  The only thing I don't like about this is that the distance is going to be calculated as many times as you reference the function, and it's slow.  So you may want to write your own function query (internally called a ValueSource), which is relatively easy to do in Solr.

~ David

sraav wrote
David,

Thank you for your prompt response. I truly appreciate it. Also, My post was not accepted the first two times so I am posting it again one final time.

In my case I want to turn off the dependency on scoring and let solr use just the boost values that I pass to each function to sort on. Here is a quick example of how I got that to work with non-geo fields which are present in the document and are not dynamically calculated. Using edismax ofcourse.

I was able to turn off the scoring (i mean remove the dependency on score) on the result set and drive the sort by the boost that I mentioned in the below query. In the below function For example - if the "document1" matches the date listed it gets a boost = 5. If the same document matches the owner AND product  - it will get an additional boost of 5 more. The total boost of this "document1" is 10. From what ever I have seen, it seems like i was able to turn off of negate the affects of solr score. There was a query norm param that was affecting the boost but it seemed to be a constant around 0.70345...most of the time for any fq mentioned).  

bq = {!func}sum(if(query({!v='datelisted:[2015-01-22T00:00:00.000Z TO *]'}),5,0),if(and(query({!v='owner:*BRAVE*'}),query({!v='PRODUCT:*SWORD*'}),5,0))

What I am trying to do is to add additional boosting function to the custom boost that will eventually tie into the above function and boost value.

For example - if "document1" falls in 0-20 KM range i would like to add a boost of 50 making the final boost value to be 60. If it falls under 20-40KM - i would like to add a boost of 40 and so on.  

Is there a way we can do this?  Please let me know if I can provide better clarity on the use case that I am trying to solve. Thank you David.

Thanks,
Raav
Reply | Threaded
Open this post in threaded view
|

Re: Boosting by calculated distance buckets

sraav
David,

I just subscriped to the solr list..lets see if that will allow me to posting this.

I will write a Custom ValueSource.  I tried the map function that you suggested, it works but it is not so great on performance.  

I will try referring funtion query as a sort instead of bq..may be it will perform better. Either ways ValusSource seems like a better bet over all.  

Thanks You so much!!
Raav