sorting on aggregate averages

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

sorting on aggregate averages

Umar Shah
Hi,
I have a problem of returning an list of results which is sorted on a
average of ranks returned from aggregates.
the qury would be something like ?
q=product:p1+product:p2+product:p3; sort score desc
To explain Supose I have documents with fields Product, Manufacturer, Rank
and I want to return the top manufacturers across products p1,p2,p3 with
highest average rank on these products.

One way is to create a store of  search results and then group and compute
the average and sort the result. Can it be done from lucene/ solr itself? if
so how?


umar
Reply | Threaded
Open this post in threaded view
|

Re: sorting on aggregate averages

hossman

: I have a problem of returning an list of results which is sorted on a
: average of ranks returned from aggregates.
: the qury would be something like ?
: q=product:p1+product:p2+product:p3; sort score desc
: To explain Supose I have documents with fields Product, Manufacturer, Rank
: and I want to return the top manufacturers across products p1,p2,p3 with
: highest average rank on these products.

the topic of generating statistics on facet constraints has come up before
... but nothing for doing that is provided out of the box at the moment.

while basic stats like the min/mean/median/stddev/max of a numeric facet
field (in the context of a q/fq) would be relativeily straight forward to
add to Solr's built in simple facet support; more complex types statistics
(like hat you describe) would be difficult to implement in a way that
would be generally reusable through simple query params ... however: it
would probably be fairly straightfoward to implemnt domain specific stats
like this directly in a custom plugin.

The new SearchComponents framework available in the trunk would probably
be an easy way to do this, allthough it's not very well documented at the
moment.  If you lok at the existing FacetComponent however, seeing how it
generates facet counts, and extending it to know about your specific
fields and generate the type of stats you want should be possible.  




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: sorting on aggregate averages

Umar Shah
Hi,

it took me some but I implemented the required function by developing a
custom plugin for our specific example. However Now I have another issue:

I am computing a sorted rank list and returning a slice (for pagination) but
have to recompute the result for each request, although the actual q
parameter and fq would be cached but not the sorted list which I could cache
to reuse on subsequent requests.

I might have a look at the caching also, any suggestions in this regard.

thanks.
-umar


On Wed, Mar 19, 2008 at 2:59 AM, Chris Hostetter <[hidden email]>
wrote:

>
> : I have a problem of returning an list of results which is sorted on a
> : average of ranks returned from aggregates.
> : the qury would be something like ?
> : q=product:p1+product:p2+product:p3; sort score desc
> : To explain Supose I have documents with fields Product, Manufacturer,
> Rank
> : and I want to return the top manufacturers across products p1,p2,p3 with
> : highest average rank on these products.
>
> the topic of generating statistics on facet constraints has come up before
> ... but nothing for doing that is provided out of the box at the moment.
>
> while basic stats like the min/mean/median/stddev/max of a numeric facet
> field (in the context of a q/fq) would be relativeily straight forward to
> add to Solr's built in simple facet support; more complex types statistics
> (like hat you describe) would be difficult to implement in a way that
> would be generally reusable through simple query params ... however: it
> would probably be fairly straightfoward to implemnt domain specific stats
> like this directly in a custom plugin.
>
> The new SearchComponents framework available in the trunk would probably
> be an easy way to do this, allthough it's not very well documented at the
> moment.  If you lok at the existing FacetComponent however, seeing how it
> generates facet counts, and extending it to know about your specific
> fields and generate the type of stats you want should be possible.
>
>
>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: sorting on aggregate averages

hossman
: I am computing a sorted rank list and returning a slice (for pagination) but
: have to recompute the result for each request, although the actual q
: parameter and fq would be cached but not the sorted list which I could cache
: to reuse on subsequent requests.
:
: I might have a look at the caching also, any suggestions in this regard.

Take a look at "User/Generic Caches" here...

        http://wiki.apache.org/solr/SolrCaching

Your custom handler/component can use SolrIndexSearcher.getCache to see if
a cache with a specific name has been defined, if it has you can do the
normal get/put operations on it. The cache will worry about expulsion of
items if it's full (the only Impl that comes with Solr is an LRUCache, but
you could write your own if you want), and SolrCore will worry about
giving you a new cache instance when a new reader is opened.  If you
implement a CacheRegenerator (and configure it for this cache) then you
can put whatever custome code in that you want for autowarming entries in
the cache based on the keys/values of the old cache (ie: warm all the
keys, warm the "first" N keys, warm all the keys whose values indicate
they were expensive to compute, etc....)

(just make sure your custom handler/component can function ok even if the
cache doesn't exist, or if there are cache misses even when you don't
expect them -- it is after all just a cache, good code should be able to
function (slowly) without it if it's turned off.)

-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: sorting on aggregate averages

Umar Shah
Thanks!
I'll have a look at that.



On Wed, Apr 2, 2008 at 6:25 AM, Chris Hostetter <[hidden email]>
wrote:

> : I am computing a sorted rank list and returning a slice (for pagination)
> but
> : have to recompute the result for each request, although the actual q
> : parameter and fq would be cached but not the sorted list which I could
> cache
> : to reuse on subsequent requests.
> :
> : I might have a look at the caching also, any suggestions in this regard.
>
> Take a look at "User/Generic Caches" here...
>
>        http://wiki.apache.org/solr/SolrCaching
>
> Your custom handler/component can use SolrIndexSearcher.getCache to see if
> a cache with a specific name has been defined, if it has you can do the
> normal get/put operations on it. The cache will worry about expulsion of
> items if it's full (the only Impl that comes with Solr is an LRUCache, but
> you could write your own if you want), and SolrCore will worry about
> giving you a new cache instance when a new reader is opened.  If you
> implement a CacheRegenerator (and configure it for this cache) then you
> can put whatever custome code in that you want for autowarming entries in
> the cache based on the keys/values of the old cache (ie: warm all the
> keys, warm the "first" N keys, warm all the keys whose values indicate
> they were expensive to compute, etc....)
>
> (just make sure your custom handler/component can function ok even if the
> cache doesn't exist, or if there are cache misses even when you don't
> expect them -- it is after all just a cache, good code should be able to
> function (slowly) without it if it's turned off.)
>
> -Hoss
>
>