solr.search.function

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

solr.search.function

Umar Shah
Hi,

I am investigating to implement an aggregate average function for a document
and require help for the same.

The problem is that I  have documents containing manufacturer, product,
rating (m,p,r)
and i want to find the top  manufacturers for product (p1,p2,...) can be
around 10 to 20 products
so i need to compute average rating for each manufacturer for these
products(p1,p2,..) and sort by this average.

solr.search.function package provides some features but cannot be used out
of the box.

Can someone guide me as to how can I achieve this.

so for a query like  (p:p1+p:p2+...)
find top 10 documents with (m,avg(r))

Can I extend ProductFloatFunction class in some form to return the average.

-umar
Reply | Threaded
Open this post in threaded view
|

Re: solr.search.function

hossman

: I am investigating to implement an aggregate average function for a document
: and require help for the same.

First off: please don't repost the same email with a different subject
(on either solr list) just because you don't recieve a reply in the first
24 hours.  The Solr community is very healthy and active and willing to
help, but sometimes people get busy and not every question gets addressed
right away (if you check the archives though, typically every question
gets answered eventually -- the few threads that have 0 replies are
usually reposts -- but you have to be a little patient)

Second: this sounds exactly like the question you asked a few days ago...

http://www.nabble.com/sorting-on-aggregate-averages-to16095991.html

...did you look into the way the FacetComponent and SimpleFacets work as i
suggested?  what you are asking is really much more related to faceting
then to the Function queries (function queries are designed to give you
one value per document)

Third: something i didn't really catch the first time you asked this
question was how few documents you expected to deal with per request....

: The problem is that I  have documents containing manufacturer, product,
: rating (m,p,r)
: and i want to find the top  manufacturers for product (p1,p2,...) can be
: around 10 to 20 products
: so i need to compute average rating for each manufacturer for these
: products(p1,p2,..) and sort by this average.

If you are only going to query for 10-20 (or even 100) documents, then
you'll have at most 10-20 (or 100) manufactures and ratings.  You could
iterate over these and compute the average directly ... this would be a
lot easier and simpler to implement then trying to leverage the faceting
code (or the FunctionQuery code ... like i said it really wasn't designed
for anything like this)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: solr.search.function

Umar Shah
On 3/21/08, Chris Hostetter <[hidden email]> wrote:

>
>
> : I am investigating to implement an aggregate average function for a
> document
>
> : and require help for the same.
>
>
> First off: please don't repost the same email with a different subject
> (on either solr list) just because you don't recieve a reply in the first
> 24 hours.  The Solr community is very healthy and active and willing to
> help, but sometimes people get busy and not every question gets addressed
> right away (if you check the archives though, typically every question
> gets answered eventually -- the few threads that have 0 replies are
> usually reposts -- but you have to be a little patient)


sorry for the repost in  the dev mailing-list .. I thought i might have to
post in the dev list for this kind of question.

Second: this sounds exactly like the question you asked a few days ago...

>
> http://www.nabble.com/sorting-on-aggregate-averages-to16095991.html
>
> ...did you look into the way the FacetComponent and SimpleFacets work as i
> suggested?  what you are asking is really much more related to faceting
> then to the Function queries (function queries are designed to give you
> one value per document)
>
> Third: something i didn't really catch the first time you asked this
> question was how few documents you expected to deal with per request....
>
> : The problem is that I  have documents containing manufacturer, product,
>
> : rating (m,p,r)
> : and i want to find the top  manufacturers for product (p1,p2,...) can be
> : around 10 to 20 products
> : so i need to compute average rating for each manufacturer for these
> : products(p1,p2,..) and sort by this average.
>
>
> If you are only going to query for 10-20 (or even 100) documents, then
> you'll have at most 10-20 (or 100) manufactures and ratings.  You could
> iterate over these and compute the average directly ... this would be a
> lot easier and simpler to implement then trying to leverage the faceting
> code (or the FunctionQuery code ... like i said it really wasn't designed
> for anything like this)


Given document schema mpr as :
mpr(
MID:INT,
PID:INT,
Rating:float
)

with assumptions:
Cardinality of M ranges in the order of 10^3
Cardinality of P ranges in the order of 10^2
    ... the toltal of records ranging in order of 10^5

the SQL equivalent would be something like:

SELECT MID, AVG(Rating) as Average FROM mpr
    WHERE PID in (p1[,p2,...])
    GROUP BY MID
    ORDER BY Average DESC LIMIT 0, 10;

Also I would require to boost the vales based on PIDs (some products have
more wight than others  effectively computing a wighted average)


To handle these queries I am plannig to develop a custom request handler
plugin in most generic form to be useful in general.

-Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: solr.search.function

hossman

: SELECT MID, AVG(Rating) as Average FROM mpr
:     WHERE PID in (p1[,p2,...])
:     GROUP BY MID
:     ORDER BY Average DESC LIMIT 0, 10;
:
: Also I would require to boost the vales based on PIDs (some products have
: more wight than others  effectively computing a wighted average)

: To handle these queries I am plannig to develop a custom request handler
: plugin in most generic form to be useful in general.

ok .. but i'm not really sure what you're asking at this point ... as i
said: the FunctionQuery code isn't relaly going to help you here .. the
Faceting Code is more akin to what you are asking about.

alternately: just because your database is structured arround one record
for each (MID, PID, Rating) triple doesn't mean your *documents* need to
be structured that way ... instead you can have one document per product
and precompute the average before indexing them (that's the theory behind
building an index, you precompute/denormalize/invert information for
faster lookup later)



-Hoss