facet optimizing

classic Classic list List threaded Threaded
25 messages Options
12
Reply | Threaded
Open this post in threaded view
|

facet optimizing

Gunther, Andrew

Any suggestions on how to optimize the loading of facets?  My index is
roughly 35,000 and I am asking solr to return 6 six facet fields on
every query.  On large result sets with facet params set to false
searching is zippy, but when set to true, and facet fields designated,
it takes some time to load.  I've tried adjusting some/all of the Common
Cache Configuration Parameters in the config but haven't gotten any
better result times.  Any suggestions?


Thanks,


Andrew


 

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Erik Hatcher
How many unique values do you have for those 6 fields?   And are  
those fields multiValued or not?  Single valued facets are much  
faster (though not realistic in my domain).  Lots of values per field  
do not good facets make.

        Erik

On Feb 7, 2007, at 11:10 AM, Gunther, Andrew wrote:

>
> Any suggestions on how to optimize the loading of facets?  My index is
> roughly 35,000 and I am asking solr to return 6 six facet fields on
> every query.  On large result sets with facet params set to false
> searching is zippy, but when set to true, and facet fields designated,
> it takes some time to load.  I've tried adjusting some/all of the  
> Common
> Cache Configuration Parameters in the config but haven't gotten any
> better result times.  Any suggestions?
>
>
> Thanks,
>
>
> Andrew
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

RE: facet optimizing

Gunther, Andrew
Yes most all terms are multi-valued which I can't avoid.
Since the data is coming from a library catalogue I am translating a
subject field to make a subject facet.  That facet alone is the biggest,
hovering near 39k.  If I remove this facet.field things return faster.
So am I to assume that this particular field bogging down operations and
there are no other optimization options besides cutting down this field?

Thanks!

-Andrew

-----Original Message-----
From: Erik Hatcher [mailto:[hidden email]]
Sent: Wednesday, February 07, 2007 11:22 AM
To: [hidden email]
Subject: Re: facet optimizing

How many unique values do you have for those 6 fields?   And are  
those fields multiValued or not?  Single valued facets are much  
faster (though not realistic in my domain).  Lots of values per field  
do not good facets make.

        Erik

On Feb 7, 2007, at 11:10 AM, Gunther, Andrew wrote:

>
> Any suggestions on how to optimize the loading of facets?  My index is
> roughly 35,000 and I am asking solr to return 6 six facet fields on
> every query.  On large result sets with facet params set to false
> searching is zippy, but when set to true, and facet fields designated,
> it takes some time to load.  I've tried adjusting some/all of the  
> Common
> Cache Configuration Parameters in the config but haven't gotten any
> better result times.  Any suggestions?
>
>
> Thanks,
>
>
> Andrew
>
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Andrew Nagy-2
Gunther, Andrew wrote:
> Yes most all terms are multi-valued which I can't avoid.
> Since the data is coming from a library catalogue I am translating a
> subject field to make a subject facet.  That facet alone is the biggest,
> hovering near 39k.  If I remove this facet.field things return faster.
> So am I to assume that this particular field bogging down operations and
> there are no other optimization options besides cutting down this field?
>  

Andrew, I haven't yet found a successful way to implement the SOLR
faceting for library catalog data.  I developed my own system, so for
every query, I first query 20 records.  Let's say it find 1000 records
and returns the 20 records.  Then I make a second query returning all
1000 records and build my own facets based on the 1000 records.  It's a
bit faster than using SOLRs faceting system, but as you said.  For large
records it still takes a bit of time.  I implemented it using AJAX so it
doesn't slow down the loading of the page.

I'd be curious if anyone has been able to find a better way using SOLRs
faceting system

Andrew
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Mike Klaas
In reply to this post by Gunther, Andrew
On 2/7/07, Gunther, Andrew <[hidden email]> wrote:
> Yes most all terms are multi-valued which I can't avoid.
> Since the data is coming from a library catalogue I am translating a
> subject field to make a subject facet.  That facet alone is the biggest,
> hovering near 39k.  If I remove this facet.field things return faster.
> So am I to assume that this particular field bogging down operations and
> there are no other optimization options besides cutting down this field?

Well, the applicable optimizations probably will be related to how you
use the results.  Surely you are not displaying 39,000 facet counts to
the user?

If you are only displaying the top subjects, one solution is to
collect more documents than you need, enumerate the subjects of the
results, and only facet on that subset.

This could be built into solr eventually (some sort of sampling bitset
intersection size).

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Chris Hostetter-3
In reply to this post by Andrew Nagy-2

: Andrew, I haven't yet found a successful way to implement the SOLR
: faceting for library catalog data.  I developed my own system, so for

Just to clarify: the "out of hte box" faceting support Solr has at the
moment is very deliberately refered to as "SimpleFacets" ... it's intended
to solve Simple problems where you want Facets based on all of the values
in a field, or one specific hardcoded queries.  It was primarily written
as a demonstration of what is possiblewhen writting a custom
SolrRequestHandler.

when you start talking about really large data sets, with an extremely
large vloume of unique field values for fields you want to facet on, then
"generic" solutions stop being very feasible, and you have to start ooking
at solutions more tailored to your dataset.  at CNET, when dealing with
Product data, we don't make any attempt to use the Simple Facet support
Solr provides to facet on things like Manufacturer or Operating System
because enumerating through every Manufacturer in the catalog on every
query would be too expensive -- instead we have strucured metadata that
drives the logic: only compute the constraint counts for this subset of
manufactures where looking at teh Desktops category, only look at teh
Operating System facet when in these categories, etc...  rules like these
need to be defined based on your user experience, and it can be easy to
build them using the metadata in your index -- but they really
need to be precomputed, not calculated on the fly every time.

For something like a LIbrary system, where you might want to facet on
Author, but you have way to many to be practical, a system that either
required a category to be picked first (allowing you to constrain the list
of authors you need to worry about) or precomputed the top 1000 authors
for displaying initially (when the user hasn't provided any other
constraints) are examples of the types of things a RequestHandler Solr
Plugin might do -- but the logic involved would probably be domain
specific.



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Yonik Seeley-2
In reply to this post by Gunther, Andrew
On 2/7/07, Gunther, Andrew <[hidden email]> wrote:
> Any suggestions on how to optimize the loading of facets?  My index is
> roughly 35,000

35,000 documents?  That's not that big.

> and I am asking solr to return 6 six facet fields on
> every query.  On large result sets with facet params set to false
> searching is zippy, but when set to true, and facet fields designated,
> it takes some time to load.  I've tried adjusting some/all of the Common
> Cache Configuration Parameters in the config but haven't gotten any
> better result times.  Any suggestions?

The first time you do these facet requests, they should be slower.
After that, they should be somewhat faster.

Solr relies on the filter cache for faceting, and if it's not big
enough you're going to get a near 0% hit rate.  Check the statistics
page and make sure there aren't any evictions after you do a query
with facets.  If there are, make the cache larger.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Erik Hatcher

On Feb 7, 2007, at 4:42 PM, Yonik Seeley wrote:
> Solr relies on the filter cache for faceting, and if it's not big
> enough you're going to get a near 0% hit rate.  Check the statistics
> page and make sure there aren't any evictions after you do a query
> with facets.  If there are, make the cache larger.

Yonik - thanks!   I was too deep into other things to worry about the  
slowness of massive multiValued facets, mainly because I was going to  
use the mess of all those nasty values we have in typical library  
data to push back and have it cleaned up.  But, I just adjusted my  
filter cache settings and my responses went from 2000+ ms to 85 ms!  
Now it takes longer to render the pie charts than it does to get the  
results back :)

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Ryan McKinley
In reply to this post by Yonik Seeley-2
Are there any simple automatic test we can run to see what fields
would support fast faceting?

Is it just that the cache size needs to be bigger then the number of
distinct values for a field?

If so, it would be nice to add an /admin page that lists each field,
the distinct value count and a green/red box showing if the current
configuration will facet quickly.  This could be a good place to
suggest a good configuration for the data in your index.

ryan
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Chris Hostetter-3

: Is it just that the cache size needs to be bigger then the number of
: distinct values for a field?

basically yes, but the cache is going to be used for all filters -- not
just those for a single facet (so your cache might be big enough that
faceting on fieldA or fieldB is fine, but if you facet on both you'll get
performance pains)

: If so, it would be nice to add an /admin page that lists each field,
: the distinct value count and a green/red box showing if the current
: configuration will facet quickly.  This could be a good place to
: suggest a good configuration for the data in your index.

that reminds me of an idea i had way back when for a Solr sanity checker
tool that would inspect your schema, index, and logs of sample queries and
point out things that didn't seem to make any sense, ie: String fields
that don't omitNorms, sorting on a tokenized field, or a non-sortable
int/float fields that werre multivalued but didnt' seem like they needed
to be (because every doc only had one value) etc...



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: facet optimizing

pbinkley
In reply to this post by Chris Hostetter-3
In the library subject heading context, I wonder if a layered approach
would bring performance into the acceptable range. Since Library of
Congress Subject Headings break into standard parts, you could have
first-tier facets representing the main heading, second-tier facets with
the main heading and first subdivision, etc. So to extract the subject
headings from a given result set, you'd first test all the first-tier
facets like "Body, Human", then where warranted test the associated
second-tier facets like "Body, Human--Social aspects.". If the
first-tier facets represent a small enough subset of the set of subject
headings as a whole, that might be enough to reduce the total number of
facet tests.

I'm told by our metadata librarian, by the way, that there are 280,000
subject headings defined in LCSH at the moment (including
cross-references), so that provides a rough upper limit on distinct
values...

Peter

-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Wednesday, February 07, 2007 2:02 PM
To: [hidden email]
Subject: Re: facet optimizing


: Andrew, I haven't yet found a successful way to implement the SOLR
: faceting for library catalog data.  I developed my own system, so for

Just to clarify: the "out of hte box" faceting support Solr has at the
moment is very deliberately refered to as "SimpleFacets" ... it's
intended to solve Simple problems where you want Facets based on all of
the values in a field, or one specific hardcoded queries.  It was
primarily written as a demonstration of what is possiblewhen writting a
custom SolrRequestHandler.

when you start talking about really large data sets, with an extremely
large vloume of unique field values for fields you want to facet on,
then "generic" solutions stop being very feasible, and you have to start
ooking at solutions more tailored to your dataset.  at CNET, when
dealing with Product data, we don't make any attempt to use the Simple
Facet support Solr provides to facet on things like Manufacturer or
Operating System because enumerating through every Manufacturer in the
catalog on every query would be too expensive -- instead we have
strucured metadata that drives the logic: only compute the constraint
counts for this subset of manufactures where looking at teh Desktops
category, only look at teh Operating System facet when in these
categories, etc...  rules like these need to be defined based on your
user experience, and it can be easy to build them using the metadata in
your index -- but they really need to be precomputed, not calculated on
the fly every time.

For something like a LIbrary system, where you might want to facet on
Author, but you have way to many to be practical, a system that either
required a category to be picked first (allowing you to constrain the
list of authors you need to worry about) or precomputed the top 1000
authors for displaying initially (when the user hasn't provided any
other
constraints) are examples of the types of things a RequestHandler Solr
Plugin might do -- but the logic involved would probably be domain
specific.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: facet optimizing

Chris Hostetter-3

: headings from a given result set, you'd first test all the first-tier
: facets like "Body, Human", then where warranted test the associated
: second-tier facets like "Body, Human--Social aspects.". If the
: first-tier facets represent a small enough subset of the set of subject
: headings as a whole, that might be enough to reduce the total number of
: facet tests.

that's exactly the type of thing i'm suggesting -- the trick would be to
size your caches so that the first-tier constraints were pretty much
always cached, and the popular second-tier constraints are usually cached
-- but once you get to the second or third tiers the number of possible
constraints is small enough that even if they aren't cached, you can
compute their counts in a resonable amount of time.

a really cach concious RequestHandler could even use the non caching
SolrIndexSearcher methods if it knew it was dealing with a really low tier
constraint (allthough at that point, spending the time to implement a
Cache implementation with an approximate LFU replacement strategy
instead of LRU would probably be a more robust use of engineering
resources)


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

rubdabadub
In reply to this post by Chris Hostetter-3
Hi:

> when you start talking about really large data sets, with an extremely
> large vloume of unique field values for fields you want to facet on, then
> "generic" solutions stop being very feasible, and you have to start ooking
> at solutions more tailored to your dataset.  at CNET, when dealing with
> Product data, we don't make any attempt to use the Simple Facet support
> Solr provides to facet on things like Manufacturer or Operating System
> because enumerating through every Manufacturer in the catalog on every
> query would be too expensive -- instead we have strucured metadata that
> drives the logic: only compute the constraint counts for this subset of
> manufactures where looking at teh Desktops category, only look at teh
> Operating System facet when in these categories, etc...  rules like these
> need to be defined based on your user experience, and it can be easy to
> build them using the metadata in your index -- but they really
> need to be precomputed, not calculated on the fly every time.

Sounds interesting, Could you please provide an example of how would one
go about doing a precomuted qurery?

> For something like a LIbrary system, where you might want to facet on
> Author, but you have way to many to be practical, a system that either
> required a category to be picked first (allowing you to constrain the list
> of authors you need to worry about) or precomputed the top 1000 authors
> for displaying initially (when the user hasn't provided any other
> constraints) are examples of the types of things a RequestHandler Solr
> Plugin might do -- but the logic involved would probably be domain
> specific.

Specifically here without getting any user constrains? how would one do this.. I
thought facets needs to have user constraints?

I would appreciate your feedback.

thanks

>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Yonik Seeley-2
In reply to this post by pbinkley
On 2/7/07, Binkley, Peter <[hidden email]> wrote:
> In the library subject heading context, I wonder if a layered approach
> would bring performance into the acceptable range. Since Library of
> Congress Subject Headings break into standard parts, you could have
> first-tier facets representing the main heading, second-tier facets with
> the main heading and first subdivision, etc. So to extract the subject
> headings from a given result set, you'd first test all the first-tier
> facets like "Body, Human", then where warranted test the associated
> second-tier facets like "Body, Human--Social aspects.".

Yes... we've had discussions about hierarchical facets in the past,
but more focused on organization/presentation than performance:
http://www.nabble.com/Hierarchical-Facets--tf2560327.html#a7135353

Which got me thinking... if we could use hierarchical facets to speed
up faceting, then we should also be able to use the same type of
strategy for non-hierarchical facets!

We could create a facet-tree, where sets at parent nodes are the union
of the child sets.
This should allow one to more quickly zoom into where higher counts
are concentrated, without necessitating storing all the facets.

One could control the space/time tradeoff with the branching factor of the tree.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Erik Hatcher
Yonik - I like the way you think!!!!

     Yeah!

It's turtles (err, trees) all the way down.

        Erik
        /me Pulling the Algorithms book off my shelf so I can vaguely follow  
along.


On Feb 7, 2007, at 8:22 PM, Yonik Seeley wrote:

> On 2/7/07, Binkley, Peter <[hidden email]> wrote:
>> In the library subject heading context, I wonder if a layered  
>> approach
>> would bring performance into the acceptable range. Since Library of
>> Congress Subject Headings break into standard parts, you could have
>> first-tier facets representing the main heading, second-tier  
>> facets with
>> the main heading and first subdivision, etc. So to extract the  
>> subject
>> headings from a given result set, you'd first test all the first-tier
>> facets like "Body, Human", then where warranted test the associated
>> second-tier facets like "Body, Human--Social aspects.".
>
> Yes... we've had discussions about hierarchical facets in the past,
> but more focused on organization/presentation than performance:
> http://www.nabble.com/Hierarchical-Facets--tf2560327.html#a7135353
>
> Which got me thinking... if we could use hierarchical facets to speed
> up faceting, then we should also be able to use the same type of
> strategy for non-hierarchical facets!
>
> We could create a facet-tree, where sets at parent nodes are the union
> of the child sets.
> This should allow one to more quickly zoom into where higher counts
> are concentrated, without necessitating storing all the facets.
>
> One could control the space/time tradeoff with the branching factor  
> of the tree.
>
> -Yonik

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Yonik Seeley-2
On 2/7/07, Erik Hatcher <[hidden email]> wrote:
> Yonik - I like the way you think!!!!
>
>      Yeah!
>
> It's turtles (err, trees) all the way down.

Heh...
I'm still thinking/brainstorming about it... it only helps if you can
effectively prune though.
Each node in the tree could also keep the max docfreq of any of it's children.
If either the facet count or max doc freq for a node is less than the
minimum facet count in your priority queue, then you don't have to
expand that node.

One other piece of info you can glean: If your branching factor is 10,
and the facet count for a node is 430, then you know that you are
guaranteed at least a count of 43 at the next level.  I'm not sure how
one could use that info though (beyond just picking the highest node
anyway).

The facet counts at nodes could be used to direct where you expand
first, but can't really be used to prune like the docfreq can.

Pruning won't work well when the base docset size is small... but if
it's small enough, we can use other strategies for that.  It also
won't work well for fields where higher docfreqs are rare... author
*might* be a case of that.

If we are making our own hierarchy, we can order it to be more effective.
For instance, the node containing terms 1-10 doesn't have to be a
sibling of the node with terms 11-20...  the hard part is figuring how
we want to arrange them.

At the leaves... perhaps putting nodes with high maxDf together???
That might make it more likely you could prune off other nodes
quicker.  It seems like one would want to minimize set overlap (i.e.
minimize the total cardinallity of all the sets at any one level).
But that doesn't seem easy to do in a timely manner.

So, off the top of my head right now, I guess maybe just sort the
leaves by maxDf and build the rest of the tree on top of that.

Thinking all this stuff up from scratch seems like the hard way...
Does anyone know how other people have implemented this stuff?

-Yonik
Reply | Threaded
Open this post in threaded view
|

RE: facet optimizing

pbinkley
Yonik wrote:

Thinking all this stuff up from scratch seems like the hard way...
Does anyone know how other people have implemented this stuff?



It's not really what Yonik was asking for, but on the semantic front,
one thing that might help is OCLC's FAST project (Faceted Application of
Subject Terminology). Our metadata librarian Susan Dahl just sent me
some info about it:

http://www.oclc.org/research/projects/fast/default.htm

It's an attempt to simplify LCSH syntax without losing the rich
vocabulary of LCSH. The purpose is to make it easier for non-cataloguers
to assign subject terms to web resources; but it also includes a "Focus
on making use of LCSH as a post-coordinate system in an online
environment."

In the Solr context, it would (I think) involve processing the existing
LCSH headings into FAST headings before indexing, and the benefit would
presumably be simpler and more consistent faceting terms. It's not clear
to me yet exactly how this would work: there's a database of FAST
authority records, but I'm not sure whether we (well, to be precise,
Erik) could make effective use of the FAST schema without access to the
database. But this might be a good context for the kind of "push back"
that Erik talked about. Those who push back against library cataloguing
standards can find themselves wearing concrete shoes pretty quickly, if
they don't have some solid backup.

Some relevant publications:

http://www.ifla.org/V/iflaj/ij-4-2003.pdf
http://www.ifla.org/IV/ifla69/papers/010e-ONeill_Mai-Chan.pdf
http://www.unifi.it/universita/biblioteche/ac/relazioni/dean_eng.pdf

Peter

Peter Binkley
Digital Initiatives Technology Librarian
Information Technology Services
4-30 Cameron Library
University of Alberta Libraries
Edmonton, Alberta
Canada T6G 2J8
Phone: (780) 492-3743
Fax: (780) 492-9243
e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Yonik Seeley-2
In reply to this post by Yonik Seeley-2
A little more brainstorming on this...
pruning by df is going to be one of the most important features
here... so a variation (or optimization) would be to keep a list of
the highest terms by df, and then build the facet tree excluding those
top terms.  That should lower the dfs in the tree nodes and allow more
pruning.

-Yonik


On 2/7/07, Yonik Seeley <[hidden email]> wrote:

> Heh...
> I'm still thinking/brainstorming about it... it only helps if you can
> effectively prune though.
> Each node in the tree could also keep the max docfreq of any of it's children.
> If either the facet count or max doc freq for a node is less than the
> minimum facet count in your priority queue, then you don't have to
> expand that node.
>
> One other piece of info you can glean: If your branching factor is 10,
> and the facet count for a node is 430, then you know that you are
> guaranteed at least a count of 43 at the next level.  I'm not sure how
> one could use that info though (beyond just picking the highest node
> anyway).
>
> The facet counts at nodes could be used to direct where you expand
> first, but can't really be used to prune like the docfreq can.
>
> Pruning won't work well when the base docset size is small... but if
> it's small enough, we can use other strategies for that.  It also
> won't work well for fields where higher docfreqs are rare... author
> *might* be a case of that.
>
> If we are making our own hierarchy, we can order it to be more effective.
> For instance, the node containing terms 1-10 doesn't have to be a
> sibling of the node with terms 11-20...  the hard part is figuring how
> we want to arrange them.
>
> At the leaves... perhaps putting nodes with high maxDf together???
> That might make it more likely you could prune off other nodes
> quicker.  It seems like one would want to minimize set overlap (i.e.
> minimize the total cardinallity of all the sets at any one level).
> But that doesn't seem easy to do in a timely manner.
>
> So, off the top of my head right now, I guess maybe just sort the
> leaves by maxDf and build the rest of the tree on top of that.
>
> Thinking all this stuff up from scratch seems like the hard way...
> Does anyone know how other people have implemented this stuff?
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Erik Hatcher
And to add some fuel to this fire, I'm seeing in the (first 100k of  
UVa MARC records) data I'm processing that the facets are sparse with  
documents.  There are a lot of documents that simply don't have a  
subject genre on them, for example... like almost 50%.  Maybe the  
data will get cleaner as I load.   I figured mentioning the spareness  
of the data in facets is likely typical.

        Erik


On Feb 8, 2007, at 2:54 PM, Yonik Seeley wrote:

> A little more brainstorming on this...
> pruning by df is going to be one of the most important features
> here... so a variation (or optimization) would be to keep a list of
> the highest terms by df, and then build the facet tree excluding those
> top terms.  That should lower the dfs in the tree nodes and allow more
> pruning.
>
> -Yonik
>
>
> On 2/7/07, Yonik Seeley <[hidden email]> wrote:
>> Heh...
>> I'm still thinking/brainstorming about it... it only helps if you can
>> effectively prune though.
>> Each node in the tree could also keep the max docfreq of any of  
>> it's children.
>> If either the facet count or max doc freq for a node is less than the
>> minimum facet count in your priority queue, then you don't have to
>> expand that node.
>>
>> One other piece of info you can glean: If your branching factor is  
>> 10,
>> and the facet count for a node is 430, then you know that you are
>> guaranteed at least a count of 43 at the next level.  I'm not sure  
>> how
>> one could use that info though (beyond just picking the highest node
>> anyway).
>>
>> The facet counts at nodes could be used to direct where you expand
>> first, but can't really be used to prune like the docfreq can.
>>
>> Pruning won't work well when the base docset size is small... but if
>> it's small enough, we can use other strategies for that.  It also
>> won't work well for fields where higher docfreqs are rare... author
>> *might* be a case of that.
>>
>> If we are making our own hierarchy, we can order it to be more  
>> effective.
>> For instance, the node containing terms 1-10 doesn't have to be a
>> sibling of the node with terms 11-20...  the hard part is figuring  
>> how
>> we want to arrange them.
>>
>> At the leaves... perhaps putting nodes with high maxDf together???
>> That might make it more likely you could prune off other nodes
>> quicker.  It seems like one would want to minimize set overlap (i.e.
>> minimize the total cardinallity of all the sets at any one level).
>> But that doesn't seem easy to do in a timely manner.
>>
>> So, off the top of my head right now, I guess maybe just sort the
>> leaves by maxDf and build the rest of the tree on top of that.
>>
>> Thinking all this stuff up from scratch seems like the hard way...
>> Does anyone know how other people have implemented this stuff?
>>
>> -Yonik
>>

Reply | Threaded
Open this post in threaded view
|

Re: facet optimizing

Chris Hostetter-3
In reply to this post by rubdabadub

: > query would be too expensive -- instead we have strucured metadata that
: > drives the logic: only compute the constraint counts for this subset of
: > manufactures where looking at teh Desktops category, only look at teh
: > Operating System facet when in these categories, etc...  rules like these
: > need to be defined based on your user experience, and it can be easy to
: > build them using the metadata in your index -- but they really
: > need to be precomputed, not calculated on the fly every time.
:
: Sounds interesting, Could you please provide an example of how would one
: go about doing a precomuted qurery?

it's domain specific ... we know certain manufactures only make products
in certain categories, so we only check the counts for those manufacturers
when we are seraching in those categories.

: > For something like a LIbrary system, where you might want to facet on
: > Author, but you have way to many to be practical, a system that either
: > required a category to be picked first (allowing you to constrain the list
: > of authors you need to worry about) or precomputed the top 1000 authors
: > for displaying initially (when the user hasn't provided any other
: > constraints) are examples of the types of things a RequestHandler Solr
: > Plugin might do -- but the logic involved would probably be domain
: > specific.
:
: Specifically here without getting any user constrains? how would one do this.. I
: thought facets needs to have user constraints?

your base query can be anything, even a query that matches on all
documents -- but that's a minor technical issue, my point is about hte
user experience: you can precompute popular authors to offer the user
before they specify any search criteria, or to compute facet counts for
when the users given you a simple search phrase, but iterating over every
possible author isn't feasible until some other constraint has been
provided that reduces the number of authors you have to deal with (like
telling you the first initial of hte author)



-Hoss

12