Re: 2D Facet

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: 2D Facet

Evgeniy Strokin
Chris, I'm very interested to implement generic multidimensional faceting. But I'm not an expert in Solr, but I'm very good with Java. So I need little bit more directions if you don't mind. I promise to share my code and if you'll be Ok with it you are welcome to use it.
So, Lets say I have a parameter facet.field=STATE. For example we'll take 3D faceting, so I'll need 2 more facet fields related to the first one. Should we do something like this:
facet.field=STATE&f.STATE.facet.matrix=NAME&f.STATE.facet.matrix=INCOME
Or for example we can have may be like this:
facet.matrix=STATE,NAME,INCOME
What would you suggest is better?
Also, where in Solr I could find something similar to take it as an example? Where all this logic should be placed?
 
Thank you
Gene


----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: Solr User <[hidden email]>
Sent: Thursday, January 17, 2008 1:12:32 AM
Subject: Re: 2D Facet

:
: Hello, is this possible to do in one query: I have a query which returns
: 1000 documents with names and addresses. I can run facet on state field
: and see how many addresses I have in each state. But also I need to see
: how many families lives in each state. So as a result I need a matrix of
: states on top and Last Names on right. After my first query, knowing
: which states I have I can run queries on each state using facet field
: Last_Name. But I guess this is not an efficient way. Is this possible to
: get in one query? Or may be some other way?

if you set rows=0 on all of those queries it won't be horribly inefficient
... the DocSets for each state and lastname should wind up in the
filterCache, so most of the queries will just be simple DocSet
intersections with only the HTTP overhead (which if you use persistent
connections should be fairly minor)

The idea of generic multidimensional faceting is acctaully pretty
interesting ... it could be done fairly simply -- imagine if for every
facet.field=foo param, solr checked for a f.foo.facet.matrix params, and
once the top facet.limit terms were found for field "foo" it then
computed the top facet founds for each f.foo.facet.matrix field
with an implicit fq=foo:term.

that would be pretty cool.


-Hoss
Reply | Threaded
Open this post in threaded view
|

Re: 2D Facet

hossman

: say I have a parameter facet.field=STATE. For example we'll take 3D
: faceting, so I'll need 2 more facet fields related to the first one.
: Should we do something like this:
: facet.field=STATE&f.STATE.facet.matrix=NAME&f.STATE.facet.matrix=INCOME
: Or for example we can have may be like this:
: facet.matrix=STATE,NAME,INCOME
: What would you suggest is better?

It's not something i've thought about too hard, but i was thinking along
the line of the first example.  So STATE is the main facet for the matrix,
and the other facets are identified as values of the f.STATE.facet.matrix
param ("matrix" isn't really the best word, it's more like a tre of
facet values ... for each of the top N values in the "main" facet, you
also get the top N values of the other facets listed).

That way you could have multiple fracet trees, and a single facet could
be part of more then one tree, it just couldn't be the main facet of
more then one tree.  for example, imagine we want to facet cars...

  facet.limit=10 &
  facet.field=STATE &
  facet.field=MODEL & f.MODEL.facet.tree=COLOR & f.MODEL.facet.tree=YEAR &
  facet.field=TYPE  & f.TYPE.facet.tree=COLOR & f.TYPE.facet.tree=STATE

...that would give you completley independent facet counts for STATE,
MODEL, and TYPE, but it would also tell you what the type 10 COLORs and
YEARs are for each of the top 10 MODELs, and what the top 10 COLORs and
STATEs are for each TYPE of car (even if not enough cars are in that state
to show up in the main STATE facet)

...honestly: any permutation you want is possible, it's jsut a question of
how to express it cleanly in key=val pair style input so it's easy to
express over HTTP.

: Also, where in Solr I could find something similar to take it as an
: example? Where all this logic should be placed?

the logic could o in a custom RequestHandler, or a custom Component ... if
you look at the FacetComponent class in the nightly builds of Solr you can
see how the current Simple faceting code is handled ... the underlying
methods (for getting counts using DocSet intersections) can still be
reused, you just need to pass them additional "filter" DocSets from the
"main" facet.




-Hoss