Hierarchical Facets?

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Hierarchical Facets?

David Legg
I've just discovered Solr and realized its potential!  My main interest
is in Solr's fledgling support for faceted search.

Are there any ideas on how to support hierarchical facets?

Take a facet like 'Location' for example.  This can be thought of as
hierarchical because you could let the user select a path through the
hierarchy like: -

  United States> Massachusetts> Boston


I suppose you could use the CNET approach of storing a Meta document for
each facet and then define three facets called Country, State and City
and set things up so that if the Country facet is selected the State
facet is displayed.  This sounds workable (but long winded) and you'd
have the extra problem that someone selecting 'London' from the City
facet may get documents from both Canada and England etc.

David Legg

Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

Yonik Seeley-2
On 11/2/06, David Legg <[hidden email]> wrote:
> I've just discovered Solr and realized its potential!

Welcome!

> My main interest
> is in Solr's fledgling support for faceted search.
>
> Are there any ideas on how to support hierarchical facets?

I've thought of some optimizations and general approach (not yet
implemented) when it's a strict hierarchy.

Store the full tag path as a single value w/ a special separator:
 "United States/Massachusetts/Boston"

That allows one to use the FieldCache (which maps a document back to a
single field value... can only be used for single-valued non-tokenized
fields).

The existing facet.field functionality could be used to return counts
of unique values, and the Solr client (front-end) could sum everything
under Massachusetts to get a count for Massachusetts itself, etc.

An alternative would be to add the concept of a strict facet hierarchy
into Solr, and it could do the summing itself (useful if there are too
many leaves to return them all to the client).

For narrowing search results (specifying filter queries via fq), one
could use prefix queries for higher level categories:
fq=location:United States/Massachusetts/*


So in short, I think it's already doable with what's there, but could
be made more efficient.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

Chris Hostetter-3

: > My main interest
: > is in Solr's fledgling support for faceted search.
: >
: > Are there any ideas on how to support hierarchical facets?

there are two different things that fall under the heading of
"hierarchical facets", and they can be very differnet...

The first usage is when the word "hierarchy" refers largely to the UI of
the facet, and not of the data itself.  An example of this would be a
facet on "date", where you show each decade as a possible constraint
w/count and if/when the user picks a decade, then you should the
individual years as constraints, and if/when they pick a year you should
them months, etc...  untill you get to the granularity that makes sense.
Another example of this would be a facet on a "Name" field, where you
start by showing them constratints and coutns on the first letter of a
name, and one they pick that you should them all of hte two letter combos,
and then the 3 letter combos, etc...

The second usage is when the documents themselves can be organized into a
hierarchical taxonomy (or perhaps multiple differnet taxonomies) and you
want to expose that information as a facet ...  Nabble.com's "Narrow
Search Results" right nav fits this model.


It's not allways a clear cut line though ... you might actually think of
your data being organized in a hierarcy of decades, which have years as
sub-categories which have months as sub-categories ... but most people
wouldn't do that.  I don't know anyone who would think that it acctually
makes sense to categorize people in a taxonomy based on the first N leters
of their name.  Yonik's Location example however, is a good one where the
lines are very blurry.  It might make sense to think of it in the first
usage case, where the UI of the facets is presented in such a way that
we only should the State level constraints once a Country level constraint
is picked, etc...  But other people might prefer a UI driven by the second
usage, where a biz search for "Dunkin Donuts" lists the first five
constraints in the Location field as...
     United States/Massachusetts/Boston (103)
     United States/Massachusetts/Cambridge (92)
     United States/Massachusetts/Brookline (84)
     United States/Colorado (77)
     United States/Massachusetts/Newton (55)

...because there are just so damn many Dunkin Donuts in Massachusetts, we
go down to the citi level, but for Colorado we just show at the state
granularity


The first type of "hierarchical facet" is obviously a lot easier then the
second -- largely because the second can't typcially be done using simpel
Term comparisons ... and you need some carefully choosen logic for
deciding when to be granular, and when to be general.

: An alternative would be to add the concept of a strict facet hierarchy
: into Solr, and it could do the summing itself (useful if there are too
: many leaves to return them all to the client).

FYI: what we found when building the "Category" Facet in the left nav of
this page...
  http://shopper.search.com/search?q=compactflash
...was that sorting on the strict count wasn't very good from a usability
perspective, because the very general categories in the taxonomy tended to
contain so many products they allways sorted to the top, even if they
weren't particularaly relevent (ie: even if only half of the Digital
Cameras on the market use compact flash cards, that may still be more then
the total number of products in the entire "flash memory" category).  We
wound up computing a custom score for each category based on a function of
the number of matching results in that category, the total number of items in
that category, and a few other metrics.

It may be hard to roll out a truely "generic" way of sorting hierarchical
counts that would be usefull for people.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

David Legg
In reply to this post by Yonik Seeley-2
Hi Yonik,
>> Are there any ideas on how to support hierarchical facets?
> I've thought of some optimizations and general approach (not yet
> implemented) when it's a strict hierarchy.
Thanks for your comments.

Of the two approaches you mention (storing the full tag path or creating
a strict facet hierarchy concept in Solr) I think the latter is the
tidier approach.
> So in short, I think it's already doable with what's there, but could
> be made more efficient.
I think a powerful facet supporting framework will be a very sought
after feature in the next few years.  The question is whether Solr is
the right structure to develop it on.  I'm torn because I believe that
complex facets will require some kind of ontological support rather than
an inverted index structure like Lucene.  The problem is most
ontological code is relatively slow and difficult to update in a
scalable way.

Thanks for adding the Jira note (SOLR-64).

- David.
Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

David Legg
In reply to this post by Chris Hostetter-3
Chris,

> : > Are there any ideas on how to support hierarchical facets?
>
> It may be hard to roll out a truely "generic" way of sorting hierarchical
> counts that would be usefull for people.
>  

Thanks for the examples.  I see what you mean about the difficulty in
creating a generic facet framework.  It would be interesting to know if
any work has been published which enumerates the design patterns
involved.  I found Marti Hearst's research paper "Design Recommendations
for Hierarchical Faceted Search" very useful...
http://flamenco.berkeley.edu/papers/faceted-workshop06.pdf

I hadn't considered the problem of a simple facet like a date
potentially being presented as a set of pseudo facets like decade, year,
month etc.  That would require the query server to further split the
counts depending on how it was to be presented in the user interface.

Sure, sites like epicurious.com with their fixed number of simple facets
can be catered for in Solr.  Future sites, based on thousands of facets,
some hierarchical, some Tag-like and some numerical would struggle I think!

- David


Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

Chris Hostetter-3

: creating a generic facet framework.  It would be interesting to know if
: any work has been published which enumerates the design patterns
: involved.  I found Marti Hearst's research paper "Design Recommendations
: for Hierarchical Faceted Search" very useful...
: http://flamenco.berkeley.edu/papers/faceted-workshop06.pdf

You know ... i keep hearing about flamenco, one of these days i should
really take it for  test drive and see what's under the hood.

Thanks for hte link to the paper ... I'm much of an academic, so I tend to
forget that there are people out there doing hardcore research into
these things.

: month etc.  That would require the query server to further split the
: counts depending on how it was to be presented in the user interface.

yeah ... for non trivial collections, the seperation between UI and
functionality is a lot blurrier when dealing with facets because computing
*everything* the UI *might* wnat to expose can be prohibitive.

: Sure, sites like epicurious.com with their fixed number of simple facets
: can be catered for in Solr.  Future sites, based on thousands of facets,
: some hierarchical, some Tag-like and some numerical would struggle I think!

I don't know about that ... CNET has hundreds of thousands of products,
with tens of thousands facet constraints ... and Solr seems to be doing ok
for us :)



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

David Legg

> I don't know about that ... CNET has hundreds of thousands of products,
> with tens of thousands facet constraints ... and Solr seems to be doing ok
> for us :)
>  

Touché... I'm impressed :-)

Still... I've got to write an application that will allow users to
assign a detailed subject classification to documents.  I've just
received a copy of the Bliss Bibliographic Classification.  Like
Flamenco, this scheme is also often mentioned in academic papers a lot
because it is an instance of faceted classification rather like
Ranganathan's colon classification scheme.  I've been meaning to read it
for a while to see what all the fuss is about... should keep me quiet
for a while.

- David
Reply | Threaded
Open this post in threaded view
|

Re: Hierarchical Facets?

Chris Hostetter-3
In reply to this post by Chris Hostetter-3

: You know ... i keep hearing about flamenco, one of these days i should
: really take it for  test drive and see what's under the hood.

i still haven't looked under the hood, but i did puruse the owners
manual...

        http://flamenco.berkeley.edu/data.html

...flamenco requires that all of your facets and all of your data be
imported at teh same time.  and while it's not neccessary for you to
specify that an documents mpas to each level of a hierarchyical facet term
(ie: you can say this doc maps to "Boston" you don't have to say it maps
to "US>MA>Boston") you do have to specify that Boston is a child of MA
which is a child of US when loading the data -- it doesn't look like there
is anyway to redefine the hierarchy on the fly, so it ammounts to the same
amount of precomputed information as the approach Yonik described earlier.

From what I can tell in the online Demos, Flamenco also doesn't seem to
attempt anything special regarding weighting differnet depths of the
hierarchy based on the current set of results -- for each hierarchical
facet, the nodes at the highest level that have matches are displayed.
(NOTE: i'm not sure if that's a limitation of the system, or an attempt at
making the UI consistent) ... ths is the area where i think hierarchical
facets are particulararly tricky, programaticly guess what the most
usefull subset of the hierarchy to display would be based on the
constraints the user has already applied.


From what I can tell: functionality could be added to Solr to do
everything Flamenco does right now and still support updates -- all that's
really needed is something to maintain the Facet configurations (order of
fields, pretty lables, and Hierarchy of terms) in memory.  Hell, if
someone was so inclined, they could write a "flamenco importer" to parse
the .tsv files Flamenco currently uses.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Sorting facets

Walter Lewis-2
In reply to this post by David Legg
My apologies if this has been answered before but I couldn't see it in
the FAQ, tutorial or wiki or the solr-user mail archives.

The explanation for sorting the documents returned by a search is quite
straight foward. I'm currently seeing the facets arriving in a more
random order.

Is the expectation that the code processing the result will apply its
own sorts to the facet lists?  or is there some other option I'm
overlooking?

Walter Lewis
Reply | Threaded
Open this post in threaded view
|

Re: Sorting facets

Yonik Seeley-2
On 11/4/06, Walter Lewis <[hidden email]> wrote:

> My apologies if this has been answered before but I couldn't see it in
> the FAQ, tutorial or wiki or the solr-user mail archives.
>
> The explanation for sorting the documents returned by a search is quite
> straight foward. I'm currently seeing the facets arriving in a more
> random order.
>
> Is the expectation that the code processing the result will apply its
> own sorts to the facet lists?  or is there some other option I'm
> overlooking?

For facet.field, If you don't put a limit on the number of facets (via
facet.limit or f.field.facet.limit) they won't be sorted.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Sorting facets

Walter Lewis-2
Yonik Seeley wrote:
> For facet.field, If you don't put a limit on the number of facets (via
> facet.limit or f.field.facet.limit) they won't be sorted.
... which sort is by number of matches descending, and for many probably
the single most useful.

Is there any way to make that sort alphabetical? I have a tag cloud
style display in mind.

Walter
Reply | Threaded
Open this post in threaded view
|

Re: Sorting facets

Yonik Seeley-2
On 11/4/06, Walter Lewis <[hidden email]> wrote:
> Yonik Seeley wrote:
> > For facet.field, If you don't put a limit on the number of facets (via
> > facet.limit or f.field.facet.limit) they won't be sorted.
> ... which sort is by number of matches descending, and for many probably
> the single most useful.
>
> Is there any way to make that sort alphabetical? I have a tag cloud
> style display in mind.

Not currently.  Is there any benefit of doing so in the server rather
than the client?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Sorting facets

Walter Lewis-2
Yonik Seeley wrote:
> On 11/4/06, Walter Lewis <[hidden email]> wrote:
>> Is there any way to make that sort alphabetical? I have a tag cloud
>> style display in mind.
> Not currently.  Is there any benefit of doing so in the server rather
> than the client?
Only that I don't have to write the code. :)

Walter