Picking Facet Fields by Frequency-in-Results

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Picking Facet Fields by Frequency-in-Results

Chris Harris-2
One task when designing a facet-based UI is deciding which fields to
facet on and display facets for. One possibility that I hope to
explore is to determine which fields to facet on dynamically, based on
the search results. In particular, I hypothesize that, for a somewhat
heterogeneous index (heterogeneous in terms of which fields a given
record might contain), that the following rule might be helpful: Facet
on a given field to the extent that it is frequently set in the
documents matching the user's search.

For example, let's say my results look like this:

Doc A:
  f1: foo
  f2: bar
  f3: <N/A>
  f4: <N/A>

Doc B:
  f1: foo2
  f2: <N/A>
  f3: <N/A>
  f4: <N/A>

Doc C:
  f1: foo3
  f2: quiz
  f3: <N/A>
  f4: buzz

Doc D:
  f1: foo4
  f2: question
  f3: bam
  f4: bing

The field usage information for these documents could be summarized like this:

field f1: Set in 4 docs
field f2: Set in 3 doc
field f3: Set 1 doc
field f4: Set 2 doc

If I were choosing facet fields based on the above rule, I would
definitely want to display facets for field f1, since occurs in all
documents.  If I had room for another facet in the UI, I would facet
f2. If I wanted another one, I'd go with f4, since it's more popular
than f3. I probably would ignore f3 in any case, because it's set for
only one document.

Has anyone implemented such a scheme with Solr? Any success? (The
closest thing I can find is
http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
to pick which facets to display based not on frequency but based more
on a ruleset.)

As far as implementation, the most straightforward approach (which
wouldn't involve modifying Solr) would apparently be to add a new
multi-valued "fieldsindexed" field to each document, which would note
which fields actually have a value for each document. So when I pass
data to Solr at indexing time, it will look something like this
(except of course it will be in valid Solr XML, rather than this
schematic):

Doc A:
  f1: foo
  f2: bar
  indexedfields: f1, f2

Doc B:
  f1: foo2
  indexedfields: f1

Doc C:
  f1: foo3
  f2: quiz
  f4: buzz
  indexedfields: f1, f2, f4

Doc D:
  f1: foo4
  f2: question
  f3: bam
  f4: bing
  indexedfields: f1, f2, f3, f4

Then to chose which facets to display, I call

http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true

and use the frequency information from this query to determine which
fields to display in the faceting UI. (To get the actual facet
information for those fields, I would query Solr a second time.)

Are there any alternatives that would be easier or more efficient?

Thanks,
Chris
Reply | Threaded
Open this post in threaded view
|

Re: Picking Facet Fields by Frequency-in-Results

Avlesh Singh
I understand the general need here. And just extending what you suggested
(indexing the fields themselves inside a multiValued field), you can perform
a query like this -
/search?q=myquery&facet=true&facet.field=indexedfields&facet.field=field1&facet.field=field2...&facet.sort=true

You'll get facets for all the fields (passed as multiple facet.field
params), including the one that gives you field frequency. You can do all
sorts of post processing on this data to achieve the desired.

Hope this helps.

Cheers
Avlesh

On Tue, Aug 4, 2009 at 2:20 AM, Chris Harris <[hidden email]> wrote:

> One task when designing a facet-based UI is deciding which fields to
> facet on and display facets for. One possibility that I hope to
> explore is to determine which fields to facet on dynamically, based on
> the search results. In particular, I hypothesize that, for a somewhat
> heterogeneous index (heterogeneous in terms of which fields a given
> record might contain), that the following rule might be helpful: Facet
> on a given field to the extent that it is frequently set in the
> documents matching the user's search.
>
> For example, let's say my results look like this:
>
> Doc A:
>  f1: foo
>  f2: bar
>  f3: <N/A>
>  f4: <N/A>
>
> Doc B:
>  f1: foo2
>  f2: <N/A>
>  f3: <N/A>
>  f4: <N/A>
>
> Doc C:
>  f1: foo3
>  f2: quiz
>  f3: <N/A>
>  f4: buzz
>
> Doc D:
>  f1: foo4
>  f2: question
>  f3: bam
>  f4: bing
>
> The field usage information for these documents could be summarized like
> this:
>
> field f1: Set in 4 docs
> field f2: Set in 3 doc
> field f3: Set 1 doc
> field f4: Set 2 doc
>
> If I were choosing facet fields based on the above rule, I would
> definitely want to display facets for field f1, since occurs in all
> documents.  If I had room for another facet in the UI, I would facet
> f2. If I wanted another one, I'd go with f4, since it's more popular
> than f3. I probably would ignore f3 in any case, because it's set for
> only one document.
>
> Has anyone implemented such a scheme with Solr? Any success? (The
> closest thing I can find is
> http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
> to pick which facets to display based not on frequency but based more
> on a ruleset.)
>
> As far as implementation, the most straightforward approach (which
> wouldn't involve modifying Solr) would apparently be to add a new
> multi-valued "fieldsindexed" field to each document, which would note
> which fields actually have a value for each document. So when I pass
> data to Solr at indexing time, it will look something like this
> (except of course it will be in valid Solr XML, rather than this
> schematic):
>
> Doc A:
>  f1: foo
>  f2: bar
>  indexedfields: f1, f2
>
> Doc B:
>  f1: foo2
>  indexedfields: f1
>
> Doc C:
>  f1: foo3
>  f2: quiz
>  f4: buzz
>  indexedfields: f1, f2, f4
>
> Doc D:
>  f1: foo4
>  f2: question
>  f3: bam
>  f4: bing
>  indexedfields: f1, f2, f3, f4
>
> Then to chose which facets to display, I call
>
>
> http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true
>
> and use the frequency information from this query to determine which
> fields to display in the faceting UI. (To get the actual facet
> information for those fields, I would query Solr a second time.)
>
> Are there any alternatives that would be easier or more efficient?
>
> Thanks,
> Chris
>
Reply | Threaded
Open this post in threaded view
|

Re: Picking Facet Fields by Frequency-in-Results

Erik Hatcher
And further on this, if you want a field automatically added to each  
document with the list of its field names, check out http://issues.apache.org/jira/browse/SOLR-1280

        Erik



On Aug 4, 2009, at 1:01 AM, Avlesh Singh wrote:

> I understand the general need here. And just extending what you  
> suggested
> (indexing the fields themselves inside a multiValued field), you can  
> perform
> a query like this -
> /search?
> q
> =
> myquery
> &facet
> =
> true
> &facet
> .field
> =
> indexedfields&facet.field=field1&facet.field=field2...&facet.sort=true
>
> You'll get facets for all the fields (passed as multiple facet.field
> params), including the one that gives you field frequency. You can  
> do all
> sorts of post processing on this data to achieve the desired.
>
> Hope this helps.
>
> Cheers
> Avlesh
>
> On Tue, Aug 4, 2009 at 2:20 AM, Chris Harris <[hidden email]>  
> wrote:
>
>> One task when designing a facet-based UI is deciding which fields to
>> facet on and display facets for. One possibility that I hope to
>> explore is to determine which fields to facet on dynamically, based  
>> on
>> the search results. In particular, I hypothesize that, for a somewhat
>> heterogeneous index (heterogeneous in terms of which fields a given
>> record might contain), that the following rule might be helpful:  
>> Facet
>> on a given field to the extent that it is frequently set in the
>> documents matching the user's search.
>>
>> For example, let's say my results look like this:
>>
>> Doc A:
>> f1: foo
>> f2: bar
>> f3: <N/A>
>> f4: <N/A>
>>
>> Doc B:
>> f1: foo2
>> f2: <N/A>
>> f3: <N/A>
>> f4: <N/A>
>>
>> Doc C:
>> f1: foo3
>> f2: quiz
>> f3: <N/A>
>> f4: buzz
>>
>> Doc D:
>> f1: foo4
>> f2: question
>> f3: bam
>> f4: bing
>>
>> The field usage information for these documents could be summarized  
>> like
>> this:
>>
>> field f1: Set in 4 docs
>> field f2: Set in 3 doc
>> field f3: Set 1 doc
>> field f4: Set 2 doc
>>
>> If I were choosing facet fields based on the above rule, I would
>> definitely want to display facets for field f1, since occurs in all
>> documents.  If I had room for another facet in the UI, I would facet
>> f2. If I wanted another one, I'd go with f4, since it's more popular
>> than f3. I probably would ignore f3 in any case, because it's set for
>> only one document.
>>
>> Has anyone implemented such a scheme with Solr? Any success? (The
>> closest thing I can find is
>> http://wiki.apache.org/solr/ComplexFacetingBrainstorming, which tries
>> to pick which facets to display based not on frequency but based more
>> on a ruleset.)
>>
>> As far as implementation, the most straightforward approach (which
>> wouldn't involve modifying Solr) would apparently be to add a new
>> multi-valued "fieldsindexed" field to each document, which would note
>> which fields actually have a value for each document. So when I pass
>> data to Solr at indexing time, it will look something like this
>> (except of course it will be in valid Solr XML, rather than this
>> schematic):
>>
>> Doc A:
>> f1: foo
>> f2: bar
>> indexedfields: f1, f2
>>
>> Doc B:
>> f1: foo2
>> indexedfields: f1
>>
>> Doc C:
>> f1: foo3
>> f2: quiz
>> f4: buzz
>> indexedfields: f1, f2, f4
>>
>> Doc D:
>> f1: foo4
>> f2: question
>> f3: bam
>> f4: bing
>> indexedfields: f1, f2, f3, f4
>>
>> Then to chose which facets to display, I call
>>
>>
>> http://myserver/solr/search?q=myquery&facet=true&facet.field=indexedfields&facet.sort=true
>>
>> and use the frequency information from this query to determine which
>> fields to display in the faceting UI. (To get the actual facet
>> information for those fields, I would query Solr a second time.)
>>
>> Are there any alternatives that would be easier or more efficient?
>>
>> Thanks,
>> Chris
>>

Reply | Threaded
Open this post in threaded view
|

Re: Picking Facet Fields by Frequency-in-Results

hossman
In reply to this post by Chris Harris-2

: the search results. In particular, I hypothesize that, for a somewhat
: heterogeneous index (heterogeneous in terms of which fields a given
: record might contain), that the following rule might be helpful: Facet
: on a given field to the extent that it is frequently set in the
: documents matching the user's search.

if you go down this route, i suspect you'll find it more interesting to
sort the fields based on which facets have the highest average facet count
... consider the (frequently typical) case of fieldX which is the sme for
every doc in the result set, or fieldY which is unique for every doc in
the result set -- neither of those will make very useful facets for your
user (fieldY might, but only if you have some external info about how to
sort the field constraints so the "best" values are shown to the the user
to help them drill down to the one doc they'd be most interested in)

my hunch: even if you implemented a syustem like this, it would probbly
only be useful in extremely generic usecases where you have absolutely no
idea what data is in your index and you want a useful "exploration" UI.  
if you have even a little bit of information about your schema and your
users, you can probably do a much better job at guessing which fields to
facet on first -- and if those fields have very few non-zero constraints
for a given query, it's easy to skip them and show the next one in your
list.

the utility of a facet option in your UI tends to depend a lot more on who
your users are then it does on what your index is.



-Hoss