Re: [CONF] Apache Solr Reference Guide > Faceting

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: [CONF] Apache Solr Reference Guide > Faceting

David Smiley
Mikhail:
That feature is used _internally_ by the facet module, particularly for facet refinement.  I am not sure we should advertise this feature... I guess it's fine?  Admittedly I've used it years ago when I wanted to facet on a bunch of pre-selected values.

On Tue, Aug 30, 2016 at 4:20 AM Mikhail Khludnev (Confluence) <[hidden email]> wrote:
avatar_ced8c191168b3bc6af3da92384625042.png Mikhail Khludnev edited a page
 
Change comment: i found facet.field={!terms=a,b,c}foo support in code, but haven't found any mentions in doc, nor even tests. I checked that it works, and decided to opt wiki. wdy?
page-icon.png
Faceting

As described in the section Overview of Searching in Solr, faceting is the arrangement of search results into categories based on indexed terms. Searchers are presented with the indexed terms, along with numerical counts of how many matching documents were found were each term. Faceting makes it easy for users to explore search results, narrowing in on exactly the results they are looking for.

Panel

Topics covered in this section:

Table of Contents
maxLevel 2

General Parameters

The table below summarizes the general parameters for controlling faceting.

Parameter

Description

facet

If set to true, enables faceting.

facet.query

Specifies a Lucene query to generate a facet count.

These parameters are described in the sections below.

The facet Parameter

If set to "true," this parameter enables facet counts in the query response. If set to "false" to a blank or missing value, this parameter disables faceting. None of the other parameters listed below will have any effect unless this parameter is set to "true." The default value is blank.

The facet.query Parameter

This parameter allows you to specify an arbitrary query in the Lucene default syntax to generate a facet count. By default, Solr's faceting feature automatically determines the unique terms for a field and returns a count for each of those terms. Using facet.query, you can override this default behavior and select exactly which terms or expressions you would like to see counted. In a typical implementation of faceting, you will specify a number of facet.query parameters. This parameter can be particularly useful for numeric-range-based facets or prefix-based facets.

You can set the facet.query parameter multiple times to indicate that multiple queries should be used as separate facet constraints.

To use facet queries in a syntax other than the default syntax, prefix the facet query with the name of the query notation. For example, to use the hypothetical myfunc query parser, you could set the facet.query parameter like so:

facet.query={!myfunc}name~fred

Field-Value Faceting Parameters

Several parameters can be used to trigger faceting based on the indexed terms in a field.

When using this parameter, it is important to remember that "term" is a very specific concept in Lucene: it relates to the literal field/value pairs that are indexed after any analysis occurs. For text fields that include stemming, lowercasing, or word splitting, the resulting terms may not be what you expect. If you want Solr to perform both analysis (for searching) and faceting on the full literal strings, use the copyField directive in your Schema to create two versions of the field: one Text and one String. Make sure both are indexed="true". (For more information about the copyField directive, see Documents, Fields, and Schema Design.)

The table below summarizes Solr's field value faceting parameters.

 

Parameter

Description

facet.field

Identifies a field to be treated as a facet.

facet.prefix

Limits the terms used for faceting to those that begin with the specified prefix.

facet.contains Limits the terms used for faceting to those that contain the specified substring.
facet.contains.ignoreCase If facet.contains is used, ignore case when searching for the specified substring.

facet.sort

Controls how faceted results are sorted.

facet.limit

Controls how many constraints should be returned for each facet.

facet.offset

Specifies an offset into the facet results at which to begin displaying facets.

facet.mincount

Specifies the minimum counts required for a facet field to be included in the response.

facet.missing

Controls whether Solr should compute a count of all matching results which have no value for the field, in addition to the term-based constraints of a facet field.

facet.method

Selects the algorithm or method Solr should use when faceting a field.

facet.enum.cache.minDF

(Advanced) Specifies the minimum document frequency (the number of documents matching a term) for which the filterCache should be used when determining the constraint count for that term.

facet.overrequest.count (Advanced) A number of documents, beyond the effective facet.limit to request from each shard in a distributed search
facet.overrequest.ratio (Advanced) A multiplier of the effective facet.limit to request from each shard in a distributed search

facet.threads

(Advanced) Controls parallel execution of field faceting

These parameters are described in the sections below.

The facet.field Parameter

The facet.field parameter identifies a field that should be treated as a facet. It iterates over each Term in the field and generate a facet count using that Term as the constraint. This parameter can be specified multiple times in a query to select multiple facet fields.

Note

If you do not set this parameter to at least one field in the schema, none of the other parameters described in this section will have any effect.

The facet.prefix Parameter

The facet.prefix parameter limits the terms on which to facet to those starting with the given string prefix. This does not limit the query in any way, only the facets that would be returned in response to the query.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.prefix.

The facet.contains Parameter

The facet.contains parameter limits the terms on which to facet to those containing the given substring. This does not limit the query in any way, only the facets that would be returned in response to the query.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.contains.

The facet.contains.ignoreCase Parameter

If facet.contains is used, the facet.contains.ignoreCase parameter causes case to be ignored when matching the given substring against candidate facet terms.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.contains.ignoreCase.

The facet.sort Parameter

This parameter determines the ordering of the facet field constraints.

facet.sort Setting

Results

count

Sort the constraints by count (highest count first).

index

Return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ASCII range, this will be alphabetically sorted.

The default is count if facet.limit is greater than 0, otherwise, the default is index.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.sort.

The facet.limit Parameter

This parameter specifies the maximum number of constraint counts (essentially, the number of facets for a field that are returned) that should be returned for the facet fields. A negative value means that Solr will return unlimited number of constraint counts.

The default value is 100.

This parameter can be specified on a per-field basis to apply a distinct limit to each field with the syntax of f.<fieldname>.facet.limit.

The facet.offset Parameter

The facet.offset parameter indicates an offset into the list of constraints to allow paging.

The default value is 0.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.offset.

The facet.mincount Parameter

The facet.mincount parameter specifies the minimum counts required for a facet field to be included in the response. If a field's counts are below the minimum, the field's facet is not returned.

The default value is 0.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.mincount.

The facet.missing Parameter

If set to true, this parameter indicates that, in addition to the Term-based constraints of a facet field, a count of all results that match the query but which have no facet value for the field should be computed and returned in the response.

The default value is false.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.missing.

The facet.method Parameter

The facet.method parameter selects the type of algorithm or method Solr should use when faceting a field.

Setting

Results

enum

Enumerates all terms in a field, calculating the set intersection of documents that match the term with documents that match the query. This method is recommended for faceting multi-valued fields that have only a few distinct values. The average number of values per document does not matter. For example, faceting on a field with U.S. States such as Alabama, Alaska, ... Wyoming would lead to fifty cached filters which would be used over and over again. The filterCache should be large enough to hold all the cached filters.

fc

Calculates facet counts by iterating over documents that match the query and summing the terms that appear in each document. This is currently implemented using an UnInvertedField cache if the field either is multi-valued or is tokenized (according to FieldType.isTokened()). Each document is looked up in the cache to see what terms/values it contains, and a tally is incremented for each value. This method is excellent for situations where the number of indexed values for the field is high, but the number of values per document is low. For multi-valued fields, a hybrid approach is used that uses term filters from the filterCache for terms that match many documents. The letters fc stand for field cache.

fcs

Per-segment field faceting for single-valued string fields. Enable with facet.method=fcs and control the number of threads used with the threads local parameter. This parameter allows faceting to be faster in the presence of rapid index changes.

The default value is fc (except for fields using the BoolField field type) since it tends to use less memory and is faster when a field has many unique terms in the index.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.method.

The facet.enum.cache.minDf Parameter

This parameter indicates the minimum document frequency (the number of documents matching a term) for which the filterCache should be used when determining the constraint count for that term. This is only used with the facet.method=enum method of faceting.

A value greater than zero decreases the filterCache's memory usage, but increases the time required for the query to be processed. If you are faceting on a field with a very large number of terms, and you wish to decrease memory usage, try setting this parameter to a value between 25 and 50, and run a few tests. Then, optimize the parameter setting as necessary.

The default value is 0, causing the filterCache to be used for all terms in the field.

This parameter can be specified on a per-field basis with the syntax of f.<fieldname>.facet.enum.cache.minDF.

Over-Request Parameters

In some situations, the accuracy in selecting the "top" constraints returned for a facet in a distributed Solr query can be improved by "Over Requesting" the number of desired constraints (ie: facet.limit) from each of the individual Shards.  In these situations, each shard is by default asked for the top "10 + (1.5 * facet.limit)" constraints.

In some situations, depending on how your docs are partitioned across your shards, and what facet.limit value you used, you may find it advantageous to increase or decrease the amount of over-requesting Solr does.  This can be achieved by setting the facet.overrequest.count (defaults to 10) and facet.overrequest.ratio (defaults to 1.5) parameters.

The facet.threads Parameter

This param will cause loading the underlying fields used in faceting to be executed in parallel with the number of threads specified. Specify as facet.threads=N where N is the maximum number of threads used. Omitting this parameter or specifying the thread count as 0 will not spawn any threads, and only the main request thread will be used. Specifying a negative number of threads will create up to Integer.MAX_VALUE threads.

Range Faceting

You can use Range Faceting on any date field or any numeric field that supports range queries. This is particularly useful for stitching together a series of range queries (as facet by query) for things like prices. As of Solr 3.1, Range Faceting is preferred over Date Faceting (described below).

Parameter

Description

facet.range

Specifies the field to facet by range.

facet.range.start

Specifies the start of the facet range.

facet.range.end

Specifies the end of the facet range.

facet.range.gap

Specifies the span of the range as a value to be added to the lower bound.

facet.range.hardend

A boolean parameter that specifies how Solr handles a range gap that cannot be evenly divided between the range start and end values. If true, the last range constraint will have the facet.range.end value an upper bound. If false, the last range will have the smallest possible upper bound greater then facet.range.end such that the range is the exact width of the specified range gap. The default value for this parameter is false.

facet.range.include

Specifies inclusion and exclusion preferences for the upper and lower bounds of the range. See the facet.range.include topic for more detailed information.

facet.range.other

Specifies counts for Solr to compute in addition to the counts for each facet range constraint.

facet.range.method Specifies the algorithm or method to use for calculating facets.

The facet.range Parameter

The facet.range parameter defines the field for which Solr should create range facets. For example:

facet.range=price&facet.range=age

facet.range=lastModified_dt

The facet.range.start Parameter

The facet.range.start parameter specifies the lower bound of the ranges. You can specify this parameter on a per field basis with the syntax of f.<fieldname>.facet.range.start. For example:

f.price.facet.range.start=0.0&f.age.facet.range.start=10

f.lastModified_dt.facet.range.start=NOW/DAY-30DAYS

The facet.range.end Parameter

The facet.range.end specifies the upper bound of the ranges. You can specify this parameter on a per field basis with the syntax of f.<fieldname>.facet.range.end. For example:

f.price.facet.range.end=1000.0&f.age.facet.range.start=99

f.lastModified_dt.facet.range.end=NOW/DAY+30DAYS

The facet.range.gap Parameter

The span of each range expressed as a value to be added to the lower bound. For date fields, this should be expressed using the DateMathParser syntax (such as, facet.range.gap=%2B1DAY ... '+1DAY'). You can specify this parameter on a per-field basis with the syntax of f.<fieldname>.facet.range.gap. For example:

f.price.facet.range.gap=100&f.age.facet.range.gap=10

f.lastModified_dt.facet.range.gap=+1DAY

The facet.range.hardend Parameter

The facet.range.hardend parameter is a Boolean parameter that specifies how Solr should handle cases where the facet.range.gap does not divide evenly between facet.range.start and facet.range.end. If true, the last range constraint will have the facet.range.end value as an upper bound. If false, the last range will have the smallest possible upper bound greater then facet.range.end such that the range is the exact width of the specified range gap. The default value for this parameter is false.

This parameter can be specified on a per field basis with the syntax f.<fieldname>.facet.range.hardend.

The facet.range.include Parameter

By default, the ranges used to compute range faceting between facet.range.start and facet.range.end are inclusive of their lower bounds and exclusive of the upper bounds. The "before" range defined with the facet.range.other parameter is exclusive and the "after" range is inclusive. This default, equivalent to "lower" below, will not result in double counting at the boundaries. You can use the facet.range.include parameter to modify this behavior using the following options:

Option

Description

lower

All gap-based ranges include their lower bound.

upper

All gap-based ranges include their upper bound.

edge

The first and last gap ranges include their edge bounds (lower for the first one, upper for the last one) even if the corresponding upper/lower option is not specified.

outer

The "before" and "after" ranges will be inclusive of their bounds, even if the first or last ranges already include those boundaries.

all

Includes all options: lower, upper, edge, outer.

You can specify this parameter on a per field basis with the syntax of f.<fieldname>.facet.range.include, and you can specify it multiple times to indicate multiple choices.

Info

To ensure you avoid double-counting, do not choose both lower and upper, do not choose outer, and do not choose all.

The facet.range.other Parameter

The facet.range.other parameter specifies that in addition to the counts for each range constraint between facet.range.start and facet.range.end, counts should also be computed for these options:

Option

Description

before

All records with field values lower then lower bound of the first range.

after

All records with field values greater then the upper bound of the last range.

between

All records with field values between the start and end bounds of all ranges.

none

Do not compute any counts.

all

Compute counts for before, between, and after.

This parameter can be specified on a per field basis with the syntax of f.<fieldname>.facet.range.other. In addition to the all option, this parameter can be specified multiple times to indicate multiple choices, but none will override all other options.

The facet.range.method Parameter

The facet.range.method parameter selects the type of algorithm or method Solr should use for range faceting. Both methods produce the same results, but performance may vary.

Method Description
filter This method generates the ranges based on other facet.range parameters, and for each of them executes a filter that later intersects with the main query resultset to get the count. It will make use of the filterCache, so it will benefit of a cache large enough to contain all ranges.
dv This method iterates the documents that match the main query, and for each of them finds the correct range for the value. This method will make use of docValues (if enabled for the field) or fieldCache. "dv" method is not supported for field type DateRangeField or when using group.facets.

Default value for this parameter is "filter".

The facet.mincount Parameter in Range Faceting

The facet.mincount parameter, the same one as used in field faceting is also applied to range faceting. When used, no ranges with a count below the minimum will be included in the response.

Info
title Date Ranges & Time Zones

Range faceting on date fields is a common situation where the TZ parameter can be useful to ensure that the "facet counts per day" or "facet counts per month" are based on a meaningful definition of when a given day/month "starts" relative to a particular TimeZone.

For more information, see the examples in the Working with Dates section.

Pivot (Decision Tree) Faceting

Pivoting is a summarization tool that lets you automatically sort, count, total or average data stored in a table. The results are typically displayed in a second table showing the summarized data. Pivot faceting lets you create a summary table of the results from a faceting documents by multiple fields. 

Another way to look at it is that the query produces a Decision Tree, in that Solr tells you "for facet A, the constraints/counts are X/N, Y/M, etc. If you were to constrain A by X, then the constraint counts for B would be S/P, T/Q, etc.". In other words, it tells you in advance what the "next" set of facet results would be for a field if you apply a constraint from the current facet results.

facet.pivot

The facet.pivot parameter defines the fields to use for the pivot. Multiple facet.pivot values will create multiple "facet_pivot" sections in the response. Separate each list of fields with a comma.

facet.pivot.mincount

The facet.pivot.mincount parameter defines the minimum number of documents that need to match in order for the facet to be included in results. The default is 1.

Using the "bin/solr -e techproducts" example, A query URL like this one will returns the data below, with the pivot faceting results found in the section "facet_pivot":

Code Block
borderColor #666666
borderStyle solid
http://localhost:8983/solr/techproducts/select?q=*:*&facet.pivot=cat,popularity,inStock
   &facet.pivot=popularity,cat&facet=true&facet.field=cat&facet.limit=5
   &rows=0&wt=json&indent=true&facet.pivot.mincount=2
Code Block
borderColor #666666
language js
borderStyle solid
  "facet_counts":{
    "facet_queries":{},
    "facet_fields":{
      "cat":[
        "electronics",14,
        "currency",4,
        "memory",3,
        "connector",2,
        "graphics card",2]},
    "facet_dates":{},
    "facet_ranges":{},
    "facet_pivot":{
      "cat,popularity,inStock":[{
          "field":"cat",
          "value":"electronics",
          "count":14,
          "pivot":[{
              "field":"popularity",
              "value":6,
              "count":5,
              "pivot":[{
                  "field":"inStock",
                  "value":true,
                  "count":5}]},
...

Combining Stats Component With Pivots

In addition to some of the  general local parameters supported by other types of faceting, a stats local parameters can be used with facet.pivot to refer to  stats.field instances (by tag) that you would like to have computed for each Pivot Constraint.

In the example below, two different (overlapping) sets of statistics are computed for each of the facet.pivot result hierarchies:

Code Block
borderColor #666666
borderStyle solid
stats=true
stats.field={!tag=piv1,piv2 min=true max=true}price
stats.field={!tag=piv2 mean=true}popularity
facet=true
facet.pivot={!stats=piv1}cat,inStock
facet.pivot={!stats=piv2}manu,inStock

Results:

Code Block
borderColor #666666
borderStyle solid
"facet_pivot":{
  "cat,inStock":[{
      "field":"cat",
      "value":"electronics",
      "count":12,
      "pivot":[{
          "field":"inStock",
          "value":true,
          "count":8,
          "stats":{
            "stats_fields":{
              "price":{
                "min":74.98999786376953,
                "max":399.0}}}},
        {
          "field":"inStock",
          "value":false,
          "count":4,
          "stats":{
            "stats_fields":{
              "price":{
                "min":11.5,
                "max":649.989990234375}}}}],
      "stats":{
        "stats_fields":{
          "price":{
            "min":11.5,
            "max":649.989990234375}}}},
    {
      "field":"cat",
      "value":"currency",
      "count":4,
      "pivot":[{
          "field":"inStock",
          "value":true,
          "count":4,
          "stats":{
            "stats_fields":{
              "price":{
                ...
  "manu,inStock":[{
      "field":"manu",
      "value":"inc",
      "count":8,
      "pivot":[{
          "field":"inStock",
          "value":true,
          "count":7,