Quantcast

prefix facet performance

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

prefix facet performance

Maria Muslea
Hi,

I have ~40K documents in SOLR (not many) and a multivalued facet field that
contains at least 2K values per document.

The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc, and
I use facet.prefix.

q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/


with "concept" defined as:


<field name=“concept" type="string" indexed="true" multiValued="true"/>


This generates the output that I am looking for, but it takes more than 10
seconds per query.


Is there any way that I could improve the facet query performance for this
example?


Thank you,

Maria
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

Yonik Seeley
How many unique values in the index?
You could try facet.method=enum

-Yonik


On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea <[hidden email]> wrote:

> Hi,
>
> I have ~40K documents in SOLR (not many) and a multivalued facet field that
> contains at least 2K values per document.
>
> The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc, and
> I use facet.prefix.
>
> q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/
>
>
> with "concept" defined as:
>
>
> <field name=“concept" type="string" indexed="true" multiValued="true"/>
>
>
> This generates the output that I am looking for, but it takes more than 10
> seconds per query.
>
>
> Is there any way that I could improve the facet query performance for this
> example?
>
>
> Thank you,
>
> Maria
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

Maria Muslea
Hmmm, not sure. Probably in the range of 100K-500K.

Before writing the email I was just looking at:
http://yonik.com/facet-performance/

Wow, using facet.method=enum makes a big difference. I will read on it to
understand what it does.

Thank you so much.

Maria

On Tue, Apr 18, 2017 at 5:21 PM, Yonik Seeley <[hidden email]> wrote:

> How many unique values in the index?
> You could try facet.method=enum
>
> -Yonik
>
>
> On Tue, Apr 18, 2017 at 8:16 PM, Maria Muslea <[hidden email]>
> wrote:
> > Hi,
> >
> > I have ~40K documents in SOLR (not many) and a multivalued facet field
> that
> > contains at least 2K values per document.
> >
> > The values of the facet field look like: A/B, A/C, A/D, C/E, M/F, etc,
> and
> > I use facet.prefix.
> >
> > q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/
> >
> >
> > with "concept" defined as:
> >
> >
> > <field name=“concept" type="string" indexed="true" multiValued="true"/>
> >
> >
> > This generates the output that I am looking for, but it takes more than
> 10
> > seconds per query.
> >
> >
> > Is there any way that I could improve the facet query performance for
> this
> > example?
> >
> >
> > Thank you,
> >
> > Maria
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

alessandro.benedetti
Hi Maria,
If you have 100-500.000 unique values for the field you are interested in, and the cardinality of your search results is actually quite small in comparison, I am not that sure term enum will help you that much ...

To simplify, with the term enum approach, you iterate over each unique value, if it matches the prefix and then you count the intersection of the result set with the posting list for that term.
In your case, your result set is likely to be much smaller than the number of unique values.
I would assume you are using the fc approach, which in my opinion was not a bad idea.
Let's start from the algorithm you are using and the schema config for your field,

Cheers
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

Maria Muslea
Actually using facet.method=enum made a HUGE difference even in my case
where I have many unique values. I am happy with the query response time
now.

Is there a way in SOLR to count the unique values for a field? If not, I
could run the reindexing and count the unique values while I add them to
give you a more accurate count of how many I have (there is a good chance
that I have more than 500K).

Thanks,
Maria

On Fri, Apr 21, 2017 at 1:16 AM, alessandro.benedetti <[hidden email]>
wrote:

> Hi Maria,
> If you have 100-500.000 unique values for the field you are interested in,
> and the cardinality of your search results is actually quite small in
> comparison, I am not that sure term enum will help you that much ...
>
> To simplify, with the term enum approach, you iterate over each unique
> value, if it matches the prefix and then you count the intersection of the
> result set with the posting list for that term.
> In your case, your result set is likely to be much smaller than the number
> of unique values.
> I would assume you are using the fc approach, which in my opinion was not a
> bad idea.
> Let's start from the algorithm you are using and the schema config for your
> field,
>
> Cheers
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/prefix-facet-performance-tp4330684p4331221.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

alessandro.benedetti
That is quite interesting !
You can use the stats module ( in association with the Json facets if you need it) to calculate an accurate approximation of the unique values [1] [2] .

Good to know it improved your scenario, I may need to update my knowledge of term enum internals!
Can you describe your schema configuration for the field and the way you were faceting before in comparison to the way you facet now ( with the related benefit)

[1] https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
[2] http://yonik.com/solr-count-distinct/
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

Maria Muslea
The field is:

<field name="concept" type="string" indexed="true" multiValued="true"/>

and using unique() I found that it has 700K+ unique values.

The query before (that takes ~10s):

wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/

the query after (that is almost instant):

wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/&facet.method=enum'

Maria

On Fri, Apr 21, 2017 at 8:59 AM, alessandro.benedetti <[hidden email]>
wrote:

> That is quite interesting !
> You can use the stats module ( in association with the Json facets if you
> need it) to calculate an accurate approximation of the unique values [1]
> [2]
> .
>
> Good to know it improved your scenario, I may need to update my knowledge
> of
> term enum internals!
> Can you describe your schema configuration for the field and the way you
> were faceting before in comparison to the way you facet now ( with the
> related benefit)
>
> [1] https://cwiki.apache.org/confluence/display/solr/The+Stats+Component
> [2] http://yonik.com/solr-count-distinct/
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/prefix-facet-performance-tp4330684p4331309.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

Yonik Seeley
On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea <[hidden email]> wrote:

> The field is:
>
> <field name="concept" type="string" indexed="true" multiValued="true"/>
>
> and using unique() I found that it has 700K+ unique values.
>
> The query before (that takes ~10s):
>
> wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/
>
> the query after (that is almost instant):
>
> wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=concept&facet.prefix=A/&facet.method=enum'

Ah, the fact that you specify a facet.prefix makes this perfectly
aligned for the "enum" method, which can skip directly to the first
term on-or-after "A/"
facet.method=enum goes term-by-term, calculating the intersection with
the facet domain.
In this case, it's the number of terms that start with "A/" that
matters, not the number of terms in the entire field (hence the
speedup).

-Yonik
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: prefix facet performance

Maria Muslea
I see. Once I specify a prefix the number of terms is MUCH smaller.

Thank you again for all your help.

Maria

On Fri, Apr 21, 2017 at 1:46 PM, Yonik Seeley <[hidden email]> wrote:

> On Fri, Apr 21, 2017 at 4:25 PM, Maria Muslea <[hidden email]>
> wrote:
> > The field is:
> >
> > <field name="concept" type="string" indexed="true" multiValued="true"/>
> >
> > and using unique() I found that it has 700K+ unique values.
> >
> > The query before (that takes ~10s):
> >
> > wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=
> concept&facet.prefix=A/
> >
> > the query after (that is almost instant):
> >
> > wt=json&indent=true&q=*:*&rows=0&facet=true&facet.field=
> concept&facet.prefix=A/&facet.method=enum'
>
> Ah, the fact that you specify a facet.prefix makes this perfectly
> aligned for the "enum" method, which can skip directly to the first
> term on-or-after "A/"
> facet.method=enum goes term-by-term, calculating the intersection with
> the facet domain.
> In this case, it's the number of terms that start with "A/" that
> matters, not the number of terms in the entire field (hence the
> speedup).
>
> -Yonik
>
Loading...