Lucene Facets performance problems (version 4.7.2)

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene Facets performance problems (version 4.7.2)

Simona Russo
Hi all,

we use Lucene *Facet* library version* 4.7.2.*

We have an *index* with *45 millions *of documents (size about 15 GB)  and
a *taxonomy* index with *57* millions of documents (size about 2 GB).

The total *facet search* time achieve *15 seconds*!

Is it possible to improve this time? Is there any tips to *configure* the
*taxonomy* index to avoid this waste of time?


Thanks in advance
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Facets performance problems (version 4.7.2)

Erick Erickson
You haven't given us much to go on. What is the cardinality of the fields
you're faceting on? What does your query look like? How are you measuring
time? What is the output if you add &debug=true?

In short, your question is far too vague to give any meaningful
information, there could be any of a dozen recommendations.

Best
Erick
On Feb 26, 2016 18:01, "Simona Russo" <[hidden email]> wrote:

> Hi all,
>
> we use Lucene *Facet* library version* 4.7.2.*
>
> We have an *index* with *45 millions *of documents (size about 15 GB)  and
> a *taxonomy* index with *57* millions of documents (size about 2 GB).
>
> The total *facet search* time achieve *15 seconds*!
>
> Is it possible to improve this time? Is there any tips to *configure* the
> *taxonomy* index to avoid this waste of time?
>
>
> Thanks in advance
>
Reply | Threaded
Open this post in threaded view
|

RE: Lucene Facets performance problems (version 4.7.2)

Uwe Schindler
Hi Erick,

this was a question about Lucene so "&debug=true" won't help. It also talks about *Lucene's Facetting*, not Solr's.

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Friday, February 26, 2016 8:22 AM
> To: java-user <[hidden email]>
> Subject: Re: Lucene Facets performance problems (version 4.7.2)
>
> You haven't given us much to go on. What is the cardinality of the fields
> you're faceting on? What does your query look like? How are you measuring
> time? What is the output if you add &debug=true?
>
> In short, your question is far too vague to give any meaningful
> information, there could be any of a dozen recommendations.
>
> Best
> Erick
> On Feb 26, 2016 18:01, "Simona Russo" <[hidden email]> wrote:
>
> > Hi all,
> >
> > we use Lucene *Facet* library version* 4.7.2.*
> >
> > We have an *index* with *45 millions *of documents (size about 15 GB)
> and
> > a *taxonomy* index with *57* millions of documents (size about 2 GB).
> >
> > The total *facet search* time achieve *15 seconds*!
> >
> > Is it possible to improve this time? Is there any tips to *configure* the
> > *taxonomy* index to avoid this waste of time?
> >
> >
> > Thanks in advance
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene Facets performance problems (version 4.7.2)

Shai Erera
True, but Erick's questions are still valid :-). We need more info to
answer these questions. So Simona, the more info you can give us the better
we'll be able to answer.

On Fri, Feb 26, 2016, 10:54 Uwe Schindler <[hidden email]> wrote:

> Hi Erick,
>
> this was a question about Lucene so "&debug=true" won't help. It also
> talks about *Lucene's Facetting*, not Solr's.
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
> > -----Original Message-----
> > From: Erick Erickson [mailto:[hidden email]]
> > Sent: Friday, February 26, 2016 8:22 AM
> > To: java-user <[hidden email]>
> > Subject: Re: Lucene Facets performance problems (version 4.7.2)
> >
> > You haven't given us much to go on. What is the cardinality of the fields
> > you're faceting on? What does your query look like? How are you measuring
> > time? What is the output if you add &debug=true?
> >
> > In short, your question is far too vague to give any meaningful
> > information, there could be any of a dozen recommendations.
> >
> > Best
> > Erick
> > On Feb 26, 2016 18:01, "Simona Russo" <[hidden email]> wrote:
> >
> > > Hi all,
> > >
> > > we use Lucene *Facet* library version* 4.7.2.*
> > >
> > > We have an *index* with *45 millions *of documents (size about 15 GB)
> > and
> > > a *taxonomy* index with *57* millions of documents (size about 2 GB).
> > >
> > > The total *facet search* time achieve *15 seconds*!
> > >
> > > Is it possible to improve this time? Is there any tips to *configure*
> the
> > > *taxonomy* index to avoid this waste of time?
> > >
> > >
> > > Thanks in advance
> > >
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Facets performance problems (version 4.7.2)

Rob Audenaerde
In reply to this post by Simona Russo
Hi Simona,

In addition to Ericks' questions:

Are you talking about *search* time or facet-collection time? And how many
results are in your result set?

I have some experience with collecting facets from large results set, these
are typically slow (as they have to retrieve all the relevant facet fields
for the facetted doccument). In Lucene 4.8 the RandomSamplingFacetsCollector
returned (as per https://issues.apache.org/jira/browse/LUCENE-5476).

-Rob

On Fri, Feb 26, 2016 at 6:01 AM, Simona Russo <[hidden email]> wrote:

> Hi all,
>
> we use Lucene *Facet* library version* 4.7.2.*
>
> We have an *index* with *45 millions *of documents (size about 15 GB)  and
> a *taxonomy* index with *57* millions of documents (size about 2 GB).
>
> The total *facet search* time achieve *15 seconds*!
>
> Is it possible to improve this time? Is there any tips to *configure* the
> *taxonomy* index to avoid this waste of time?
>
>
> Thanks in advance
>
Reply | Threaded
Open this post in threaded view
|

Re: Lucene Facets performance problems (version 4.7.2)

Simona Russo
Thanks for yours quick feedback.

The problem happens by the customer and we trying to simulate it on our
environments in order to figure out which part of the query is slow.

I add some informations:

   - the facet *dimension* is composed by *2 categories* (for example
   "/cat1/cat2") and the *second* category ("cat2") is a *multivalue field*
   - the cardinality is
      - "cat1" is about 15 millions of unique value
      - "cat2" *every* unique *"cat1"* contains maximum 100 documents
and *every
      document* contains an average of 30 values of the field "cat2"
(*multivalue
      field*)
   - we use the following statements to obtain the "facets"

FacetsCollector fc = new FacetsCollector ();
>
FacetsCollector.search ( searcher, query, maxResults, fc );
>
Facets facetsCount = new *FastTaxonomyFacetCounts* (indexFieldName, tReader,
> facetsConfig, fc );
>
...


And after we have a recursive call like this to browse the path of the
category:

> FacetResult facetResult = facetsCount.getTopChildren (topN, dimensionName,
> arrayCurrentPath );



From the first tests performed in our laboratory, the slowest part seems to
be when we create an instance of new *FastTaxonomyFacetCounts.*
*I don't know if the problem is with category with **multivalue field*.

We are still investigating so I have no other information, but if you have
other questions feel free to contact me.

Thanks
Simona





2016-02-26 10:47 GMT+01:00 Rob Audenaerde <[hidden email]>:

> Hi Simona,
>
> In addition to Ericks' questions:
>
> Are you talking about *search* time or facet-collection time? And how many
> results are in your result set?
>
> I have some experience with collecting facets from large results set, these
> are typically slow (as they have to retrieve all the relevant facet fields
> for the facetted doccument). In Lucene 4.8 the
> RandomSamplingFacetsCollector
> returned (as per https://issues.apache.org/jira/browse/LUCENE-5476).
>
> -Rob
>
> On Fri, Feb 26, 2016 at 6:01 AM, Simona Russo <[hidden email]> wrote:
>
> > Hi all,
> >
> > we use Lucene *Facet* library version* 4.7.2.*
> >
> > We have an *index* with *45 millions *of documents (size about 15 GB)
> and
> > a *taxonomy* index with *57* millions of documents (size about 2 GB).
> >
> > The total *facet search* time achieve *15 seconds*!
> >
> > Is it possible to improve this time? Is there any tips to *configure* the
> > *taxonomy* index to avoid this waste of time?
> >
> >
> > Thanks in advance
> >
>