Slow faceting performance on a docValues field

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Slow faceting performance on a docValues field

David smith-2
I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin):
"process": {
 "time": 24709,
 "query": { "time": 54 }, "facet": { "time": 24574 },


The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits.
I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck.  The facet part of the query is as follows:

"params": { "facet.range": "eventDate",
 "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
 "f.eventDate.facet.range.gap": "+1DAY",
 "start": "0",

 "rows": "10",

 "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",

 "f.eventDate.facet.mincount": "1",

 "facet": "true",

 "debugQuery": "true",
 "_": "1421169383802"
 }

And, the relevant schema definition is as follows:

   <field name="eventDate" type="tdate" indexed="true" stored="true" multiValued="false" docValues="true"/>

    <!-- A Trie based date field for faster date range queries and date faceting. -->
    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>


During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index.  I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.

I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field.

Any suggestions welcome.

Regards,
David

Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Shawn Heisey-2
On 1/13/2015 10:35 AM, David Smith wrote:

> I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin):
> "process": {
>  "time": 24709,
>  "query": { "time": 54 }, "facet": { "time": 24574 },
>
>
> The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits.
> I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck.  The facet part of the query is as follows:
>
> "params": { "facet.range": "eventDate",
>  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
>  "f.eventDate.facet.range.gap": "+1DAY",
>  "start": "0",
>
>  "rows": "10",
>
>  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
>
>  "f.eventDate.facet.mincount": "1",
>
>  "facet": "true",
>
>  "debugQuery": "true",
>  "_": "1421169383802"
>  }
>
> And, the relevant schema definition is as follows:
>
>    <field name="eventDate" type="tdate" indexed="true" stored="true" multiValued="false" docValues="true"/>
>
>     <!-- A Trie based date field for faster date range queries and date faceting. -->
>     <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
>
>
> During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index.  I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
>
> I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field.

24GB of RAM to cache 175GB is probably not enough in the general case,
but if you're seeing very little disk I/O activity for this query, then
we'll leave that alone and you can worry about it later.

What I would try immediately is setting the facet.method parameter to
enum and seeing what that does to the facet time.  I've had good luck
generally with that, even in situations where the docs indicated that
the default (fc) was supposed to work better.  I have never explored the
relationship between facet.method and docValues, though.

I'm out of ideas after this.  I don't have enough experience with
faceting to help much.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Tomás Fernández Löbbe
Range Faceting won't use the DocValues even if they are there set, it
translates each gap to a filter. This means that it will end up using the
FilterCache, which should cause faster followup queries if you repeat the
same gaps (and don't commit).
You may also want to try interval faceting, it will use DocValues instead
of filters. The API is different, you'll have to provide the intervals
yourself.

Tomás

On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]> wrote:

> On 1/13/2015 10:35 AM, David Smith wrote:
> > I have a query against a single 50M doc index (175GB) using Solr 4.10.2,
> that exhibits the following response times (via the debugQuery option in
> Solr Admin):
> > "process": {
> >  "time": 24709,
> >  "query": { "time": 54 }, "facet": { "time": 24574 },
> >
> >
> > The query time of 54ms is great and exactly as expected -- this example
> was a single-term search that returned 3 hits.
> > I am trying to get the facet time (24.5 seconds) to be sub-second, and
> am having no luck.  The facet part of the query is as follows:
> >
> > "params": { "facet.range": "eventDate",
> >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> >  "f.eventDate.facet.range.gap": "+1DAY",
> >  "start": "0",
> >
> >  "rows": "10",
> >
> >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> >
> >  "f.eventDate.facet.mincount": "1",
> >
> >  "facet": "true",
> >
> >  "debugQuery": "true",
> >  "_": "1421169383802"
> >  }
> >
> > And, the relevant schema definition is as follows:
> >
> >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> multiValued="false" docValues="true"/>
> >
> >     <!-- A Trie based date field for faster date range queries and date
> faceting. -->
> >     <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
> positionIncrementGap="0"/>
> >
> >
> > During the 25-second query, the Solr JVM pegs one CPU, with little or no
> I/O activity detected on the drive that holds the 175GB index.  I have 48GB
> of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> >
> > I do NOT have any fieldValue caches configured as yet, because my
> (perhaps too simplistic?) reading of the documentation was that DocValues
> eliminates the need for a field-level cache on this facet field.
>
> 24GB of RAM to cache 175GB is probably not enough in the general case,
> but if you're seeing very little disk I/O activity for this query, then
> we'll leave that alone and you can worry about it later.
>
> What I would try immediately is setting the facet.method parameter to
> enum and seeing what that does to the facet time.  I've had good luck
> generally with that, even in situations where the docs indicated that
> the default (fc) was supposed to work better.  I have never explored the
> relationship between facet.method and docValues, though.
>
> I'm out of ideas after this.  I don't have enough experience with
> faceting to help much.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

David smith-2
In reply to this post by Shawn Heisey-2
Shawn,

Thanks for the suggestion, but experimentally, in my case the same query with facet.method=enum returns in almost the same amount of time.

Regards
David

     On Tuesday, January 13, 2015 12:02 PM, Shawn Heisey <[hidden email]> wrote:
   

 On 1/13/2015 10:35 AM, David Smith wrote:

> I have a query against a single 50M doc index (175GB) using Solr 4.10.2, that exhibits the following response times (via the debugQuery option in Solr Admin):
> "process": {
>  "time": 24709,
>  "query": { "time": 54 }, "facet": { "time": 24574 },
>
>
> The query time of 54ms is great and exactly as expected -- this example was a single-term search that returned 3 hits.
> I am trying to get the facet time (24.5 seconds) to be sub-second, and am having no luck.  The facet part of the query is as follows:
>
> "params": { "facet.range": "eventDate",
>  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
>  "f.eventDate.facet.range.gap": "+1DAY",
>  "start": "0",
>
>  "rows": "10",
>
>  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
>
>  "f.eventDate.facet.mincount": "1",
>
>  "facet": "true",
>
>  "debugQuery": "true",
>  "_": "1421169383802"
>  }
>
> And, the relevant schema definition is as follows:
>
>    <field name="eventDate" type="tdate" indexed="true" stored="true" multiValued="false" docValues="true"/>
>
>    <!-- A Trie based date field for faster date range queries and date faceting. -->
>    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6" positionIncrementGap="0"/>
>
>
> During the 25-second query, the Solr JVM pegs one CPU, with little or no I/O activity detected on the drive that holds the 175GB index.  I have 48GB of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
>
> I do NOT have any fieldValue caches configured as yet, because my (perhaps too simplistic?) reading of the documentation was that DocValues eliminates the need for a field-level cache on this facet field.

24GB of RAM to cache 175GB is probably not enough in the general case,
but if you're seeing very little disk I/O activity for this query, then
we'll leave that alone and you can worry about it later.

What I would try immediately is setting the facet.method parameter to
enum and seeing what that does to the facet time.  I've had good luck
generally with that, even in situations where the docs indicated that
the default (fc) was supposed to work better.  I have never explored the
relationship between facet.method and docValues, though.

I'm out of ideas after this.  I don't have enough experience with
faceting to help much.

Thanks,
Shawn



Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

David smith-2
In reply to this post by Tomás Fernández Löbbe
Tomás,


Thanks for the response -- the performance of my query makes perfect sense in light of your information.
I looked at Interval faceting.  My required interval is 1 day.  I cannot change that requirement.  Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ??

f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
 

Each query would be 185MB in size if I structure it this way.

I assume I must be mis-understanding how to use Interval faceting with dates.  Are there any concrete examples you know of?  A google search did not come up with much.

Kind regards,
Dave

     On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <[hidden email]> wrote:
   

 Range Faceting won't use the DocValues even if they are there set, it
translates each gap to a filter. This means that it will end up using the
FilterCache, which should cause faster followup queries if you repeat the
same gaps (and don't commit).
You may also want to try interval faceting, it will use DocValues instead
of filters. The API is different, you'll have to provide the intervals
yourself.

Tomás

On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]> wrote:

> On 1/13/2015 10:35 AM, David Smith wrote:
> > I have a query against a single 50M doc index (175GB) using Solr 4.10.2,
> that exhibits the following response times (via the debugQuery option in
> Solr Admin):
> > "process": {
> >  "time": 24709,
> >  "query": { "time": 54 }, "facet": { "time": 24574 },
> >
> >
> > The query time of 54ms is great and exactly as expected -- this example
> was a single-term search that returned 3 hits.
> > I am trying to get the facet time (24.5 seconds) to be sub-second, and
> am having no luck.  The facet part of the query is as follows:
> >
> > "params": { "facet.range": "eventDate",
> >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> >  "f.eventDate.facet.range.gap": "+1DAY",
> >  "start": "0",
> >
> >  "rows": "10",
> >
> >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> >
> >  "f.eventDate.facet.mincount": "1",
> >
> >  "facet": "true",
> >
> >  "debugQuery": "true",
> >  "_": "1421169383802"
> >  }
> >
> > And, the relevant schema definition is as follows:
> >
> >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> multiValued="false" docValues="true"/>
> >
> >    <!-- A Trie based date field for faster date range queries and date
> faceting. -->
> >    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
> positionIncrementGap="0"/>
> >
> >
> > During the 25-second query, the Solr JVM pegs one CPU, with little or no
> I/O activity detected on the drive that holds the 175GB index.  I have 48GB
> of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> >
> > I do NOT have any fieldValue caches configured as yet, because my
> (perhaps too simplistic?) reading of the documentation was that DocValues
> eliminates the need for a field-level cache on this facet field.
>
> 24GB of RAM to cache 175GB is probably not enough in the general case,
> but if you're seeing very little disk I/O activity for this query, then
> we'll leave that alone and you can worry about it later.
>
> What I would try immediately is setting the facet.method parameter to
> enum and seeing what that does to the facet time.  I've had good luck
> generally with that, even in situations where the docs indicated that
> the default (fc) was supposed to work better.  I have never explored the
> relationship between facet.method and docValues, though.
>
> I'm out of ideas after this.  I don't have enough experience with
> faceting to help much.
>
> Thanks,
> Shawn
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Tomás Fernández Löbbe
No, you are not misreading, right now there is no automatic way of
generating the intervals on the server side similar to range faceting... I
guess it won't work in your case. Maybe you should create a Jira to add
this feature to interval faceting.

Tomás

On Tue, Jan 13, 2015 at 10:44 AM, David Smith <[hidden email]>
wrote:

> Tomás,
>
>
> Thanks for the response -- the performance of my query makes perfect sense
> in light of your information.
> I looked at Interval faceting.  My required interval is 1 day.  I cannot
> change that requirement.  Unless I am mis-reading the doc, that means to
> facet a 10 year range, the query needs to specify over 3,600 intervals ??
>
>
> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
>
>
> Each query would be 185MB in size if I structure it this way.
>
> I assume I must be mis-understanding how to use Interval faceting with
> dates.  Are there any concrete examples you know of?  A google search did
> not come up with much.
>
> Kind regards,
> Dave
>
>      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
> [hidden email]> wrote:
>
>
>  Range Faceting won't use the DocValues even if they are there set, it
> translates each gap to a filter. This means that it will end up using the
> FilterCache, which should cause faster followup queries if you repeat the
> same gaps (and don't commit).
> You may also want to try interval faceting, it will use DocValues instead
> of filters. The API is different, you'll have to provide the intervals
> yourself.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]>
> wrote:
>
> > On 1/13/2015 10:35 AM, David Smith wrote:
> > > I have a query against a single 50M doc index (175GB) using Solr
> 4.10.2,
> > that exhibits the following response times (via the debugQuery option in
> > Solr Admin):
> > > "process": {
> > >  "time": 24709,
> > >  "query": { "time": 54 }, "facet": { "time": 24574 },
> > >
> > >
> > > The query time of 54ms is great and exactly as expected -- this example
> > was a single-term search that returned 3 hits.
> > > I am trying to get the facet time (24.5 seconds) to be sub-second, and
> > am having no luck.  The facet part of the query is as follows:
> > >
> > > "params": { "facet.range": "eventDate",
> > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> > >  "f.eventDate.facet.range.gap": "+1DAY",
> > >  "start": "0",
> > >
> > >  "rows": "10",
> > >
> > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> > >
> > >  "f.eventDate.facet.mincount": "1",
> > >
> > >  "facet": "true",
> > >
> > >  "debugQuery": "true",
> > >  "_": "1421169383802"
> > >  }
> > >
> > > And, the relevant schema definition is as follows:
> > >
> > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> > multiValued="false" docValues="true"/>
> > >
> > >    <!-- A Trie based date field for faster date range queries and date
> > faceting. -->
> > >    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
> > positionIncrementGap="0"/>
> > >
> > >
> > > During the 25-second query, the Solr JVM pegs one CPU, with little or
> no
> > I/O activity detected on the drive that holds the 175GB index.  I have
> 48GB
> > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> > >
> > > I do NOT have any fieldValue caches configured as yet, because my
> > (perhaps too simplistic?) reading of the documentation was that DocValues
> > eliminates the need for a field-level cache on this facet field.
> >
> > 24GB of RAM to cache 175GB is probably not enough in the general case,
> > but if you're seeing very little disk I/O activity for this query, then
> > we'll leave that alone and you can worry about it later.
> >
> > What I would try immediately is setting the facet.method parameter to
> > enum and seeing what that does to the facet time.  I've had good luck
> > generally with that, even in situations where the docs indicated that
> > the default (fc) was supposed to work better.  I have never explored the
> > relationship between facet.method and docValues, though.
> >
> > I'm out of ideas after this.  I don't have enough experience with
> > faceting to help much.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Alexandre Rafalovitch
Could probably write a custom SearchComponent to prepend and expand
the query for the required use case. Though if something then has to
parse that query back, it would still be an issue.

Regards,
 Alex
----
Sign up for my Solr resources newsletter at http://www.solr-start.com/


On 13 January 2015 at 14:12, Tomás Fernández Löbbe
<[hidden email]> wrote:

> No, you are not misreading, right now there is no automatic way of
> generating the intervals on the server side similar to range faceting... I
> guess it won't work in your case. Maybe you should create a Jira to add
> this feature to interval faceting.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:44 AM, David Smith <[hidden email]>
> wrote:
>
>> Tomás,
>>
>>
>> Thanks for the response -- the performance of my query makes perfect sense
>> in light of your information.
>> I looked at Interval faceting.  My required interval is 1 day.  I cannot
>> change that requirement.  Unless I am mis-reading the doc, that means to
>> facet a 10 year range, the query needs to specify over 3,600 intervals ??
>>
>>
>> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
>>
>>
>> Each query would be 185MB in size if I structure it this way.
>>
>> I assume I must be mis-understanding how to use Interval faceting with
>> dates.  Are there any concrete examples you know of?  A google search did
>> not come up with much.
>>
>> Kind regards,
>> Dave
>>
>>      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
>> [hidden email]> wrote:
>>
>>
>>  Range Faceting won't use the DocValues even if they are there set, it
>> translates each gap to a filter. This means that it will end up using the
>> FilterCache, which should cause faster followup queries if you repeat the
>> same gaps (and don't commit).
>> You may also want to try interval faceting, it will use DocValues instead
>> of filters. The API is different, you'll have to provide the intervals
>> yourself.
>>
>> Tomás
>>
>> On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]>
>> wrote:
>>
>> > On 1/13/2015 10:35 AM, David Smith wrote:
>> > > I have a query against a single 50M doc index (175GB) using Solr
>> 4.10.2,
>> > that exhibits the following response times (via the debugQuery option in
>> > Solr Admin):
>> > > "process": {
>> > >  "time": 24709,
>> > >  "query": { "time": 54 }, "facet": { "time": 24574 },
>> > >
>> > >
>> > > The query time of 54ms is great and exactly as expected -- this example
>> > was a single-term search that returned 3 hits.
>> > > I am trying to get the facet time (24.5 seconds) to be sub-second, and
>> > am having no luck.  The facet part of the query is as follows:
>> > >
>> > > "params": { "facet.range": "eventDate",
>> > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
>> > >  "f.eventDate.facet.range.gap": "+1DAY",
>> > >  "start": "0",
>> > >
>> > >  "rows": "10",
>> > >
>> > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
>> > >
>> > >  "f.eventDate.facet.mincount": "1",
>> > >
>> > >  "facet": "true",
>> > >
>> > >  "debugQuery": "true",
>> > >  "_": "1421169383802"
>> > >  }
>> > >
>> > > And, the relevant schema definition is as follows:
>> > >
>> > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
>> > multiValued="false" docValues="true"/>
>> > >
>> > >    <!-- A Trie based date field for faster date range queries and date
>> > faceting. -->
>> > >    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
>> > positionIncrementGap="0"/>
>> > >
>> > >
>> > > During the 25-second query, the Solr JVM pegs one CPU, with little or
>> no
>> > I/O activity detected on the drive that holds the 175GB index.  I have
>> 48GB
>> > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
>> > >
>> > > I do NOT have any fieldValue caches configured as yet, because my
>> > (perhaps too simplistic?) reading of the documentation was that DocValues
>> > eliminates the need for a field-level cache on this facet field.
>> >
>> > 24GB of RAM to cache 175GB is probably not enough in the general case,
>> > but if you're seeing very little disk I/O activity for this query, then
>> > we'll leave that alone and you can worry about it later.
>> >
>> > What I would try immediately is setting the facet.method parameter to
>> > enum and seeing what that does to the facet time.  I've had good luck
>> > generally with that, even in situations where the docs indicated that
>> > the default (fc) was supposed to work better.  I have never explored the
>> > relationship between facet.method and docValues, though.
>> >
>> > I'm out of ideas after this.  I don't have enough experience with
>> > faceting to help much.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Tomás Fernández Löbbe
In reply to this post by Tomás Fernández Löbbe
Just a side question. In your first example you have dates set with time
but in the second (where you set intervals) time is not set.
Is this something that can be resolved having a field that only sets date
(without time), and then use regular field faceting and facet.sort=index?
If that's possible in your use case that may be faster.

Tomás

On Tue, Jan 13, 2015 at 11:12 AM, Tomás Fernández Löbbe <
[hidden email]> wrote:

> No, you are not misreading, right now there is no automatic way of
> generating the intervals on the server side similar to range faceting... I
> guess it won't work in your case. Maybe you should create a Jira to add
> this feature to interval faceting.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:44 AM, David Smith <
> [hidden email]> wrote:
>
>> Tomás,
>>
>>
>> Thanks for the response -- the performance of my query makes perfect
>> sense in light of your information.
>> I looked at Interval faceting.  My required interval is 1 day.  I cannot
>> change that requirement.  Unless I am mis-reading the doc, that means to
>> facet a 10 year range, the query needs to specify over 3,600 intervals ??
>>
>>
>> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
>>
>>
>> Each query would be 185MB in size if I structure it this way.
>>
>> I assume I must be mis-understanding how to use Interval faceting with
>> dates.  Are there any concrete examples you know of?  A google search did
>> not come up with much.
>>
>> Kind regards,
>> Dave
>>
>>      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
>> [hidden email]> wrote:
>>
>>
>>  Range Faceting won't use the DocValues even if they are there set, it
>> translates each gap to a filter. This means that it will end up using the
>> FilterCache, which should cause faster followup queries if you repeat the
>> same gaps (and don't commit).
>> You may also want to try interval faceting, it will use DocValues instead
>> of filters. The API is different, you'll have to provide the intervals
>> yourself.
>>
>> Tomás
>>
>> On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]>
>> wrote:
>>
>> > On 1/13/2015 10:35 AM, David Smith wrote:
>> > > I have a query against a single 50M doc index (175GB) using Solr
>> 4.10.2,
>> > that exhibits the following response times (via the debugQuery option in
>> > Solr Admin):
>> > > "process": {
>> > >  "time": 24709,
>> > >  "query": { "time": 54 }, "facet": { "time": 24574 },
>> > >
>> > >
>> > > The query time of 54ms is great and exactly as expected -- this
>> example
>> > was a single-term search that returned 3 hits.
>> > > I am trying to get the facet time (24.5 seconds) to be sub-second, and
>> > am having no luck.  The facet part of the query is as follows:
>> > >
>> > > "params": { "facet.range": "eventDate",
>> > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
>> > >  "f.eventDate.facet.range.gap": "+1DAY",
>> > >  "start": "0",
>> > >
>> > >  "rows": "10",
>> > >
>> > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
>> > >
>> > >  "f.eventDate.facet.mincount": "1",
>> > >
>> > >  "facet": "true",
>> > >
>> > >  "debugQuery": "true",
>> > >  "_": "1421169383802"
>> > >  }
>> > >
>> > > And, the relevant schema definition is as follows:
>> > >
>> > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
>> > multiValued="false" docValues="true"/>
>> > >
>> > >    <!-- A Trie based date field for faster date range queries and date
>> > faceting. -->
>> > >    <fieldType name="tdate" class="solr.TrieDateField"
>> precisionStep="6"
>> > positionIncrementGap="0"/>
>> > >
>> > >
>> > > During the 25-second query, the Solr JVM pegs one CPU, with little or
>> no
>> > I/O activity detected on the drive that holds the 175GB index.  I have
>> 48GB
>> > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
>> > >
>> > > I do NOT have any fieldValue caches configured as yet, because my
>> > (perhaps too simplistic?) reading of the documentation was that
>> DocValues
>> > eliminates the need for a field-level cache on this facet field.
>> >
>> > 24GB of RAM to cache 175GB is probably not enough in the general case,
>> > but if you're seeing very little disk I/O activity for this query, then
>> > we'll leave that alone and you can worry about it later.
>> >
>> > What I would try immediately is setting the facet.method parameter to
>> > enum and seeing what that does to the facet time.  I've had good luck
>> > generally with that, even in situations where the docs indicated that
>> > the default (fc) was supposed to work better.  I have never explored the
>> > relationship between facet.method and docValues, though.
>> >
>> > I'm out of ideas after this.  I don't have enough experience with
>> > faceting to help much.
>> >
>> > Thanks,
>> > Shawn
>> >
>> >
>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

David smith-2
In reply to this post by Tomás Fernández Löbbe
What is stumping me is that the search result has 3 hits, yet faceting those 3 hits takes 24 seconds.  The documentation for facet.method=fc is quite explicit about how Solr does faceting:


"fc (stands for Field Cache) The facet counts are calculated by iterating over documents that match the query and summing the terms that appear in each document. This was the default method for single valued fields prior to Solr 1.4."

If a search yielded millions of hits, I could understand 24 seconds to calculate the facets.  But not for a search with only 3 hits.  


What am I missing?  

Regards,
David



 

     On Tuesday, January 13, 2015 1:12 PM, Tomás Fernández Löbbe <[hidden email]> wrote:
   

 No, you are not misreading, right now there is no automatic way of
generating the intervals on the server side similar to range faceting... I
guess it won't work in your case. Maybe you should create a Jira to add
this feature to interval faceting.

Tomás

On Tue, Jan 13, 2015 at 10:44 AM, David Smith <[hidden email]>
wrote:

> Tomás,
>
>
> Thanks for the response -- the performance of my query makes perfect sense
> in light of your information.
> I looked at Interval faceting.  My required interval is 1 day.  I cannot
> change that requirement.  Unless I am mis-reading the doc, that means to
> facet a 10 year range, the query needs to specify over 3,600 intervals ??
>
>
> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
>
>
> Each query would be 185MB in size if I structure it this way.
>
> I assume I must be mis-understanding how to use Interval faceting with
> dates.  Are there any concrete examples you know of?  A google search did
> not come up with much.
>
> Kind regards,
> Dave
>
>      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
> [hidden email]> wrote:
>
>
>  Range Faceting won't use the DocValues even if they are there set, it
> translates each gap to a filter. This means that it will end up using the
> FilterCache, which should cause faster followup queries if you repeat the
> same gaps (and don't commit).
> You may also want to try interval faceting, it will use DocValues instead
> of filters. The API is different, you'll have to provide the intervals
> yourself.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]>
> wrote:
>
> > On 1/13/2015 10:35 AM, David Smith wrote:
> > > I have a query against a single 50M doc index (175GB) using Solr
> 4.10.2,
> > that exhibits the following response times (via the debugQuery option in
> > Solr Admin):
> > > "process": {
> > >  "time": 24709,
> > >  "query": { "time": 54 }, "facet": { "time": 24574 },
> > >
> > >
> > > The query time of 54ms is great and exactly as expected -- this example
> > was a single-term search that returned 3 hits.
> > > I am trying to get the facet time (24.5 seconds) to be sub-second, and
> > am having no luck.  The facet part of the query is as follows:
> > >
> > > "params": { "facet.range": "eventDate",
> > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> > >  "f.eventDate.facet.range.gap": "+1DAY",
> > >  "start": "0",
> > >
> > >  "rows": "10",
> > >
> > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> > >
> > >  "f.eventDate.facet.mincount": "1",
> > >
> > >  "facet": "true",
> > >
> > >  "debugQuery": "true",
> > >  "_": "1421169383802"
> > >  }
> > >
> > > And, the relevant schema definition is as follows:
> > >
> > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> > multiValued="false" docValues="true"/>
> > >
> > >    <!-- A Trie based date field for faster date range queries and date
> > faceting. -->
> > >    <fieldType name="tdate" class="solr.TrieDateField" precisionStep="6"
> > positionIncrementGap="0"/>
> > >
> > >
> > > During the 25-second query, the Solr JVM pegs one CPU, with little or
> no
> > I/O activity detected on the drive that holds the 175GB index.  I have
> 48GB
> > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> > >
> > > I do NOT have any fieldValue caches configured as yet, because my
> > (perhaps too simplistic?) reading of the documentation was that DocValues
> > eliminates the need for a field-level cache on this facet field.
> >
> > 24GB of RAM to cache 175GB is probably not enough in the general case,
> > but if you're seeing very little disk I/O activity for this query, then
> > we'll leave that alone and you can worry about it later.
> >
> > What I would try immediately is setting the facet.method parameter to
> > enum and seeing what that does to the facet time.  I've had good luck
> > generally with that, even in situations where the docs indicated that
> > the default (fc) was supposed to work better.  I have never explored the
> > relationship between facet.method and docValues, though.
> >
> > I'm out of ideas after this.  I don't have enough experience with
> > faceting to help much.
> >
> > Thanks,
> > Shawn
> >
> >
>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Tomás Fernández Löbbe
"fc", "fcs" and "enum" only apply for field faceting, not range faceting.

Tomás

On Tue, Jan 13, 2015 at 11:24 AM, David Smith <[hidden email]>
wrote:

> What is stumping me is that the search result has 3 hits, yet faceting
> those 3 hits takes 24 seconds.  The documentation for facet.method=fc is
> quite explicit about how Solr does faceting:
>
>
> "fc (stands for Field Cache) The facet counts are calculated by iterating
> over documents that match the query and summing the terms that appear in
> each document. This was the default method for single valued fields prior
> to Solr 1.4."
>
> If a search yielded millions of hits, I could understand 24 seconds to
> calculate the facets.  But not for a search with only 3 hits.
>
>
> What am I missing?
>
> Regards,
> David
>
>
>
>
>
>      On Tuesday, January 13, 2015 1:12 PM, Tomás Fernández Löbbe <
> [hidden email]> wrote:
>
>
>  No, you are not misreading, right now there is no automatic way of
> generating the intervals on the server side similar to range faceting... I
> guess it won't work in your case. Maybe you should create a Jira to add
> this feature to interval faceting.
>
> Tomás
>
> On Tue, Jan 13, 2015 at 10:44 AM, David Smith <[hidden email]
> >
> wrote:
>
> > Tomás,
> >
> >
> > Thanks for the response -- the performance of my query makes perfect
> sense
> > in light of your information.
> > I looked at Interval faceting.  My required interval is 1 day.  I cannot
> > change that requirement.  Unless I am mis-reading the doc, that means to
> > facet a 10 year range, the query needs to specify over 3,600 intervals ??
> >
> >
> >
> f.eventDate.facet.interval.set=[2005-01-01T00:00:00.000Z,2005-01-01T23:59:59.999Z]&f.eventDate.facet.interval.set=[2005-01-02T00:00:00.000Z,2005-01-02T23:59:59.999Z]&etc,etc
> >
> >
> > Each query would be 185MB in size if I structure it this way.
> >
> > I assume I must be mis-understanding how to use Interval faceting with
> > dates.  Are there any concrete examples you know of?  A google search did
> > not come up with much.
> >
> > Kind regards,
> > Dave
> >
> >      On Tuesday, January 13, 2015 12:16 PM, Tomás Fernández Löbbe <
> > [hidden email]> wrote:
> >
> >
> >  Range Faceting won't use the DocValues even if they are there set, it
> > translates each gap to a filter. This means that it will end up using the
> > FilterCache, which should cause faster followup queries if you repeat the
> > same gaps (and don't commit).
> > You may also want to try interval faceting, it will use DocValues instead
> > of filters. The API is different, you'll have to provide the intervals
> > yourself.
> >
> > Tomás
> >
> > On Tue, Jan 13, 2015 at 10:01 AM, Shawn Heisey <[hidden email]>
> > wrote:
> >
> > > On 1/13/2015 10:35 AM, David Smith wrote:
> > > > I have a query against a single 50M doc index (175GB) using Solr
> > 4.10.2,
> > > that exhibits the following response times (via the debugQuery option
> in
> > > Solr Admin):
> > > > "process": {
> > > >  "time": 24709,
> > > >  "query": { "time": 54 }, "facet": { "time": 24574 },
> > > >
> > > >
> > > > The query time of 54ms is great and exactly as expected -- this
> example
> > > was a single-term search that returned 3 hits.
> > > > I am trying to get the facet time (24.5 seconds) to be sub-second,
> and
> > > am having no luck.  The facet part of the query is as follows:
> > > >
> > > > "params": { "facet.range": "eventDate",
> > > >  "f.eventDate.facet.range.end": "2015-05-13T16:37:18.000Z",
> > > >  "f.eventDate.facet.range.gap": "+1DAY",
> > > >  "start": "0",
> > > >
> > > >  "rows": "10",
> > > >
> > > >  "f.eventDate.facet.range.start": "2005-03-13T16:37:18.000Z",
> > > >
> > > >  "f.eventDate.facet.mincount": "1",
> > > >
> > > >  "facet": "true",
> > > >
> > > >  "debugQuery": "true",
> > > >  "_": "1421169383802"
> > > >  }
> > > >
> > > > And, the relevant schema definition is as follows:
> > > >
> > > >    <field name="eventDate" type="tdate" indexed="true" stored="true"
> > > multiValued="false" docValues="true"/>
> > > >
> > > >    <!-- A Trie based date field for faster date range queries and
> date
> > > faceting. -->
> > > >    <fieldType name="tdate" class="solr.TrieDateField"
> precisionStep="6"
> > > positionIncrementGap="0"/>
> > > >
> > > >
> > > > During the 25-second query, the Solr JVM pegs one CPU, with little or
> > no
> > > I/O activity detected on the drive that holds the 175GB index.  I have
> > 48GB
> > > of RAM, 1/2 of that dedicated to the OS and the other to the Solr JVM.
> > > >
> > > > I do NOT have any fieldValue caches configured as yet, because my
> > > (perhaps too simplistic?) reading of the documentation was that
> DocValues
> > > eliminates the need for a field-level cache on this facet field.
> > >
> > > 24GB of RAM to cache 175GB is probably not enough in the general case,
> > > but if you're seeing very little disk I/O activity for this query, then
> > > we'll leave that alone and you can worry about it later.
> > >
> > > What I would try immediately is setting the facet.method parameter to
> > > enum and seeing what that does to the facet time.  I've had good luck
> > > generally with that, even in situations where the docs indicated that
> > > the default (fc) was supposed to work better.  I have never explored
> the
> > > relationship between facet.method and docValues, though.
> > >
> > > I'm out of ideas after this.  I don't have enough experience with
> > > faceting to help much.
> > >
> > > Thanks,
> > > Shawn
> > >
> > >
> >
> >
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

Shawn Heisey-2
In reply to this post by David smith-2
On 1/13/2015 11:44 AM, David Smith wrote:
> I looked at Interval faceting.  My required interval is 1 day.  I cannot change that requirement.  Unless I am mis-reading the doc, that means to facet a 10 year range, the query needs to specify over 3,600 intervals ??

I am very ignorant of how the internals work ... but it sounds like the
parameters you have chosen are basically making thousands of separate
facets, almost all of which will ultimately return zero, and therefore
be excluded from the results.

If my naive assessment of the situation is even close to accurate, then
I think the rest of this paragraph would apply:  If we assume that those
individual facets are running consecutively, each one would be
completing in single-digit-millisecond time to add up to about 25
seconds.  If we assume they are running in parallel, that's a LOT of
work to handle all at once, and the actual workload might look more like
it's consecutive because there aren't enough CPU resources to handle
them truly in parallel.  I don't know that thousands of facets can be
sped up very much.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

David smith-2
Shawn,
I've been thinking along your lines, and continued to run tests through the day.  The results surprised me.

For my index, Solr range faceting time is most closely related to the total number of documents in the index for the range specified.  The number of "buckets" in the range is a second factor.   

I found NO correlation whatsoever to the number of hits in the query.  Whether I have 3 hits or 1,500,000 hits, it's ~24 seconds to facet the result for that same time period.  That is what surprised me.

For example, if my facet range is a 10 year period for which there exists 47M docs in the index, the facet time is 24 seconds.  If I switch my facet range to a different 10 year period with 1.3M docs, the facet time drops to less than 5 seconds.  

If I go back to my original 10 year period (with 47M docs in the index), but facet by month instead of day, my facet time drops to 2.5 seconds.  Now, I can't meet my user needs this way, but it does show the relationship between # of buckets and faceting time.

Regards,

David
Reply | Threaded
Open this post in threaded view
|

Re: Slow faceting performance on a docValues field

gulats
maybe quite late to the party but for the benefit of future readers,
experimentation with facet.range.method might be helpful (for solr versions
6 and above) as it allows us to use docValues as well for range faceting



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html