Solr7: Bad query throughput around commit time

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr7: Bad query throughput around commit time

Nawab Zada Asad Iqbal
Hi,

I am committing every 5 minutes using a periodic cron job  "curl
http://localhost:8984/solr/core1/update?commit=true". Besides this, my app
doesn't do any soft or hard commits. With Solr 7 upgrade, I am noticing
that query throughput plummets every 5 minutes - probably when the commit
happens.
What can I do to improve this? I didn't use to happen like this in solr4.5.
(i.e., i used to get a stable query throughput of  50-60 queries per
second. Now there are spikes to 60 qps interleaved by drops to almost
**0**).  Between those 5 minutes, I am able to achieve high throughput,
hence I guess that issue is related to indexing or merging, and not query
flow.

I have 48G allotted to each solr process, and it seems that only ~50% is
being used at any time, similarly CPU is not spiking beyond 50% either.
There is frequent merging (every 5 minute) , but i am not sure if that is
a cause of the slowdown.

Here are my merge and cache settings:

Thanks
Nawab

<mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
  <int name="maxMergeAtOnce">5</int>
  <int name="segmentsPerTier">5</int>
      <int name="maxMergeAtOnceExplicit">10</int>
      <int name="floorSegmentMB">16</int>
      <!-- 50 gb -->
      <double name="maxMergedSegmentMB">50000</double>
      <double name="forceMergeDeletesPctAllowed">1</double>

    </mergePolicyFactory>




<filterCache class="solr.FastLRUCache"
             size="10240"
             initialSize="5120"
             autowarmCount="1024"/>
<queryResultCache class="solr.LRUCache"
                 size="10240"
                 initialSize="5120"
                 autowarmCount="0"/>
<documentCache class="solr.LRUCache"
               size="10240"
               initialSize="5120"
               autowarmCount="0"/>


<useColdSearcher>false</useColdSearcher>

<maxWarmingSearchers>2</maxWarmingSearchers>

<listener event="newSearcher" class="solr.QuerySenderListener">
  <arr name="queries">
  </arr>
</listener>
<listener event="firstSearcher" class="solr.QuerySenderListener">
  <arr name="queries">
  </arr>
</listener>
Reply | Threaded
Open this post in threaded view
|

Re: Solr7: Bad query throughput around commit time

Erick Erickson
What evidence to you have that the changes you've made to your configs
are useful? There's lots of things in here that are suspect:

  <double name="forceMergeDeletesPctAllowed">1</double>

First, this is useless unless you are forceMerging/optimizing. Which
you shouldn't be doing under most circumstances. And you're going to
be rewriting a lot of data every time See:

https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

filterCache size of size="10240" is far in excess of what we usually
recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
Why did you choose this? On the theory that "more is better?" If
you're using NOW then you may not be using the filterCache well, see:

https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/

autowarmCount="1024"

Every time you commit you're firing off 1024 queries which is going to
spike the CPU a lot. Again, this is super-excessive. I usually start
with 16 or so.

Why are you committing from a cron job? Why not just set your
autocommit settings and forget about it? That's what they're for.

Your queryResultCache is likewise kind of large, but it takes up much
less space than the filterCache per entry so it's probably OK. I'd
still shrink it and set the autowarm to 16 or so to start, unless
you're seeing a pretty high hit ratio, which is pretty unusual but
does happen.

48G of memory is just asking for long GC pauses. How many docs do you
have in each core anyway? If you're really using this much heap, then
it'd be good to see what you can do to shrink in. Enabling docValues
for all fields you facet, sort or group on will help that a lot if you
haven't already.

How much memory on your entire machine? And how much is used by _all_
the JVMs you running on a particular machine? MMapDirectory needs as
much OS memory space as it can get, see:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Lately we've seen some structures that consume memory until a commit
happens (either soft or hard). I'd shrink my autocommit down to 60
seconds or even less (openSearcher=false).

In short, I'd go back mostly to the default settings and build _up_ as
you can demonstrate improvements. You've changed enough things here
that untangling which one is the culprit will be hard. You want the
JVM to have as little memory as possible, unfortunately that's
something you figure out by experimentation.

Best,
Erick

On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <[hidden email]> wrote:

> Hi,
>
> I am committing every 5 minutes using a periodic cron job  "curl
> http://localhost:8984/solr/core1/update?commit=true". Besides this, my app
> doesn't do any soft or hard commits. With Solr 7 upgrade, I am noticing
> that query throughput plummets every 5 minutes - probably when the commit
> happens.
> What can I do to improve this? I didn't use to happen like this in solr4.5.
> (i.e., i used to get a stable query throughput of  50-60 queries per
> second. Now there are spikes to 60 qps interleaved by drops to almost
> **0**).  Between those 5 minutes, I am able to achieve high throughput,
> hence I guess that issue is related to indexing or merging, and not query
> flow.
>
> I have 48G allotted to each solr process, and it seems that only ~50% is
> being used at any time, similarly CPU is not spiking beyond 50% either.
> There is frequent merging (every 5 minute) , but i am not sure if that is
> a cause of the slowdown.
>
> Here are my merge and cache settings:
>
> Thanks
> Nawab
>
> <mergePolicyFactory class="org.apache.solr.index.TieredMergePolicyFactory">
>   <int name="maxMergeAtOnce">5</int>
>   <int name="segmentsPerTier">5</int>
>       <int name="maxMergeAtOnceExplicit">10</int>
>       <int name="floorSegmentMB">16</int>
>       <!-- 50 gb -->
>       <double name="maxMergedSegmentMB">50000</double>
>       <double name="forceMergeDeletesPctAllowed">1</double>
>
>     </mergePolicyFactory>
>
>
>
>
> <filterCache class="solr.FastLRUCache"
>              size="10240"
>              initialSize="5120"
>              autowarmCount="1024"/>
> <queryResultCache class="solr.LRUCache"
>                  size="10240"
>                  initialSize="5120"
>                  autowarmCount="0"/>
> <documentCache class="solr.LRUCache"
>                size="10240"
>                initialSize="5120"
>                autowarmCount="0"/>
>
>
> <useColdSearcher>false</useColdSearcher>
>
> <maxWarmingSearchers>2</maxWarmingSearchers>
>
> <listener event="newSearcher" class="solr.QuerySenderListener">
>   <arr name="queries">
>   </arr>
> </listener>
> <listener event="firstSearcher" class="solr.QuerySenderListener">
>   <arr name="queries">
>   </arr>
> </listener>
Reply | Threaded
Open this post in threaded view
|

Re: Solr7: Bad query throughput around commit time

Nawab Zada Asad Iqbal
Thanks for a quick and detailed response, Erick!

Unfortunately i don't have a proof, but our servers with solr 4.5 are
running really nicely with the above config. I had assumed that same  or
similar settings will also perform well with Solr 7, but that assumption
didn't hold. As, a lot has changed in 3 major releases.
I have tweaked the cache values as you suggested but increasing or
decreasing doesn't seem to do any noticeable improvement.

At the moment, my one core has 800GB index, ~450 Million documents, 48 G
Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
drive, running 3 solr processes (each with one core as described above).  I
agree that it is a very atypical system so i should probably try different
parameters with a fresh eye to find the solution.


I tried with autocommits (commit with opensearcher=false very half minute ;
and softcommit every 5 minutes). That supported the hypothesis that the
query throughput decreases after opening a new searcher and **not** after
committing the index . Cache hit ratios are all in 80+% (even when i
decreased the filterCache to 128, so i will keep it at this lower value).
Document cache hitratio is really bad, it drops to around 40% after
newSearcher. But i guess that is expected, since it cannot be warmed up
anyway.


Thanks
Nawab



On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <[hidden email]>
wrote:

> What evidence to you have that the changes you've made to your configs
> are useful? There's lots of things in here that are suspect:
>
>   <double name="forceMergeDeletesPctAllowed">1</double>
>
> First, this is useless unless you are forceMerging/optimizing. Which
> you shouldn't be doing under most circumstances. And you're going to
> be rewriting a lot of data every time See:
>
> https://lucidworks.com/2017/10/13/segment-merging-deleted-
> documents-optimize-may-bad/
>
> filterCache size of size="10240" is far in excess of what we usually
> recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> Why did you choose this? On the theory that "more is better?" If
> you're using NOW then you may not be using the filterCache well, see:
>
> https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
>
> autowarmCount="1024"
>
> Every time you commit you're firing off 1024 queries which is going to
> spike the CPU a lot. Again, this is super-excessive. I usually start
> with 16 or so.
>
> Why are you committing from a cron job? Why not just set your
> autocommit settings and forget about it? That's what they're for.
>
> Your queryResultCache is likewise kind of large, but it takes up much
> less space than the filterCache per entry so it's probably OK. I'd
> still shrink it and set the autowarm to 16 or so to start, unless
> you're seeing a pretty high hit ratio, which is pretty unusual but
> does happen.
>
> 48G of memory is just asking for long GC pauses. How many docs do you
> have in each core anyway? If you're really using this much heap, then
> it'd be good to see what you can do to shrink in. Enabling docValues
> for all fields you facet, sort or group on will help that a lot if you
> haven't already.
>
> How much memory on your entire machine? And how much is used by _all_
> the JVMs you running on a particular machine? MMapDirectory needs as
> much OS memory space as it can get, see:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Lately we've seen some structures that consume memory until a commit
> happens (either soft or hard). I'd shrink my autocommit down to 60
> seconds or even less (openSearcher=false).
>
> In short, I'd go back mostly to the default settings and build _up_ as
> you can demonstrate improvements. You've changed enough things here
> that untangling which one is the culprit will be hard. You want the
> JVM to have as little memory as possible, unfortunately that's
> something you figure out by experimentation.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <[hidden email]>
> wrote:
> > Hi,
> >
> > I am committing every 5 minutes using a periodic cron job  "curl
> > http://localhost:8984/solr/core1/update?commit=true". Besides this, my
> app
> > doesn't do any soft or hard commits. With Solr 7 upgrade, I am noticing
> > that query throughput plummets every 5 minutes - probably when the commit
> > happens.
> > What can I do to improve this? I didn't use to happen like this in
> solr4.5.
> > (i.e., i used to get a stable query throughput of  50-60 queries per
> > second. Now there are spikes to 60 qps interleaved by drops to almost
> > **0**).  Between those 5 minutes, I am able to achieve high throughput,
> > hence I guess that issue is related to indexing or merging, and not query
> > flow.
> >
> > I have 48G allotted to each solr process, and it seems that only ~50% is
> > being used at any time, similarly CPU is not spiking beyond 50% either.
> > There is frequent merging (every 5 minute) , but i am not sure if that is
> > a cause of the slowdown.
> >
> > Here are my merge and cache settings:
> >
> > Thanks
> > Nawab
> >
> > <mergePolicyFactory class="org.apache.solr.index.
> TieredMergePolicyFactory">
> >   <int name="maxMergeAtOnce">5</int>
> >   <int name="segmentsPerTier">5</int>
> >       <int name="maxMergeAtOnceExplicit">10</int>
> >       <int name="floorSegmentMB">16</int>
> >       <!-- 50 gb -->
> >       <double name="maxMergedSegmentMB">50000</double>
> >       <double name="forceMergeDeletesPctAllowed">1</double>
> >
> >     </mergePolicyFactory>
> >
> >
> >
> >
> > <filterCache class="solr.FastLRUCache"
> >              size="10240"
> >              initialSize="5120"
> >              autowarmCount="1024"/>
> > <queryResultCache class="solr.LRUCache"
> >                  size="10240"
> >                  initialSize="5120"
> >                  autowarmCount="0"/>
> > <documentCache class="solr.LRUCache"
> >                size="10240"
> >                initialSize="5120"
> >                autowarmCount="0"/>
> >
> >
> > <useColdSearcher>false</useColdSearcher>
> >
> > <maxWarmingSearchers>2</maxWarmingSearchers>
> >
> > <listener event="newSearcher" class="solr.QuerySenderListener">
> >   <arr name="queries">
> >   </arr>
> > </listener>
> > <listener event="firstSearcher" class="solr.QuerySenderListener">
> >   <arr name="queries">
> >   </arr>
> > </listener>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr7: Bad query throughput around commit time

Kevin Risden-3
> One machine runs with a 3TB drive, running 3 solr processes (each with
one core as described above).

How much total memory on the machine?

Kevin Risden

On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <[hidden email]>
wrote:

> Thanks for a quick and detailed response, Erick!
>
> Unfortunately i don't have a proof, but our servers with solr 4.5 are
> running really nicely with the above config. I had assumed that same  or
> similar settings will also perform well with Solr 7, but that assumption
> didn't hold. As, a lot has changed in 3 major releases.
> I have tweaked the cache values as you suggested but increasing or
> decreasing doesn't seem to do any noticeable improvement.
>
> At the moment, my one core has 800GB index, ~450 Million documents, 48 G
> Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
> drive, running 3 solr processes (each with one core as described above).  I
> agree that it is a very atypical system so i should probably try different
> parameters with a fresh eye to find the solution.
>
>
> I tried with autocommits (commit with opensearcher=false very half minute ;
> and softcommit every 5 minutes). That supported the hypothesis that the
> query throughput decreases after opening a new searcher and **not** after
> committing the index . Cache hit ratios are all in 80+% (even when i
> decreased the filterCache to 128, so i will keep it at this lower value).
> Document cache hitratio is really bad, it drops to around 40% after
> newSearcher. But i guess that is expected, since it cannot be warmed up
> anyway.
>
>
> Thanks
> Nawab
>
>
>
> On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <[hidden email]>
> wrote:
>
> > What evidence to you have that the changes you've made to your configs
> > are useful? There's lots of things in here that are suspect:
> >
> >   <double name="forceMergeDeletesPctAllowed">1</double>
> >
> > First, this is useless unless you are forceMerging/optimizing. Which
> > you shouldn't be doing under most circumstances. And you're going to
> > be rewriting a lot of data every time See:
> >
> > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> > documents-optimize-may-bad/
> >
> > filterCache size of size="10240" is far in excess of what we usually
> > recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> > Why did you choose this? On the theory that "more is better?" If
> > you're using NOW then you may not be using the filterCache well, see:
> >
> > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> >
> > autowarmCount="1024"
> >
> > Every time you commit you're firing off 1024 queries which is going to
> > spike the CPU a lot. Again, this is super-excessive. I usually start
> > with 16 or so.
> >
> > Why are you committing from a cron job? Why not just set your
> > autocommit settings and forget about it? That's what they're for.
> >
> > Your queryResultCache is likewise kind of large, but it takes up much
> > less space than the filterCache per entry so it's probably OK. I'd
> > still shrink it and set the autowarm to 16 or so to start, unless
> > you're seeing a pretty high hit ratio, which is pretty unusual but
> > does happen.
> >
> > 48G of memory is just asking for long GC pauses. How many docs do you
> > have in each core anyway? If you're really using this much heap, then
> > it'd be good to see what you can do to shrink in. Enabling docValues
> > for all fields you facet, sort or group on will help that a lot if you
> > haven't already.
> >
> > How much memory on your entire machine? And how much is used by _all_
> > the JVMs you running on a particular machine? MMapDirectory needs as
> > much OS memory space as it can get, see:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >
> > Lately we've seen some structures that consume memory until a commit
> > happens (either soft or hard). I'd shrink my autocommit down to 60
> > seconds or even less (openSearcher=false).
> >
> > In short, I'd go back mostly to the default settings and build _up_ as
> > you can demonstrate improvements. You've changed enough things here
> > that untangling which one is the culprit will be hard. You want the
> > JVM to have as little memory as possible, unfortunately that's
> > something you figure out by experimentation.
> >
> > Best,
> > Erick
> >
> > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <[hidden email]>
> > wrote:
> > > Hi,
> > >
> > > I am committing every 5 minutes using a periodic cron job  "curl
> > > http://localhost:8984/solr/core1/update?commit=true". Besides this, my
> > app
> > > doesn't do any soft or hard commits. With Solr 7 upgrade, I am noticing
> > > that query throughput plummets every 5 minutes - probably when the
> commit
> > > happens.
> > > What can I do to improve this? I didn't use to happen like this in
> > solr4.5.
> > > (i.e., i used to get a stable query throughput of  50-60 queries per
> > > second. Now there are spikes to 60 qps interleaved by drops to almost
> > > **0**).  Between those 5 minutes, I am able to achieve high throughput,
> > > hence I guess that issue is related to indexing or merging, and not
> query
> > > flow.
> > >
> > > I have 48G allotted to each solr process, and it seems that only ~50%
> is
> > > being used at any time, similarly CPU is not spiking beyond 50% either.
> > > There is frequent merging (every 5 minute) , but i am not sure if that
> is
> > > a cause of the slowdown.
> > >
> > > Here are my merge and cache settings:
> > >
> > > Thanks
> > > Nawab
> > >
> > > <mergePolicyFactory class="org.apache.solr.index.
> > TieredMergePolicyFactory">
> > >   <int name="maxMergeAtOnce">5</int>
> > >   <int name="segmentsPerTier">5</int>
> > >       <int name="maxMergeAtOnceExplicit">10</int>
> > >       <int name="floorSegmentMB">16</int>
> > >       <!-- 50 gb -->
> > >       <double name="maxMergedSegmentMB">50000</double>
> > >       <double name="forceMergeDeletesPctAllowed">1</double>
> > >
> > >     </mergePolicyFactory>
> > >
> > >
> > >
> > >
> > > <filterCache class="solr.FastLRUCache"
> > >              size="10240"
> > >              initialSize="5120"
> > >              autowarmCount="1024"/>
> > > <queryResultCache class="solr.LRUCache"
> > >                  size="10240"
> > >                  initialSize="5120"
> > >                  autowarmCount="0"/>
> > > <documentCache class="solr.LRUCache"
> > >                size="10240"
> > >                initialSize="5120"
> > >                autowarmCount="0"/>
> > >
> > >
> > > <useColdSearcher>false</useColdSearcher>
> > >
> > > <maxWarmingSearchers>2</maxWarmingSearchers>
> > >
> > > <listener event="newSearcher" class="solr.QuerySenderListener">
> > >   <arr name="queries">
> > >   </arr>
> > > </listener>
> > > <listener event="firstSearcher" class="solr.QuerySenderListener">
> > >   <arr name="queries">
> > >   </arr>
> > > </listener>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr7: Bad query throughput around commit time

Nawab Zada Asad Iqbal
~248 gb

Nawab


On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <[hidden email]> wrote:

> > One machine runs with a 3TB drive, running 3 solr processes (each with
> one core as described above).
>
> How much total memory on the machine?
>
> Kevin Risden
>
> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <[hidden email]>
> wrote:
>
> > Thanks for a quick and detailed response, Erick!
> >
> > Unfortunately i don't have a proof, but our servers with solr 4.5 are
> > running really nicely with the above config. I had assumed that same  or
> > similar settings will also perform well with Solr 7, but that assumption
> > didn't hold. As, a lot has changed in 3 major releases.
> > I have tweaked the cache values as you suggested but increasing or
> > decreasing doesn't seem to do any noticeable improvement.
> >
> > At the moment, my one core has 800GB index, ~450 Million documents, 48 G
> > Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
> > drive, running 3 solr processes (each with one core as described
> above).  I
> > agree that it is a very atypical system so i should probably try
> different
> > parameters with a fresh eye to find the solution.
> >
> >
> > I tried with autocommits (commit with opensearcher=false very half
> minute ;
> > and softcommit every 5 minutes). That supported the hypothesis that the
> > query throughput decreases after opening a new searcher and **not** after
> > committing the index . Cache hit ratios are all in 80+% (even when i
> > decreased the filterCache to 128, so i will keep it at this lower value).
> > Document cache hitratio is really bad, it drops to around 40% after
> > newSearcher. But i guess that is expected, since it cannot be warmed up
> > anyway.
> >
> >
> > Thanks
> > Nawab
> >
> >
> >
> > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <[hidden email]>
> > wrote:
> >
> > > What evidence to you have that the changes you've made to your configs
> > > are useful? There's lots of things in here that are suspect:
> > >
> > >   <double name="forceMergeDeletesPctAllowed">1</double>
> > >
> > > First, this is useless unless you are forceMerging/optimizing. Which
> > > you shouldn't be doing under most circumstances. And you're going to
> > > be rewriting a lot of data every time See:
> > >
> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> > > documents-optimize-may-bad/
> > >
> > > filterCache size of size="10240" is far in excess of what we usually
> > > recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
> > > Why did you choose this? On the theory that "more is better?" If
> > > you're using NOW then you may not be using the filterCache well, see:
> > >
> > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> > >
> > > autowarmCount="1024"
> > >
> > > Every time you commit you're firing off 1024 queries which is going to
> > > spike the CPU a lot. Again, this is super-excessive. I usually start
> > > with 16 or so.
> > >
> > > Why are you committing from a cron job? Why not just set your
> > > autocommit settings and forget about it? That's what they're for.
> > >
> > > Your queryResultCache is likewise kind of large, but it takes up much
> > > less space than the filterCache per entry so it's probably OK. I'd
> > > still shrink it and set the autowarm to 16 or so to start, unless
> > > you're seeing a pretty high hit ratio, which is pretty unusual but
> > > does happen.
> > >
> > > 48G of memory is just asking for long GC pauses. How many docs do you
> > > have in each core anyway? If you're really using this much heap, then
> > > it'd be good to see what you can do to shrink in. Enabling docValues
> > > for all fields you facet, sort or group on will help that a lot if you
> > > haven't already.
> > >
> > > How much memory on your entire machine? And how much is used by _all_
> > > the JVMs you running on a particular machine? MMapDirectory needs as
> > > much OS memory space as it can get, see:
> > >
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> > >
> > > Lately we've seen some structures that consume memory until a commit
> > > happens (either soft or hard). I'd shrink my autocommit down to 60
> > > seconds or even less (openSearcher=false).
> > >
> > > In short, I'd go back mostly to the default settings and build _up_ as
> > > you can demonstrate improvements. You've changed enough things here
> > > that untangling which one is the culprit will be hard. You want the
> > > JVM to have as little memory as possible, unfortunately that's
> > > something you figure out by experimentation.
> > >
> > > Best,
> > > Erick
> > >
> > > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <
> [hidden email]>
> > > wrote:
> > > > Hi,
> > > >
> > > > I am committing every 5 minutes using a periodic cron job  "curl
> > > > http://localhost:8984/solr/core1/update?commit=true". Besides this,
> my
> > > app
> > > > doesn't do any soft or hard commits. With Solr 7 upgrade, I am
> noticing
> > > > that query throughput plummets every 5 minutes - probably when the
> > commit
> > > > happens.
> > > > What can I do to improve this? I didn't use to happen like this in
> > > solr4.5.
> > > > (i.e., i used to get a stable query throughput of  50-60 queries per
> > > > second. Now there are spikes to 60 qps interleaved by drops to almost
> > > > **0**).  Between those 5 minutes, I am able to achieve high
> throughput,
> > > > hence I guess that issue is related to indexing or merging, and not
> > query
> > > > flow.
> > > >
> > > > I have 48G allotted to each solr process, and it seems that only ~50%
> > is
> > > > being used at any time, similarly CPU is not spiking beyond 50%
> either.
> > > > There is frequent merging (every 5 minute) , but i am not sure if
> that
> > is
> > > > a cause of the slowdown.
> > > >
> > > > Here are my merge and cache settings:
> > > >
> > > > Thanks
> > > > Nawab
> > > >
> > > > <mergePolicyFactory class="org.apache.solr.index.
> > > TieredMergePolicyFactory">
> > > >   <int name="maxMergeAtOnce">5</int>
> > > >   <int name="segmentsPerTier">5</int>
> > > >       <int name="maxMergeAtOnceExplicit">10</int>
> > > >       <int name="floorSegmentMB">16</int>
> > > >       <!-- 50 gb -->
> > > >       <double name="maxMergedSegmentMB">50000</double>
> > > >       <double name="forceMergeDeletesPctAllowed">1</double>
> > > >
> > > >     </mergePolicyFactory>
> > > >
> > > >
> > > >
> > > >
> > > > <filterCache class="solr.FastLRUCache"
> > > >              size="10240"
> > > >              initialSize="5120"
> > > >              autowarmCount="1024"/>
> > > > <queryResultCache class="solr.LRUCache"
> > > >                  size="10240"
> > > >                  initialSize="5120"
> > > >                  autowarmCount="0"/>
> > > > <documentCache class="solr.LRUCache"
> > > >                size="10240"
> > > >                initialSize="5120"
> > > >                autowarmCount="0"/>
> > > >
> > > >
> > > > <useColdSearcher>false</useColdSearcher>
> > > >
> > > > <maxWarmingSearchers>2</maxWarmingSearchers>
> > > >
> > > > <listener event="newSearcher" class="solr.QuerySenderListener">
> > > >   <arr name="queries">
> > > >   </arr>
> > > > </listener>
> > > > <listener event="firstSearcher" class="solr.QuerySenderListener">
> > > >   <arr name="queries">
> > > >   </arr>
> > > > </listener>
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr7: Bad query throughput around commit time

Erick Erickson
Nawab:

bq: Cache hit ratios are all in 80+% (even when i decreased the
filterCache to 128)

This suggests that you use a relatively small handful of fq clauses,
which is perfectly fine. Having 450M docs and a cache size of 1024 is
_really_ scary! You had a potential for a 57G (yes, gigabyte)
filterCache. Fortunately you apparently don't use enough different fq
clauses to fill it up, or they match very few documents. I cheated a
little, if the result set is small the individual doc IDs are stored
rather than a bitset 450M bits wide.... Your
admin>>core>>plugins/stats>>filterCache should show you how many
evictions there are which is another interesting stat.

As it is, you're filterCache might use up 7G or so. Hefty but you have
lots of RAM.

*************
bq:  Document cache hitratio is really bad,

This is often the case. Getting documents really means, here, getting
the _stored_ values. The point of the documentCache is to keep entries
in a cache for the various elements of a single request to use. To
name just 2
> you get the stored values for the "fl" list
> you highlight.

These are separate, and each accesses the stored values. Problem is,
"accessing the stored values" means
1> reading the document from disk
2> decompressing a 16K block minimum.

I'm skipping the fact that returning docValues doesn't need the stored
data, but you get the idea.

Anyway, not having to read/decompress for both the"fl" list and
highlighting is what the documentCache is about. That's where the
recommendation "size it as (max # of users) * (max rows)"
recommendation comes in (if you can afford the memory certainly).

Some users have situations where the documentCache hit ratio is much
better, but I'd be surprised if any core with 450M docs even got
close.

*************
bq: That supported the hypothesis that the query throughput decreases
after opening a new searcher and **not** after committing the index

Are you saying that you have something of a sawtooth pattern? I.e.
queries are slow "for a while" after opening a new searcher but then
improve until the next commit? This is usually an autowarm problem, so
you might address it with a more precise autowarm. Look particularly
for anything that sorts/groups/facets. Any such fields should have
docValues=true set. Unfortunately this will require a complete
re-index. Don't be frightened by the fact that enabling docValues will
cause your index size on disk to grow. Paradoxically that will
actually _lower_ the size of the JVM heap requirements. Essentially
the additional size on disk is the serialized structure that would
have to be built in the JVM. Since it is pre-built at index time, it
can be MMapped and use OS memory space and not JVM.

*************
450M docs and 800G index size is quite large and a prime candidate for
sharding FWIW.

Best,
Erick




On Sat, Nov 11, 2017 at 4:52 PM, Nawab Zada Asad Iqbal <[hidden email]> wrote:

> ~248 gb
>
> Nawab
>
>
> On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <[hidden email]> wrote:
>
>> > One machine runs with a 3TB drive, running 3 solr processes (each with
>> one core as described above).
>>
>> How much total memory on the machine?
>>
>> Kevin Risden
>>
>> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <[hidden email]>
>> wrote:
>>
>> > Thanks for a quick and detailed response, Erick!
>> >
>> > Unfortunately i don't have a proof, but our servers with solr 4.5 are
>> > running really nicely with the above config. I had assumed that same  or
>> > similar settings will also perform well with Solr 7, but that assumption
>> > didn't hold. As, a lot has changed in 3 major releases.
>> > I have tweaked the cache values as you suggested but increasing or
>> > decreasing doesn't seem to do any noticeable improvement.
>> >
>> > At the moment, my one core has 800GB index, ~450 Million documents, 48 G
>> > Xmx. GC pauses haven't been an issue though.  One machine runs with a 3TB
>> > drive, running 3 solr processes (each with one core as described
>> above).  I
>> > agree that it is a very atypical system so i should probably try
>> different
>> > parameters with a fresh eye to find the solution.
>> >
>> >
>> > I tried with autocommits (commit with opensearcher=false very half
>> minute ;
>> > and softcommit every 5 minutes). That supported the hypothesis that the
>> > query throughput decreases after opening a new searcher and **not** after
>> > committing the index . Cache hit ratios are all in 80+% (even when i
>> > decreased the filterCache to 128, so i will keep it at this lower value).
>> > Document cache hitratio is really bad, it drops to around 40% after
>> > newSearcher. But i guess that is expected, since it cannot be warmed up
>> > anyway.
>> >
>> >
>> > Thanks
>> > Nawab
>> >
>> >
>> >
>> > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <[hidden email]>
>> > wrote:
>> >
>> > > What evidence to you have that the changes you've made to your configs
>> > > are useful? There's lots of things in here that are suspect:
>> > >
>> > >   <double name="forceMergeDeletesPctAllowed">1</double>
>> > >
>> > > First, this is useless unless you are forceMerging/optimizing. Which
>> > > you shouldn't be doing under most circumstances. And you're going to
>> > > be rewriting a lot of data every time See:
>> > >
>> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-
>> > > documents-optimize-may-bad/
>> > >
>> > > filterCache size of size="10240" is far in excess of what we usually
>> > > recommend. Each entry can be up to maxDoc/8 and you have 10K of them.
>> > > Why did you choose this? On the theory that "more is better?" If
>> > > you're using NOW then you may not be using the filterCache well, see:
>> > >
>> > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
>> > >
>> > > autowarmCount="1024"
>> > >
>> > > Every time you commit you're firing off 1024 queries which is going to
>> > > spike the CPU a lot. Again, this is super-excessive. I usually start
>> > > with 16 or so.
>> > >
>> > > Why are you committing from a cron job? Why not just set your
>> > > autocommit settings and forget about it? That's what they're for.
>> > >
>> > > Your queryResultCache is likewise kind of large, but it takes up much
>> > > less space than the filterCache per entry so it's probably OK. I'd
>> > > still shrink it and set the autowarm to 16 or so to start, unless
>> > > you're seeing a pretty high hit ratio, which is pretty unusual but
>> > > does happen.
>> > >
>> > > 48G of memory is just asking for long GC pauses. How many docs do you
>> > > have in each core anyway? If you're really using this much heap, then
>> > > it'd be good to see what you can do to shrink in. Enabling docValues
>> > > for all fields you facet, sort or group on will help that a lot if you
>> > > haven't already.
>> > >
>> > > How much memory on your entire machine? And how much is used by _all_
>> > > the JVMs you running on a particular machine? MMapDirectory needs as
>> > > much OS memory space as it can get, see:
>> > >
>> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>> > >
>> > > Lately we've seen some structures that consume memory until a commit
>> > > happens (either soft or hard). I'd shrink my autocommit down to 60
>> > > seconds or even less (openSearcher=false).
>> > >
>> > > In short, I'd go back mostly to the default settings and build _up_ as
>> > > you can demonstrate improvements. You've changed enough things here
>> > > that untangling which one is the culprit will be hard. You want the
>> > > JVM to have as little memory as possible, unfortunately that's
>> > > something you figure out by experimentation.
>> > >
>> > > Best,
>> > > Erick
>> > >
>> > > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <
>> [hidden email]>
>> > > wrote:
>> > > > Hi,
>> > > >
>> > > > I am committing every 5 minutes using a periodic cron job  "curl
>> > > > http://localhost:8984/solr/core1/update?commit=true". Besides this,
>> my
>> > > app
>> > > > doesn't do any soft or hard commits. With Solr 7 upgrade, I am
>> noticing
>> > > > that query throughput plummets every 5 minutes - probably when the
>> > commit
>> > > > happens.
>> > > > What can I do to improve this? I didn't use to happen like this in
>> > > solr4.5.
>> > > > (i.e., i used to get a stable query throughput of  50-60 queries per
>> > > > second. Now there are spikes to 60 qps interleaved by drops to almost
>> > > > **0**).  Between those 5 minutes, I am able to achieve high
>> throughput,
>> > > > hence I guess that issue is related to indexing or merging, and not
>> > query
>> > > > flow.
>> > > >
>> > > > I have 48G allotted to each solr process, and it seems that only ~50%
>> > is
>> > > > being used at any time, similarly CPU is not spiking beyond 50%
>> either.
>> > > > There is frequent merging (every 5 minute) , but i am not sure if
>> that
>> > is
>> > > > a cause of the slowdown.
>> > > >
>> > > > Here are my merge and cache settings:
>> > > >
>> > > > Thanks
>> > > > Nawab
>> > > >
>> > > > <mergePolicyFactory class="org.apache.solr.index.
>> > > TieredMergePolicyFactory">
>> > > >   <int name="maxMergeAtOnce">5</int>
>> > > >   <int name="segmentsPerTier">5</int>
>> > > >       <int name="maxMergeAtOnceExplicit">10</int>
>> > > >       <int name="floorSegmentMB">16</int>
>> > > >       <!-- 50 gb -->
>> > > >       <double name="maxMergedSegmentMB">50000</double>
>> > > >       <double name="forceMergeDeletesPctAllowed">1</double>
>> > > >
>> > > >     </mergePolicyFactory>
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > <filterCache class="solr.FastLRUCache"
>> > > >              size="10240"
>> > > >              initialSize="5120"
>> > > >              autowarmCount="1024"/>
>> > > > <queryResultCache class="solr.LRUCache"
>> > > >                  size="10240"
>> > > >                  initialSize="5120"
>> > > >                  autowarmCount="0"/>
>> > > > <documentCache class="solr.LRUCache"
>> > > >                size="10240"
>> > > >                initialSize="5120"
>> > > >                autowarmCount="0"/>
>> > > >
>> > > >
>> > > > <useColdSearcher>false</useColdSearcher>
>> > > >
>> > > > <maxWarmingSearchers>2</maxWarmingSearchers>
>> > > >
>> > > > <listener event="newSearcher" class="solr.QuerySenderListener">
>> > > >   <arr name="queries">
>> > > >   </arr>
>> > > > </listener>
>> > > > <listener event="firstSearcher" class="solr.QuerySenderListener">
>> > > >   <arr name="queries">
>> > > >   </arr>
>> > > > </listener>
>> > >
>> >
>>
Reply | Threaded
Open this post in threaded view
|

Re: Solr7: Bad query throughput around commit time

Nawab Zada Asad Iqbal
Thanks Erik, Yes i see the 'sawtooth' pattern. I will try your suggestion,
but i am wondering why were the queries performant with solr4 without
DocValues? Have some defaults changed?

---



On Sat, Nov 11, 2017 at 8:28 PM, Erick Erickson <[hidden email]>
wrote:

> Nawab:
>
> bq: Cache hit ratios are all in 80+% (even when i decreased the
> filterCache to 128)
>
> This suggests that you use a relatively small handful of fq clauses,
> which is perfectly fine. Having 450M docs and a cache size of 1024 is
> _really_ scary! You had a potential for a 57G (yes, gigabyte)
> filterCache. Fortunately you apparently don't use enough different fq
> clauses to fill it up, or they match very few documents. I cheated a
> little, if the result set is small the individual doc IDs are stored
> rather than a bitset 450M bits wide.... Your
> admin>>core>>plugins/stats>>filterCache should show you how many
> evictions there are which is another interesting stat.
>
> As it is, you're filterCache might use up 7G or so. Hefty but you have
> lots of RAM.
>
> *************
> bq:  Document cache hitratio is really bad,
>
> This is often the case. Getting documents really means, here, getting
> the _stored_ values. The point of the documentCache is to keep entries
> in a cache for the various elements of a single request to use. To
> name just 2
> > you get the stored values for the "fl" list
> > you highlight.
>
> These are separate, and each accesses the stored values. Problem is,
> "accessing the stored values" means
> 1> reading the document from disk
> 2> decompressing a 16K block minimum.
>
> I'm skipping the fact that returning docValues doesn't need the stored
> data, but you get the idea.
>
> Anyway, not having to read/decompress for both the"fl" list and
> highlighting is what the documentCache is about. That's where the
> recommendation "size it as (max # of users) * (max rows)"
> recommendation comes in (if you can afford the memory certainly).
>
> Some users have situations where the documentCache hit ratio is much
> better, but I'd be surprised if any core with 450M docs even got
> close.
>
> *************
> bq: That supported the hypothesis that the query throughput decreases
> after opening a new searcher and **not** after committing the index
>
> Are you saying that you have something of a sawtooth pattern? I.e.
> queries are slow "for a while" after opening a new searcher but then
> improve until the next commit? This is usually an autowarm problem, so
> you might address it with a more precise autowarm. Look particularly
> for anything that sorts/groups/facets. Any such fields should have
> docValues=true set. Unfortunately this will require a complete
> re-index. Don't be frightened by the fact that enabling docValues will
> cause your index size on disk to grow. Paradoxically that will
> actually _lower_ the size of the JVM heap requirements. Essentially
> the additional size on disk is the serialized structure that would
> have to be built in the JVM. Since it is pre-built at index time, it
> can be MMapped and use OS memory space and not JVM.
>
> *************
> 450M docs and 800G index size is quite large and a prime candidate for
> sharding FWIW.
>
> Best,
> Erick
>
>
>
>
> On Sat, Nov 11, 2017 at 4:52 PM, Nawab Zada Asad Iqbal <[hidden email]>
> wrote:
> > ~248 gb
> >
> > Nawab
> >
> >
> > On Sat, Nov 11, 2017 at 2:41 PM Kevin Risden <[hidden email]> wrote:
> >
> >> > One machine runs with a 3TB drive, running 3 solr processes (each with
> >> one core as described above).
> >>
> >> How much total memory on the machine?
> >>
> >> Kevin Risden
> >>
> >> On Sat, Nov 11, 2017 at 1:08 PM, Nawab Zada Asad Iqbal <
> [hidden email]>
> >> wrote:
> >>
> >> > Thanks for a quick and detailed response, Erick!
> >> >
> >> > Unfortunately i don't have a proof, but our servers with solr 4.5 are
> >> > running really nicely with the above config. I had assumed that same
> or
> >> > similar settings will also perform well with Solr 7, but that
> assumption
> >> > didn't hold. As, a lot has changed in 3 major releases.
> >> > I have tweaked the cache values as you suggested but increasing or
> >> > decreasing doesn't seem to do any noticeable improvement.
> >> >
> >> > At the moment, my one core has 800GB index, ~450 Million documents,
> 48 G
> >> > Xmx. GC pauses haven't been an issue though.  One machine runs with a
> 3TB
> >> > drive, running 3 solr processes (each with one core as described
> >> above).  I
> >> > agree that it is a very atypical system so i should probably try
> >> different
> >> > parameters with a fresh eye to find the solution.
> >> >
> >> >
> >> > I tried with autocommits (commit with opensearcher=false very half
> >> minute ;
> >> > and softcommit every 5 minutes). That supported the hypothesis that
> the
> >> > query throughput decreases after opening a new searcher and **not**
> after
> >> > committing the index . Cache hit ratios are all in 80+% (even when i
> >> > decreased the filterCache to 128, so i will keep it at this lower
> value).
> >> > Document cache hitratio is really bad, it drops to around 40% after
> >> > newSearcher. But i guess that is expected, since it cannot be warmed
> up
> >> > anyway.
> >> >
> >> >
> >> > Thanks
> >> > Nawab
> >> >
> >> >
> >> >
> >> > On Thu, Nov 9, 2017 at 9:11 PM, Erick Erickson <
> [hidden email]>
> >> > wrote:
> >> >
> >> > > What evidence to you have that the changes you've made to your
> configs
> >> > > are useful? There's lots of things in here that are suspect:
> >> > >
> >> > >   <double name="forceMergeDeletesPctAllowed">1</double>
> >> > >
> >> > > First, this is useless unless you are forceMerging/optimizing. Which
> >> > > you shouldn't be doing under most circumstances. And you're going to
> >> > > be rewriting a lot of data every time See:
> >> > >
> >> > > https://lucidworks.com/2017/10/13/segment-merging-deleted-
> >> > > documents-optimize-may-bad/
> >> > >
> >> > > filterCache size of size="10240" is far in excess of what we usually
> >> > > recommend. Each entry can be up to maxDoc/8 and you have 10K of
> them.
> >> > > Why did you choose this? On the theory that "more is better?" If
> >> > > you're using NOW then you may not be using the filterCache well,
> see:
> >> > >
> >> > > https://lucidworks.com/2012/02/23/date-math-now-and-filter-queries/
> >> > >
> >> > > autowarmCount="1024"
> >> > >
> >> > > Every time you commit you're firing off 1024 queries which is going
> to
> >> > > spike the CPU a lot. Again, this is super-excessive. I usually start
> >> > > with 16 or so.
> >> > >
> >> > > Why are you committing from a cron job? Why not just set your
> >> > > autocommit settings and forget about it? That's what they're for.
> >> > >
> >> > > Your queryResultCache is likewise kind of large, but it takes up
> much
> >> > > less space than the filterCache per entry so it's probably OK. I'd
> >> > > still shrink it and set the autowarm to 16 or so to start, unless
> >> > > you're seeing a pretty high hit ratio, which is pretty unusual but
> >> > > does happen.
> >> > >
> >> > > 48G of memory is just asking for long GC pauses. How many docs do
> you
> >> > > have in each core anyway? If you're really using this much heap,
> then
> >> > > it'd be good to see what you can do to shrink in. Enabling docValues
> >> > > for all fields you facet, sort or group on will help that a lot if
> you
> >> > > haven't already.
> >> > >
> >> > > How much memory on your entire machine? And how much is used by
> _all_
> >> > > the JVMs you running on a particular machine? MMapDirectory needs as
> >> > > much OS memory space as it can get, see:
> >> > >
> >> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
> >> > >
> >> > > Lately we've seen some structures that consume memory until a commit
> >> > > happens (either soft or hard). I'd shrink my autocommit down to 60
> >> > > seconds or even less (openSearcher=false).
> >> > >
> >> > > In short, I'd go back mostly to the default settings and build _up_
> as
> >> > > you can demonstrate improvements. You've changed enough things here
> >> > > that untangling which one is the culprit will be hard. You want the
> >> > > JVM to have as little memory as possible, unfortunately that's
> >> > > something you figure out by experimentation.
> >> > >
> >> > > Best,
> >> > > Erick
> >> > >
> >> > > On Thu, Nov 9, 2017 at 8:42 PM, Nawab Zada Asad Iqbal <
> >> [hidden email]>
> >> > > wrote:
> >> > > > Hi,
> >> > > >
> >> > > > I am committing every 5 minutes using a periodic cron job  "curl
> >> > > > http://localhost:8984/solr/core1/update?commit=true". Besides
> this,
> >> my
> >> > > app
> >> > > > doesn't do any soft or hard commits. With Solr 7 upgrade, I am
> >> noticing
> >> > > > that query throughput plummets every 5 minutes - probably when the
> >> > commit
> >> > > > happens.
> >> > > > What can I do to improve this? I didn't use to happen like this in
> >> > > solr4.5.
> >> > > > (i.e., i used to get a stable query throughput of  50-60 queries
> per
> >> > > > second. Now there are spikes to 60 qps interleaved by drops to
> almost
> >> > > > **0**).  Between those 5 minutes, I am able to achieve high
> >> throughput,
> >> > > > hence I guess that issue is related to indexing or merging, and
> not
> >> > query
> >> > > > flow.
> >> > > >
> >> > > > I have 48G allotted to each solr process, and it seems that only
> ~50%
> >> > is
> >> > > > being used at any time, similarly CPU is not spiking beyond 50%
> >> either.
> >> > > > There is frequent merging (every 5 minute) , but i am not sure if
> >> that
> >> > is
> >> > > > a cause of the slowdown.
> >> > > >
> >> > > > Here are my merge and cache settings:
> >> > > >
> >> > > > Thanks
> >> > > > Nawab
> >> > > >
> >> > > > <mergePolicyFactory class="org.apache.solr.index.
> >> > > TieredMergePolicyFactory">
> >> > > >   <int name="maxMergeAtOnce">5</int>
> >> > > >   <int name="segmentsPerTier">5</int>
> >> > > >       <int name="maxMergeAtOnceExplicit">10</int>
> >> > > >       <int name="floorSegmentMB">16</int>
> >> > > >       <!-- 50 gb -->
> >> > > >       <double name="maxMergedSegmentMB">50000</double>
> >> > > >       <double name="forceMergeDeletesPctAllowed">1</double>
> >> > > >
> >> > > >     </mergePolicyFactory>
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > <filterCache class="solr.FastLRUCache"
> >> > > >              size="10240"
> >> > > >              initialSize="5120"
> >> > > >              autowarmCount="1024"/>
> >> > > > <queryResultCache class="solr.LRUCache"
> >> > > >                  size="10240"
> >> > > >                  initialSize="5120"
> >> > > >                  autowarmCount="0"/>
> >> > > > <documentCache class="solr.LRUCache"
> >> > > >                size="10240"
> >> > > >                initialSize="5120"
> >> > > >                autowarmCount="0"/>
> >> > > >
> >> > > >
> >> > > > <useColdSearcher>false</useColdSearcher>
> >> > > >
> >> > > > <maxWarmingSearchers>2</maxWarmingSearchers>
> >> > > >
> >> > > > <listener event="newSearcher" class="solr.QuerySenderListener">
> >> > > >   <arr name="queries">
> >> > > >   </arr>
> >> > > > </listener>
> >> > > > <listener event="firstSearcher" class="solr.QuerySenderListener">
> >> > > >   <arr name="queries">
> >> > > >   </arr>
> >> > > > </listener>
> >> > >
> >> >
> >>
>