JVM GC Issue

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

JVM GC Issue

Dominique Bejean
Hi,

We are encountering issue with GC.

Randomly nearly once a day there are consecutive full GC with no memory
reclaimed.

So the old generation heap usage grow up to the limit.

Solr stop responding and we need to force restart.


We are using Solr 6.6.1 with Oracle 1.8 JVM. The JVM settings are the
default one with CMS GC.


Solr is configured as standalone in master/slave replication. Replication
are pooled each 10 minutes.


Hardware has 4 vcpu and 16Gb RAM

Both Xms and Xmx are set to 6Gb.

The core contain 1.200.000 documents, with a lot of fields both stored and
indexed without docValues enabled.

The file system core size is 11Gb


All Solr cache are sized to 1024 with autowarmcount to 256


Queries rate is low (10 per seconds). The only memory consuming feature are
sort and facet. There are 10 facets from 10 to 2000 distinct values


I don't see errors or warning in solr logs about opening or warming new
searchers.

here is link to gceasy report.
http://gceasy.io/my-gc-report.jsp?p=c2hhcmVkLzIwMTcvMTIvMS8tLXNvbHJfZ2MubG9nLjAuemlwLS0xMS0xNC0zMA==

Gceasy suggest to increase heap size, but I do not agree

Any suggestion about this issue ?

Regards.

Dominique
--
Dominique Béjean
06 08 46 12 43
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Toke Eskildsen-2
Dominique Bejean <[hidden email]> wrote:
> We are encountering issue with GC.

> Randomly nearly once a day there are consecutive full GC with no memory
> reclaimed.

[... 1.2M docs, Xmx 6GB ...]

> Gceasy suggest to increase heap size, but I do not agree

It does seem strange, with your apparently modest index & workload. Nothing you say sounds problematic to me and you have covered the usual culprits overlapping searchers, faceting and filterCache.

Is it possible for you to share the solr.log around the two times that memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.

If you cannot share, please check if you have excessive traffic around that time or if there is a lot of UnInverting going on (triggered by faceting on non.DocValues String fields). I know your post implies that you have already done so, so this is more of a sanity check.


- Toke Eskildsen
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Erick Erickson
Your autowarm counts are rather high, bit as Toke says this doesn't
seem outrageous.

I have seen situations where Solr is running close to the limits of
its heap and GC only reclaims a tiny bit of memory each time, when you
say "full GC with no memory
reclaimed" is that really no memory _at all_? Or "almost no memory"?
This situation can be alleviated by allocating more memory to the JVM
.

Your JVM pressure would certainly be reduced by enabling docValues on
any field you sort,facet or group on. That would require a full
reindex of course. Note that this makes your index on disk bigger, but
reduces JVM pressure by roughly the same amount so it's a win in this
situation.

Have you attached a memory profiler to the running Solr instance? I'd
be curious where the memory is being allocated.

Best,
Erick

On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <[hidden email]> wrote:

> Dominique Bejean <[hidden email]> wrote:
>> We are encountering issue with GC.
>
>> Randomly nearly once a day there are consecutive full GC with no memory
>> reclaimed.
>
> [... 1.2M docs, Xmx 6GB ...]
>
>> Gceasy suggest to increase heap size, but I do not agree
>
> It does seem strange, with your apparently modest index & workload. Nothing you say sounds problematic to me and you have covered the usual culprits overlapping searchers, faceting and filterCache.
>
> Is it possible for you to share the solr.log around the two times that memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.
>
> If you cannot share, please check if you have excessive traffic around that time or if there is a lot of UnInverting going on (triggered by faceting on non.DocValues String fields). I know your post implies that you have already done so, so this is more of a sanity check.
>
>
> - Toke Eskildsen
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Dominique Bejean
Hi,


Thank you both for your responses.


I just have solr log for the very last period of the CG log.


Grep command allows me to count queries per minute with hits > 1000 or >
10000 and so with the biggest impact on memory and cpu during faceting


> 1000

     59 11:13

     45 11:14

     36 11:15

     45 11:16

     59 11:17

     40 11:18

     95 11:19

    123 11:20

    137 11:21

    123 11:22

     86 11:23

     26 11:24

     19 11:25

     17 11:26


> 10000

     55 11:19

     78 11:20

     48 11:21

    134 11:22

     93 11:23

     10 11:24


So we see that at the time GC start become nuts, large result set count
increase.


The query field include phonetic filter and results are really not relevant
due to this. I will suggest to :


1/ remove the phonetic filter in order to have less irrelevant results and
so get smaller result set

2/ enable docValues on field use for faceting


I expect decrease GC requirements and stabilize GC.


Regards


Dominique





Le ven. 1 déc. 2017 à 18:17, Erick Erickson <[hidden email]> a
écrit :

> Your autowarm counts are rather high, bit as Toke says this doesn't
> seem outrageous.
>
> I have seen situations where Solr is running close to the limits of
> its heap and GC only reclaims a tiny bit of memory each time, when you
> say "full GC with no memory
> reclaimed" is that really no memory _at all_? Or "almost no memory"?
> This situation can be alleviated by allocating more memory to the JVM
> .
>
> Your JVM pressure would certainly be reduced by enabling docValues on
> any field you sort,facet or group on. That would require a full
> reindex of course. Note that this makes your index on disk bigger, but
> reduces JVM pressure by roughly the same amount so it's a win in this
> situation.
>
> Have you attached a memory profiler to the running Solr instance? I'd
> be curious where the memory is being allocated.
>
> Best,
> Erick
>
> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <[hidden email]> wrote:
> > Dominique Bejean <[hidden email]> wrote:
> >> We are encountering issue with GC.
> >
> >> Randomly nearly once a day there are consecutive full GC with no memory
> >> reclaimed.
> >
> > [... 1.2M docs, Xmx 6GB ...]
> >
> >> Gceasy suggest to increase heap size, but I do not agree
> >
> > It does seem strange, with your apparently modest index & workload.
> Nothing you say sounds problematic to me and you have covered the usual
> culprits overlapping searchers, faceting and filterCache.
> >
> > Is it possible for you to share the solr.log around the two times that
> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.
> >
> > If you cannot share, please check if you have excessive traffic around
> that time or if there is a lot of UnInverting going on (triggered by
> faceting on non.DocValues String fields). I know your post implies that you
> have already done so, so this is more of a sanity check.
> >
> >
> > - Toke Eskildsen
>
--
Dominique Béjean
06 08 46 12 43
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Erick Erickson
Doninique:

Actually, the memory requirements shouldn't really go up as the number
of hits increases. The general algorithm is (say rows=10)
Calcluate the score of each doc
If the score is zero, ignore
If the score is > the minimum in my current top 10, replace the lowest
scoring doc in my current top 10 with the new doc (a PriorityQueue
last I knew).
else discard the doc.

When all docs have been scored, assemble the return from the top 10
(or whatever rows is set to).

The key here is that most of the Solr index is kept in
MMapDirecotry/OS space, see Uwe's excellent blog here:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
In terms of _searching_, very little of the Lucene index structures
are kept in memory.

That said, faceting plays a bit loose with the rules. If you have
docValues set to true, most of the memory structures are in the OS
memory space, not the JVM. If you have docValues set to false, then
the "uninverted" structure is built in the JVM heap space.

Additionally, the JVM requirements are sensitive to the number of
unique values in field being faceted on. For instance, let's say you
faceted by a date field with just facet.field=some_date_field. A
bucket would have to be allocated to hold the counts for each and
every unique date field, i.e. one for each millisecond in your search,
which might be something you're seeing. Conceptually this is just an
array[uniqueValues] of ints (longs? I'm not sure). This should be
relatively easily testable by omitting the facets while measuring.

Where the number of rows _does_ make a difference is in the return
packet. Say I have rows=10. In that case I create a single return
packet with all 10 docs "fl" field. If rows = 10,000 then that return
packet is obviously 1,000 times as large and must be assembled in
memory.

I rather doubt the phonetic filter is to blame. But you can test this
by just omitting the field containing the phonetic filter in the
search query. I've certainly been wrong before.....

Best,
Erick

On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
<[hidden email]> wrote:

> Hi,
>
>
> Thank you both for your responses.
>
>
> I just have solr log for the very last period of the CG log.
>
>
> Grep command allows me to count queries per minute with hits > 1000 or >
> 10000 and so with the biggest impact on memory and cpu during faceting
>
>
>> 1000
>
>      59 11:13
>
>      45 11:14
>
>      36 11:15
>
>      45 11:16
>
>      59 11:17
>
>      40 11:18
>
>      95 11:19
>
>     123 11:20
>
>     137 11:21
>
>     123 11:22
>
>      86 11:23
>
>      26 11:24
>
>      19 11:25
>
>      17 11:26
>
>
>> 10000
>
>      55 11:19
>
>      78 11:20
>
>      48 11:21
>
>     134 11:22
>
>      93 11:23
>
>      10 11:24
>
>
> So we see that at the time GC start become nuts, large result set count
> increase.
>
>
> The query field include phonetic filter and results are really not relevant
> due to this. I will suggest to :
>
>
> 1/ remove the phonetic filter in order to have less irrelevant results and
> so get smaller result set
>
> 2/ enable docValues on field use for faceting
>
>
> I expect decrease GC requirements and stabilize GC.
>
>
> Regards
>
>
> Dominique
>
>
>
>
>
> Le ven. 1 déc. 2017 à 18:17, Erick Erickson <[hidden email]> a
> écrit :
>
>> Your autowarm counts are rather high, bit as Toke says this doesn't
>> seem outrageous.
>>
>> I have seen situations where Solr is running close to the limits of
>> its heap and GC only reclaims a tiny bit of memory each time, when you
>> say "full GC with no memory
>> reclaimed" is that really no memory _at all_? Or "almost no memory"?
>> This situation can be alleviated by allocating more memory to the JVM
>> .
>>
>> Your JVM pressure would certainly be reduced by enabling docValues on
>> any field you sort,facet or group on. That would require a full
>> reindex of course. Note that this makes your index on disk bigger, but
>> reduces JVM pressure by roughly the same amount so it's a win in this
>> situation.
>>
>> Have you attached a memory profiler to the running Solr instance? I'd
>> be curious where the memory is being allocated.
>>
>> Best,
>> Erick
>>
>> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <[hidden email]> wrote:
>> > Dominique Bejean <[hidden email]> wrote:
>> >> We are encountering issue with GC.
>> >
>> >> Randomly nearly once a day there are consecutive full GC with no memory
>> >> reclaimed.
>> >
>> > [... 1.2M docs, Xmx 6GB ...]
>> >
>> >> Gceasy suggest to increase heap size, but I do not agree
>> >
>> > It does seem strange, with your apparently modest index & workload.
>> Nothing you say sounds problematic to me and you have covered the usual
>> culprits overlapping searchers, faceting and filterCache.
>> >
>> > Is it possible for you to share the solr.log around the two times that
>> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.
>> >
>> > If you cannot share, please check if you have excessive traffic around
>> that time or if there is a lot of UnInverting going on (triggered by
>> faceting on non.DocValues String fields). I know your post implies that you
>> have already done so, so this is more of a sanity check.
>> >
>> >
>> > - Toke Eskildsen
>>
> --
> Dominique Béjean
> 06 08 46 12 43
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Dominique Bejean
Hi, Thank you for the explanations about faceting. I was thinking the hit
count had a biggest impact on facet memory lifecycle. Regardless the hit
cout there is a query peak at the time the issue occurs. This is relative
in regard of what Solr is supposed be able to handle, but this should be
sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09
285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299
10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24
At the time the issue occurs, we see the CPU activity growing up to very
high. May be there is a lack of CPU. So, I will suggest all actions that
will remove pressure on heap memory.


   - enable docValues
   - divide cache size per 2 in order go back to Solr default
   - refine the fl parameter as I know it can optimized

Concerning phonetic filter, anyway it will be removed as a large number of
results are really irrelevant. Regads. Dominique


Le sam. 2 déc. 2017 à 04:25, Erick Erickson <[hidden email]> a
écrit :

> Doninique:
>
> Actually, the memory requirements shouldn't really go up as the number
> of hits increases. The general algorithm is (say rows=10)
> Calcluate the score of each doc
> If the score is zero, ignore
> If the score is > the minimum in my current top 10, replace the lowest
> scoring doc in my current top 10 with the new doc (a PriorityQueue
> last I knew).
> else discard the doc.
>
> When all docs have been scored, assemble the return from the top 10
> (or whatever rows is set to).
>
> The key here is that most of the Solr index is kept in
> MMapDirecotry/OS space, see Uwe's excellent blog here:
> http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> In terms of _searching_, very little of the Lucene index structures
> are kept in memory.
>
> That said, faceting plays a bit loose with the rules. If you have
> docValues set to true, most of the memory structures are in the OS
> memory space, not the JVM. If you have docValues set to false, then
> the "uninverted" structure is built in the JVM heap space.
>
> Additionally, the JVM requirements are sensitive to the number of
> unique values in field being faceted on. For instance, let's say you
> faceted by a date field with just facet.field=some_date_field. A
> bucket would have to be allocated to hold the counts for each and
> every unique date field, i.e. one for each millisecond in your search,
> which might be something you're seeing. Conceptually this is just an
> array[uniqueValues] of ints (longs? I'm not sure). This should be
> relatively easily testable by omitting the facets while measuring.
>
> Where the number of rows _does_ make a difference is in the return
> packet. Say I have rows=10. In that case I create a single return
> packet with all 10 docs "fl" field. If rows = 10,000 then that return
> packet is obviously 1,000 times as large and must be assembled in
> memory.
>
> I rather doubt the phonetic filter is to blame. But you can test this
> by just omitting the field containing the phonetic filter in the
> search query. I've certainly been wrong before.....
>
> Best,
> Erick
>
> On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
> <[hidden email]> wrote:
> > Hi,
> >
> >
> > Thank you both for your responses.
> >
> >
> > I just have solr log for the very last period of the CG log.
> >
> >
> > Grep command allows me to count queries per minute with hits > 1000 or >
> > 10000 and so with the biggest impact on memory and cpu during faceting
> >
> >
> >> 1000
> >
> >      59 11:13
> >
> >      45 11:14
> >
> >      36 11:15
> >
> >      45 11:16
> >
> >      59 11:17
> >
> >      40 11:18
> >
> >      95 11:19
> >
> >     123 11:20
> >
> >     137 11:21
> >
> >     123 11:22
> >
> >      86 11:23
> >
> >      26 11:24
> >
> >      19 11:25
> >
> >      17 11:26
> >
> >
> >> 10000
> >
> >      55 11:19
> >
> >      78 11:20
> >
> >      48 11:21
> >
> >     134 11:22
> >
> >      93 11:23
> >
> >      10 11:24
> >
> >
> > So we see that at the time GC start become nuts, large result set count
> > increase.
> >
> >
> > The query field include phonetic filter and results are really not
> relevant
> > due to this. I will suggest to :
> >
> >
> > 1/ remove the phonetic filter in order to have less irrelevant results
> and
> > so get smaller result set
> >
> > 2/ enable docValues on field use for faceting
> >
> >
> > I expect decrease GC requirements and stabilize GC.
> >
> >
> > Regards
> >
> >
> > Dominique
> >
> >
> >
> >
> >
> > Le ven. 1 déc. 2017 à 18:17, Erick Erickson <[hidden email]> a
> > écrit :
> >
> >> Your autowarm counts are rather high, bit as Toke says this doesn't
> >> seem outrageous.
> >>
> >> I have seen situations where Solr is running close to the limits of
> >> its heap and GC only reclaims a tiny bit of memory each time, when you
> >> say "full GC with no memory
> >> reclaimed" is that really no memory _at all_? Or "almost no memory"?
> >> This situation can be alleviated by allocating more memory to the JVM
> >> .
> >>
> >> Your JVM pressure would certainly be reduced by enabling docValues on
> >> any field you sort,facet or group on. That would require a full
> >> reindex of course. Note that this makes your index on disk bigger, but
> >> reduces JVM pressure by roughly the same amount so it's a win in this
> >> situation.
> >>
> >> Have you attached a memory profiler to the running Solr instance? I'd
> >> be curious where the memory is being allocated.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <[hidden email]> wrote:
> >> > Dominique Bejean <[hidden email]> wrote:
> >> >> We are encountering issue with GC.
> >> >
> >> >> Randomly nearly once a day there are consecutive full GC with no
> memory
> >> >> reclaimed.
> >> >
> >> > [... 1.2M docs, Xmx 6GB ...]
> >> >
> >> >> Gceasy suggest to increase heap size, but I do not agree
> >> >
> >> > It does seem strange, with your apparently modest index & workload.
> >> Nothing you say sounds problematic to me and you have covered the usual
> >> culprits overlapping searchers, faceting and filterCache.
> >> >
> >> > Is it possible for you to share the solr.log around the two times that
> >> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01 08:00-12:00.
> >> >
> >> > If you cannot share, please check if you have excessive traffic around
> >> that time or if there is a lot of UnInverting going on (triggered by
> >> faceting on non.DocValues String fields). I know your post implies that
> you
> >> have already done so, so this is more of a sanity check.
> >> >
> >> >
> >> > - Toke Eskildsen
> >>
> > --
> > Dominique Béjean
> > 06 08 46 12 43
>
--
Dominique Béjean
06 08 46 12 43
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Toke Eskildsen-2
Dominique Bejean <[hidden email]> wrote:
> Hi, Thank you for the explanations about faceting. I was thinking the hit
> count had a biggest impact on facet memory lifecycle.

Only if you have a very high facet.limit. Could you provide us with a typical query, including all the parameters?

- Toke Eskildsen
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Dominique Bejean
Hi Toke,

Nearly 30% of the requests are setting facet.limit=200

On 42000 requests the number of time each field is used for faceting is

$ grep  'facet=true' select.log | grep -oP 'facet.field=([^&])*' | sort |
uniq -c | sort -r

 23119 facet.field=category_path

  8643 facet.field=EUR_0_price_decimal

  5560 facet.field=type_pratique_facet

  5560 facet.field=size_facet_facet

  5560 facet.field=marque_facet

  5560 facet.field=is_marketplace_origin_facet

  5560 facet.field=gender_facet

  5560 facet.field=color_facet_facet

  5560 facet.field=club_facet

  5560 facet.field=age_facet

  3290 facet.field=durete_facet

  3290 facet.field=diametre_roues_facet

   169 facet.field=EUR_1_price_decimal

    38 facet.field=EUR_4_price_decimal


The larger count of unique values for these fields are

category_path 3025

marque_facet 2100

size_facet_facet 1400

type_pratique_facet 166

color_facet_facet 165

Here are 2 typical queries :

2017-11-20 10:13:27.585 INFO  (qtp592179046-15153) [   x:french]
o.a.s.c.S.Request [french]  webapp=/solr path=/select
params={mm=100%25&facet.field=category_path&facet.field=age_facet&facet.field=is_marketplace_origin_facet&facet.field=type_pratique_facet&facet.field=gender_facet&facet.field=color_facet_facet&facet.field=size_facet_facet&facet.field=EUR_0_price_decimal&facet.field=EUR_0_price_decimal&facet.field=marque_facet&facet.field=club_facet&facet.field=diametre_roues_facet&facet.field=durete_facet&q.alt=*:*&
json.nl=map&fl=products_id,product_type_static,name_varchar,store_id,website_id,EUR_0_price_decimal&start=0&fq=(store_id:"1")+AND+(website_id:"1")+AND+(product_status:"1")+AND+(category_id:"3389")+AND+(filter_visibility_int:"2"+OR+filter_visibility_int:"4")&rows=48&bq=is_marketplace_origin_boost_exact:"OUI"^210+is_marketplace_origin_boost:OUI^207+is_marketplace_origin_relative_boost:OUI^203+is_marketplace_origin_boost_exact:"NON"^210+is_marketplace_origin_boost:NON^207+is_marketplace_origin_relative_boost:NON^203+&q=*:*&facet.limit=200&spellcheck.q=&defType=edismax&qf=textSearch&stats=true&facet.mincount=1&facet=true&wt=json&stats.field=EUR_0_price_decimal&stats.field=sort_EUR_0_special_price_decimal&spellcheck.collate=true&timestamp=1511172807}
hits=953 status=0 QTime=26

2017-11-20 10:17:28.193 INFO  (qtp592179046-17115) [   x:french]
o.a.s.c.S.Request [french]  webapp=/solr path=/select
params={mm=100%25&facet.field=category_path&facet.field=age_facet&facet.field=is_marketplace_origin_facet&facet.field=type_pratique_facet&facet.field=gender_facet&facet.field=color_facet_facet&facet.field=size_facet_facet&facet.field=EUR_0_price_decimal&facet.field=marque_facet&facet.field=club_facet&
json.nl=map&fl=products_id,product_type_static,name_varchar,store_id,website_id,EUR_0_price_decimal&start=0&fq=(store_id:"1")+AND+(website_id:"1")+AND+(product_status:"1")+AND+(filter_visibility_int:"3"+OR+filter_visibility_int:"4")8&bq=name_boost_exact:"velo"^120+name_boost:"velo"^100+name_relative_boost:velo^80+category_boost:velo^60+is_marketplace_origin_boost_exact:"OUI"^210+is_marketplace_origin_boost:OUI^207+is_marketplace_origin_relative_boost:OUI^203+is_marketplace_origin_boost_exact:"NON"^210+is_marketplace_origin_boost:NON^207+is_marketplace_origin_relative_boost:NON^203+size_facet_boost_exact:"velo"^299+size_facet_boost:velo^296+size_facet_relative_boost:velo^292+marque_boost_exact:"velo"^359+marque_boost:velo^356+marque_relative_boost:velo^352+&q=velo&facet.limit=200&spellcheck.q=velo&defType=edismax&qf=textSearch&stats=true&facet.mincount=1&facet=true&wt=json&stats.field=EUR_0_price_decimal&stats.field=sort_EUR_0_special_price_decimal&spellcheck.collate=true&timestamp=1511173047}
hits=6761 status=0 QTime=38




Dominique


Le sam. 2 déc. 2017 à 16:23, Toke Eskildsen <[hidden email]> a écrit :

> Dominique Bejean <[hidden email]> wrote:
> > Hi, Thank you for the explanations about faceting. I was thinking the hit
> > count had a biggest impact on facet memory lifecycle.
>
> Only if you have a very high facet.limit. Could you provide us with a
> typical query, including all the parameters?
>
> - Toke Eskildsen
>
--
Dominique Béjean
06 08 46 12 43
S G
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

S G
In reply to this post by Dominique Bejean
I am a bit curious on the docValues implementation.
I understand that docValues do not use JVM memory and
they make use of OS cache - that is why they are more performant.

But to return any response from the docValues, the values in the
docValues' column-oriented-structures would need to be brought
into the JVM's memory. And that will then increase the pressure
on the JVM's memory anyways. So how do docValues actually
help from memory perspective?

Thanks
SG


On Sat, Dec 2, 2017 at 12:39 AM, Dominique Bejean <[hidden email]
> wrote:

> Hi, Thank you for the explanations about faceting. I was thinking the hit
> count had a biggest impact on facet memory lifecycle. Regardless the hit
> cout there is a query peak at the time the issue occurs. This is relative
> in regard of what Solr is supposed be able to handle, but this should be
> sufficient to explain GC activity growing up. 198 10:07 208 10:08 267 10:09
> 285 10:10 244 10:11 286 10:12 277 10:13 252 10:14 183 10:15 302 10:16 299
> 10:17 273 10:18 348 10:19 468 10:20 496 10:21 673 10:22 496 10:23 101 10:24
> At the time the issue occurs, we see the CPU activity growing up to very
> high. May be there is a lack of CPU. So, I will suggest all actions that
> will remove pressure on heap memory.
>
>
>    - enable docValues
>    - divide cache size per 2 in order go back to Solr default
>    - refine the fl parameter as I know it can optimized
>
> Concerning phonetic filter, anyway it will be removed as a large number of
> results are really irrelevant. Regads. Dominique
>
>
> Le sam. 2 déc. 2017 à 04:25, Erick Erickson <[hidden email]> a
> écrit :
>
> > Doninique:
> >
> > Actually, the memory requirements shouldn't really go up as the number
> > of hits increases. The general algorithm is (say rows=10)
> > Calcluate the score of each doc
> > If the score is zero, ignore
> > If the score is > the minimum in my current top 10, replace the lowest
> > scoring doc in my current top 10 with the new doc (a PriorityQueue
> > last I knew).
> > else discard the doc.
> >
> > When all docs have been scored, assemble the return from the top 10
> > (or whatever rows is set to).
> >
> > The key here is that most of the Solr index is kept in
> > MMapDirecotry/OS space, see Uwe's excellent blog here:
> > http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html.
> > In terms of _searching_, very little of the Lucene index structures
> > are kept in memory.
> >
> > That said, faceting plays a bit loose with the rules. If you have
> > docValues set to true, most of the memory structures are in the OS
> > memory space, not the JVM. If you have docValues set to false, then
> > the "uninverted" structure is built in the JVM heap space.
> >
> > Additionally, the JVM requirements are sensitive to the number of
> > unique values in field being faceted on. For instance, let's say you
> > faceted by a date field with just facet.field=some_date_field. A
> > bucket would have to be allocated to hold the counts for each and
> > every unique date field, i.e. one for each millisecond in your search,
> > which might be something you're seeing. Conceptually this is just an
> > array[uniqueValues] of ints (longs? I'm not sure). This should be
> > relatively easily testable by omitting the facets while measuring.
> >
> > Where the number of rows _does_ make a difference is in the return
> > packet. Say I have rows=10. In that case I create a single return
> > packet with all 10 docs "fl" field. If rows = 10,000 then that return
> > packet is obviously 1,000 times as large and must be assembled in
> > memory.
> >
> > I rather doubt the phonetic filter is to blame. But you can test this
> > by just omitting the field containing the phonetic filter in the
> > search query. I've certainly been wrong before.....
> >
> > Best,
> > Erick
> >
> > On Fri, Dec 1, 2017 at 2:31 PM, Dominique Bejean
> > <[hidden email]> wrote:
> > > Hi,
> > >
> > >
> > > Thank you both for your responses.
> > >
> > >
> > > I just have solr log for the very last period of the CG log.
> > >
> > >
> > > Grep command allows me to count queries per minute with hits > 1000 or
> >
> > > 10000 and so with the biggest impact on memory and cpu during faceting
> > >
> > >
> > >> 1000
> > >
> > >      59 11:13
> > >
> > >      45 11:14
> > >
> > >      36 11:15
> > >
> > >      45 11:16
> > >
> > >      59 11:17
> > >
> > >      40 11:18
> > >
> > >      95 11:19
> > >
> > >     123 11:20
> > >
> > >     137 11:21
> > >
> > >     123 11:22
> > >
> > >      86 11:23
> > >
> > >      26 11:24
> > >
> > >      19 11:25
> > >
> > >      17 11:26
> > >
> > >
> > >> 10000
> > >
> > >      55 11:19
> > >
> > >      78 11:20
> > >
> > >      48 11:21
> > >
> > >     134 11:22
> > >
> > >      93 11:23
> > >
> > >      10 11:24
> > >
> > >
> > > So we see that at the time GC start become nuts, large result set count
> > > increase.
> > >
> > >
> > > The query field include phonetic filter and results are really not
> > relevant
> > > due to this. I will suggest to :
> > >
> > >
> > > 1/ remove the phonetic filter in order to have less irrelevant results
> > and
> > > so get smaller result set
> > >
> > > 2/ enable docValues on field use for faceting
> > >
> > >
> > > I expect decrease GC requirements and stabilize GC.
> > >
> > >
> > > Regards
> > >
> > >
> > > Dominique
> > >
> > >
> > >
> > >
> > >
> > > Le ven. 1 déc. 2017 à 18:17, Erick Erickson <[hidden email]>
> a
> > > écrit :
> > >
> > >> Your autowarm counts are rather high, bit as Toke says this doesn't
> > >> seem outrageous.
> > >>
> > >> I have seen situations where Solr is running close to the limits of
> > >> its heap and GC only reclaims a tiny bit of memory each time, when you
> > >> say "full GC with no memory
> > >> reclaimed" is that really no memory _at all_? Or "almost no memory"?
> > >> This situation can be alleviated by allocating more memory to the JVM
> > >> .
> > >>
> > >> Your JVM pressure would certainly be reduced by enabling docValues on
> > >> any field you sort,facet or group on. That would require a full
> > >> reindex of course. Note that this makes your index on disk bigger, but
> > >> reduces JVM pressure by roughly the same amount so it's a win in this
> > >> situation.
> > >>
> > >> Have you attached a memory profiler to the running Solr instance? I'd
> > >> be curious where the memory is being allocated.
> > >>
> > >> Best,
> > >> Erick
> > >>
> > >> On Fri, Dec 1, 2017 at 8:31 AM, Toke Eskildsen <[hidden email]> wrote:
> > >> > Dominique Bejean <[hidden email]> wrote:
> > >> >> We are encountering issue with GC.
> > >> >
> > >> >> Randomly nearly once a day there are consecutive full GC with no
> > memory
> > >> >> reclaimed.
> > >> >
> > >> > [... 1.2M docs, Xmx 6GB ...]
> > >> >
> > >> >> Gceasy suggest to increase heap size, but I do not agree
> > >> >
> > >> > It does seem strange, with your apparently modest index & workload.
> > >> Nothing you say sounds problematic to me and you have covered the
> usual
> > >> culprits overlapping searchers, faceting and filterCache.
> > >> >
> > >> > Is it possible for you to share the solr.log around the two times
> that
> > >> memory usage peaked? 2017-11-30 17:00-19:00 and 2017-12-01
> 08:00-12:00.
> > >> >
> > >> > If you cannot share, please check if you have excessive traffic
> around
> > >> that time or if there is a lot of UnInverting going on (triggered by
> > >> faceting on non.DocValues String fields). I know your post implies
> that
> > you
> > >> have already done so, so this is more of a sanity check.
> > >> >
> > >> >
> > >> > - Toke Eskildsen
> > >>
> > > --
> > > Dominique Béjean
> > > 06 08 46 12 43
> >
> --
> Dominique Béjean
> 06 08 46 12 43
>
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Yonik Seeley
On Sat, Dec 2, 2017 at 8:59 PM, S G <[hidden email]> wrote:
> I am a bit curious on the docValues implementation.
> I understand that docValues do not use JVM memory and
> they make use of OS cache - that is why they are more performant.
>
> But to return any response from the docValues, the values in the
> docValues' column-oriented-structures would need to be brought
> into the JVM's memory. And that will then increase the pressure
> on the JVM's memory anyways. So how do docValues actually
> help from memory perspective?

docValues are not more performant than the on-heap fieldCache once a
fieldCache entry has been populated.
docValues do help with memory in a number of ways:
1) off-heap memory use degrades much more gracefully... heap memory
will just cause an OOM exception when heap size is exceeded
2) off-heap memory (memory used by OS cache) can be dynamically shared
with other processes running on the system (the OS allocates as
needed)
3) easier to configure (as opposed to max-heap size) since the OS will
just automatically use all free memory to cache relevant parts of
memory mapped disk files
4) off-heap memory does not need to be garbage collected (helps with
people that have huge GC pauses)

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

Shawn Heisey-2
In reply to this post by S G
On 12/2/2017 6:59 PM, S G wrote:
> I am a bit curious on the docValues implementation.
> I understand that docValues do not use JVM memory and
> they make use of OS cache - that is why they are more performant.
>
> But to return any response from the docValues, the values in the
> docValues' column-oriented-structures would need to be brought
> into the JVM's memory. And that will then increase the pressure
> on the JVM's memory anyways. So how do docValues actually
> help from memory perspective?

What I'm writing below is my understanding of docValues.  If it turns
out that I've gotten any of it wrong, that is MY error, not Solr's.

When there are no docValues, Solr must do something called "uninverting
the index" in order to satisfy certain operations -- primarily faceting,
grouping, and sorting.

A Lucene index is an inverted index.  This means that it is a big list
of terms, and then each of those entries is a second list that describes
which fields and documents have the term, as well as some other
information like positions.  Uninverting the index is pretty efficient,
but it does take time.  The uninverted index structure is a list of all
terms for a specific field.  Then there's a second phase -- the info in
the uninverted field is read and processed for the query operation,
which will use heap.  I do not know if there are additional phases.
There might be.

In case you don't know, in the Lucene index, docValues data on disk
consists of every entry in the index for one field, written sequentially
in an uncompressed format.

This means that for those query types, docValues is *exactly* what Solr
needs for the first phase.  And instead of generating it into heap
memory and then reading it, Solr can just read the data right off the
disk (which the OS caches, so it might be REALLY fast and use OS memory)
in order to handle second and later phases.  This is faster than
building an uninverted field, and consumes no heap memory.

As I mentioned, the uninverted data is built from indexed terms.  The
contents of docValue data is the same as a stored field -- the original
indexed data.  Because docValues cannot be added to fields using
solr.TextField, the only type that undergoes text analysis, there's no
possibility of a difference between an uninverted field and docValues.

Thanks,
Shawn
S G
Reply | Threaded
Open this post in threaded view
|

Re: JVM GC Issue

S G
I think the below article explains it well:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

I was thinking that doc-Values need to be transitioned into JVM from the OS
cache.
Turns out that is not required as the docValues are loaded into the virtual
address space by the OS.
The JVM need not think about loading them into its own memory as it can
just access the virtual memory as easily.
The OS keeps track of whether the docValues should be loaded into memory
(if their address is actually being accessed by the JVM) or they just keep
lying on the disk.


Thx - SG

On Sun, Dec 3, 2017 at 12:02 PM, Shawn Heisey <[hidden email]> wrote:

> On 12/2/2017 6:59 PM, S G wrote:
>
>> I am a bit curious on the docValues implementation.
>> I understand that docValues do not use JVM memory and
>> they make use of OS cache - that is why they are more performant.
>>
>> But to return any response from the docValues, the values in the
>> docValues' column-oriented-structures would need to be brought
>> into the JVM's memory. And that will then increase the pressure
>> on the JVM's memory anyways. So how do docValues actually
>> help from memory perspective?
>>
>
> What I'm writing below is my understanding of docValues.  If it turns out
> that I've gotten any of it wrong, that is MY error, not Solr's.
>
> When there are no docValues, Solr must do something called "uninverting
> the index" in order to satisfy certain operations -- primarily faceting,
> grouping, and sorting.
>
> A Lucene index is an inverted index.  This means that it is a big list of
> terms, and then each of those entries is a second list that describes which
> fields and documents have the term, as well as some other information like
> positions.  Uninverting the index is pretty efficient, but it does take
> time.  The uninverted index structure is a list of all terms for a specific
> field.  Then there's a second phase -- the info in the uninverted field is
> read and processed for the query operation, which will use heap.  I do not
> know if there are additional phases. There might be.
>
> In case you don't know, in the Lucene index, docValues data on disk
> consists of every entry in the index for one field, written sequentially in
> an uncompressed format.
>
> This means that for those query types, docValues is *exactly* what Solr
> needs for the first phase.  And instead of generating it into heap memory
> and then reading it, Solr can just read the data right off the disk (which
> the OS caches, so it might be REALLY fast and use OS memory) in order to
> handle second and later phases.  This is faster than building an uninverted
> field, and consumes no heap memory.
>
> As I mentioned, the uninverted data is built from indexed terms.  The
> contents of docValue data is the same as a stored field -- the original
> indexed data.  Because docValues cannot be added to fields using
> solr.TextField, the only type that undergoes text analysis, there's no
> possibility of a difference between an uninverted field and docValues.
>
> Thanks,
> Shawn
>