Replica is going into recovery in Solr 6.1.0

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

Replica is going into recovery in Solr 6.1.0

vishal patel-2
I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 2 shards and each shard has 1 replica. Suddenly one replica is going into recovery mode and Requests become slow in our production.
I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that time and also multiple times minor GC pauses.

My logs :
https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing

I do not know why long GC pause time happened. In our platform heavy searching and indexing is performed.
long GC pause times happen due to searching or indexing?
If GC pause time long then why replica is going into recovery? can we set the waiting time of update request?
what is the minimum GC pause time for going into recovery mode?

It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310

Regards,
Vishal Patel

Sent from Outlook<http://aka.ms/weboutlook>
Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

vishal patel-2
Is there anyone looking at this?

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: vishal patel <[hidden email]>
Sent: Wednesday, February 12, 2020 3:45 PM
To: [hidden email] <[hidden email]>
Subject: Replica is going into recovery in Solr 6.1.0

I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 2 shards and each shard has 1 replica. Suddenly one replica is going into recovery mode and Requests become slow in our production.
I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that time and also multiple times minor GC pauses.

My logs :
https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing

I do not know why long GC pause time happened. In our platform heavy searching and indexing is performed.
long GC pause times happen due to searching or indexing?
If GC pause time long then why replica is going into recovery? can we set the waiting time of update request?
what is the minimum GC pause time for going into recovery mode?

It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310

Regards,
Vishal Patel

Sent from Outlook<http://aka.ms/weboutlook>
Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

Rajdeep Sahoo
What is your memory configuration

On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <[hidden email]>
wrote:

> Is there anyone looking at this?
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: vishal patel <[hidden email]>
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: [hidden email] <[hidden email]>
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
> have 2 shards and each shard has 1 replica. Suddenly one replica is going
> into recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
> that time and also multiple times minor GC pauses.
>
> My logs :
>
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy
> searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set
> the waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? :
> https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook<http://aka.ms/weboutlook>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

Walter Underwood
In reply to this post by vishal patel-2
Your JVM had very bad GC trouble. The 5 second GCs were enough to cause problems. The one minute GC is really, really bad. I’m not surprised the replica went down.

Look at the graphs for memory usage in the new and old spaces. It looks like it ran out. Maybe the heap is too small, but it might be something else.

What GC are you using?

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 12, 2020, at 8:16 PM, vishal patel <[hidden email]> wrote:
>
> Is there anyone looking at this?
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: vishal patel <[hidden email]>
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: [hidden email] <[hidden email]>
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 2 shards and each shard has 1 replica. Suddenly one replica is going into recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that time and also multiple times minor GC pauses.
>
> My logs :
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set the waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook<http://aka.ms/weboutlook>

Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

vishal patel-2
In reply to this post by Rajdeep Sahoo
My configuration:

-XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 -XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 -XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer -XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 -XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 -XX:+UseG1GC -XX:-UseLargePages -XX:-UseLargePagesIndividualAllocation -XX:+UseStringDeduplication

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: Rajdeep Sahoo <[hidden email]>
Sent: Thursday, February 13, 2020 10:03 AM
To: [hidden email] <[hidden email]>
Subject: Re: Replica is going into recovery in Solr 6.1.0

What is your memory configuration

On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <[hidden email]>
wrote:

> Is there anyone looking at this?
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: vishal patel <[hidden email]>
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: [hidden email] <[hidden email]>
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
> have 2 shards and each shard has 1 replica. Suddenly one replica is going
> into recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
> that time and also multiple times minor GC pauses.
>
> My logs :
>
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy
> searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set
> the waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? :
> https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook<http://aka.ms/weboutlook>
>
Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

vishal patel-2
In reply to this post by Walter Underwood
What GC are you using? -- G1GC

Sent from Outlook<http://aka.ms/weboutlook>

________________________________
From: Walter Underwood <[hidden email]>
Sent: Thursday, February 13, 2020 11:09 AM
To: [hidden email] <[hidden email]>
Subject: Re: Replica is going into recovery in Solr 6.1.0

Your JVM had very bad GC trouble. The 5 second GCs were enough to cause problems. The one minute GC is really, really bad. I’m not surprised the replica went down.

Look at the graphs for memory usage in the new and old spaces. It looks like it ran out. Maybe the heap is too small, but it might be something else.

What GC are you using?

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 12, 2020, at 8:16 PM, vishal patel <[hidden email]> wrote:
>
> Is there anyone looking at this?
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: vishal patel <[hidden email]>
> Sent: Wednesday, February 12, 2020 3:45 PM
> To: [hidden email] <[hidden email]>
> Subject: Replica is going into recovery in Solr 6.1.0
>
> I am using solr version 6.1.0, Java 8 version and G1gc on production. We have 2 shards and each shard has 1 replica. Suddenly one replica is going into recovery mode and Requests become slow in our production.
> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on that time and also multiple times minor GC pauses.
>
> My logs :
> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>
> I do not know why long GC pause time happened. In our platform heavy searching and indexing is performed.
> long GC pause times happen due to searching or indexing?
> If GC pause time long then why replica is going into recovery? can we set the waiting time of update request?
> what is the minimum GC pause time for going into recovery mode?
>
> It is useful for my problem? : https://issues.apache.org/jira/browse/SOLR-9310
>
> Regards,
> Vishal Patel
>
> Sent from Outlook<http://aka.ms/weboutlook>

Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

Walter Underwood
In reply to this post by vishal patel-2
You have a 64GB heap. That is extremely unusual. You can only do that if the instance has 80 GB or more of RAM. If you don’t have enough RAM, the JVM will start using swap space and cause extremely long GC pauses.

How much RAM do you have?

How did you choose these GC settings?

We have been using these settings with Java 8 in prod for three years with no GC problems.

SOLR_HEAP=8g
# Use G1 GC  -- wunder 2017-01-23
# Settings from https://wiki.apache.org/solr/ShawnHeisey
GC_TUNE=" \
-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts \


If you don’t have a very, very good reason for your GC settings, use these instead.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 12, 2020, at 10:47 PM, vishal patel <[hidden email]> wrote:
>
> My configuration:
>
> -XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 -XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 -XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer -XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 -XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 -XX:+UseG1GC -XX:-UseLargePages -XX:-UseLargePagesIndividualAllocation -XX:+UseStringDeduplication
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: Rajdeep Sahoo <[hidden email]>
> Sent: Thursday, February 13, 2020 10:03 AM
> To: [hidden email] <[hidden email]>
> Subject: Re: Replica is going into recovery in Solr 6.1.0
>
> What is your memory configuration
>
> On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <[hidden email]>
> wrote:
>
>> Is there anyone looking at this?
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>> ________________________________
>> From: vishal patel <[hidden email]>
>> Sent: Wednesday, February 12, 2020 3:45 PM
>> To: [hidden email] <[hidden email]>
>> Subject: Replica is going into recovery in Solr 6.1.0
>>
>> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
>> have 2 shards and each shard has 1 replica. Suddenly one replica is going
>> into recovery mode and Requests become slow in our production.
>> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
>> that time and also multiple times minor GC pauses.
>>
>> My logs :
>>
>> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>>
>> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>>
>> I do not know why long GC pause time happened. In our platform heavy
>> searching and indexing is performed.
>> long GC pause times happen due to searching or indexing?
>> If GC pause time long then why replica is going into recovery? can we set
>> the waiting time of update request?
>> what is the minimum GC pause time for going into recovery mode?
>>
>> It is useful for my problem? :
>> https://issues.apache.org/jira/browse/SOLR-9310
>>
>> Regards,
>> Vishal Patel
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

Erick Erickson
What Walter said. Also, you _must_ leave quite a bit of free RAM for the OS due to Lucene using MMapDirectory space, see:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Basically until you can get your GC pauses under control, you’ll have an unstable collection.

How big are your indexes on disk? How many docs per replica? How many replicas per host?

Best,
Erick

> On Feb 13, 2020, at 5:16 PM, Walter Underwood <[hidden email]> wrote:
>
> You have a 64GB heap. That is extremely unusual. You can only do that if the instance has 80 GB or more of RAM. If you don’t have enough RAM, the JVM will start using swap space and cause extremely long GC pauses.
>
> How much RAM do you have?
>
> How did you choose these GC settings?
>
> We have been using these settings with Java 8 in prod for three years with no GC problems.
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> “
>
> If you don’t have a very, very good reason for your GC settings, use these instead.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>> On Feb 12, 2020, at 10:47 PM, vishal patel <[hidden email]> wrote:
>>
>> My configuration:
>>
>> -XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 -XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 -XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer -XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 -XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 -XX:+UseG1GC -XX:-UseLargePages -XX:-UseLargePagesIndividualAllocation -XX:+UseStringDeduplication
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>> ________________________________
>> From: Rajdeep Sahoo <[hidden email]>
>> Sent: Thursday, February 13, 2020 10:03 AM
>> To: [hidden email] <[hidden email]>
>> Subject: Re: Replica is going into recovery in Solr 6.1.0
>>
>> What is your memory configuration
>>
>> On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <[hidden email]>
>> wrote:
>>
>>> Is there anyone looking at this?
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>> ________________________________
>>> From: vishal patel <[hidden email]>
>>> Sent: Wednesday, February 12, 2020 3:45 PM
>>> To: [hidden email] <[hidden email]>
>>> Subject: Replica is going into recovery in Solr 6.1.0
>>>
>>> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
>>> have 2 shards and each shard has 1 replica. Suddenly one replica is going
>>> into recovery mode and Requests become slow in our production.
>>> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
>>> that time and also multiple times minor GC pauses.
>>>
>>> My logs :
>>>
>>> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>>>
>>> I do not know why long GC pause time happened. In our platform heavy
>>> searching and indexing is performed.
>>> long GC pause times happen due to searching or indexing?
>>> If GC pause time long then why replica is going into recovery? can we set
>>> the waiting time of update request?
>>> what is the minimum GC pause time for going into recovery mode?
>>>
>>> It is useful for my problem? :
>>> https://issues.apache.org/jira/browse/SOLR-9310
>>>
>>> Regards,
>>> Vishal Patel
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

vishal patel-2
Total memory of server is 256 GB and in this server below application running
Application1             50 GB
Application2             30 GB
Application3               8 GB
Application4               2 GB
Solr shard1                64 GB
Solr shard2 replica   64 GB

Note: Solr shard2 and shard1 replica running on another server. Normally 35 to 40 GB memory constant usage in one solr instance so we keep the 64 GB. We are using NRT.

How big are your indexes on disk? - [Shard1-115 Gb, shard2 replica-96 GB] [shard1 replica-114 GB, shard2-100GB]
How many docs per replica?            - Approx 30959714 docs
How many replicas per host?           - One server has one shard and one replica.

Regards,
Vishal

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: Erick Erickson <[hidden email]>
Sent: Friday, February 14, 2020 4:00 AM
To: [hidden email] <[hidden email]>
Subject: Re: Replica is going into recovery in Solr 6.1.0

What Walter said. Also, you _must_ leave quite a bit of free RAM for the OS due to Lucene using MMapDirectory space, see:

https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

Basically until you can get your GC pauses under control, you’ll have an unstable collection.

How big are your indexes on disk? How many docs per replica? How many replicas per host?

Best,
Erick

> On Feb 13, 2020, at 5:16 PM, Walter Underwood <[hidden email]> wrote:
>
> You have a 64GB heap. That is extremely unusual. You can only do that if the instance has 80 GB or more of RAM. If you don’t have enough RAM, the JVM will start using swap space and cause extremely long GC pauses.
>
> How much RAM do you have?
>
> How did you choose these GC settings?
>
> We have been using these settings with Java 8 in prod for three years with no GC problems.
>
> SOLR_HEAP=8g
> # Use G1 GC  -- wunder 2017-01-23
> # Settings from https://wiki.apache.org/solr/ShawnHeisey
> GC_TUNE=" \
> -XX:+UseG1GC \
> -XX:+ParallelRefProcEnabled \
> -XX:G1HeapRegionSize=8m \
> -XX:MaxGCPauseMillis=200 \
> -XX:+UseLargePages \
> -XX:+AggressiveOpts \
> “
>
> If you don’t have a very, very good reason for your GC settings, use these instead.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>> On Feb 12, 2020, at 10:47 PM, vishal patel <[hidden email]> wrote:
>>
>> My configuration:
>>
>> -XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 -XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 -XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer -XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 -XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 -XX:+UseG1GC -XX:-UseLargePages -XX:-UseLargePagesIndividualAllocation -XX:+UseStringDeduplication
>>
>> Sent from Outlook<http://aka.ms/weboutlook>
>> ________________________________
>> From: Rajdeep Sahoo <[hidden email]>
>> Sent: Thursday, February 13, 2020 10:03 AM
>> To: [hidden email] <[hidden email]>
>> Subject: Re: Replica is going into recovery in Solr 6.1.0
>>
>> What is your memory configuration
>>
>> On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <[hidden email]>
>> wrote:
>>
>>> Is there anyone looking at this?
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>> ________________________________
>>> From: vishal patel <[hidden email]>
>>> Sent: Wednesday, February 12, 2020 3:45 PM
>>> To: [hidden email] <[hidden email]>
>>> Subject: Replica is going into recovery in Solr 6.1.0
>>>
>>> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
>>> have 2 shards and each shard has 1 replica. Suddenly one replica is going
>>> into recovery mode and Requests become slow in our production.
>>> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
>>> that time and also multiple times minor GC pauses.
>>>
>>> My logs :
>>>
>>> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>>>
>>> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>>>
>>> I do not know why long GC pause time happened. In our platform heavy
>>> searching and indexing is performed.
>>> long GC pause times happen due to searching or indexing?
>>> If GC pause time long then why replica is going into recovery? can we set
>>> the waiting time of update request?
>>> what is the minimum GC pause time for going into recovery mode?
>>>
>>> It is useful for my problem? :
>>> https://issues.apache.org/jira/browse/SOLR-9310
>>>
>>> Regards,
>>> Vishal Patel
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: Replica is going into recovery in Solr 6.1.0

Walter Underwood
I don’t see anything in your description that requires a large heap. This is a terrible JVM configuration.

Do this:

* Use the GC configuration I recommended, with an 8 GB heap.
* Run one copy of Solr. That hosts both shard1 and shard1.

That increases the RAM available for OS and file buffers from 38 GB to 158 GB.

Solr is slow when the index must be fetched from disk. It is fast when the index can be cached in RAM file buffers. Assuming 2 GB for OS and other demons, right now you can fit 17% of the two indexes in file buffers. With a single smaller JVM, you can fit 74% of the indexes into RAM. This should make a huge speed difference. You’ll also see GC pauses of 200 ms or less.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 13, 2020, at 9:40 PM, vishal patel <[hidden email]> wrote:
>
> Total memory of server is 256 GB and in this server below application running
> Application1             50 GB
> Application2             30 GB
> Application3               8 GB
> Application4               2 GB
> Solr shard1                64 GB
> Solr shard2 replica   64 GB
>
> Note: Solr shard2 and shard1 replica running on another server. Normally 35 to 40 GB memory constant usage in one solr instance so we keep the 64 GB. We are using NRT.
>
> How big are your indexes on disk? - [Shard1-115 Gb, shard2 replica-96 GB] [shard1 replica-114 GB, shard2-100GB]
> How many docs per replica?            - Approx 30959714 docs
> How many replicas per host?           - One server has one shard and one replica.
>
> Regards,
> Vishal
>
> Sent from Outlook<http://aka.ms/weboutlook>
> ________________________________
> From: Erick Erickson <[hidden email]>
> Sent: Friday, February 14, 2020 4:00 AM
> To: [hidden email] <[hidden email]>
> Subject: Re: Replica is going into recovery in Solr 6.1.0
>
> What Walter said. Also, you _must_ leave quite a bit of free RAM for the OS due to Lucene using MMapDirectory space, see:
>
> https://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html
>
> Basically until you can get your GC pauses under control, you’ll have an unstable collection.
>
> How big are your indexes on disk? How many docs per replica? How many replicas per host?
>
> Best,
> Erick
>
>> On Feb 13, 2020, at 5:16 PM, Walter Underwood <[hidden email]> wrote:
>>
>> You have a 64GB heap. That is extremely unusual. You can only do that if the instance has 80 GB or more of RAM. If you don’t have enough RAM, the JVM will start using swap space and cause extremely long GC pauses.
>>
>> How much RAM do you have?
>>
>> How did you choose these GC settings?
>>
>> We have been using these settings with Java 8 in prod for three years with no GC problems.
>>
>> SOLR_HEAP=8g
>> # Use G1 GC  -- wunder 2017-01-23
>> # Settings from https://wiki.apache.org/solr/ShawnHeisey
>> GC_TUNE=" \
>> -XX:+UseG1GC \
>> -XX:+ParallelRefProcEnabled \
>> -XX:G1HeapRegionSize=8m \
>> -XX:MaxGCPauseMillis=200 \
>> -XX:+UseLargePages \
>> -XX:+AggressiveOpts \
>> “
>>
>> If you don’t have a very, very good reason for your GC settings, use these instead.
>>
>> wunder
>> Walter Underwood
>> [hidden email]
>> http://observer.wunderwood.org/  (my blog)
>>
>>> On Feb 12, 2020, at 10:47 PM, vishal patel <[hidden email]> wrote:
>>>
>>> My configuration:
>>>
>>> -XX:+AggressiveOpts -XX:ConcGCThreads=12 -XX:G1HeapRegionSize=33554432 -XX:G1ReservePercent=20 -XX:InitialHeapSize=68719476736 -XX:InitiatingHeapOccupancyPercent=10 -XX:+ManagementServer -XX:MaxHeapSize=68719476736 -XX:ParallelGCThreads=36 -XX:+ParallelRefProcEnabled -XX:PrintFLSStatistics=1 -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:ThreadStackSize=256 -XX:+UseG1GC -XX:-UseLargePages -XX:-UseLargePagesIndividualAllocation -XX:+UseStringDeduplication
>>>
>>> Sent from Outlook<http://aka.ms/weboutlook>
>>> ________________________________
>>> From: Rajdeep Sahoo <[hidden email]>
>>> Sent: Thursday, February 13, 2020 10:03 AM
>>> To: [hidden email] <[hidden email]>
>>> Subject: Re: Replica is going into recovery in Solr 6.1.0
>>>
>>> What is your memory configuration
>>>
>>> On Thu, 13 Feb, 2020, 9:46 AM vishal patel, <[hidden email]>
>>> wrote:
>>>
>>>> Is there anyone looking at this?
>>>>
>>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>> ________________________________
>>>> From: vishal patel <[hidden email]>
>>>> Sent: Wednesday, February 12, 2020 3:45 PM
>>>> To: [hidden email] <[hidden email]>
>>>> Subject: Replica is going into recovery in Solr 6.1.0
>>>>
>>>> I am using solr version 6.1.0, Java 8 version and G1gc on production. We
>>>> have 2 shards and each shard has 1 replica. Suddenly one replica is going
>>>> into recovery mode and Requests become slow in our production.
>>>> I have analyzed that minor GC max pause time was 1 min 6 sec 800 ms on
>>>> that time and also multiple times minor GC pauses.
>>>>
>>>> My logs :
>>>>
>>>> https://drive.google.com/file/d/158z3nzLsnHGouyRnXgfzCjwD4iadgKSp/view?usp=sharing
>>>>
>>>> https://drive.google.com/file/d/1E4jyffvIWVJB7EeEMXBXyqaK2ZfAA8kk/view?usp=sharing
>>>>
>>>> I do not know why long GC pause time happened. In our platform heavy
>>>> searching and indexing is performed.
>>>> long GC pause times happen due to searching or indexing?
>>>> If GC pause time long then why replica is going into recovery? can we set
>>>> the waiting time of update request?
>>>> what is the minimum GC pause time for going into recovery mode?
>>>>
>>>> It is useful for my problem? :
>>>> https://issues.apache.org/jira/browse/SOLR-9310
>>>>
>>>> Regards,
>>>> Vishal Patel
>>>>
>>>> Sent from Outlook<http://aka.ms/weboutlook>
>>>>
>>
>