Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

vishal patel-2

We have 2 shard and 2 replicas in production server.Somehow replica1 became leader when some commit process was running in shard1.
Log ::

***shard1***
2019-04-08 12:52:09.930 INFO  (searcherExecutor-30-thread-1-processing-n:shard1:8983_solr x:productData s:shard1 c:productData r:core_node1) [c:productData s:shard1 r:core_node1 x:productData] o.a.s.c.QuerySenderListener QuerySenderListener done.
2019-04-08 12:54:01.397 INFO  (qtp1239731077-1359101) [c:product s:shard1 r:core_node1 x:product] o.a.s.u.p.LogUpdateProcessorFactory [product]  webapp=/solr path=/update params={wt=javabin&version=2}{add=[PRO23241768 (1630250393598427136)]} 0 111711

***replica1***
2019-04-08 12:52:09.581 INFO  (qtp1239731077-1021605) [c:product s:shard1 r:core_node3 x:product] o.a.s.u.p.LogUpdateProcessorFactory [product]  webapp=/solr path=/update params={update.distrib=FROMLEADER&distrib.from=shard1:8983/solr/product/&wt=javabin&version=2}{add=[PRO23241768 (1630250393598427136)]} 0 0
2019-04-08 12:52:19.717 INFO  (zkCallback-4-thread-207-processing-n:replica1:8983_solr) [   ] o.a.s.c.c.ZkStateReader A live node change: [WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/live_nodes], has occurred - updating... (live nodes size: [4])

PRO23241768 was successfully updated at time 12:52:09.581 in replica1 but updated time was 12:54:01.397 in shard1. It took around 1.86(111711) minutes. In between replica1 tried to become a leader at time 12:52:19.717 and it became successfully.

My production solr.xml
<int name="zkClientTimeout">${zkClientTimeout:600000}</int>
<int name="distribUpdateSoTimeout">${distribUpdateSoTimeout:600000}</int>
<int name="distribUpdateConnTimeout">${distribUpdateConnTimeout:60000}</int>

<shardHandlerFactory name="shardHandlerFactory" class="HttpShardHandlerFactory">
 <int name="socketTimeout">${socketTimeout:600000}</int>
 <int name="connTimeout">${connTimeout:60000}</int>
</shardHandlerFactory>

Collection : product and productData.

Version ::
solr  : 6.1.0
Zoo keeper : 3.4.6


Why did shard1 take a 1.8 minutes time for update? and if it took time for update then why did replica1 try to become leader? Is it required to update any timeout?

Note : PRO23241768 was soft commit and log was info level.
Reply | Threaded
Open this post in threaded view
|

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Shawn Heisey-2
On 4/17/2019 6:25 AM, vishal patel wrote:
> Why did shard1 take a 1.8 minutes time for update? and if it took time for update then why did replica1 try to become leader? Is it required to update any timeout?

There's no information here that can tell us why the update took so
long.  My best guess would be long GC pauses due to the heap size being
too small.  But there might be other causes.

Indexing a single document should be VERY fast.  Even a large document
should only take a handful of milliseconds.

If the request included "commit=true" as a parameter, then it might be
the commit that was slow, not the indexing.  You'll need to check the
logs to determine that.

The reason that the leader changed was almost certainly the fact that
the update took so long.  SolrCloud would have decided that the node was
down if any operation took that long.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Erick Erickson
Specifically a _leader_ being put into the down or recovering state is almost always because ZooKeeper cannot ping it and get a response back before it times out. This also points to large GC pauses no the Solr node. Using something like GCViewer on the GC logs at the time of the problem will help a lot.

A _follower_ can go into recovery when an update takes too long but that’s “leader initiated recovery” and originates _from_ the leader, which is much different than the leader going into a down state.

Best,
Erick

> On Apr 17, 2019, at 7:54 AM, Shawn Heisey <[hidden email]> wrote:
>
> On 4/17/2019 6:25 AM, vishal patel wrote:
>> Why did shard1 take a 1.8 minutes time for update? and if it took time for update then why did replica1 try to become leader? Is it required to update any timeout?
>
> There's no information here that can tell us why the update took so long.  My best guess would be long GC pauses due to the heap size being too small.  But there might be other causes.
>
> Indexing a single document should be VERY fast.  Even a large document should only take a handful of milliseconds.
>
> If the request included "commit=true" as a parameter, then it might be the commit that was slow, not the indexing.  You'll need to check the logs to determine that.
>
> The reason that the leader changed was almost certainly the fact that the update took so long.  SolrCloud would have decided that the node was down if any operation took that long.
>
> Thanks,
> Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

vishal patel-2
Thanks for your reply.

You are right. I checked GC log and use of GC Viewer I noticed that pause time was 111.4546597 secs.

GC Log :

2019-04-08T13:52:09.198+0100: 796799.689: [CMS-concurrent-mark: 1.676/30.552 secs] [Times: user=93.42 sys=34.11, real=30.55 secs]
2019-04-08T13:52:09.198+0100: 796799.689: [CMS-concurrent-preclean-start]
2019-04-08T13:52:09.603+0100: 796800.094: [CMS-concurrent-preclean: 0.387/0.405 secs] [Times: user=8.47 sys=1.13, real=0.40 secs]
2019-04-08T13:52:09.603+0100: 796800.095: [CMS-concurrent-abortable-preclean-start]
{Heap before GC invocations=112412 (full 55591):
 par new generation   total 13107200K, used 11580169K [0x0000000080000000, 0x0000000440000000, 0x0000000440000000)
  eden space 10485760K, 100% used [0x0000000080000000, 0x0000000300000000, 0x0000000300000000)
  from space 2621440K,  41% used [0x0000000300000000, 0x0000000342cc2600, 0x00000003a0000000)
  to   space 2621440K,   0% used [0x00000003a0000000, 0x00000003a0000000, 0x0000000440000000)
 concurrent mark-sweep generation total 47185920K, used 28266850K [0x0000000440000000, 0x0000000f80000000, 0x0000000f80000000)
 Metaspace       used 49763K, capacity 50614K, committed 53408K, reserved 55296K
2019-04-08T13:52:09.939+0100: 796800.430: [GC (Allocation Failure) 796800.431: [ParNew
Desired survivor size 2415919104 bytes, new threshold 8 (max 8)
- age   1:  197413992 bytes,  197413992 total
- age   2:  170743472 bytes,  368157464 total
- age   3:  218531128 bytes,  586688592 total
- age   4:    3636992 bytes,  590325584 total
- age   5:   18608784 bytes,  608934368 total
- age   6:  163869560 bytes,  772803928 total
- age   7:   55349616 bytes,  828153544 total
- age   8:    5124472 bytes,  833278016 total
: 11580169K->985493K(13107200K), 111.4543849 secs] 39847019K->29253720K(60293120K), 111.4546597 secs] [Times: user=302.38 sys=109.81, real=111.46 secs]
Heap after GC invocations=112413 (full 55591):
 par new generation   total 13107200K, used 985493K [0x0000000080000000, 0x0000000440000000, 0x0000000440000000)
  eden space 10485760K,   0% used [0x0000000080000000, 0x0000000080000000, 0x0000000300000000)
  from space 2621440K,  37% used [0x00000003a0000000, 0x00000003dc265470, 0x0000000440000000)
  to   space 2621440K,   0% used [0x0000000300000000, 0x0000000300000000, 0x00000003a0000000)
 concurrent mark-sweep generation total 47185920K, used 28268227K [0x0000000440000000, 0x0000000f80000000, 0x0000000f80000000)
 Metaspace       used 49763K, capacity 50614K, committed 53408K, reserved 55296K
}
2019-04-08T13:54:01.394+0100: 796911.885: Total time for which application threads were stopped: 111.4638238 seconds, Stopping threads took: 0.0069189 seconds


May I set any max timeout when GC pause 2 second in Solr.xml or any file of Zoo keeper ? what to do when GC pause time more?

Sent from Outlook<http://aka.ms/weboutlook>
________________________________
From: Erick Erickson <[hidden email]>
Sent: Thursday, April 18, 2019 7:36 AM
To: [hidden email]
Subject: Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Specifically a _leader_ being put into the down or recovering state is almost always because ZooKeeper cannot ping it and get a response back before it times out. This also points to large GC pauses no the Solr node. Using something like GCViewer on the GC logs at the time of the problem will help a lot.

A _follower_ can go into recovery when an update takes too long but that’s “leader initiated recovery” and originates _from_ the leader, which is much different than the leader going into a down state.

Best,
Erick

> On Apr 17, 2019, at 7:54 AM, Shawn Heisey <[hidden email]> wrote:
>
> On 4/17/2019 6:25 AM, vishal patel wrote:
>> Why did shard1 take a 1.8 minutes time for update? and if it took time for update then why did replica1 try to become leader? Is it required to update any timeout?
>
> There's no information here that can tell us why the update took so long.  My best guess would be long GC pauses due to the heap size being too small.  But there might be other causes.
>
> Indexing a single document should be VERY fast.  Even a large document should only take a handful of milliseconds.
>
> If the request included "commit=true" as a parameter, then it might be the commit that was slow, not the indexing.  You'll need to check the logs to determine that.
>
> The reason that the leader changed was almost certainly the fact that the update took so long.  SolrCloud would have decided that the node was down if any operation took that long.
>
> Thanks,
> Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Shawn Heisey-2
On 4/18/2019 1:00 AM, vishal patel wrote:
> Thanks for your reply.
>
> You are right. I checked GC log and use of GC Viewer I noticed that pause time was 111.4546597 secs.

<snip>

> 2019-04-08T13:52:09.939+0100: 796800.430: [GC (Allocation Failure) 796800.431: [ParNew
> Desired survivor size 2415919104 bytes, new threshold 8 (max 8)
> - age   1:  197413992 bytes,  197413992 total
> - age   2:  170743472 bytes,  368157464 total
> - age   3:  218531128 bytes,  586688592 total
> - age   4:    3636992 bytes,  590325584 total
> - age   5:   18608784 bytes,  608934368 total
> - age   6:  163869560 bytes,  772803928 total
> - age   7:   55349616 bytes,  828153544 total
> - age   8:    5124472 bytes,  833278016 total
> : 11580169K->985493K(13107200K), 111.4543849 secs] 39847019K->29253720K(60293120K), 111.4546597 secs] [Times: user=302.38 sys=109.81, real=111.46 secs]

 From another log entry that you sent I can see that your heap size is
over 50 GB.  This is a really really big heap, and the log entry I
included above is probably a full GC.  The GC config that Solr ships
with will try to avoid full GCs ... but if the heap is too small for
what Solr is being asked to do, then they will not be possible to avoid.
  Which I think might be what is happening here.

In this case, it's probably preferable to figure out why the heap has to
be so big and fix that rather than increase the heap size ... but
increasing the heap size might also help, assuming you have any memory
left on the server.

Can you tell me how many documents are being handled by this Solr
instance?  This will be gotten by finding the maxDoc count from all
cores and adding those numbers together.

Can you gather and share the screenshot described on the following wiki
page?

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

I would also like to see the solrconfig.xml from at least one of the
large indexes.

> May I set any max timeout when GC pause 2 second in Solr.xml or any file of Zoo keeper ? what to do when GC pause time more?

A GC pause lasting 111 seconds is an indication of a problem that you
can't fix with increased timeouts.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

vishal patel-2

Can you tell me how many documents are being handled by this Solr 
instance?  This will be gotten by finding the maxDoc count from all 
cores and adding those numbers together.
-- 228634803 maxDoc of one shard [we have 26 collection in production and 2 shards 2 replicas] 

Can you gather and share the screenshot described on the following wiki 
page?
-- I have attached screen shot for memory and CPU of one shard.[PID : 2976]

I have also attached solrconfig.xml of one collection.

From: Shawn Heisey <[hidden email]>
Sent: Thursday, April 18, 2019 8:11 PM
To: [hidden email]
Subject: Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0
 
On 4/18/2019 1:00 AM, vishal patel wrote:
> Thanks for your reply.
>
> You are right. I checked GC log and use of GC Viewer I noticed that pause time was 111.4546597 secs.

<snip>

> 2019-04-08T13:52:09.939+0100: 796800.430: [GC (Allocation Failure) 796800.431: [ParNew
> Desired survivor size 2415919104 bytes, new threshold 8 (max 8)
> - age   1:  197413992 bytes,  197413992 total
> - age   2:  170743472 bytes,  368157464 total
> - age   3:  218531128 bytes,  586688592 total
> - age   4:    3636992 bytes,  590325584 total
> - age   5:   18608784 bytes,  608934368 total
> - age   6:  163869560 bytes,  772803928 total
> - age   7:   55349616 bytes,  828153544 total
> - age   8:    5124472 bytes,  833278016 total
> : 11580169K->985493K(13107200K), 111.4543849 secs] 39847019K->29253720K(60293120K), 111.4546597 secs] [Times: user=302.38 sys=109.81, real=111.46 secs]

 From another log entry that you sent I can see that your heap size is
over 50 GB.  This is a really really big heap, and the log entry I
included above is probably a full GC.  The GC config that Solr ships
with will try to avoid full GCs ... but if the heap is too small for
what Solr is being asked to do, then they will not be possible to avoid.
  Which I think might be what is happening here.

In this case, it's probably preferable to figure out why the heap has to
be so big and fix that rather than increase the heap size ... but
increasing the heap size might also help, assuming you have any memory
left on the server.

Can you tell me how many documents are being handled by this Solr
instance?  This will be gotten by finding the maxDoc count from all
cores and adding those numbers together.

Can you gather and share the screenshot described on the following wiki
page?

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

I would also like to see the solrconfig.xml from at least one of the
large indexes.

> May I set any max timeout when GC pause 2 second in Solr.xml or any file of Zoo keeper ? what to do when GC pause time more?

A GC pause lasting 111 seconds is an indication of a problem that you
can't fix with increased timeouts.

Thanks,
Shawn

solrconfig.xml (61K) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: Replica becomes leader when shard was taking a time to update document - Solr 6.1.0

Shawn Heisey-2
On 4/22/2019 3:19 AM, vishal patel wrote:
> -- 228634803 maxDoc of one shard [we have 26 collection in production
> and 2 shards 2 replicas]

228 million is quite a lot of documents.

> Can you gather and share the screenshot described on the following wiki
> page?

There seem to be two Solr instances on this machine, not one.  You said
there's one shard ... so why would you need more than one Solr instance?

If I'm reading things correctly, it looks like your 228 million
documents are about 30 gigabytes in size.  They must be very small
documents.

> I have also attached solrconfig.xml of one collection.

There are no caches -- they're commented out.  That is the other thing
that might need large amounts of memory.  For instance, the filterCache
that you have commented out ... if a filterCache of size 10240 were
actually there, with 228 million documents, that cache would require
more memory than you have in the server -- for just the cache.

Thoughts after seeing all this:  Even with 228 million documents, it
doesn't seem like you would need a 50GB heap.  But the GC log snippet
seems to indicate that you do, and that even that size might not be big
enough ... so I'm wondering what is needing all that memory.  Are you
doing massively complex queries, like huge facets, or grouping?  If you
are, you might need an even larger heap.  Which I think means that
you're going to need to run only one Solr instance on this machine, not
two, so that you have additional memory available.

Thanks,
Shawn