Three questions about huge tlog problem and CDCR

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Three questions about huge tlog problem and CDCR

Louis
* Environment: Solr Cloud 7.7.0, 3 nodes / CDCR bidirectional / CDCR buffer
disabled

Hello All,

I have some problem with tlog. They are getting bigger and bigger...

They don't seem to be deleted at all even after hard commit, so now the
total size of tlog files is more than 21GB..

Actually I see multiple tlog folders like,

 2.5GB tlog/
 6.7GB tlog.20190815170021077/
 6.7GB tlog.20190316225613751/
 ...

Are they all necessary for recovery? what is the tlog.2019XXXX folders?


Based on my understanding, tlog files are for recovery when graceful
shutdown failed..

1) As long as I stop entire nodes gracefully, is it safe to delete tlog
files manually by using rm -rf ./tlogs?

2) I think that the reason why tlog files are not deleted is because of CDCR
not working properly.. So tlogs just stay forever until being synchronized..
And synchronization never happened and tlogs keep increasing.. Does my
theory make sense?

3) Actually, we set up our replicator element's schdule to 1 hour and
updatelogsynchronizer element to 1 hour as well. Could this be the reason
for why CDCR is not working because of the interval is too long?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Three questions about huge tlog problem and CDCR

Louis
found a typo. correcting "updateLogSynchronizer" is set to 60000(1 min), not
1 hour



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Three questions about huge tlog problem and CDCR

Erick Erickson
This usually indicates that the connection between DCs is broken and one or the other is falling behind.

Note: “bidirectional” does _not_ mean that you can index to both DCs simultaneously, rather than you can switch from indexing in one DC to the other….

Best,
Erick

> On Dec 19, 2019, at 1:01 AM, alwaysbluesky <[hidden email]> wrote:
>
> found a typo. correcting "updateLogSynchronizer" is set to 60000(1 min), not
> 1 hour
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Three questions about huge tlog problem and CDCR

Louis
Thank you for the advice.

By the way, when I upload a new collectin configuration to zookeepr and
enable bidirectional CDCR for the collections on both prod and dr
side(<collection>/cdcr?action=START), and reload the collections, CDCR
usually didn't work. So if I restarted entire nodes in the cluster on both
prod and dr, CDCR started working.

Should I normally restart Solr after enabling/disabling the CDCR? Reloading
the collections without Solr restart is not enough to apply the CDCR change?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Three questions about huge tlog problem and CDCR

Damien Kamerman
Did you run /cdcr?action=DISABLEBUFFER on both sides?


On Fri, 20 Dec 2019 at 05:22, alwaysbluesky <[hidden email]>
wrote:

> Thank you for the advice.
>
> By the way, when I upload a new collectin configuration to zookeepr and
> enable bidirectional CDCR for the collections on both prod and dr
> side(<collection>/cdcr?action=START), and reload the collections, CDCR
> usually didn't work. So if I restarted entire nodes in the cluster on both
> prod and dr, CDCR started working.
>
> Should I normally restart Solr after enabling/disabling the CDCR? Reloading
> the collections without Solr restart is not enough to apply the CDCR
> change?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Three questions about huge tlog problem and CDCR

Louis
sure.

I disabled buffer and started cdcr by calling api on both side.

And when I do indexing, I see the size of tlog folder stays within 1MB while
the size of index folder is increasing.

So I imagined that tlog would be consumed by target node and cleared, and
data is being forwarded to target node.. but actually when I checked target
node, index in target nodes is still empty and data was loaded only in
source node.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html