Negative CDCR Queue Size?

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Negative CDCR Queue Size?

Webster Homer
Several times I have noticed that the CDCR action=QUEUES will return a negative queueSize. When this happens we seem to be missing data in the target collection. How can this happen? What does a negative Queue size mean? The timestamp is an empty string.

We have two targets for a source. One looks like this, with a negative queue size
queues": ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",-1,"lastTimestamp",""]],

The other is healthy
"ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]

We are not seeing CDCR errors.

What could cause this behavior?
Reply | Threaded
Open this post in threaded view
|

Re: Negative CDCR Queue Size?

Erick Erickson
What version of Solr? CDCR has changed quite a bit in the 7x  code
line so it's important to know the version.

On Tue, Nov 6, 2018 at 10:32 AM Webster Homer
<[hidden email]> wrote:

>
> Several times I have noticed that the CDCR action=QUEUES will return a negative queueSize. When this happens we seem to be missing data in the target collection. How can this happen? What does a negative Queue size mean? The timestamp is an empty string.
>
> We have two targets for a source. One looks like this, with a negative queue size
> queues": ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",-1,"lastTimestamp",""]],
>
> The other is healthy
> "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize",246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
>
> We are not seeing CDCR errors.
>
> What could cause this behavior?
Reply | Threaded
Open this post in threaded view
|

RE: Negative CDCR Queue Size?

Webster Homer
I'm sorry I should have included that. We are running Solr 7.2. We use CDCR for almost all of our collections. We have experienced several intermittent problems with CDCR, this one seems to be new, at least I hadn't seen it before

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Tuesday, November 06, 2018 12:36 PM
To: solr-user <[hidden email]>
Subject: Re: Negative CDCR Queue Size?

What version of Solr? CDCR has changed quite a bit in the 7x  code line so it's important to know the version.

On Tue, Nov 6, 2018 at 10:32 AM Webster Homer <[hidden email]> wrote:

>
> Several times I have noticed that the CDCR action=QUEUES will return a negative queueSize. When this happens we seem to be missing data in the target collection. How can this happen? What does a negative Queue size mean? The timestamp is an empty string.
>
> We have two targets for a source. One looks like this, with a negative
> queue size
> queues":
> ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> ",-1,"lastTimestamp",""]],
>
> The other is healthy
> "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
>
> We are not seeing CDCR errors.
>
> What could cause this behavior?
Reply | Threaded
Open this post in threaded view
|

Re: Negative CDCR Queue Size?

Amrit Sarkar
Hi Webster,

The queue size "*-1*" suggests the target is not initialized, and you
should see a "WARN" in the logs suggesting something bad happened at the
respective target. I am also posting the source code for reference.

Any chance you can look for WARN in the logs or probably check at
respective source and target the CDCR is configured and was running ok?
without any manual intervention?

Also, you mentioned there are a number of intermittent issues with CDCR, I
see you have reported few Jiras. I will be grateful if you can report the
rest?

Code:

> for (CdcrReplicatorState state : replicatorManager.getReplicatorStates()) {
>   NamedList queueStats = new NamedList();
>   CdcrUpdateLog.CdcrLogReader logReader = state.getLogReader();
>   if (logReader == null) {
>     String collectionName = req.getCore().getCoreDescriptor().getCloudDescriptor().getCollectionName();
>     String shard = req.getCore().getCoreDescriptor().getCloudDescriptor().getShardId();
>     log.warn("The log reader for target collection {} is not initialised @ {}:{}",
>         state.getTargetCollection(), collectionName, shard);
>     queueStats.add(CdcrParams.QUEUE_SIZE, -1l);
>   } else {
>     queueStats.add(CdcrParams.QUEUE_SIZE, logReader.getNumberOfRemainingRecords());
>   }
>   queueStats.add(CdcrParams.LAST_TIMESTAMP, state.getTimestampOfLastProcessedOperation());
>   if (hosts.get(state.getZkHost()) == null) {
>     hosts.add(state.getZkHost(), new NamedList());
>   }
>   ((NamedList) hosts.get(state.getZkHost())).add(state.getTargetCollection(), queueStats);
> }
> rsp.add(CdcrParams.QUEUES, hosts);
>
>
Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Wed, Nov 7, 2018 at 12:47 AM Webster Homer <
[hidden email]> wrote:

> I'm sorry I should have included that. We are running Solr 7.2. We use
> CDCR for almost all of our collections. We have experienced several
> intermittent problems with CDCR, this one seems to be new, at least I
> hadn't seen it before
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Tuesday, November 06, 2018 12:36 PM
> To: solr-user <[hidden email]>
> Subject: Re: Negative CDCR Queue Size?
>
> What version of Solr? CDCR has changed quite a bit in the 7x  code line so
> it's important to know the version.
>
> On Tue, Nov 6, 2018 at 10:32 AM Webster Homer <
> [hidden email]> wrote:
> >
> > Several times I have noticed that the CDCR action=QUEUES will return a
> negative queueSize. When this happens we seem to be missing data in the
> target collection. How can this happen? What does a negative Queue size
> mean? The timestamp is an empty string.
> >
> > We have two targets for a source. One looks like this, with a negative
> > queue size
> > queues":
> > ["uc1f-ecom-mzk01.sial.com:2181,uc1f-ecom-mzk02.sial.com:2181,uc1f-eco
> > m-mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize
> > ",-1,"lastTimestamp",""]],
> >
> > The other is healthy
> > "ae1b-ecom-mzk01.sial.com:2181,ae1b-ecom-mzk02.sial.com:2181,ae1b-ecom
> > -mzk03.sial.com:2181/solr",["ucb-catalog-material-180317",["queueSize"
> > ,246980,"lastTimestamp","2018-11-06T16:21:53.265Z"]]
> >
> > We are not seeing CDCR errors.
> >
> > What could cause this behavior?
>