SolrCloud upgrade concern

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

SolrCloud upgrade concern

gnandre
Hi,

I am trying to upgrade my system from Solr master-slave architecture to
SolrCloud architecture.
Meanwhile, I stumbled upon this very negative post about SolrCloud.

https://lucene.472066.n3.nabble.com/A-Last-Message-to-the-Solr-Users-td4452980.html


Given that it is from one of the initial authors of SolrCloud
functionality, I am having second thoughts about the upgrade and I am
somewhat concerned.

I will greatly appreciate any advice/feedback on this from Solr community.
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

David Hastings
ha, im on that thread, didnt know they got stored on a site, thats good to
know!

-i stand by what i said in there.  so i have nothing more to add

On Thu, Jan 16, 2020 at 3:29 PM Arnold Bronley <[hidden email]>
wrote:

> Hi,
>
> I am trying to upgrade my system from Solr master-slave architecture to
> SolrCloud architecture.
> Meanwhile, I stumbled upon this very negative post about SolrCloud.
>
>
> https://lucene.472066.n3.nabble.com/A-Last-Message-to-the-Solr-Users-td4452980.html
>
>
> Given that it is from one of the initial authors of SolrCloud
> functionality, I am having second thoughts about the upgrade and I am
> somewhat concerned.
>
> I will greatly appreciate any advice/feedback on this from Solr community.
>
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

Jason Gerlowski
Hi Arnold,

The stability and complexity issues Mark highlighted in his post
aren't just imagined - there are real, sometimes serious, bugs in
SolrCloud features.  But at the same time there are many many stable
deployments out there where SolrCloud is a real success story for
users.  Small example, I work at a company (Lucidworks) where our main
product (Fusion) is built heavily on top of SolrCloud and we see it
deployed successfully every day.

In no way am I trying to minimize Mark's concerns (or David's).  There
are stability bugs.  But the extent to which those need affect you
depends a lot on what your deployment looks like.  How many nodes?
How many collections?  How tightly are you trying to squeeze your
hardware?  Is your network flaky?  Are you looking to use any of
SolrCloud's newer, less stable features like CDCR, etc.?

Is SolrCloud better for you than Master/Slave?  It depends on what
you're hoping to gain by a move to SolrCloud, and on your answers to
some of the questions above.  I would be leery of following any
recommendations that are made without regard for your reason for
switching or your deployment details.  Those things are always the
biggest driver in terms of success.

Good luck making your decision!

Best,

Jason
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

gnandre
Thanks for this reply, Jason.

I am mostly worried about CDCR feature. I am relying heavily on it.
Although, I am planning to use Solr 8.3. It has been long time since CDCR
was first introduced. I wonder what is the state of CDCR is 8.3. Is it
stable now?

On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski <[hidden email]> wrote:

> Hi Arnold,
>
> The stability and complexity issues Mark highlighted in his post
> aren't just imagined - there are real, sometimes serious, bugs in
> SolrCloud features.  But at the same time there are many many stable
> deployments out there where SolrCloud is a real success story for
> users.  Small example, I work at a company (Lucidworks) where our main
> product (Fusion) is built heavily on top of SolrCloud and we see it
> deployed successfully every day.
>
> In no way am I trying to minimize Mark's concerns (or David's).  There
> are stability bugs.  But the extent to which those need affect you
> depends a lot on what your deployment looks like.  How many nodes?
> How many collections?  How tightly are you trying to squeeze your
> hardware?  Is your network flaky?  Are you looking to use any of
> SolrCloud's newer, less stable features like CDCR, etc.?
>
> Is SolrCloud better for you than Master/Slave?  It depends on what
> you're hoping to gain by a move to SolrCloud, and on your answers to
> some of the questions above.  I would be leery of following any
> recommendations that are made without regard for your reason for
> switching or your deployment details.  Those things are always the
> biggest driver in terms of success.
>
> Good luck making your decision!
>
> Best,
>
> Jason
>
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

Jason Gerlowski
Hi Arnold,

From what I saw in the community, CDCR saw an initial burst of
development around when it was contributed, but hasn't seen much
attention or improvement since.  So while it's been around for a few
years, I'm not sure it's improved much in terms of stability or
compatibility with other Solr features.

Some of the bigger ticket issues still open around CDCR:
- SOLR-11959 no support for basic-auth
- SOLR-12842 infinite retry of failed update-requests (leads to
sync/recovery problems)
- SOLR-12057 no real support for NRT/TLOG/PULL replicas
- SOLR-10679 no support for collection aliases

These are in addition to other more architectural issues: CDCR can be
a bottleneck on clusters with high ingestion rates, CDCR uses
full-index-replication more than traditional indexing setups, which
can cause issues with modern index sizes, etc.

So, unfortunately, no real good news in terms of CDCR maturing much in
recent releases.  Joel Bernstein filed a JIRA recently suggesting its
removal entirely actually.  Though I don't think it's gone anywhere.

That said, I gather from what you said that you're already using CDCR
successfully with Master-Slave.  If none of these pitfalls are biting
you in your current Master-Slave setup, you might not be bothered by
them any more in SolrCloud.  Most of the problems with CDCR are
applicable in master-slave as well as SolrCloud.  I wouldn't recommend
CDCR if you were starting from scratch, and I still recommend you
consider other options.  But since you're already using it with some
success, it might be an orthogonal concern to your potential migration
to SolrCloud.

Best of luck deciding!

Jason

On Fri, May 22, 2020 at 7:06 PM gnandre <[hidden email]> wrote:

>
> Thanks for this reply, Jason.
>
> I am mostly worried about CDCR feature. I am relying heavily on it.
> Although, I am planning to use Solr 8.3. It has been long time since CDCR
> was first introduced. I wonder what is the state of CDCR is 8.3. Is it
> stable now?
>
> On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski <[hidden email]> wrote:
>
> > Hi Arnold,
> >
> > The stability and complexity issues Mark highlighted in his post
> > aren't just imagined - there are real, sometimes serious, bugs in
> > SolrCloud features.  But at the same time there are many many stable
> > deployments out there where SolrCloud is a real success story for
> > users.  Small example, I work at a company (Lucidworks) where our main
> > product (Fusion) is built heavily on top of SolrCloud and we see it
> > deployed successfully every day.
> >
> > In no way am I trying to minimize Mark's concerns (or David's).  There
> > are stability bugs.  But the extent to which those need affect you
> > depends a lot on what your deployment looks like.  How many nodes?
> > How many collections?  How tightly are you trying to squeeze your
> > hardware?  Is your network flaky?  Are you looking to use any of
> > SolrCloud's newer, less stable features like CDCR, etc.?
> >
> > Is SolrCloud better for you than Master/Slave?  It depends on what
> > you're hoping to gain by a move to SolrCloud, and on your answers to
> > some of the questions above.  I would be leery of following any
> > recommendations that are made without regard for your reason for
> > switching or your deployment details.  Those things are always the
> > biggest driver in terms of success.
> >
> > Good luck making your decision!
> >
> > Best,
> >
> > Jason
> >
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

gnandre
Thanks, Jason. This is very helpful.

I should clarify though that I am not using CDCR currently with my
existing master-slave architecture. What I meant to say earlier was that we
will be relying heavily on the CDCR feature if we migrate from solr
master-slave architecture to solrcloud architecture. Are there any
alternatives to CDCR? AFAIK, if you want to replicate between different
data centers then CDCR is the only option. Also, when you say lot of
customers are using SolrCloud successfully, how are they working around the
CDCR situation? Do they not have any data center use cases? Is there some
list maintained somewhere where one can find which companies are using
SolrCloud successfully?



On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski <[hidden email]>
wrote:

> Hi Arnold,
>
> From what I saw in the community, CDCR saw an initial burst of
> development around when it was contributed, but hasn't seen much
> attention or improvement since.  So while it's been around for a few
> years, I'm not sure it's improved much in terms of stability or
> compatibility with other Solr features.
>
> Some of the bigger ticket issues still open around CDCR:
> - SOLR-11959 no support for basic-auth
> - SOLR-12842 infinite retry of failed update-requests (leads to
> sync/recovery problems)
> - SOLR-12057 no real support for NRT/TLOG/PULL replicas
> - SOLR-10679 no support for collection aliases
>
> These are in addition to other more architectural issues: CDCR can be
> a bottleneck on clusters with high ingestion rates, CDCR uses
> full-index-replication more than traditional indexing setups, which
> can cause issues with modern index sizes, etc.
>
> So, unfortunately, no real good news in terms of CDCR maturing much in
> recent releases.  Joel Bernstein filed a JIRA recently suggesting its
> removal entirely actually.  Though I don't think it's gone anywhere.
>
> That said, I gather from what you said that you're already using CDCR
> successfully with Master-Slave.  If none of these pitfalls are biting
> you in your current Master-Slave setup, you might not be bothered by
> them any more in SolrCloud.  Most of the problems with CDCR are
> applicable in master-slave as well as SolrCloud.  I wouldn't recommend
> CDCR if you were starting from scratch, and I still recommend you
> consider other options.  But since you're already using it with some
> success, it might be an orthogonal concern to your potential migration
> to SolrCloud.
>
> Best of luck deciding!
>
> Jason
>
> On Fri, May 22, 2020 at 7:06 PM gnandre <[hidden email]> wrote:
> >
> > Thanks for this reply, Jason.
> >
> > I am mostly worried about CDCR feature. I am relying heavily on it.
> > Although, I am planning to use Solr 8.3. It has been long time since CDCR
> > was first introduced. I wonder what is the state of CDCR is 8.3. Is it
> > stable now?
> >
> > On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski <[hidden email]>
> wrote:
> >
> > > Hi Arnold,
> > >
> > > The stability and complexity issues Mark highlighted in his post
> > > aren't just imagined - there are real, sometimes serious, bugs in
> > > SolrCloud features.  But at the same time there are many many stable
> > > deployments out there where SolrCloud is a real success story for
> > > users.  Small example, I work at a company (Lucidworks) where our main
> > > product (Fusion) is built heavily on top of SolrCloud and we see it
> > > deployed successfully every day.
> > >
> > > In no way am I trying to minimize Mark's concerns (or David's).  There
> > > are stability bugs.  But the extent to which those need affect you
> > > depends a lot on what your deployment looks like.  How many nodes?
> > > How many collections?  How tightly are you trying to squeeze your
> > > hardware?  Is your network flaky?  Are you looking to use any of
> > > SolrCloud's newer, less stable features like CDCR, etc.?
> > >
> > > Is SolrCloud better for you than Master/Slave?  It depends on what
> > > you're hoping to gain by a move to SolrCloud, and on your answers to
> > > some of the questions above.  I would be leery of following any
> > > recommendations that are made without regard for your reason for
> > > switching or your deployment details.  Those things are always the
> > > biggest driver in terms of success.
> > >
> > > Good luck making your decision!
> > >
> > > Best,
> > >
> > > Jason
> > >
>
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

Erick Erickson
The biggest issue with CDCR is it’s rather fragile and requires monitoring,
it’s not a “fire and forget” type of functionality. For instance, the use of the
tlogs as a queueing mechanism means that if, for any reason, the communications
between DCs is broken, the tlogs will grow forever until the connection is
re-established. Plus the other issues Jason pointed out.

So yes, some companies do use CDCR to communicate between separate
DCs. But they also put in some “roll your own” type of monitoring to insure
things don’t go haywire.

Alternatives:
1> use something that’s built from the ground up to provide reliable
     messaging between DCs. Kafka or similar has been mentioned. Write
     your updates to the Kafka queue and consume them in both DCs.
     These kinds of solutions have a lot more robustness.

2> reproduce your system-of-record rather than Solr in the DCs and
   treat the DCs as separate installations. If you adopt this approach,
  some of the streaming capabilities can be used to monitor that they stay
  in sync. For instance have a background or periodic task that’ll take a while
  for a complete run wrap two "search" streams in a "unique” decorator,
  anything except an empty result identifies docs not on both DCs.

3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in
    an update decorator for DC2 and wrap both of those in a daemon decorator.
   That’s gobbledygook, and you’ll have to dig through the docs a bit for
   that to make sense. Essentially the topic stream is one of the very few
   streams that does not (IIRC) require all values in the fl list be docValues.
   It fires the first time and establishes a checkpoint, finding all docs up to that point.
   Thereafter, it’ll get docs that have changed since the last time it ran. It uses a tiny
   collection for record keeping. Each time the topic stream finds new docs, it passes
  them to the update stream which sends them to another DC. Wrapping the whole
  thing in a daemon decorator means it periodically runs in the background. The one
  shortcoming is that this approach doesn’t propagate deletes. That’s enough of that
  until you tell us whether it sounds worth pursuing ;)

So overall, you _can_ use CDCR to connect remote DCs, but it takes time and energy
to make it robust. Its advantage is that it’s entirely contained within Solr. But it’s not
getting much attention lately, meaning nobody has decided the functionality is important
enough to them to donate the time/resources to make it more robust. Were someone
to take an active interest in it, likely it could be kept around as a plugin that core Solr
is not responsible for.

Best,
Erick

> On May 27, 2020, at 4:43 PM, gnandre <[hidden email]> wrote:
>
> Thanks, Jason. This is very helpful.
>
> I should clarify though that I am not using CDCR currently with my
> existing master-slave architecture. What I meant to say earlier was that we
> will be relying heavily on the CDCR feature if we migrate from solr
> master-slave architecture to solrcloud architecture. Are there any
> alternatives to CDCR? AFAIK, if you want to replicate between different
> data centers then CDCR is the only option. Also, when you say lot of
> customers are using SolrCloud successfully, how are they working around the
> CDCR situation? Do they not have any data center use cases? Is there some
> list maintained somewhere where one can find which companies are using
> SolrCloud successfully?
>
>
>
> On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski <[hidden email]>
> wrote:
>
>> Hi Arnold,
>>
>> From what I saw in the community, CDCR saw an initial burst of
>> development around when it was contributed, but hasn't seen much
>> attention or improvement since.  So while it's been around for a few
>> years, I'm not sure it's improved much in terms of stability or
>> compatibility with other Solr features.
>>
>> Some of the bigger ticket issues still open around CDCR:
>> - SOLR-11959 no support for basic-auth
>> - SOLR-12842 infinite retry of failed update-requests (leads to
>> sync/recovery problems)
>> - SOLR-12057 no real support for NRT/TLOG/PULL replicas
>> - SOLR-10679 no support for collection aliases
>>
>> These are in addition to other more architectural issues: CDCR can be
>> a bottleneck on clusters with high ingestion rates, CDCR uses
>> full-index-replication more than traditional indexing setups, which
>> can cause issues with modern index sizes, etc.
>>
>> So, unfortunately, no real good news in terms of CDCR maturing much in
>> recent releases.  Joel Bernstein filed a JIRA recently suggesting its
>> removal entirely actually.  Though I don't think it's gone anywhere.
>>
>> That said, I gather from what you said that you're already using CDCR
>> successfully with Master-Slave.  If none of these pitfalls are biting
>> you in your current Master-Slave setup, you might not be bothered by
>> them any more in SolrCloud.  Most of the problems with CDCR are
>> applicable in master-slave as well as SolrCloud.  I wouldn't recommend
>> CDCR if you were starting from scratch, and I still recommend you
>> consider other options.  But since you're already using it with some
>> success, it might be an orthogonal concern to your potential migration
>> to SolrCloud.
>>
>> Best of luck deciding!
>>
>> Jason
>>
>> On Fri, May 22, 2020 at 7:06 PM gnandre <[hidden email]> wrote:
>>>
>>> Thanks for this reply, Jason.
>>>
>>> I am mostly worried about CDCR feature. I am relying heavily on it.
>>> Although, I am planning to use Solr 8.3. It has been long time since CDCR
>>> was first introduced. I wonder what is the state of CDCR is 8.3. Is it
>>> stable now?
>>>
>>> On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski <[hidden email]>
>> wrote:
>>>
>>>> Hi Arnold,
>>>>
>>>> The stability and complexity issues Mark highlighted in his post
>>>> aren't just imagined - there are real, sometimes serious, bugs in
>>>> SolrCloud features.  But at the same time there are many many stable
>>>> deployments out there where SolrCloud is a real success story for
>>>> users.  Small example, I work at a company (Lucidworks) where our main
>>>> product (Fusion) is built heavily on top of SolrCloud and we see it
>>>> deployed successfully every day.
>>>>
>>>> In no way am I trying to minimize Mark's concerns (or David's).  There
>>>> are stability bugs.  But the extent to which those need affect you
>>>> depends a lot on what your deployment looks like.  How many nodes?
>>>> How many collections?  How tightly are you trying to squeeze your
>>>> hardware?  Is your network flaky?  Are you looking to use any of
>>>> SolrCloud's newer, less stable features like CDCR, etc.?
>>>>
>>>> Is SolrCloud better for you than Master/Slave?  It depends on what
>>>> you're hoping to gain by a move to SolrCloud, and on your answers to
>>>> some of the questions above.  I would be leery of following any
>>>> recommendations that are made without regard for your reason for
>>>> switching or your deployment details.  Those things are always the
>>>> biggest driver in terms of success.
>>>>
>>>> Good luck making your decision!
>>>>
>>>> Best,
>>>>
>>>> Jason
>>>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

Jan Høydahl / Cominvent
I had a client who asked a lot about CDCR a few years ago, but I kept recommending
aginst it and recommended them to go for Ericks’s alternative (2), since they anyway
needed to replicate their Oracle DBs in each DC as well. Much cleaner design to let
each cluster have a local datasource and always stay in sync with local DB than to
replicate both DB and index.

There are of course use cases where you want to sync a read-only copy of indices
to multiple DCs. I hope we’ll see a 3rd party tool for that some day, something that
can sit outside your Solr clusters, monitor ZK of each cluster, and do some magic :)

Jan

> 28. mai 2020 kl. 01:17 skrev Erick Erickson <[hidden email]>:
>
> The biggest issue with CDCR is it’s rather fragile and requires monitoring,
> it’s not a “fire and forget” type of functionality. For instance, the use of the
> tlogs as a queueing mechanism means that if, for any reason, the communications
> between DCs is broken, the tlogs will grow forever until the connection is
> re-established. Plus the other issues Jason pointed out.
>
> So yes, some companies do use CDCR to communicate between separate
> DCs. But they also put in some “roll your own” type of monitoring to insure
> things don’t go haywire.
>
> Alternatives:
> 1> use something that’s built from the ground up to provide reliable
>     messaging between DCs. Kafka or similar has been mentioned. Write
>     your updates to the Kafka queue and consume them in both DCs.
>     These kinds of solutions have a lot more robustness.
>
> 2> reproduce your system-of-record rather than Solr in the DCs and
>   treat the DCs as separate installations. If you adopt this approach,
>  some of the streaming capabilities can be used to monitor that they stay
>  in sync. For instance have a background or periodic task that’ll take a while
>  for a complete run wrap two "search" streams in a "unique” decorator,
>  anything except an empty result identifies docs not on both DCs.
>
> 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in
>    an update decorator for DC2 and wrap both of those in a daemon decorator.
>   That’s gobbledygook, and you’ll have to dig through the docs a bit for
>   that to make sense. Essentially the topic stream is one of the very few
>   streams that does not (IIRC) require all values in the fl list be docValues.
>   It fires the first time and establishes a checkpoint, finding all docs up to that point.
>   Thereafter, it’ll get docs that have changed since the last time it ran. It uses a tiny
>   collection for record keeping. Each time the topic stream finds new docs, it passes
>  them to the update stream which sends them to another DC. Wrapping the whole
>  thing in a daemon decorator means it periodically runs in the background. The one
>  shortcoming is that this approach doesn’t propagate deletes. That’s enough of that
>  until you tell us whether it sounds worth pursuing ;)
>
> So overall, you _can_ use CDCR to connect remote DCs, but it takes time and energy
> to make it robust. Its advantage is that it’s entirely contained within Solr. But it’s not
> getting much attention lately, meaning nobody has decided the functionality is important
> enough to them to donate the time/resources to make it more robust. Were someone
> to take an active interest in it, likely it could be kept around as a plugin that core Solr
> is not responsible for.
>
> Best,
> Erick
>
>> On May 27, 2020, at 4:43 PM, gnandre <[hidden email]> wrote:
>>
>> Thanks, Jason. This is very helpful.
>>
>> I should clarify though that I am not using CDCR currently with my
>> existing master-slave architecture. What I meant to say earlier was that we
>> will be relying heavily on the CDCR feature if we migrate from solr
>> master-slave architecture to solrcloud architecture. Are there any
>> alternatives to CDCR? AFAIK, if you want to replicate between different
>> data centers then CDCR is the only option. Also, when you say lot of
>> customers are using SolrCloud successfully, how are they working around the
>> CDCR situation? Do they not have any data center use cases? Is there some
>> list maintained somewhere where one can find which companies are using
>> SolrCloud successfully?
>>
>>
>>
>> On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski <[hidden email]>
>> wrote:
>>
>>> Hi Arnold,
>>>
>>> From what I saw in the community, CDCR saw an initial burst of
>>> development around when it was contributed, but hasn't seen much
>>> attention or improvement since.  So while it's been around for a few
>>> years, I'm not sure it's improved much in terms of stability or
>>> compatibility with other Solr features.
>>>
>>> Some of the bigger ticket issues still open around CDCR:
>>> - SOLR-11959 no support for basic-auth
>>> - SOLR-12842 infinite retry of failed update-requests (leads to
>>> sync/recovery problems)
>>> - SOLR-12057 no real support for NRT/TLOG/PULL replicas
>>> - SOLR-10679 no support for collection aliases
>>>
>>> These are in addition to other more architectural issues: CDCR can be
>>> a bottleneck on clusters with high ingestion rates, CDCR uses
>>> full-index-replication more than traditional indexing setups, which
>>> can cause issues with modern index sizes, etc.
>>>
>>> So, unfortunately, no real good news in terms of CDCR maturing much in
>>> recent releases.  Joel Bernstein filed a JIRA recently suggesting its
>>> removal entirely actually.  Though I don't think it's gone anywhere.
>>>
>>> That said, I gather from what you said that you're already using CDCR
>>> successfully with Master-Slave.  If none of these pitfalls are biting
>>> you in your current Master-Slave setup, you might not be bothered by
>>> them any more in SolrCloud.  Most of the problems with CDCR are
>>> applicable in master-slave as well as SolrCloud.  I wouldn't recommend
>>> CDCR if you were starting from scratch, and I still recommend you
>>> consider other options.  But since you're already using it with some
>>> success, it might be an orthogonal concern to your potential migration
>>> to SolrCloud.
>>>
>>> Best of luck deciding!
>>>
>>> Jason
>>>
>>> On Fri, May 22, 2020 at 7:06 PM gnandre <[hidden email]> wrote:
>>>>
>>>> Thanks for this reply, Jason.
>>>>
>>>> I am mostly worried about CDCR feature. I am relying heavily on it.
>>>> Although, I am planning to use Solr 8.3. It has been long time since CDCR
>>>> was first introduced. I wonder what is the state of CDCR is 8.3. Is it
>>>> stable now?
>>>>
>>>> On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski <[hidden email]>
>>> wrote:
>>>>
>>>>> Hi Arnold,
>>>>>
>>>>> The stability and complexity issues Mark highlighted in his post
>>>>> aren't just imagined - there are real, sometimes serious, bugs in
>>>>> SolrCloud features.  But at the same time there are many many stable
>>>>> deployments out there where SolrCloud is a real success story for
>>>>> users.  Small example, I work at a company (Lucidworks) where our main
>>>>> product (Fusion) is built heavily on top of SolrCloud and we see it
>>>>> deployed successfully every day.
>>>>>
>>>>> In no way am I trying to minimize Mark's concerns (or David's).  There
>>>>> are stability bugs.  But the extent to which those need affect you
>>>>> depends a lot on what your deployment looks like.  How many nodes?
>>>>> How many collections?  How tightly are you trying to squeeze your
>>>>> hardware?  Is your network flaky?  Are you looking to use any of
>>>>> SolrCloud's newer, less stable features like CDCR, etc.?
>>>>>
>>>>> Is SolrCloud better for you than Master/Slave?  It depends on what
>>>>> you're hoping to gain by a move to SolrCloud, and on your answers to
>>>>> some of the questions above.  I would be leery of following any
>>>>> recommendations that are made without regard for your reason for
>>>>> switching or your deployment details.  Those things are always the
>>>>> biggest driver in terms of success.
>>>>>
>>>>> Good luck making your decision!
>>>>>
>>>>> Best,
>>>>>
>>>>> Jason
>>>>>
>>>
>

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud upgrade concern

gnandre
Thanks for all this information. It clears lot of confusion surrounding
CDCR feature. Although, I should say that if CDCR functionality is so
fragile in SolrCloud and not worth pursuing much, does it make sense to add
some warning about its possible shortcomings in the documentation?

On Thu, May 28, 2020 at 9:02 AM Jan Høydahl <[hidden email]> wrote:

> I had a client who asked a lot about CDCR a few years ago, but I kept
> recommending
> aginst it and recommended them to go for Ericks’s alternative (2), since
> they anyway
> needed to replicate their Oracle DBs in each DC as well. Much cleaner
> design to let
> each cluster have a local datasource and always stay in sync with local DB
> than to
> replicate both DB and index.
>
> There are of course use cases where you want to sync a read-only copy of
> indices
> to multiple DCs. I hope we’ll see a 3rd party tool for that some day,
> something that
> can sit outside your Solr clusters, monitor ZK of each cluster, and do
> some magic :)
>
> Jan
>
> > 28. mai 2020 kl. 01:17 skrev Erick Erickson <[hidden email]>:
> >
> > The biggest issue with CDCR is it’s rather fragile and requires
> monitoring,
> > it’s not a “fire and forget” type of functionality. For instance, the
> use of the
> > tlogs as a queueing mechanism means that if, for any reason, the
> communications
> > between DCs is broken, the tlogs will grow forever until the connection
> is
> > re-established. Plus the other issues Jason pointed out.
> >
> > So yes, some companies do use CDCR to communicate between separate
> > DCs. But they also put in some “roll your own” type of monitoring to
> insure
> > things don’t go haywire.
> >
> > Alternatives:
> > 1> use something that’s built from the ground up to provide reliable
> >     messaging between DCs. Kafka or similar has been mentioned. Write
> >     your updates to the Kafka queue and consume them in both DCs.
> >     These kinds of solutions have a lot more robustness.
> >
> > 2> reproduce your system-of-record rather than Solr in the DCs and
> >   treat the DCs as separate installations. If you adopt this approach,
> >  some of the streaming capabilities can be used to monitor that they stay
> >  in sync. For instance have a background or periodic task that’ll take a
> while
> >  for a complete run wrap two "search" streams in a "unique” decorator,
> >  anything except an empty result identifies docs not on both DCs.
> >
> > 3> Oh Dear. This one is “interesting”. Wrap a “topic" stream on DC1 in
> >    an update decorator for DC2 and wrap both of those in a daemon
> decorator.
> >   That’s gobbledygook, and you’ll have to dig through the docs a bit for
> >   that to make sense. Essentially the topic stream is one of the very
> few
> >   streams that does not (IIRC) require all values in the fl list be
> docValues.
> >   It fires the first time and establishes a checkpoint, finding all docs
> up to that point.
> >   Thereafter, it’ll get docs that have changed since the last time it
> ran. It uses a tiny
> >   collection for record keeping. Each time the topic stream finds new
> docs, it passes
> >  them to the update stream which sends them to another DC. Wrapping the
> whole
> >  thing in a daemon decorator means it periodically runs in the
> background. The one
> >  shortcoming is that this approach doesn’t propagate deletes. That’s
> enough of that
> >  until you tell us whether it sounds worth pursuing ;)
> >
> > So overall, you _can_ use CDCR to connect remote DCs, but it takes time
> and energy
> > to make it robust. Its advantage is that it’s entirely contained within
> Solr. But it’s not
> > getting much attention lately, meaning nobody has decided the
> functionality is important
> > enough to them to donate the time/resources to make it more robust. Were
> someone
> > to take an active interest in it, likely it could be kept around as a
> plugin that core Solr
> > is not responsible for.
> >
> > Best,
> > Erick
> >
> >> On May 27, 2020, at 4:43 PM, gnandre <[hidden email]> wrote:
> >>
> >> Thanks, Jason. This is very helpful.
> >>
> >> I should clarify though that I am not using CDCR currently with my
> >> existing master-slave architecture. What I meant to say earlier was
> that we
> >> will be relying heavily on the CDCR feature if we migrate from solr
> >> master-slave architecture to solrcloud architecture. Are there any
> >> alternatives to CDCR? AFAIK, if you want to replicate between different
> >> data centers then CDCR is the only option. Also, when you say lot of
> >> customers are using SolrCloud successfully, how are they working around
> the
> >> CDCR situation? Do they not have any data center use cases? Is there
> some
> >> list maintained somewhere where one can find which companies are using
> >> SolrCloud successfully?
> >>
> >>
> >>
> >> On Wed, May 27, 2020 at 9:27 AM Jason Gerlowski <[hidden email]>
> >> wrote:
> >>
> >>> Hi Arnold,
> >>>
> >>> From what I saw in the community, CDCR saw an initial burst of
> >>> development around when it was contributed, but hasn't seen much
> >>> attention or improvement since.  So while it's been around for a few
> >>> years, I'm not sure it's improved much in terms of stability or
> >>> compatibility with other Solr features.
> >>>
> >>> Some of the bigger ticket issues still open around CDCR:
> >>> - SOLR-11959 no support for basic-auth
> >>> - SOLR-12842 infinite retry of failed update-requests (leads to
> >>> sync/recovery problems)
> >>> - SOLR-12057 no real support for NRT/TLOG/PULL replicas
> >>> - SOLR-10679 no support for collection aliases
> >>>
> >>> These are in addition to other more architectural issues: CDCR can be
> >>> a bottleneck on clusters with high ingestion rates, CDCR uses
> >>> full-index-replication more than traditional indexing setups, which
> >>> can cause issues with modern index sizes, etc.
> >>>
> >>> So, unfortunately, no real good news in terms of CDCR maturing much in
> >>> recent releases.  Joel Bernstein filed a JIRA recently suggesting its
> >>> removal entirely actually.  Though I don't think it's gone anywhere.
> >>>
> >>> That said, I gather from what you said that you're already using CDCR
> >>> successfully with Master-Slave.  If none of these pitfalls are biting
> >>> you in your current Master-Slave setup, you might not be bothered by
> >>> them any more in SolrCloud.  Most of the problems with CDCR are
> >>> applicable in master-slave as well as SolrCloud.  I wouldn't recommend
> >>> CDCR if you were starting from scratch, and I still recommend you
> >>> consider other options.  But since you're already using it with some
> >>> success, it might be an orthogonal concern to your potential migration
> >>> to SolrCloud.
> >>>
> >>> Best of luck deciding!
> >>>
> >>> Jason
> >>>
> >>> On Fri, May 22, 2020 at 7:06 PM gnandre <[hidden email]>
> wrote:
> >>>>
> >>>> Thanks for this reply, Jason.
> >>>>
> >>>> I am mostly worried about CDCR feature. I am relying heavily on it.
> >>>> Although, I am planning to use Solr 8.3. It has been long time since
> CDCR
> >>>> was first introduced. I wonder what is the state of CDCR is 8.3. Is it
> >>>> stable now?
> >>>>
> >>>> On Wed, Jan 22, 2020, 8:01 AM Jason Gerlowski <[hidden email]>
> >>> wrote:
> >>>>
> >>>>> Hi Arnold,
> >>>>>
> >>>>> The stability and complexity issues Mark highlighted in his post
> >>>>> aren't just imagined - there are real, sometimes serious, bugs in
> >>>>> SolrCloud features.  But at the same time there are many many stable
> >>>>> deployments out there where SolrCloud is a real success story for
> >>>>> users.  Small example, I work at a company (Lucidworks) where our
> main
> >>>>> product (Fusion) is built heavily on top of SolrCloud and we see it
> >>>>> deployed successfully every day.
> >>>>>
> >>>>> In no way am I trying to minimize Mark's concerns (or David's).
> There
> >>>>> are stability bugs.  But the extent to which those need affect you
> >>>>> depends a lot on what your deployment looks like.  How many nodes?
> >>>>> How many collections?  How tightly are you trying to squeeze your
> >>>>> hardware?  Is your network flaky?  Are you looking to use any of
> >>>>> SolrCloud's newer, less stable features like CDCR, etc.?
> >>>>>
> >>>>> Is SolrCloud better for you than Master/Slave?  It depends on what
> >>>>> you're hoping to gain by a move to SolrCloud, and on your answers to
> >>>>> some of the questions above.  I would be leery of following any
> >>>>> recommendations that are made without regard for your reason for
> >>>>> switching or your deployment details.  Those things are always the
> >>>>> biggest driver in terms of success.
> >>>>>
> >>>>> Good luck making your decision!
> >>>>>
> >>>>> Best,
> >>>>>
> >>>>> Jason
> >>>>>
> >>>
> >
>
>