Solr Cloud on Docker?

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr Cloud on Docker?

Walter Underwood
Does anyone have experience running a big Solr Cloud cluster on Docker containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with 36 CPU instances. We are running version 6.6.2 right now, but could upgrade.

If people have specific things to do or avoid, I’d really appreciate it.

I got a couple of responses on the Slack channel, but I’d love more stories from the trenches. This is a direction for our company architecture.

We have a master/slave cluster (Solr 4.10.4) that is awesome. I can absolutely see running the slaves as containers. For Solr Cloud? Makes me nervous.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud on Docker?

Dwane Hall
Hey Walter,

I recently migrated our Solr cluster to Docker and am very pleased I did so. We run relativity large servers and run multiple Solr instances per physical host and having managed Solr upgrades on bare metal installs since Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In our case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr instances. Here I host 3 collections of varying size. The first contains 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b (30 shards) all with 2 NRT replicas. The docs are primarily database sourced but are not tiny by any means.

Here are some of my comments from our migration journey:
- Running Solr on Docker should be no different to bare metal. You still need to test for your environment and conditions and follow the guides and best practices outlined in the excellent Lucidworks blog post https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/.
- The recent Solr Docker images are built with Java 11 so if you store your indexes in hdfs you'll have to build your own Docker image as Hadoop is not yet certified with Java 11 (or use an older Solr version image built with Java 8)
- As Docker will be responsible for quite a few Solr nodes it becomes important to make sure the Docker daemon is configured in systemctl to restart after failure or reboot of the host. Additionally the Docker restart=always setting is useful for restarting failed containers automatically if a single container dies (i.e. JVM explosions). I've deliberately blown up the JVM in test conditions and found the containers/Solr recover really well under Docker.
- I use Docker Compose to spin up our environment and it has been excellent for maintaining consistent settings across Solr nodes and hosts. Additionally using a .env file makes most of the Solr environment variables per node configurable in an external file.
- I'd recommend Docker Swarm if you plan on running Solr over multiple physical hosts. Unfortunately we had an incompatible OS so I was unable to utilise this approach. The same incompatibility existed for K8s but Lucidworks has another great article on this approach if you're more fortunate with your environment than us https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
- Our Solr instances are TLS secured and use the basic auth plugin and rules based authentication provider. There's nothing I have not been able to configure with the default Docker images using environment variables passed into the container. This makes upgrades to Solr versions really easy as you just need to grab the image and pass in your environment details to the container for any new Solr version.
- If possible I'd start with the Solr 8 Docker image. The project underwent a large refactor to align it with the install script based on community feedback. If you start with an earlier version you'll need to refactor when you eventually move to Solr version 8. The Solr Docker page has more details on this.
- Matijn Koster (the project lead) is excellent and very responsive to questions on the project page. Read through the q&a page before reaching out I found a lot of my questions already answered there.  Additionally, he provides a number of example Docker configurations from command line parameters to docker-compose files running multiple instances and zookeeper quarums.
- The Docker extra hosts parameter is useful for adding extra hosts to your containers hosts file particularly if you have multiple nic cards with internal and external interfaces and you want to force communication over a specific one.
- We use the Solr Prometheus exporter to collect node metrics. I've found I've needed to reduce the metrics to collect as having this many nodes overwhelmed it occasionally. From memory it had something to do with concurrent modification of Future objects the collector users and it sometimes misses collection cycles. This is not Docker related but Solr size related and the exporter's ability to handle it.
- We use the zkCli script a lot for updating configsets. As I did not want to have to copy them into a container to update them I just download a copy of the Solr binaries and use it entirely for this zookeeper script. It's not elegant but a number of our Dev's are not familiar with Docker and this was a nice compromise. Another alternative is to just use the rest API to do any configset manipulation.
- We load balance all of these nodes to external clients using a haproxy Docker image. This combined with the Docker restart policy and Solr replication and autoscaling capabilities provides a very stable environment for us.

All in all migrating and running Solr on Docker has been brilliant. It was primarily driven by a need to scale our environment vertically on large hardware instances as running 100 nodes on bare metal was too big a maintenance and administrative burden for us with a small Dev and support team. To date it's been very stable and reliable so I would recommend the approach if you are in a similar situation.

Thanks,

Dwane






________________________________
From: Walter Underwood <[hidden email]>
Sent: Saturday, 14 December 2019 6:04 PM
To: [hidden email] <[hidden email]>
Subject: Solr Cloud on Docker?

Does anyone have experience running a big Solr Cloud cluster on Docker containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with 36 CPU instances. We are running version 6.6.2 right now, but could upgrade.

If people have specific things to do or avoid, I’d really appreciate it.

I got a couple of responses on the Slack channel, but I’d love more stories from the trenches. This is a direction for our company architecture.

We have a master/slave cluster (Solr 4.10.4) that is awesome. I can absolutely see running the slaves as containers. For Solr Cloud? Makes me nervous.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud on Docker?

Dominique Bejean
Hi  Dwane,

Thank you for sharing this great solr/docker user story.

According to your Solr/JVM memory requirements (Heap size + MetaSpace +
OffHeap size) are you specifying specific settings in docker-compose files
(mem_limit, mem_reservation, mem_swappiness, ...) ?
I suppose you are limiting total memory used by all dockerised Solr in
order to keep free memory on host for MMAPDirectory ?

In short can you explain the memory management ?

Regards

Dominique




Le lun. 23 déc. 2019 à 00:17, Dwane Hall <[hidden email]> a écrit :

> Hey Walter,
>
> I recently migrated our Solr cluster to Docker and am very pleased I did
> so. We run relativity large servers and run multiple Solr instances per
> physical host and having managed Solr upgrades on bare metal installs since
> Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In our
> case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> instances. Here I host 3 collections of varying size. The first contains
> 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b (30
> shards) all with 2 NRT replicas. The docs are primarily database sourced
> but are not tiny by any means.
>
> Here are some of my comments from our migration journey:
> - Running Solr on Docker should be no different to bare metal. You still
> need to test for your environment and conditions and follow the guides and
> best practices outlined in the excellent Lucidworks blog post
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> .
> - The recent Solr Docker images are built with Java 11 so if you store
> your indexes in hdfs you'll have to build your own Docker image as Hadoop
> is not yet certified with Java 11 (or use an older Solr version image built
> with Java 8)
> - As Docker will be responsible for quite a few Solr nodes it becomes
> important to make sure the Docker daemon is configured in systemctl to
> restart after failure or reboot of the host. Additionally the Docker
> restart=always setting is useful for restarting failed containers
> automatically if a single container dies (i.e. JVM explosions). I've
> deliberately blown up the JVM in test conditions and found the
> containers/Solr recover really well under Docker.
> - I use Docker Compose to spin up our environment and it has been
> excellent for maintaining consistent settings across Solr nodes and hosts.
> Additionally using a .env file makes most of the Solr environment variables
> per node configurable in an external file.
> - I'd recommend Docker Swarm if you plan on running Solr over multiple
> physical hosts. Unfortunately we had an incompatible OS so I was unable to
> utilise this approach. The same incompatibility existed for K8s but
> Lucidworks has another great article on this approach if you're more
> fortunate with your environment than us
> https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
> - Our Solr instances are TLS secured and use the basic auth plugin and
> rules based authentication provider. There's nothing I have not been able
> to configure with the default Docker images using environment variables
> passed into the container. This makes upgrades to Solr versions really easy
> as you just need to grab the image and pass in your environment details to
> the container for any new Solr version.
> - If possible I'd start with the Solr 8 Docker image. The project
> underwent a large refactor to align it with the install script based on
> community feedback. If you start with an earlier version you'll need to
> refactor when you eventually move to Solr version 8. The Solr Docker page
> has more details on this.
> - Matijn Koster (the project lead) is excellent and very responsive to
> questions on the project page. Read through the q&a page before reaching
> out I found a lot of my questions already answered there.  Additionally, he
> provides a number of example Docker configurations from command line
> parameters to docker-compose files running multiple instances and zookeeper
> quarums.
> - The Docker extra hosts parameter is useful for adding extra hosts to
> your containers hosts file particularly if you have multiple nic cards with
> internal and external interfaces and you want to force communication over a
> specific one.
> - We use the Solr Prometheus exporter to collect node metrics. I've found
> I've needed to reduce the metrics to collect as having this many nodes
> overwhelmed it occasionally. From memory it had something to do with
> concurrent modification of Future objects the collector users and it
> sometimes misses collection cycles. This is not Docker related but Solr
> size related and the exporter's ability to handle it.
> - We use the zkCli script a lot for updating configsets. As I did not want
> to have to copy them into a container to update them I just download a copy
> of the Solr binaries and use it entirely for this zookeeper script. It's
> not elegant but a number of our Dev's are not familiar with Docker and this
> was a nice compromise. Another alternative is to just use the rest API to
> do any configset manipulation.
> - We load balance all of these nodes to external clients using a haproxy
> Docker image. This combined with the Docker restart policy and Solr
> replication and autoscaling capabilities provides a very stable environment
> for us.
>
> All in all migrating and running Solr on Docker has been brilliant. It was
> primarily driven by a need to scale our environment vertically on large
> hardware instances as running 100 nodes on bare metal was too big a
> maintenance and administrative burden for us with a small Dev and support
> team. To date it's been very stable and reliable so I would recommend the
> approach if you are in a similar situation.
>
> Thanks,
>
> Dwane
>
>
>
>
>
>
> ________________________________
> From: Walter Underwood <[hidden email]>
> Sent: Saturday, 14 December 2019 6:04 PM
> To: [hidden email] <[hidden email]>
> Subject: Solr Cloud on Docker?
>
> Does anyone have experience running a big Solr Cloud cluster on Docker
> containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with 36
> CPU instances. We are running version 6.6.2 right now, but could upgrade.
>
> If people have specific things to do or avoid, I’d really appreciate it.
>
> I got a couple of responses on the Slack channel, but I’d love more
> stories from the trenches. This is a direction for our company architecture.
>
> We have a master/slave cluster (Solr 4.10.4) that is awesome. I can
> absolutely see running the slaves as containers. For Solr Cloud? Makes me
> nervous.
>
> wunder
> Walter Underwood
> [hidden email]
> http://observer.wunderwood.org/  (my blog)
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud on Docker?

Scott Stults
One of our clients has been running a big Solr Cloud (100-ish nodes, TB
index, billions of docs) in kubernetes for over a year and it's been
wonderful. I think during that time the biggest scrapes we got were when we
ran out of disk space. Performance and reliability has been solid
otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
avoided if you do your Docker orchestration through kubernetes.


k/r,
Scott

On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean <[hidden email]>
wrote:

> Hi  Dwane,
>
> Thank you for sharing this great solr/docker user story.
>
> According to your Solr/JVM memory requirements (Heap size + MetaSpace +
> OffHeap size) are you specifying specific settings in docker-compose files
> (mem_limit, mem_reservation, mem_swappiness, ...) ?
> I suppose you are limiting total memory used by all dockerised Solr in
> order to keep free memory on host for MMAPDirectory ?
>
> In short can you explain the memory management ?
>
> Regards
>
> Dominique
>
>
>
>
> Le lun. 23 déc. 2019 à 00:17, Dwane Hall <[hidden email]> a écrit :
>
> > Hey Walter,
> >
> > I recently migrated our Solr cluster to Docker and am very pleased I did
> > so. We run relativity large servers and run multiple Solr instances per
> > physical host and having managed Solr upgrades on bare metal installs
> since
> > Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In
> our
> > case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> > instances. Here I host 3 collections of varying size. The first contains
> > 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b (30
> > shards) all with 2 NRT replicas. The docs are primarily database sourced
> > but are not tiny by any means.
> >
> > Here are some of my comments from our migration journey:
> > - Running Solr on Docker should be no different to bare metal. You still
> > need to test for your environment and conditions and follow the guides
> and
> > best practices outlined in the excellent Lucidworks blog post
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > .
> > - The recent Solr Docker images are built with Java 11 so if you store
> > your indexes in hdfs you'll have to build your own Docker image as Hadoop
> > is not yet certified with Java 11 (or use an older Solr version image
> built
> > with Java 8)
> > - As Docker will be responsible for quite a few Solr nodes it becomes
> > important to make sure the Docker daemon is configured in systemctl to
> > restart after failure or reboot of the host. Additionally the Docker
> > restart=always setting is useful for restarting failed containers
> > automatically if a single container dies (i.e. JVM explosions). I've
> > deliberately blown up the JVM in test conditions and found the
> > containers/Solr recover really well under Docker.
> > - I use Docker Compose to spin up our environment and it has been
> > excellent for maintaining consistent settings across Solr nodes and
> hosts.
> > Additionally using a .env file makes most of the Solr environment
> variables
> > per node configurable in an external file.
> > - I'd recommend Docker Swarm if you plan on running Solr over multiple
> > physical hosts. Unfortunately we had an incompatible OS so I was unable
> to
> > utilise this approach. The same incompatibility existed for K8s but
> > Lucidworks has another great article on this approach if you're more
> > fortunate with your environment than us
> > https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
> > - Our Solr instances are TLS secured and use the basic auth plugin and
> > rules based authentication provider. There's nothing I have not been able
> > to configure with the default Docker images using environment variables
> > passed into the container. This makes upgrades to Solr versions really
> easy
> > as you just need to grab the image and pass in your environment details
> to
> > the container for any new Solr version.
> > - If possible I'd start with the Solr 8 Docker image. The project
> > underwent a large refactor to align it with the install script based on
> > community feedback. If you start with an earlier version you'll need to
> > refactor when you eventually move to Solr version 8. The Solr Docker page
> > has more details on this.
> > - Matijn Koster (the project lead) is excellent and very responsive to
> > questions on the project page. Read through the q&a page before reaching
> > out I found a lot of my questions already answered there.  Additionally,
> he
> > provides a number of example Docker configurations from command line
> > parameters to docker-compose files running multiple instances and
> zookeeper
> > quarums.
> > - The Docker extra hosts parameter is useful for adding extra hosts to
> > your containers hosts file particularly if you have multiple nic cards
> with
> > internal and external interfaces and you want to force communication
> over a
> > specific one.
> > - We use the Solr Prometheus exporter to collect node metrics. I've found
> > I've needed to reduce the metrics to collect as having this many nodes
> > overwhelmed it occasionally. From memory it had something to do with
> > concurrent modification of Future objects the collector users and it
> > sometimes misses collection cycles. This is not Docker related but Solr
> > size related and the exporter's ability to handle it.
> > - We use the zkCli script a lot for updating configsets. As I did not
> want
> > to have to copy them into a container to update them I just download a
> copy
> > of the Solr binaries and use it entirely for this zookeeper script. It's
> > not elegant but a number of our Dev's are not familiar with Docker and
> this
> > was a nice compromise. Another alternative is to just use the rest API to
> > do any configset manipulation.
> > - We load balance all of these nodes to external clients using a haproxy
> > Docker image. This combined with the Docker restart policy and Solr
> > replication and autoscaling capabilities provides a very stable
> environment
> > for us.
> >
> > All in all migrating and running Solr on Docker has been brilliant. It
> was
> > primarily driven by a need to scale our environment vertically on large
> > hardware instances as running 100 nodes on bare metal was too big a
> > maintenance and administrative burden for us with a small Dev and support
> > team. To date it's been very stable and reliable so I would recommend the
> > approach if you are in a similar situation.
> >
> > Thanks,
> >
> > Dwane
> >
> >
> >
> >
> >
> >
> > ________________________________
> > From: Walter Underwood <[hidden email]>
> > Sent: Saturday, 14 December 2019 6:04 PM
> > To: [hidden email] <[hidden email]>
> > Subject: Solr Cloud on Docker?
> >
> > Does anyone have experience running a big Solr Cloud cluster on Docker
> > containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with 36
> > CPU instances. We are running version 6.6.2 right now, but could upgrade.
> >
> > If people have specific things to do or avoid, I’d really appreciate it.
> >
> > I got a couple of responses on the Slack channel, but I’d love more
> > stories from the trenches. This is a direction for our company
> architecture.
> >
> > We have a master/slave cluster (Solr 4.10.4) that is awesome. I can
> > absolutely see running the slaves as containers. For Solr Cloud? Makes me
> > nervous.
> >
> > wunder
> > Walter Underwood
> > [hidden email]
> > http://observer.wunderwood.org/  (my blog)
> >
> >
>


--
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com
Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud on Docker?

Dwane Hall
Hey Dominique,

From a memory management perspective I don't do any container resource limiting specifically in Docker (although as you mention you certainly can).  In our circumstances these hosts are used specifically for Solr so I planned and tested my capacity beforehand. We have ~768G of RAM on each of these 5 hosts so with 20x16G heaps we had ~320G of heap being used by Solr, some overhead for Docker and the other OS services leaving ~400G for the OS cache and whatever wants to grab it on each host. Not everyone will have servers this large which is why we really had to take advantage of multiple Solr instances/host and Docker became important for our cluster operation management.  Our disk's are not SSD's either and all instances write to the same raid 5 spinner which is bind mounted to the containers.  With this configuration we've been able to achieve consistent median response times of under 500ms across the largest collection but obviously query type varies this (no terms, leading wildcards etc.).  Our QPS is not huge ranging from 2-20/sec but if we need to scale further or speed up response times there's certainly wins that can be made at a disk level.  For our current circumstances we're very content with the deployment.

In not sure if you've read Toke's blog on his experiences at the Royal Danish Library but I found it really useful when capacity planning and recommend reading it (https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/).

As always it's recommend to test for your own conditions and best of luck with your deployment!

Dwane

________________________________
From: Scott Stults <[hidden email]>
Sent: Thursday, 30 January 2020 1:45 AM
To: [hidden email] <[hidden email]>
Subject: Re: Solr Cloud on Docker?

One of our clients has been running a big Solr Cloud (100-ish nodes, TB
index, billions of docs) in kubernetes for over a year and it's been
wonderful. I think during that time the biggest scrapes we got were when we
ran out of disk space. Performance and reliability has been solid
otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
avoided if you do your Docker orchestration through kubernetes.


k/r,
Scott

On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean <[hidden email]>
wrote:

> Hi  Dwane,
>
> Thank you for sharing this great solr/docker user story.
>
> According to your Solr/JVM memory requirements (Heap size + MetaSpace +
> OffHeap size) are you specifying specific settings in docker-compose files
> (mem_limit, mem_reservation, mem_swappiness, ...) ?
> I suppose you are limiting total memory used by all dockerised Solr in
> order to keep free memory on host for MMAPDirectory ?
>
> In short can you explain the memory management ?
>
> Regards
>
> Dominique
>
>
>
>
> Le lun. 23 déc. 2019 à 00:17, Dwane Hall <[hidden email]> a écrit :
>
> > Hey Walter,
> >
> > I recently migrated our Solr cluster to Docker and am very pleased I did
> > so. We run relativity large servers and run multiple Solr instances per
> > physical host and having managed Solr upgrades on bare metal installs
> since
> > Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In
> our
> > case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> > instances. Here I host 3 collections of varying size. The first contains
> > 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b (30
> > shards) all with 2 NRT replicas. The docs are primarily database sourced
> > but are not tiny by any means.
> >
> > Here are some of my comments from our migration journey:
> > - Running Solr on Docker should be no different to bare metal. You still
> > need to test for your environment and conditions and follow the guides
> and
> > best practices outlined in the excellent Lucidworks blog post
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > .
> > - The recent Solr Docker images are built with Java 11 so if you store
> > your indexes in hdfs you'll have to build your own Docker image as Hadoop
> > is not yet certified with Java 11 (or use an older Solr version image
> built
> > with Java 8)
> > - As Docker will be responsible for quite a few Solr nodes it becomes
> > important to make sure the Docker daemon is configured in systemctl to
> > restart after failure or reboot of the host. Additionally the Docker
> > restart=always setting is useful for restarting failed containers
> > automatically if a single container dies (i.e. JVM explosions). I've
> > deliberately blown up the JVM in test conditions and found the
> > containers/Solr recover really well under Docker.
> > - I use Docker Compose to spin up our environment and it has been
> > excellent for maintaining consistent settings across Solr nodes and
> hosts.
> > Additionally using a .env file makes most of the Solr environment
> variables
> > per node configurable in an external file.
> > - I'd recommend Docker Swarm if you plan on running Solr over multiple
> > physical hosts. Unfortunately we had an incompatible OS so I was unable
> to
> > utilise this approach. The same incompatibility existed for K8s but
> > Lucidworks has another great article on this approach if you're more
> > fortunate with your environment than us
> > https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
> > - Our Solr instances are TLS secured and use the basic auth plugin and
> > rules based authentication provider. There's nothing I have not been able
> > to configure with the default Docker images using environment variables
> > passed into the container. This makes upgrades to Solr versions really
> easy
> > as you just need to grab the image and pass in your environment details
> to
> > the container for any new Solr version.
> > - If possible I'd start with the Solr 8 Docker image. The project
> > underwent a large refactor to align it with the install script based on
> > community feedback. If you start with an earlier version you'll need to
> > refactor when you eventually move to Solr version 8. The Solr Docker page
> > has more details on this.
> > - Matijn Koster (the project lead) is excellent and very responsive to
> > questions on the project page. Read through the q&a page before reaching
> > out I found a lot of my questions already answered there.  Additionally,
> he
> > provides a number of example Docker configurations from command line
> > parameters to docker-compose files running multiple instances and
> zookeeper
> > quarums.
> > - The Docker extra hosts parameter is useful for adding extra hosts to
> > your containers hosts file particularly if you have multiple nic cards
> with
> > internal and external interfaces and you want to force communication
> over a
> > specific one.
> > - We use the Solr Prometheus exporter to collect node metrics. I've found
> > I've needed to reduce the metrics to collect as having this many nodes
> > overwhelmed it occasionally. From memory it had something to do with
> > concurrent modification of Future objects the collector users and it
> > sometimes misses collection cycles. This is not Docker related but Solr
> > size related and the exporter's ability to handle it.
> > - We use the zkCli script a lot for updating configsets. As I did not
> want
> > to have to copy them into a container to update them I just download a
> copy
> > of the Solr binaries and use it entirely for this zookeeper script. It's
> > not elegant but a number of our Dev's are not familiar with Docker and
> this
> > was a nice compromise. Another alternative is to just use the rest API to
> > do any configset manipulation.
> > - We load balance all of these nodes to external clients using a haproxy
> > Docker image. This combined with the Docker restart policy and Solr
> > replication and autoscaling capabilities provides a very stable
> environment
> > for us.
> >
> > All in all migrating and running Solr on Docker has been brilliant. It
> was
> > primarily driven by a need to scale our environment vertically on large
> > hardware instances as running 100 nodes on bare metal was too big a
> > maintenance and administrative burden for us with a small Dev and support
> > team. To date it's been very stable and reliable so I would recommend the
> > approach if you are in a similar situation.
> >
> > Thanks,
> >
> > Dwane
> >
> >
> >
> >
> >
> >
> > ________________________________
> > From: Walter Underwood <[hidden email]>
> > Sent: Saturday, 14 December 2019 6:04 PM
> > To: [hidden email] <[hidden email]>
> > Subject: Solr Cloud on Docker?
> >
> > Does anyone have experience running a big Solr Cloud cluster on Docker
> > containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with 36
> > CPU instances. We are running version 6.6.2 right now, but could upgrade.
> >
> > If people have specific things to do or avoid, I’d really appreciate it.
> >
> > I got a couple of responses on the Slack channel, but I’d love more
> > stories from the trenches. This is a direction for our company
> architecture.
> >
> > We have a master/slave cluster (Solr 4.10.4) that is awesome. I can
> > absolutely see running the slaves as containers. For Solr Cloud? Makes me
> > nervous.
> >
> > wunder
> > Walter Underwood
> > [hidden email]
> > http://observer.wunderwood.org/  (my blog)
> >
> >
>


--
Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
| 434.409.2780
http://www.opensourceconnections.com
Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud on Docker?

Dominique Bejean
Thank you Dwane. Great info :)


Le mer. 5 févr. 2020 à 11:49, Dwane Hall <[hidden email]> a écrit :

> Hey Dominique,
>
> From a memory management perspective I don't do any container resource
> limiting specifically in Docker (although as you mention you certainly
> can).  In our circumstances these hosts are used specifically for Solr so I
> planned and tested my capacity beforehand. We have ~768G of RAM on each of
> these 5 hosts so with 20x16G heaps we had ~320G of heap being used by Solr,
> some overhead for Docker and the other OS services leaving ~400G for the OS
> cache and whatever wants to grab it on each host. Not everyone will have
> servers this large which is why we really had to take advantage of multiple
> Solr instances/host and Docker became important for our cluster operation
> management.  Our disk's are not SSD's either and all instances write to the
> same raid 5 spinner which is bind mounted to the containers.  With this
> configuration we've been able to achieve consistent median response times
> of under 500ms across the largest collection but obviously query type
> varies this (no terms, leading wildcards etc.).  Our QPS is not huge
> ranging from 2-20/sec but if we need to scale further or speed up response
> times there's certainly wins that can be made at a disk level.  For our
> current circumstances we're very content with the deployment.
>
> In not sure if you've read Toke's blog on his experiences at the Royal
> Danish Library but I found it really useful when capacity planning and
> recommend reading it (
> https://sbdevel.wordpress.com/2016/11/30/70tb-16b-docs-4-machines-1-solrcloud/
> ).
>
> As always it's recommend to test for your own conditions and best of luck
> with your deployment!
>
> Dwane
>
> ------------------------------
> *From:* Scott Stults <[hidden email]>
> *Sent:* Thursday, 30 January 2020 1:45 AM
> *To:* [hidden email] <[hidden email]>
> *Subject:* Re: Solr Cloud on Docker?
>
> One of our clients has been running a big Solr Cloud (100-ish nodes, TB
> index, billions of docs) in kubernetes for over a year and it's been
> wonderful. I think during that time the biggest scrapes we got were when we
> ran out of disk space. Performance and reliability has been solid
> otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
> avoided if you do your Docker orchestration through kubernetes.
>
>
> k/r,
> Scott
>
> On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean <
> [hidden email]>
> wrote:
>
> > Hi  Dwane,
> >
> > Thank you for sharing this great solr/docker user story.
> >
> > According to your Solr/JVM memory requirements (Heap size + MetaSpace +
> > OffHeap size) are you specifying specific settings in docker-compose
> files
> > (mem_limit, mem_reservation, mem_swappiness, ...) ?
> > I suppose you are limiting total memory used by all dockerised Solr in
> > order to keep free memory on host for MMAPDirectory ?
> >
> > In short can you explain the memory management ?
> >
> > Regards
> >
> > Dominique
> >
> >
> >
> >
> > Le lun. 23 déc. 2019 à 00:17, Dwane Hall <[hidden email]> a
> écrit :
> >
> > > Hey Walter,
> > >
> > > I recently migrated our Solr cluster to Docker and am very pleased I
> did
> > > so. We run relativity large servers and run multiple Solr instances per
> > > physical host and having managed Solr upgrades on bare metal installs
> > since
> > > Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In
> > our
> > > case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> > > instances. Here I host 3 collections of varying size. The first
> contains
> > > 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b
> (30
> > > shards) all with 2 NRT replicas. The docs are primarily database
> sourced
> > > but are not tiny by any means.
> > >
> > > Here are some of my comments from our migration journey:
> > > - Running Solr on Docker should be no different to bare metal. You
> still
> > > need to test for your environment and conditions and follow the guides
> > and
> > > best practices outlined in the excellent Lucidworks blog post
> > >
> >
> https://lucidworks.com/post/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
> > > .
> > > - The recent Solr Docker images are built with Java 11 so if you store
> > > your indexes in hdfs you'll have to build your own Docker image as
> Hadoop
> > > is not yet certified with Java 11 (or use an older Solr version image
> > built
> > > with Java 8)
> > > - As Docker will be responsible for quite a few Solr nodes it becomes
> > > important to make sure the Docker daemon is configured in systemctl to
> > > restart after failure or reboot of the host. Additionally the Docker
> > > restart=always setting is useful for restarting failed containers
> > > automatically if a single container dies (i.e. JVM explosions). I've
> > > deliberately blown up the JVM in test conditions and found the
> > > containers/Solr recover really well under Docker.
> > > - I use Docker Compose to spin up our environment and it has been
> > > excellent for maintaining consistent settings across Solr nodes and
> > hosts.
> > > Additionally using a .env file makes most of the Solr environment
> > variables
> > > per node configurable in an external file.
> > > - I'd recommend Docker Swarm if you plan on running Solr over multiple
> > > physical hosts. Unfortunately we had an incompatible OS so I was unable
> > to
> > > utilise this approach. The same incompatibility existed for K8s but
> > > Lucidworks has another great article on this approach if you're more
> > > fortunate with your environment than us
> > > https://lucidworks.com/post/running-solr-on-kubernetes-part-1/.
> > > - Our Solr instances are TLS secured and use the basic auth plugin and
> > > rules based authentication provider. There's nothing I have not been
> able
> > > to configure with the default Docker images using environment variables
> > > passed into the container. This makes upgrades to Solr versions really
> > easy
> > > as you just need to grab the image and pass in your environment details
> > to
> > > the container for any new Solr version.
> > > - If possible I'd start with the Solr 8 Docker image. The project
> > > underwent a large refactor to align it with the install script based on
> > > community feedback. If you start with an earlier version you'll need to
> > > refactor when you eventually move to Solr version 8. The Solr Docker
> page
> > > has more details on this.
> > > - Matijn Koster (the project lead) is excellent and very responsive to
> > > questions on the project page. Read through the q&a page before
> reaching
> > > out I found a lot of my questions already answered there.
> Additionally,
> > he
> > > provides a number of example Docker configurations from command line
> > > parameters to docker-compose files running multiple instances and
> > zookeeper
> > > quarums.
> > > - The Docker extra hosts parameter is useful for adding extra hosts to
> > > your containers hosts file particularly if you have multiple nic cards
> > with
> > > internal and external interfaces and you want to force communication
> > over a
> > > specific one.
> > > - We use the Solr Prometheus exporter to collect node metrics. I've
> found
> > > I've needed to reduce the metrics to collect as having this many nodes
> > > overwhelmed it occasionally. From memory it had something to do with
> > > concurrent modification of Future objects the collector users and it
> > > sometimes misses collection cycles. This is not Docker related but Solr
> > > size related and the exporter's ability to handle it.
> > > - We use the zkCli script a lot for updating configsets. As I did not
> > want
> > > to have to copy them into a container to update them I just download a
> > copy
> > > of the Solr binaries and use it entirely for this zookeeper script.
> It's
> > > not elegant but a number of our Dev's are not familiar with Docker and
> > this
> > > was a nice compromise. Another alternative is to just use the rest API
> to
> > > do any configset manipulation.
> > > - We load balance all of these nodes to external clients using a
> haproxy
> > > Docker image. This combined with the Docker restart policy and Solr
> > > replication and autoscaling capabilities provides a very stable
> > environment
> > > for us.
> > >
> > > All in all migrating and running Solr on Docker has been brilliant. It
> > was
> > > primarily driven by a need to scale our environment vertically on large
> > > hardware instances as running 100 nodes on bare metal was too big a
> > > maintenance and administrative burden for us with a small Dev and
> support
> > > team. To date it's been very stable and reliable so I would recommend
> the
> > > approach if you are in a similar situation.
> > >
> > > Thanks,
> > >
> > > Dwane
> > >
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Walter Underwood <[hidden email]>
> > > Sent: Saturday, 14 December 2019 6:04 PM
> > > To: [hidden email] <[hidden email]>
> > > Subject: Solr Cloud on Docker?
> > >
> > > Does anyone have experience running a big Solr Cloud cluster on Docker
> > > containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with
> 36
> > > CPU instances. We are running version 6.6.2 right now, but could
> upgrade.
> > >
> > > If people have specific things to do or avoid, I’d really appreciate
> it.
> > >
> > > I got a couple of responses on the Slack channel, but I’d love more
> > > stories from the trenches. This is a direction for our company
> > architecture.
> > >
> > > We have a master/slave cluster (Solr 4.10.4) that is awesome. I can
> > > absolutely see running the slaves as containers. For Solr Cloud? Makes
> me
> > > nervous.
> > >
> > > wunder
> > > Walter Underwood
> > > [hidden email]
> > > http://observer.wunderwood.org/  (my blog)
> > >
> > >
> >
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> http://www.opensourceconnections.com
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr Cloud on Docker?

Karl Stoney
Nothing much to add to the below apart from we also successfully run solr on kubernetes.  It took some implementation effort but we're now at a point where we can do `kubectl scale --replicas=x statefulset/solr` and increase capacity in minutes with solr's autoscaling taking care of the new shard creation.

Very happy.
________________________________
From: Dominique Bejean <[hidden email]>
Sent: 05 February 2020 17:53
To: Dwane Hall <[hidden email]>
Cc: Scott Stults <[hidden email]>; [hidden email] <[hidden email]>
Subject: Re: Solr Cloud on Docker?

Thank you Dwane. Great info :)


Le mer. 5 févr. 2020 à 11:49, Dwane Hall <[hidden email]> a écrit :

> Hey Dominique,
>
> From a memory management perspective I don't do any container resource
> limiting specifically in Docker (although as you mention you certainly
> can).  In our circumstances these hosts are used specifically for Solr so I
> planned and tested my capacity beforehand. We have ~768G of RAM on each of
> these 5 hosts so with 20x16G heaps we had ~320G of heap being used by Solr,
> some overhead for Docker and the other OS services leaving ~400G for the OS
> cache and whatever wants to grab it on each host. Not everyone will have
> servers this large which is why we really had to take advantage of multiple
> Solr instances/host and Docker became important for our cluster operation
> management.  Our disk's are not SSD's either and all instances write to the
> same raid 5 spinner which is bind mounted to the containers.  With this
> configuration we've been able to achieve consistent median response times
> of under 500ms across the largest collection but obviously query type
> varies this (no terms, leading wildcards etc.).  Our QPS is not huge
> ranging from 2-20/sec but if we need to scale further or speed up response
> times there's certainly wins that can be made at a disk level.  For our
> current circumstances we're very content with the deployment.
>
> In not sure if you've read Toke's blog on his experiences at the Royal
> Danish Library but I found it really useful when capacity planning and
> recommend reading it (
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fsbdevel.wordpress.com%2F2016%2F11%2F30%2F70tb-16b-docs-4-machines-1-solrcloud%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C551dd53ab648462d6ae008d7aa6463d4%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165220483158911&amp;sdata=LXYgh3kUAo4X4mDbDIqhJO%2B%2BR3FdrTxci3sNw%2Frm0sc%3D&amp;reserved=0
> ).
>
> As always it's recommend to test for your own conditions and best of luck
> with your deployment!
>
> Dwane
>
> ------------------------------
> *From:* Scott Stults <[hidden email]>
> *Sent:* Thursday, 30 January 2020 1:45 AM
> *To:* [hidden email] <[hidden email]>
> *Subject:* Re: Solr Cloud on Docker?
>
> One of our clients has been running a big Solr Cloud (100-ish nodes, TB
> index, billions of docs) in kubernetes for over a year and it's been
> wonderful. I think during that time the biggest scrapes we got were when we
> ran out of disk space. Performance and reliability has been solid
> otherwise. Like Dwane alluded to, a lot of operations pitfalls can be
> avoided if you do your Docker orchestration through kubernetes.
>
>
> k/r,
> Scott
>
> On Tue, Jan 28, 2020 at 3:34 AM Dominique Bejean <
> [hidden email]>
> wrote:
>
> > Hi  Dwane,
> >
> > Thank you for sharing this great solr/docker user story.
> >
> > According to your Solr/JVM memory requirements (Heap size + MetaSpace +
> > OffHeap size) are you specifying specific settings in docker-compose
> files
> > (mem_limit, mem_reservation, mem_swappiness, ...) ?
> > I suppose you are limiting total memory used by all dockerised Solr in
> > order to keep free memory on host for MMAPDirectory ?
> >
> > In short can you explain the memory management ?
> >
> > Regards
> >
> > Dominique
> >
> >
> >
> >
> > Le lun. 23 déc. 2019 à 00:17, Dwane Hall <[hidden email]> a
> écrit :
> >
> > > Hey Walter,
> > >
> > > I recently migrated our Solr cluster to Docker and am very pleased I
> did
> > > so. We run relativity large servers and run multiple Solr instances per
> > > physical host and having managed Solr upgrades on bare metal installs
> > since
> > > Solr 5, containerisation has been a blessing (currently Solr 7.7.2). In
> > our
> > > case we run 20 Solr nodes per host over 5 hosts totalling 100 Solr
> > > instances. Here I host 3 collections of varying size. The first
> contains
> > > 60m docs (8 shards), the second 360m (12 shards) , and the third 1.3b
> (30
> > > shards) all with 2 NRT replicas. The docs are primarily database
> sourced
> > > but are not tiny by any means.
> > >
> > > Here are some of my comments from our migration journey:
> > > - Running Solr on Docker should be no different to bare metal. You
> still
> > > need to test for your environment and conditions and follow the guides
> > and
> > > best practices outlined in the excellent Lucidworks blog post
> > >
> >
> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucidworks.com%2Fpost%2Fsizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C551dd53ab648462d6ae008d7aa6463d4%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165220483158911&amp;sdata=vv8O7U9gxwYV585OUQIH8faaldabL1ENWxmeH11EOdA%3D&amp;reserved=0
> > > .
> > > - The recent Solr Docker images are built with Java 11 so if you store
> > > your indexes in hdfs you'll have to build your own Docker image as
> Hadoop
> > > is not yet certified with Java 11 (or use an older Solr version image
> > built
> > > with Java 8)
> > > - As Docker will be responsible for quite a few Solr nodes it becomes
> > > important to make sure the Docker daemon is configured in systemctl to
> > > restart after failure or reboot of the host. Additionally the Docker
> > > restart=always setting is useful for restarting failed containers
> > > automatically if a single container dies (i.e. JVM explosions). I've
> > > deliberately blown up the JVM in test conditions and found the
> > > containers/Solr recover really well under Docker.
> > > - I use Docker Compose to spin up our environment and it has been
> > > excellent for maintaining consistent settings across Solr nodes and
> > hosts.
> > > Additionally using a .env file makes most of the Solr environment
> > variables
> > > per node configurable in an external file.
> > > - I'd recommend Docker Swarm if you plan on running Solr over multiple
> > > physical hosts. Unfortunately we had an incompatible OS so I was unable
> > to
> > > utilise this approach. The same incompatibility existed for K8s but
> > > Lucidworks has another great article on this approach if you're more
> > > fortunate with your environment than us
> > > https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Flucidworks.com%2Fpost%2Frunning-solr-on-kubernetes-part-1%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C551dd53ab648462d6ae008d7aa6463d4%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165220483158911&amp;sdata=AuW6mmHOpIt6xi%2BDUdl0pV2rSbQ2XHwQte%2FgHIzqZ68%3D&amp;reserved=0.
> > > - Our Solr instances are TLS secured and use the basic auth plugin and
> > > rules based authentication provider. There's nothing I have not been
> able
> > > to configure with the default Docker images using environment variables
> > > passed into the container. This makes upgrades to Solr versions really
> > easy
> > > as you just need to grab the image and pass in your environment details
> > to
> > > the container for any new Solr version.
> > > - If possible I'd start with the Solr 8 Docker image. The project
> > > underwent a large refactor to align it with the install script based on
> > > community feedback. If you start with an earlier version you'll need to
> > > refactor when you eventually move to Solr version 8. The Solr Docker
> page
> > > has more details on this.
> > > - Matijn Koster (the project lead) is excellent and very responsive to
> > > questions on the project page. Read through the q&a page before
> reaching
> > > out I found a lot of my questions already answered there.
> Additionally,
> > he
> > > provides a number of example Docker configurations from command line
> > > parameters to docker-compose files running multiple instances and
> > zookeeper
> > > quarums.
> > > - The Docker extra hosts parameter is useful for adding extra hosts to
> > > your containers hosts file particularly if you have multiple nic cards
> > with
> > > internal and external interfaces and you want to force communication
> > over a
> > > specific one.
> > > - We use the Solr Prometheus exporter to collect node metrics. I've
> found
> > > I've needed to reduce the metrics to collect as having this many nodes
> > > overwhelmed it occasionally. From memory it had something to do with
> > > concurrent modification of Future objects the collector users and it
> > > sometimes misses collection cycles. This is not Docker related but Solr
> > > size related and the exporter's ability to handle it.
> > > - We use the zkCli script a lot for updating configsets. As I did not
> > want
> > > to have to copy them into a container to update them I just download a
> > copy
> > > of the Solr binaries and use it entirely for this zookeeper script.
> It's
> > > not elegant but a number of our Dev's are not familiar with Docker and
> > this
> > > was a nice compromise. Another alternative is to just use the rest API
> to
> > > do any configset manipulation.
> > > - We load balance all of these nodes to external clients using a
> haproxy
> > > Docker image. This combined with the Docker restart policy and Solr
> > > replication and autoscaling capabilities provides a very stable
> > environment
> > > for us.
> > >
> > > All in all migrating and running Solr on Docker has been brilliant. It
> > was
> > > primarily driven by a need to scale our environment vertically on large
> > > hardware instances as running 100 nodes on bare metal was too big a
> > > maintenance and administrative burden for us with a small Dev and
> support
> > > team. To date it's been very stable and reliable so I would recommend
> the
> > > approach if you are in a similar situation.
> > >
> > > Thanks,
> > >
> > > Dwane
> > >
> > >
> > >
> > >
> > >
> > >
> > > ________________________________
> > > From: Walter Underwood <[hidden email]>
> > > Sent: Saturday, 14 December 2019 6:04 PM
> > > To: [hidden email] <[hidden email]>
> > > Subject: Solr Cloud on Docker?
> > >
> > > Does anyone have experience running a big Solr Cloud cluster on Docker
> > > containers? By “big”, I mean 35 million docs, 40 nodes, 8 shards, with
> 36
> > > CPU instances. We are running version 6.6.2 right now, but could
> upgrade.
> > >
> > > If people have specific things to do or avoid, I’d really appreciate
> it.
> > >
> > > I got a couple of responses on the Slack channel, but I’d love more
> > > stories from the trenches. This is a direction for our company
> > architecture.
> > >
> > > We have a master/slave cluster (Solr 4.10.4) that is awesome. I can
> > > absolutely see running the slaves as containers. For Solr Cloud? Makes
> me
> > > nervous.
> > >
> > > wunder
> > > Walter Underwood
> > > [hidden email]
> > > https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fobserver.wunderwood.org%2F&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C551dd53ab648462d6ae008d7aa6463d4%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165220483168865&amp;sdata=F88VxpEaVhHa%2FzKjtwYUxV7iEio8svJ%2ByA5kL6VwRSY%3D&amp;reserved=0  (my blog)
> > >
> > >
> >
>
>
> --
> Scott Stults | Founder & Solutions Architect | OpenSource Connections, LLC
> | 434.409.2780
> https://eur03.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.opensourceconnections.com&amp;data=02%7C01%7Ckarl.stoney%40autotrader.co.uk%7C551dd53ab648462d6ae008d7aa6463d4%7C926f3743f3d24b8a816818cfcbe776fe%7C0%7C0%7C637165220483168865&amp;sdata=Kc%2FCNBMFYfPxS%2Bm%2F8R9RyE3jJvnxvCsFVX%2Fuh52SGLo%3D&amp;reserved=0
>
This e-mail is sent on behalf of Auto Trader Group Plc, Registered Office: 1 Tony Wilson Place, Manchester, Lancashire, M15 4FN (Registered in England No. 9439967). This email and any files transmitted with it are confidential and may be legally privileged, and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This email message has been swept for the presence of computer viruses.