Maximum number of SolrCloud collections in limited hardware resource

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Maximum number of SolrCloud collections in limited hardware resource

Sharif Shahriar
Hi Guys,
We are in a use-case where we need to create a large number of
collections(1000 to 1500) in a SolrCloud. Here most of collections will
have a very limited number of documents(100 to 1000), even some collections
are empty. We are using single shard and 2 replicas.For each replica we
using a machine with 12GB  RAM , 32 GB SSD.

Now the problem is, when we create about 1400 collection(all of them are
empty i.e. no document is added yet) the solr service goes down showing out
of memory exception. We have few questions here-

1. When we are creating collections, each collection is taking about 8 MB
to 12 MB of memory when there is no document yet. Is there any way to
configure SolrCloud in a way that it takes low memory for each collection
initially(like 1MB for each collection), then we would be able to create
1500 collection using about 3GB of machines RAM?

2. Is there any way to clear/flush the cache of SolrCloud, specially from
those collections which we don't access for while(May be we can take those
inactive collections out of memory and load them back when they are needed
again)?

3. Is there any way to collect the Garbage Memory from SolrCloud(may be
created by deleting documents and collections) ?

Our target is without increasing the hardware resources, create maximum
number of collections, and keeping the highly accessed collections &
documents in memory. We'll appreciate your help.





Best Regards,
*Sharif Shahriar Ahmed*
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Shawn Heisey
On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
> Now the problem is, when we create about 1400 collection(all of them are
> empty i.e. no document is added yet) the solr service goes down showing out
> of memory exception. We have few questions here-
>
> 1. When we are creating collections, each collection is taking about 8 MB
> to 12 MB of memory when there is no document yet. Is there any way to
> configure SolrCloud in a way that it takes low memory for each collection
> initially(like 1MB for each collection), then we would be able to create
> 1500 collection using about 3GB of machines RAM?

Solr doesn't dictate how much memory it allocates for a collection.  It
allocates what it needs, and if the heap size is too small for that,
then you get OOME.

You're going to need a lot more than two Solr servers to handle that
many collections, and they're going to need more than 12GB of memory. 
You should already have at least three servers in your setup, because
ZooKeeper requires three servers for redundancy.

http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup

Handling a large number of collections is one area where SolrCloud needs
improvement.  Work is constantly happening towards this goal, but it's a
very complex piece of software, so making design changes is not trivial.

> 2. Is there any way to clear/flush the cache of SolrCloud, specially from
> those collections which we don't access for while(May be we can take those
> inactive collections out of memory and load them back when they are needed
> again)?

Unfortunately the functionality that allows index cores to be unloaded
(which we have colloquially called "LotsOfCores") does not work when
Solr is running in SolrCloud mode.SolrCloud functionality would break if
its cores get unloaded.  It would take a fair amount of development
effort to allow the two features to work together.

> 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
> created by deleting documents and collections) ?

Java handles garbage collection automatically.  It's possible to
explicitly ask the system to collect garbage, but any good programming
guide for Java will recommend that programmers should NOT explicitly
trigger GC.  While it might be possible for Solr's memory usage to
become more efficient through development effort, it's already pretty
good.  To our knowledge, Solr does not currently have any memory leak
bugs, and if any are found, they are taken seriously and fixed as fast
as we can fix them.

> Our target is without increasing the hardware resources, create maximum
> number of collections, and keeping the highly accessed collections &
> documents in memory. We'll appreciate your help.

That goal will require a fair amount of hardware.  You may have no
choice but to increase your hardware resources.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Emir Arnautović
Hi,
It is probably the best if you merge some of your collections (or all) and have discriminator field that will be used to filter out tenant’s documents only. In case you go with multiple collections serving multiple tenants, you would have to have logic on top of it to resolve tenant to collection. Unfortunately, Solr does not have alias with filtering like ES that would come handy in such cases.
If you stick with multiple collections, you can turn off caches completely, monitor latency and turn on caches for collections when it is reaching some threshold.
Caches are invalidated on commit, so submitting dummy doc and committing should invalidate caches. Alternative is to reload collection.

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 27 Jun 2018, at 14:46, Shawn Heisey <[hidden email]> wrote:
>
> On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
>> Now the problem is, when we create about 1400 collection(all of them are
>> empty i.e. no document is added yet) the solr service goes down showing out
>> of memory exception. We have few questions here-
>>
>> 1. When we are creating collections, each collection is taking about 8 MB
>> to 12 MB of memory when there is no document yet. Is there any way to
>> configure SolrCloud in a way that it takes low memory for each collection
>> initially(like 1MB for each collection), then we would be able to create
>> 1500 collection using about 3GB of machines RAM?
>
> Solr doesn't dictate how much memory it allocates for a collection.  It allocates what it needs, and if the heap size is too small for that, then you get OOME.
>
> You're going to need a lot more than two Solr servers to handle that many collections, and they're going to need more than 12GB of memory.  You should already have at least three servers in your setup, because ZooKeeper requires three servers for redundancy.
>
> http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup
>
> Handling a large number of collections is one area where SolrCloud needs improvement.  Work is constantly happening towards this goal, but it's a very complex piece of software, so making design changes is not trivial.
>
>> 2. Is there any way to clear/flush the cache of SolrCloud, specially from
>> those collections which we don't access for while(May be we can take those
>> inactive collections out of memory and load them back when they are needed
>> again)?
>
> Unfortunately the functionality that allows index cores to be unloaded (which we have colloquially called "LotsOfCores") does not work when Solr is running in SolrCloud mode.SolrCloud functionality would break if its cores get unloaded.  It would take a fair amount of development effort to allow the two features to work together.
>
>> 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
>> created by deleting documents and collections) ?
>
> Java handles garbage collection automatically.  It's possible to explicitly ask the system to collect garbage, but any good programming guide for Java will recommend that programmers should NOT explicitly trigger GC.  While it might be possible for Solr's memory usage to become more efficient through development effort, it's already pretty good.  To our knowledge, Solr does not currently have any memory leak bugs, and if any are found, they are taken seriously and fixed as fast as we can fix them.
>
>> Our target is without increasing the hardware resources, create maximum
>> number of collections, and keeping the highly accessed collections &
>> documents in memory. We'll appreciate your help.
>
> That goal will require a fair amount of hardware.  You may have no choice but to increase your hardware resources.
>
> Thanks,
> Shawn
>

Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Sharif Shahriar
Hi Emir,
Thanks a lot for your reply. In your reply you've mentioned-
If you stick with multiple collections, you can turn off caches completely,
monitor latency and turn on caches for collections when it is reaching some
threshold.

-How this can be done? Is there any configuration to turn off caches
completely in SolrCloud?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Erick Erickson
Just set the size parameter in solrconfig.xml to 0.

Best,
Erick

On Wed, Jul 4, 2018 at 10:37 PM, Sharif Shahriar <[hidden email]> wrote:

> Hi Emir,
> Thanks a lot for your reply. In your reply you've mentioned-
> If you stick with multiple collections, you can turn off caches completely,
> monitor latency and turn on caches for collections when it is reaching some
> threshold.
>
> -How this can be done? Is there any configuration to turn off caches
> completely in SolrCloud?
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Alexandre Rafalovitch
In reply to this post by Shawn Heisey
Does it need to be a SolrCloud? If it is just replication, maybe it can
just be double indexed from the client. Or old style replication. And then
use LotsOfCores autoloading.

Regards,
    Alex

On Wed, Jun 27, 2018, 8:46 AM Shawn Heisey, <[hidden email]> wrote:

> On 6/27/2018 5:10 AM, Sharif Shahrair wrote:
> > Now the problem is, when we create about 1400 collection(all of them are
> > empty i.e. no document is added yet) the solr service goes down showing
> out
> > of memory exception. We have few questions here-
> >
> > 1. When we are creating collections, each collection is taking about 8 MB
> > to 12 MB of memory when there is no document yet. Is there any way to
> > configure SolrCloud in a way that it takes low memory for each collection
> > initially(like 1MB for each collection), then we would be able to create
> > 1500 collection using about 3GB of machines RAM?
>
> Solr doesn't dictate how much memory it allocates for a collection.  It
> allocates what it needs, and if the heap size is too small for that,
> then you get OOME.
>
> You're going to need a lot more than two Solr servers to handle that
> many collections, and they're going to need more than 12GB of memory.
> You should already have at least three servers in your setup, because
> ZooKeeper requires three servers for redundancy.
>
>
> http://zookeeper.apache.org/doc/r3.4.12/zookeeperAdmin.html#sc_zkMulitServerSetup
>
> Handling a large number of collections is one area where SolrCloud needs
> improvement.  Work is constantly happening towards this goal, but it's a
> very complex piece of software, so making design changes is not trivial.
>
> > 2. Is there any way to clear/flush the cache of SolrCloud, specially from
> > those collections which we don't access for while(May be we can take
> those
> > inactive collections out of memory and load them back when they are
> needed
> > again)?
>
> Unfortunately the functionality that allows index cores to be unloaded
> (which we have colloquially called "LotsOfCores") does not work when
> Solr is running in SolrCloud mode.SolrCloud functionality would break if
> its cores get unloaded.  It would take a fair amount of development
> effort to allow the two features to work together.
>
> > 3. Is there any way to collect the Garbage Memory from SolrCloud(may be
> > created by deleting documents and collections) ?
>
> Java handles garbage collection automatically.  It's possible to
> explicitly ask the system to collect garbage, but any good programming
> guide for Java will recommend that programmers should NOT explicitly
> trigger GC.  While it might be possible for Solr's memory usage to
> become more efficient through development effort, it's already pretty
> good.  To our knowledge, Solr does not currently have any memory leak
> bugs, and if any are found, they are taken seriously and fixed as fast
> as we can fix them.
>
> > Our target is without increasing the hardware resources, create maximum
> > number of collections, and keeping the highly accessed collections &
> > documents in memory. We'll appreciate your help.
>
> That goal will require a fair amount of hardware.  You may have no
> choice but to increase your hardware resources.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Sharif Shahriar
In reply to this post by Erick Erickson
Hi Erick,
Setting the size parameter to 0 in solrconfig.xml can stop document caching,
but it cannot control how much memory it will take initially when creating a
collection, right?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Maximum number of SolrCloud collections in limited hardware resource

Sharif Shahriar
In reply to this post by Shawn Heisey
Thanks a lot Shawn for your details reply.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html