From solr to solr cloud

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

From solr to solr cloud

Vignan Malyala
Hi
I currently have 500 collections in my stand alone solr. Bcoz of day by day
increase in Data, I want to convert it into solr cloud.
Can you suggest me how to do it successfully.
How many shards should be there?
How many nodes should be there?
Are so called nodes different machines i should take?
How many zoo keeper nodes should be there?
Are so called zoo keeper nodes different machines i should take?
Total how many machines i have to take to implement scalable solr cloud?

Plz detail these questions. Any of documents on web aren't clear for
production environments.
Thanks in advance.
Reply | Threaded
Open this post in threaded view
|

Re: From solr to solr cloud

David Hastings
are you noticing performance decreases in stand alone solr as of now?

On Thu, Dec 5, 2019 at 2:29 PM Vignan Malyala <[hidden email]> wrote:

> Hi
> I currently have 500 collections in my stand alone solr. Bcoz of day by day
> increase in Data, I want to convert it into solr cloud.
> Can you suggest me how to do it successfully.
> How many shards should be there?
> How many nodes should be there?
> Are so called nodes different machines i should take?
> How many zoo keeper nodes should be there?
> Are so called zoo keeper nodes different machines i should take?
> Total how many machines i have to take to implement scalable solr cloud?
>
> Plz detail these questions. Any of documents on web aren't clear for
> production environments.
> Thanks in advance.
>
Reply | Threaded
Open this post in threaded view
|

Re: From solr to solr cloud

Paras Lehana
Do you mean 500 cores? Tell us about the data more. How many documents per
core do you have or what performance issues are you facing?

On Fri, 6 Dec 2019 at 01:01, David Hastings <[hidden email]>
wrote:

> are you noticing performance decreases in stand alone solr as of now?
>
> On Thu, Dec 5, 2019 at 2:29 PM Vignan Malyala <[hidden email]>
> wrote:
>
> > Hi
> > I currently have 500 collections in my stand alone solr. Bcoz of day by
> day
> > increase in Data, I want to convert it into solr cloud.
> > Can you suggest me how to do it successfully.
> > How many shards should be there?
> > How many nodes should be there?
> > Are so called nodes different machines i should take?
> > How many zoo keeper nodes should be there?
> > Are so called zoo keeper nodes different machines i should take?
> > Total how many machines i have to take to implement scalable solr cloud?
> >
> > Plz detail these questions. Any of documents on web aren't clear for
> > production environments.
> > Thanks in advance.
> >
>


--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>
Reply | Threaded
Open this post in threaded view
|

Re: From solr to solr cloud

Shawn Heisey-2
In reply to this post by Vignan Malyala
On 12/5/2019 12:28 PM, Vignan Malyala wrote:
> I currently have 500 collections in my stand alone solr. Bcoz of day by day
> increase in Data, I want to convert it into solr cloud.
> Can you suggest me how to do it successfully.
> How many shards should be there?
> How many nodes should be there?
> Are so called nodes different machines i should take?
> How many zoo keeper nodes should be there?
> Are so called zoo keeper nodes different machines i should take?
> Total how many machines i have to take to implement scalable solr cloud?

500 collections is large enough that running it in SolrCloud is likely
to encounter scalability issues.  SolrCloud's design does not do well
with that many collections in the cluster, even if there are a lot of
machines.

There's a lot of comment history on this issue:

https://issues.apache.org/jira/browse/SOLR-7191

Generally speaking, each machine should only house one Solr node,
whether you're running cloud or not.  If each one requires a really huge
heap, it might be worthwhile to split it, but that's the only time I
would do so.  And I would generally prefer to add more machines than to
run multiple Solr nodes on one machine.

One thing you might do, if the way your data is divided will permit it,
is to run multiple SolrCloud clusters.  Multiple clusters can all use
one ZooKeeper ensemble.

ZooKeeper requires a minimum of three machines for fault tolerance.
With 3 or 4 machines in the ensemble, you can survive one machine
failure.  To survive two failures requires at least 5 machines.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: From solr to solr cloud

Vignan Malyala
In reply to this post by Paras Lehana
Yes! 500 collections.
Each collection/core has around 50k to 50L documents/jsons (depending upon
the client). We made one core for each client. Each json has 15 fields.
It already in production as as Solr stand alone server.

We want to use SolrCloud for it now, so as to make it scalable for future
safety. How do I make it possible (obviously with minimum cost)?

On Fri, Dec 6, 2019 at 11:14 AM Paras Lehana <[hidden email]>
wrote:

> Do you mean 500 cores? Tell us about the data more. How many documents per
> core do you have or what performance issues are you facing?
>
> On Fri, 6 Dec 2019 at 01:01, David Hastings <[hidden email]>
> wrote:
>
> > are you noticing performance decreases in stand alone solr as of now?
> >
> > On Thu, Dec 5, 2019 at 2:29 PM Vignan Malyala <[hidden email]>
> > wrote:
> >
> > > Hi
> > > I currently have 500 collections in my stand alone solr. Bcoz of day by
> > day
> > > increase in Data, I want to convert it into solr cloud.
> > > Can you suggest me how to do it successfully.
> > > How many shards should be there?
> > > How many nodes should be there?
> > > Are so called nodes different machines i should take?
> > > How many zoo keeper nodes should be there?
> > > Are so called zoo keeper nodes different machines i should take?
> > > Total how many machines i have to take to implement scalable solr
> cloud?
> > >
> > > Plz detail these questions. Any of documents on web aren't clear for
> > > production environments.
> > > Thanks in advance.
> > >
> >
>
>
> --
> --
> Regards,
>
> *Paras Lehana* [65871]
> Development Engineer, Auto-Suggest,
> IndiaMART Intermesh Ltd.
>
> 8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
> Noida, UP, IN - 201303
>
> Mob.: +91-9560911996
> Work: 01203916600 | Extn:  *8173*
>
> --
> *
> *
>
>  <https://www.facebook.com/IndiaMART/videos/578196442936091/>
>
Reply | Threaded
Open this post in threaded view
|

Re: From solr to solr cloud

Vignan Malyala
In reply to this post by Shawn Heisey-2
Hi Shawn,

Thanks for your response!

Yes! 500 collections.
Each collection/core has around 50k to 50L documents/jsons (depending upon
the client). We made one core for each client. Each json has 15 fields.
It already in production as as Solr stand alone server.
We want to use SolrCloud for it now, so as to make it scalable for future
safety. How do I make it possible?

As per your response, I understood that, I have to create 3 zookeeper
instances and some machines that house 1 solr node each.
Is that the optimized solution? *And how many machines do I need to build
to house solr nodes keeping in mind 500 collections?*

Thanks in advance!

On Fri, Dec 6, 2019 at 11:44 AM Shawn Heisey <[hidden email]> wrote:

> On 12/5/2019 12:28 PM, Vignan Malyala wrote:
> > I currently have 500 collections in my stand alone solr. Bcoz of day by
> day
> > increase in Data, I want to convert it into solr cloud.
> > Can you suggest me how to do it successfully.
> > How many shards should be there?
> > How many nodes should be there?
> > Are so called nodes different machines i should take?
> > How many zoo keeper nodes should be there?
> > Are so called zoo keeper nodes different machines i should take?
> > Total how many machines i have to take to implement scalable solr cloud?
>
> 500 collections is large enough that running it in SolrCloud is likely
> to encounter scalability issues.  SolrCloud's design does not do well
> with that many collections in the cluster, even if there are a lot of
> machines.
>
> There's a lot of comment history on this issue:
>
> https://issues.apache.org/jira/browse/SOLR-7191
>
> Generally speaking, each machine should only house one Solr node,
> whether you're running cloud or not.  If each one requires a really huge
> heap, it might be worthwhile to split it, but that's the only time I
> would do so.  And I would generally prefer to add more machines than to
> run multiple Solr nodes on one machine.
>
> One thing you might do, if the way your data is divided will permit it,
> is to run multiple SolrCloud clusters.  Multiple clusters can all use
> one ZooKeeper ensemble.
>
> ZooKeeper requires a minimum of three machines for fault tolerance.
> With 3 or 4 machines in the ensemble, you can survive one machine
> failure.  To survive two failures requires at least 5 machines.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: From solr to solr cloud

Erick Erickson
Because you use individual collections, you really don’t have to care
about getting it all right up front.

Each collection can be created on a specified set of nodes, see the “createNodeSet”
parameter of the collections API “CREATE” command.

And let’s say you figure out later that you need more hardware and want to move
some of your existing collections to new hardware. Use the MOVEREPLICA API
command.

So say you start out with 1 machine hosting 500 collections.
You get more and more and more clients and your machine gets overloaded. Or
one of your collections grows disproportionately to the others. You spin up a
new machine and MOVEREPLICA for some number of replicas on your
original machine to the new hardware.

Also consider that at some point, it may be desirable to have multiple “pods”.
By that I mean it can get awkward to have thousands of collections hosted on
a single Zookeeper ensemble. Again, because you have individual collections
you can declare one “pod” (Zookeeper + Solr nodes) full and spin up
another one, i.e. totally separate hardware, separate ZK ensembles. The pods
don’t know about each other at all.

Best,
Erick

> On Dec 6, 2019, at 3:12 AM, Vignan Malyala <[hidden email]> wrote:
>
> Hi Shawn,
>
> Thanks for your response!
>
> Yes! 500 collections.
> Each collection/core has around 50k to 50L documents/jsons (depending upon
> the client). We made one core for each client. Each json has 15 fields.
> It already in production as as Solr stand alone server.
> We want to use SolrCloud for it now, so as to make it scalable for future
> safety. How do I make it possible?
>
> As per your response, I understood that, I have to create 3 zookeeper
> instances and some machines that house 1 solr node each.
> Is that the optimized solution? *And how many machines do I need to build
> to house solr nodes keeping in mind 500 collections?*
>
> Thanks in advance!
>
> On Fri, Dec 6, 2019 at 11:44 AM Shawn Heisey <[hidden email]> wrote:
>
>> On 12/5/2019 12:28 PM, Vignan Malyala wrote:
>>> I currently have 500 collections in my stand alone solr. Bcoz of day by
>> day
>>> increase in Data, I want to convert it into solr cloud.
>>> Can you suggest me how to do it successfully.
>>> How many shards should be there?
>>> How many nodes should be there?
>>> Are so called nodes different machines i should take?
>>> How many zoo keeper nodes should be there?
>>> Are so called zoo keeper nodes different machines i should take?
>>> Total how many machines i have to take to implement scalable solr cloud?
>>
>> 500 collections is large enough that running it in SolrCloud is likely
>> to encounter scalability issues.  SolrCloud's design does not do well
>> with that many collections in the cluster, even if there are a lot of
>> machines.
>>
>> There's a lot of comment history on this issue:
>>
>> https://issues.apache.org/jira/browse/SOLR-7191
>>
>> Generally speaking, each machine should only house one Solr node,
>> whether you're running cloud or not.  If each one requires a really huge
>> heap, it might be worthwhile to split it, but that's the only time I
>> would do so.  And I would generally prefer to add more machines than to
>> run multiple Solr nodes on one machine.
>>
>> One thing you might do, if the way your data is divided will permit it,
>> is to run multiple SolrCloud clusters.  Multiple clusters can all use
>> one ZooKeeper ensemble.
>>
>> ZooKeeper requires a minimum of three machines for fault tolerance.
>> With 3 or 4 machines in the ensemble, you can survive one machine
>> failure.  To survive two failures requires at least 5 machines.
>>
>> Thanks,
>> Shawn
>>