Solr server configuration

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr server configuration

Deepak Nair
Hello,

We want to implement Solr 7.x for one of our client with below requirement.


1.       The data to be index will be from 2 Oracle databases with 2 tables each and around 10 columns.

2.       The data volume is expected to be reach around 10 million in each table.

3.       4000+ users will query the indexed data from a UI. The peak load is expected to be around 2000 queries/sec.

4.       The implementation will be on a standalone or clustered Unix environment.

I want to know what should be the best server configuration for this kind of requirement. Eg: how many VMs, what should be the RAM, Heap size etc.

Thanks,
Deepak
Reply | Threaded
Open this post in threaded view
|

Re: Solr server configuration

Erick Erickson
First, it's totally impossible to answer in the abstract, see:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

Second, indexing DB tables directly into Solr is usually the wrong
approach. Solr is not a replacement for a relational DB, it does not
function as a DB, is not optimized for joins etc. It's  a _search
engine_ and does that superlatively.

At the very least, the most common recommendation if you have the
space is to de-normalize the data. My point is you need to think about
this problem in terms of _search_, not "move some tables to Solr and
use Solr like a DB". Which means that even if someone can answer your
questions, it won't help much.

Best,
Erick

On Thu, Jan 11, 2018 at 9:53 PM, Deepak Nair <[hidden email]> wrote:

> Hello,
>
> We want to implement Solr 7.x for one of our client with below requirement.
>
>
> 1.       The data to be index will be from 2 Oracle databases with 2 tables each and around 10 columns.
>
> 2.       The data volume is expected to be reach around 10 million in each table.
>
> 3.       4000+ users will query the indexed data from a UI. The peak load is expected to be around 2000 queries/sec.
>
> 4.       The implementation will be on a standalone or clustered Unix environment.
>
> I want to know what should be the best server configuration for this kind of requirement. Eg: how many VMs, what should be the RAM, Heap size etc.
>
> Thanks,
> Deepak
Reply | Threaded
Open this post in threaded view
|

Re: Solr server configuration

Emir Arnautović
Hi Deepak,
Here is another blog post containing some thought how it can be estimated.

http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html <http://www.od-bits.com/2018/01/solrelasticsearch-capacity-planning.html>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Jan 2018, at 17:08, Erick Erickson <[hidden email]> wrote:
>
> First, it's totally impossible to answer in the abstract, see:
> https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
>
> Second, indexing DB tables directly into Solr is usually the wrong
> approach. Solr is not a replacement for a relational DB, it does not
> function as a DB, is not optimized for joins etc. It's  a _search
> engine_ and does that superlatively.
>
> At the very least, the most common recommendation if you have the
> space is to de-normalize the data. My point is you need to think about
> this problem in terms of _search_, not "move some tables to Solr and
> use Solr like a DB". Which means that even if someone can answer your
> questions, it won't help much.
>
> Best,
> Erick
>
> On Thu, Jan 11, 2018 at 9:53 PM, Deepak Nair <[hidden email]> wrote:
>> Hello,
>>
>> We want to implement Solr 7.x for one of our client with below requirement.
>>
>>
>> 1.       The data to be index will be from 2 Oracle databases with 2 tables each and around 10 columns.
>>
>> 2.       The data volume is expected to be reach around 10 million in each table.
>>
>> 3.       4000+ users will query the indexed data from a UI. The peak load is expected to be around 2000 queries/sec.
>>
>> 4.       The implementation will be on a standalone or clustered Unix environment.
>>
>> I want to know what should be the best server configuration for this kind of requirement. Eg: how many VMs, what should be the RAM, Heap size etc.
>>
>> Thanks,
>> Deepak