Optimal RAM to size index ration

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Optimal RAM to size index ration

SOLR4189
Hi all,

I have a collection with many shards. Each shard is in separate SOLR node
(VM) has 40Gb index size, 4 CPU and SSD.

When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
same queries times (percentile80, percentile90 and percentile95). I run the
long test - 8 hours production queries and updates.
 
What does it mean? All index in RAM it not must? Maybe is it due to SSD? How
can I check it?

Thank you.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

BlackIce
Do you load the index onto a RAM disk? I was under the impression that the
JVM had everything contained that had to do with SOLR (I might be wrong),
if thats the case and you are not loading the Index onto a ram disk then
you won't see any difference.
in either scenario, I don't think you would see any difference.

On Mon, Apr 15, 2019 at 3:33 PM SOLR4189 <[hidden email]> wrote:

> Hi all,
>
> I have a collection with many shards. Each shard is in separate SOLR node
> (VM) has 40Gb index size, 4 CPU and SSD.
>
> When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
> index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
> same queries times (percentile80, percentile90 and percentile95). I run the
> long test - 8 hours production queries and updates.
>
> What does it mean? All index in RAM it not must? Maybe is it due to SSD?
> How
> can I check it?
>
> Thank you.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

Emir Arnautović
In reply to this post by SOLR4189
Hi,
The recommendation to have RAM enough to place your entire index into memory is sort of worst case scenario (maybe better called the best case scenario) where your index is optimal and is fully used all the time. OS will load pages that are used and those that might be used to memory, so even if you have 40GB of index files on disk if you do not use the files they will not be loaded to memory. Why would you not use some files: maybe some fields are stored but you never retrieve them, or you enabled doc values but you never use doc values or you use only subset of your documents and old documents are never part of result…
The best thing is that you run your Solr with some monitoring tool and see how much RAM is actually used on average/max and use that value with some headroom. You can put some alert on used RAM and react if/when your system starts requiring more. One such tool is our https://sematext.com/spm <https://sematext.com/spm>

HTH,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 15 Apr 2019, at 15:25, SOLR4189 <[hidden email]> wrote:
>
> Hi all,
>
> I have a collection with many shards. Each shard is in separate SOLR node
> (VM) has 40Gb index size, 4 CPU and SSD.
>
> When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
> index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
> same queries times (percentile80, percentile90 and percentile95). I run the
> long test - 8 hours production queries and updates.
>
> What does it mean? All index in RAM it not must? Maybe is it due to SSD? How
> can I check it?
>
> Thank you.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

SOLR4189
In reply to this post by BlackIce
No, I don't load index to RAM, but I run 8 hours queries, so OS must load
necessary files (segments) to RAM during my tests. So in the case where I
set 25GB for RAM, not all files will be loaded to RAM and I thought I'll see
degradation in queries times, but I didn't



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

Shawn Heisey-2
In reply to this post by SOLR4189
On 4/15/2019 7:25 AM, SOLR4189 wrote:

> I have a collection with many shards. Each shard is in separate SOLR node
> (VM) has 40Gb index size, 4 CPU and SSD.
>
> When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
> index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
> same queries times (percentile80, percentile90 and percentile95). I run the
> long test - 8 hours production queries and updates.
>  
> What does it mean? All index in RAM it not must? Maybe is it due to SSD? How
> can I check it?

Achieving good performance does not necessarily require that you have
enough memory to cache the entire index.

The OS disk cache only caches data that is actually accessed.  Running
thousands of queries is going to access certain parts of the index
frequently, but it is unlikely to actually access ALL of the data in the
index.

The most important part of the index that will be accessed on every
query is the data produced by the schema attribute 'indexed="true"'.
That's the actual inverted index.  The percentage of the full index that
this part consumes will be highly dependent on your schema and the
actual contents of the documents that you index -- I cannot give you a
percentage.  Some setups need half the index cached.  Some need a lot
more.  I've heard of some people having great performance with only ten
percent of the index cached, but I suspect that this is not common.

If you go to this page, click on the "Asking for help on a
memory/performance issue" link in the table of contents, and look at the
screenshots, you'll see a lot of numbers:

https://wiki.apache.org/solr/SolrPerformanceProblems

An important number for you to check on your systems is labeled "cached
Mem" in the Linux/UNIX screenshot, showing about 18GB, and "Cached" in
the Windows screenshot, showing about 8GB.  This is the actual amount of
data in the OS disk cache.  If Solr is the only thing on the system,
then it should be pretty close to the amount of index data that the
system has cached.  You'll probably find that on the 50GB system that
only a fraction of the available memory has actually been used.  You may
even find that the same is true on the smaller system.

The OS disk cache can only contain data that has actually been read.  If
a part of the index data is never accessed by queries, it will not be in
the OS disk cache.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

BlackIce
In reply to this post by SOLR4189
I'm not that proficient with Solr.. I used it, but I'd yet have to fully
dive into it, but this topic really interests me.

In those 8 hour tests, does ALL information get accessed, or just partial?
That could be a reason as to why you don't see any difference, that the
test in that time period only accesses partial amount of the Information
and in this time period it only accesses an amount of information which
fits into RAM in both cases...

SSD's will be slower as RAM anyway

On Mon, Apr 15, 2019 at 4:53 PM SOLR4189 <[hidden email]> wrote:

> No, I don't load index to RAM, but I run 8 hours queries, so OS must load
> necessary files (segments) to RAM during my tests. So in the case where I
> set 25GB for RAM, not all files will be loaded to RAM and I thought I'll
> see
> degradation in queries times, but I didn't
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

Erick Erickson
To pile on a bit:

Your *.fdt files contain “stored=true” data. By and large I ignore them for this discussion. Say I execute a query with “rows=10”. The fdt (and fdx) files are only accessed for the 10 docs returned so they have little impact on query time. Or rather, they have a reasonably constant effect on query time. So when trying to get a feel for the RAM/index-size ratio these can be pretty much ignored.

Second, you say you ran your tests for 8 hours. How distinct are the queries? If you run some relatively small set of queries, all the bits of the index that you need will be loaded into RAM very early and just repeating the same set of queries a zillion times measures nothing interesting.

I like at least 5,000 distinct queries that I then randomize when load testing, more if possible and all real user queries if possible. If you can’t get real user queries, you have to guess I’m afraid.

Best,
Erick



> On Apr 15, 2019, at 8:17 AM, BlackIce <[hidden email]> wrote:
>
> I'm not that proficient with Solr.. I used it, but I'd yet have to fully
> dive into it, but this topic really interests me.
>
> In those 8 hour tests, does ALL information get accessed, or just partial?
> That could be a reason as to why you don't see any difference, that the
> test in that time period only accesses partial amount of the Information
> and in this time period it only accesses an amount of information which
> fits into RAM in both cases...
>
> SSD's will be slower as RAM anyway
>
> On Mon, Apr 15, 2019 at 4:53 PM SOLR4189 <[hidden email]> wrote:
>
>> No, I don't load index to RAM, but I run 8 hours queries, so OS must load
>> necessary files (segments) to RAM during my tests. So in the case where I
>> set 25GB for RAM, not all files will be loaded to RAM and I thought I'll
>> see
>> degradation in queries times, but I didn't
>>
>>
>>
>> --
>> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>>

Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

SOLR4189
All my queries from production environments, from real customers. I build
query player that runs queries in the same time intervals like in PRODUCTION
(all customers' queries with time intervals between them are saved in
splunk). So all queries are distinct.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Optimal RAM to size index ration

Jeff Courtade
In reply to this post by SOLR4189
In linux the os will cache files in ram for quick reading.

You can force the into ram by doing cat filename >/dev/null

I do this with all my index files after a reboot and see better performance
times on queries.

Optimal ram is enough ram for all the indexes plus jvm plus 20 percent...

Generally

--
Jeff Courtade
M: 240.507.6116

On Mon, Apr 15, 2019, 9:33 AM SOLR4189 <[hidden email]> wrote:

> Hi all,
>
> I have a collection with many shards. Each shard is in separate SOLR node
> (VM) has 40Gb index size, 4 CPU and SSD.
>
> When I run performance checking with 50GB RAM (10Gb for JVM and 40Gb for
> index) per node and 25GB RAM (10Gb for JVM and 15Gb for index), I get the
> same queries times (percentile80, percentile90 and percentile95). I run the
> long test - 8 hours production queries and updates.
>
> What does it mean? All index in RAM it not must? Maybe is it due to SSD?
> How
> can I check it?
>
> Thank you.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>