Solr soft commits

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr soft commits

Shivam Omar
Hi,

I need some help in understanding solr soft commits.  As soft commits are about visibility and are fast in nature. They are advised for nrt use cases. I want to understand does soft commit also honor merge policies and do segment merging for docs in memory. For example, in case, I keep hard commit interval very high and allow few million documents to be in memory by using soft commit with no hard commit, can it affect solr query time performance.


Shivam

Get Outlook for Android<https://aka.ms/ghei36>

DISCLAIMER
This email and any files transmitted with it are intended solely for the person or the entity to whom they are addressed and may contain information which is Confidential and Privileged. Any misuse of the information contained in this email, including but not limited to retransmission or dissemination of the said information by person or entities other than the intended recipient is unauthorized and strictly prohibited. If you are not the intended recipient of this email, please delete this email and contact the sender immediately.
Reply | Threaded
Open this post in threaded view
|

Re: Solr soft commits

Shawn Heisey-2
On 5/10/2018 9:48 AM, Shivam Omar wrote:
> I need some help in understanding solr soft commits.  As soft commits are about visibility and are fast in nature. They are advised for nrt use cases.

Soft commits *MIGHT* be faster than hard commits.  There are situations
where the performance of a soft commit and a hard commit with
openSearcher=true will be about the same, particularly if indexing is
very heavy.

> I want to understand does soft commit also honor merge policies and do segment merging for docs in memory. For example, in case, I keep hard commit interval very high and allow few million documents to be in memory by using soft commit with no hard commit, can it affect solr query time performance.

Segments in memory are very likely not eligible for merging, but I do
not actually know whether that is the case.

Using soft commits will NOT keep millions of documents in memory.  Solr
uses the NRTCachingDirectoryFactory from Lucene by default, and uses it
with default values, which are far too low to accommodate millions of
documents.  See the Javadoc for the directory to see what those defaults
are:

https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html

That page shows a directory creation with memory values of 5 and 60 MB,
but the defaults in the factory code (which is what Solr normally uses)
are 4 and 48.  I'm pretty sure that you can increase these values in
solrconfig.xml, but really large values are not recommended.  Large
enough values to accommodate millions of documents would require the
Java heap to also be large, likely with no real performance advantage.

If segment sizes exceed these values, then they will not be cached in
memory.  Older segments and segments that do not meet the size
requirements are flushed to disk.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Solr soft commits

Shivam Omar


From: Shawn Heisey
Sent: Thursday, May 10, 9:43 PM
Subject: Re: Solr soft commits
To: [hidden email]


On 5/10/2018 9:48 AM, Shivam Omar wrote: > I need some help in understanding solr soft commits. As soft commits are about visibility and are fast in nature. They are advised for nrt use cases. Soft commits *MIGHT* be faster than hard commits.  There are situations where the performance of a soft commit and a hard commit with openSearcher=true will be about the same, particularly if indexing is very heavy.

Thanks Shawn, So there are cases when soft commit will not be faster than the hard commit with openSearcher=true. We have a case where we have to do bulk deletions in that case will soft commit be faster than hard commits.

> I want to understand does soft commit also honor merge policies and do segment merging for docs in memory. For example, in case, I keep hard commit interval very high and allow few million documents to be in memory by using soft commit with no hard commit, can it affect solr query time performance. Segments in memory are very likely not eligible for merging, but I do not actually know whether that is the case. Using soft commits will NOT keep millions of documents in memory.  Solr uses the NRTCachingDirectoryFactory from Lucene by default, and uses it with default values, which are far too low to accommodate millions of documents.  See the Javadoc for the directory to see what those defaults are: https://lucene.apache.org/core/7_3_0/core/org/apache/lucene/store/NRTCachingDirectory.html That page shows a directory creation with memory values of 5 and 60 MB, but the defaults in the factory code (which is what Solr normally uses) are 4 and 48.  I'm pretty sure that you can increase these values in solrconfig.xml, but really large values are not recommended.  Large enough values to accommodate millions of documents would require the Java heap to also be large, likely with no real performance advantage. If segment sizes exceed these values, then they will not be cached in memory.  Older segments and segments that do not meet the size requirements are flushed to disk.

Does it mean post crossing the memory threshold soft commits will lead lucene to flush data to disk as in hard commit. Also does a soft commit has a query time performance cost than doing a hard commit.

Thanks, Shawn

DISCLAIMER
This email and any files transmitted with it are intended solely for the person or the entity to whom they are addressed and may contain information which is Confidential and Privileged. Any misuse of the information contained in this email, including but not limited to retransmission or dissemination of the said information by person or entities other than the intended recipient is unauthorized and strictly prohibited. If you are not the intended recipient of this email, please delete this email and contact the sender immediately.
Reply | Threaded
Open this post in threaded view
|

Re: Solr soft commits

Mark Miller-3
In reply to this post by Shivam Omar
A soft commit does not control merging. The IndexWriter controls merging
and hard commits go through the IndexWriter. A soft commit tells Solr to
try and open a new SolrIndexSearcher with the latest view of the index. It
does this with a mix of using the on disk index and talking to the
IndexWriter to see updates that have not been committed.

Opening a new SolrIndexSearcher using the IndexWriter this way does have a
cost. You may flush segments, you may apply deletes, you may have to
rebuild partial or full in memory data structures. It's generally much
faster than a hard commit to get a refreshed view of the index though.

Given how SolrCloud was designed, it's usually best to set an auto hard
commit to something that works for you, given how large it will make tlogs
(affecting recovery times), and how much RAM is used. Then use soft commits
for visibility. It's best to use them as infrequently as your use case
allows.

- Mark

On Thu, May 10, 2018 at 10:49 AM Shivam Omar <[hidden email]>
wrote:

> Hi,
>
> I need some help in understanding solr soft commits.  As soft commits are
> about visibility and are fast in nature. They are advised for nrt use
> cases. I want to understand does soft commit also honor merge policies and
> do segment merging for docs in memory. For example, in case, I keep hard
> commit interval very high and allow few million documents to be in memory
> by using soft commit with no hard commit, can it affect solr query time
> performance.
>
>
> Shivam
>
> Get Outlook for Android<https://aka.ms/ghei36>
>
> DISCLAIMER
> This email and any files transmitted with it are intended solely for the
> person or the entity to whom they are addressed and may contain information
> which is Confidential and Privileged. Any misuse of the information
> contained in this email, including but not limited to retransmission or
> dissemination of the said information by person or entities other than the
> intended recipient is unauthorized and strictly prohibited. If you are not
> the intended recipient of this email, please delete this email and contact
> the sender immediately.
>
--
- Mark
about.me/markrmiller
Reply | Threaded
Open this post in threaded view
|

Re: Solr soft commits

Shawn Heisey-2
In reply to this post by Shivam Omar
On 5/10/2018 8:28 PM, Shivam Omar wrote:
> Thanks Shawn, So there are cases when soft commit will not be faster than the hard commit with openSearcher=true. We have a case where we have to do bulk deletions in that case will soft commit be faster than hard commits.

I actually have no idea whether deletions get put in memory by the
NRTCachingDirectory or not.  If they don't, then soft commits with
deletes would have no performance advantages over hard commits. 
Somebody who knows the Lucene code REALLY well will need to comment here.

> Does it mean post crossing the memory threshold soft commits will lead lucene to flush data to disk as in hard commit. Also does a soft commit has a query time performance cost than doing a hard commit.

If the machine has enough memory to effectively cache the index, then a
query after a hard commit should be just as fast as a query after a soft
commit.  When Solr must actually read the disk to process a query,
that's when things get slow.  If the machine has enough memory (not
assigned to any program) for effective disk caching, then the data it
needs to process a query will be in memory regardless of what kind of
commit is done.

Thanks,
Shawn