merge policy & autocommit

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

merge policy & autocommit

Danilo Tomasoni
Hello all,

We have a solr instance with around 40MLN docs.

In the bulk import phase we noticed a high IO and CPU load and it looks
like it's related to autocommit because if I disable autocommit the load
of the system is very low.

I know that disabling autocommit is not recommended, but I'm wondering
if there is a minimum hardware requirement to make this suggestion
effective.

Our system is not very powerful in terms of IO read/write speed (around
100 Mbyte/s) is it possible that this relatively low IO performance
combined with

autocommit will slow down incredibly our solr instance to the point of
making it not responsive?

The same can be true also for the merge policy? how the IO speed can
affect the merge policy parameters?

I kept the default merge policy configuration but it looks like it never
merges segments. How can I know if a merge is happening?


Thank you

Danilo

--
Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
[hidden email]
http://www.cosbi.eu
 
As for the European General Data Protection Regulation 2016/679 on the protection of natural persons with regard to the processing of personal data, we inform you that all the data we possess are object of treatment in the respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may ask for their correction, cancellation or you may oppose to their use by written request sent by recorded delivery to The Microsoft Research – University of Trento Centre for Computational and Systems Biology Scarl, Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to

Reply | Threaded
Open this post in threaded view
|

Re: merge policy & autocommit

Shawn Heisey-2
On 10/28/2019 7:23 AM, Danilo Tomasoni wrote:
> We have a solr instance with around 40MLN docs.
>
> In the bulk import phase we noticed a high IO and CPU load and it looks
> like it's related to autocommit because if I disable autocommit the load
> of the system is very low.
>
> I know that disabling autocommit is not recommended, but I'm wondering
> if there is a minimum hardware requirement to make this suggestion
> effective.

What are your settings for autoCommit and autoSoftCommit?  If the
settings are referring to system properties, have you defined those
system properties?  Would you be able to restart Solr and then share a
solr.log file that goes back to that start?

The settings that Solr has shipped with for quite a while are to enable
autoCommit with a 15 second maxTime, no maxDoc, and openSearcher set to
false.  The autoSoftCommit setting is not enabled by default.

These settings work well, though I personally think 15 seconds is
perhaps too frequent, and like to set it to something like one minute
instead.

With openSearcher set to false, autoCommit will not affect document
visibility.  If automatically making index changes visible is desired,
it is better to configure autoSoftCommit in addition to autoCommit ...
and super short intervals are not recommended.

> Our system is not very powerful in terms of IO read/write speed (around
> 100 Mbyte/s) is it possible that this relatively low IO performance
> combined with

100MB/sec is not what I would call low I/O.  It's the minimum that you
can expect from modern commodity SATA hard drives, and some of those can
go even faster.  It's also roughly equivalent to the maximum real-world
achievable throughput of a gigabit network connection with TCP-based
protocols.

> autocommit will slow down incredibly our solr instance to the point of
> making it not responsive?

If it's configured correctly, autoCommit should have very little effect
on performance.  Hard commits that do not open a new searcher should
happen VERY quickly.  It seems very strange to me that disabling a
correctly configured autoCommit would substantially affect indexing speeds.

> The same can be true also for the merge policy? how the IO speed can
> affect the merge policy parameters?
>
> I kept the default merge policy configuration but it looks like it never
> merges segments. How can I know if a merge is happening?

If you have segments that are radically different sizes, then merging is
happening.  With default settings, merges from the first level should
produce segments roughly ten times the size of the ones created by
indexing.  Second level merges will probably produce segments roughly
100 times the size of the smallest ones.  Segment merging is a normal
part of Lucene operation, it would be very unusual for it to not occur.

Merging will affect I/O, but it is extremely rare for merging to happen
super-quickly.  The fastest I have ever seen merging on a single Solr
core proceed is about 30 megabytes per second, though usually that
system achieved about 20 megabytes per second.  Merging involves
considerable computational work, it's not just a straight data copy.

Thanks,
Shawn