defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Danilo Tomasoni
Hello all,
we are running a solr instance with around 41 MLN documents on a SATA class 10 disk with around 10.000 rpm.
We are experiencing very slow query responses (in the order of hours..) with an average of 205 segments.
We made a test with a normal pc and an SSD disk, and there the same solr instance with the same data and the same number of segments was around 45 times faster.
Force optimize was also tried to improve the performances, but it was very slow, so we abandoned it.

Since we still don't have enterprise server ssd disks, we are now wondering if in the meanwhile defragmenting the solrdata folder can help.
The idea is that due to many updates, each segment file is fragmented across different phisical blocks.
Put in another way, each segment file is non-contiguous on disk, and this can slow-down the solr response.

What do you suggest?
Is this somewhat equivalent to force-optimize or it can be faster?

Thank you.
Danilo

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
[hidden email]<https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..&URL=mailto%3acalabro%40cosbi.eu>
http://www.cosbi.eu<https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..&URL=http%3a%2f%2fwww.cosbi.eu%2f>

As for the European General Data Protection Regulation 2016/679 on the protection of natural persons with regard to the processing of personal data, we inform you that all the data we possess are object of treatment in the respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may ask for their correction, cancellation or you may oppose to their use by written request sent by recorded delivery to The Microsoft Research – University of Trento Centre for Computational and Systems Biology Scarl, Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to
Reply | Threaded
Open this post in threaded view
|

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Dario Rigolin
Hi Danilo, following my experience now SSD or RAM Disk is the only way to
speed up queries. It depends on your storage occupation of your 41M docs.
If you don't have Enterprise SSD you can add consumer SSD as a fast cache
(linux caching modules "flashcache / bcache" are able to use cheap SSD as a
data cache and have your data safe stored on SATA Disks).

I don't think you can increase performances without changing technology on
the storage system.

Regards.
Dario

Il giorno lun 22 feb 2021 alle ore 08:52 Danilo Tomasoni <[hidden email]>
ha scritto:

> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA
> class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..)
> with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr
> instance with the same data and the same number of segments was around 45
> times faster.
> Force optimize was also tried to improve the performances, but it was very
> slow, so we abandoned it.
>
> Since we still don't have enterprise server ssd disks, we are now
> wondering if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented
> across different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this
> can slow-down the solr response.
>
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
>
> Thank you.
> Danilo
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for
> Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> [hidden email]<
> https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..&URL=mailto%3acalabro%40cosbi.eu
> >
> http://www.cosbi.eu<
> https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..&URL=http%3a%2f%2fwww.cosbi.eu%2f
> >
>
> As for the European General Data Protection Regulation 2016/679 on the
> protection of natural persons with regard to the processing of personal
> data, we inform you that all the data we possess are object of treatment in
> the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how;
> you may ask for their correction, cancellation or you may oppose to their
> use by written request sent by recorded delivery to The Microsoft Research
> – University of Trento Centre for Computational and Systems Biology Scarl,
> Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to
>


--

Dario Rigolin
Comperio srl - CTO
Mobile: +39 347 7232652 - Office: +39 0425 471482
Skype: dario.rigolin
Reply | Threaded
Open this post in threaded view
|

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Dmitri Maziuk
In reply to this post by Danilo Tomasoni
On 2021-02-22 1:52 AM, Danilo Tomasoni wrote:
> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..) with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr instance with the same data and the same number of segments was around 45 times faster.

What is your actual hardware and OS, as opposed to "normal pc"?

Dima
Reply | Threaded
Open this post in threaded view
|

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Walter Underwood
In reply to this post by Danilo Tomasoni
A forced merge might improve speed 20%. Going from spinning disk to SSD
will improve speed 20X or more. Don’t waste your time even thinking about
forced merges.

You need to get SSDs.

The even bigger speedup is to get enough RAM that the OS can keep the
Solr index files in file system buffers. Check how much space is used by
your indexes, then make sure that there is that much available RAM that
is not used by the OS or Solr JVM.

Some people make the mistake of giving a huge heap to the JVM, thinking
this will improve caching. This almost always makes things worse, by
using RAM that could be use for caching files. 8GB of heap is usually enough.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 21, 2021, at 11:52 PM, Danilo Tomasoni <[hidden email]> wrote:
>
> Hello all,
> we are running a solr instance with around 41 MLN documents on a SATA class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..) with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr instance with the same data and the same number of segments was around 45 times faster.
> Force optimize was also tried to improve the performances, but it was very slow, so we abandoned it.
>
> Since we still don't have enterprise server ssd disks, we are now wondering if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented across different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this can slow-down the solr response.
>
> What do you suggest?
> Is this somewhat equivalent to force-optimize or it can be faster?
>
> Thank you.
> Danilo
>
> Danilo Tomasoni
>
> Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)
> Piazza Manifattura 1,  38068 Rovereto (TN), Italy
> [hidden email]<https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..&URL=mailto%3acalabro%40cosbi.eu>
> http://www.cosbi.eu<https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..&URL=http%3a%2f%2fwww.cosbi.eu%2f>
>
> As for the European General Data Protection Regulation 2016/679 on the protection of natural persons with regard to the processing of personal data, we inform you that all the data we possess are object of treatment in the respect of the normative provided for by the cited GDPR.
> It is your right to be informed on which of your data are used and how; you may ask for their correction, cancellation or you may oppose to their use by written request sent by recorded delivery to The Microsoft Research – University of Trento Centre for Computational and Systems Biology Scarl, Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
> P Please don't print this e-mail unless you really need to

Reply | Threaded
Open this post in threaded view
|

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Shawn Heisey-2
In reply to this post by Danilo Tomasoni
On 2/22/2021 12:52 AM, Danilo Tomasoni wrote:
> we are running a solr instance with around 41 MLN documents on a SATA class 10 disk with around 10.000 rpm.
> We are experiencing very slow query responses (in the order of hours..) with an average of 205 segments.
> We made a test with a normal pc and an SSD disk, and there the same solr instance with the same data and the same number of segments was around 45 times faster.
> Force optimize was also tried to improve the performances, but it was very slow, so we abandoned it.
>
> Since we still don't have enterprise server ssd disks, we are now wondering if in the meanwhile defragmenting the solrdata folder can help.
> The idea is that due to many updates, each segment file is fragmented across different phisical blocks.
> Put in another way, each segment file is non-contiguous on disk, and this can slow-down the solr response.

The absolute best thing you can do to improve Solr performance is add
memory.

The OS automatically uses unallocated memory to cache data on the disk.
  Because memory is far faster than any disk, even SSD, it performs better.

I wrote a wiki page about it:

https://cwiki.apache.org/confluence/display/solr/SolrPerformanceProblems

If you have sufficient memory, the speed of your disks will have little
effect on performance.  It's only in cases where there is not enough
memory that disk performance will matter.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Dmitri Maziuk
On 2021-02-22 11:18 AM, Shawn Heisey wrote:

> The OS automatically uses unallocated memory to cache data on the disk.
>   Because memory is far faster than any disk, even SSD, it performs better.

Depends on the os, from "defragmenting solrdata folder" I suspect the OP
is on windows whose filesystems and memory management does not always
work the way the Unix textbook says.

Dima
Reply | Threaded
Open this post in threaded view
|

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Walter Underwood
True, but Windows does cache files. It has been a couple of decades since I ran search on Windows, but Ultraseek got large gains from setting some sort of system property to make it act like a file server and give file caching equal priority with program caching.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 22, 2021, at 9:22 AM, dmitri maziuk <[hidden email]> wrote:
>
> On 2021-02-22 11:18 AM, Shawn Heisey wrote:
>
>> The OS automatically uses unallocated memory to cache data on the disk.   Because memory is far faster than any disk, even SSD, it performs better.
>
> Depends on the os, from "defragmenting solrdata folder" I suspect the OP is on windows whose filesystems and memory management does not always work the way the Unix textbook says.
>
> Dima

Reply | Threaded
Open this post in threaded view
|

R: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Danilo Tomasoni
Thank you all for the suggestions,
The OS is not windows, it's centos, a colleague thinks that even on linux defragmenting can improve performance about 2X because it keeps the data contiguous on disk.

We cannot use flashcache because we run solr on virtual machines.
We will investigate better on the memory suggestion by Shawn..
thank you very much.

Danilo

Danilo Tomasoni

Fondazione The Microsoft Research - University of Trento Centre for Computational and Systems Biology (COSBI)
Piazza Manifattura 1,  38068 Rovereto (TN), Italy
[hidden email]<https://webmail.cosbi.eu/owa/redir.aspx?C=VNXi3_8-qSZTBi-FPvMwmwSB3IhCOjY8nuCBIfcNIs_5SgD-zNPWCA..&URL=mailto%3acalabro%40cosbi.eu>
http://www.cosbi.eu<https://webmail.cosbi.eu/owa/redir.aspx?C=CkilyF54_imtLHzZqF1gCGvmYXjsnf4bzGynd8OXm__5SgD-zNPWCA..&URL=http%3a%2f%2fwww.cosbi.eu%2f>

As for the European General Data Protection Regulation 2016/679 on the protection of natural persons with regard to the processing of personal data, we inform you that all the data we possess are object of treatment in the respect of the normative provided for by the cited GDPR.
It is your right to be informed on which of your data are used and how; you may ask for their correction, cancellation or you may oppose to their use by written request sent by recorded delivery to The Microsoft Research – University of Trento Centre for Computational and Systems Biology Scarl, Piazza Manifattura 1, 38068 Rovereto (TN), Italy.
P Please don't print this e-mail unless you really need to
________________________________
Da: Walter Underwood <[hidden email]>
Inviato: lunedì 22 febbraio 2021 18:25
A: [hidden email] <[hidden email]>
Oggetto: Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

[CAUTION: EXTERNAL SENDER]
[Please check correspondence between Sender Display Name and Sender Email Address before clicking on any link or opening attachments]


True, but Windows does cache files. It has been a couple of decades since I ran search on Windows, but Ultraseek got large gains from setting some sort of system property to make it act like a file server and give file caching equal priority with program caching.

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)

> On Feb 22, 2021, at 9:22 AM, dmitri maziuk <[hidden email]> wrote:
>
> On 2021-02-22 11:18 AM, Shawn Heisey wrote:
>
>> The OS automatically uses unallocated memory to cache data on the disk.   Because memory is far faster than any disk, even SSD, it performs better.
>
> Depends on the os, from "defragmenting solrdata folder" I suspect the OP is on windows whose filesystems and memory management does not always work the way the Unix textbook says.
>
> Dima

Reply | Threaded
Open this post in threaded view
|

Re: R: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

Dmitri Maziuk
On 2021-02-23 1:53 AM, Danilo Tomasoni wrote:
> Thank you all for the suggestions,
> The OS is not windows, it's centos, a colleague thinks that even on linux defragmenting can improve performance about 2X because it keeps the data contiguous on disk.

You may want to check the filesystem you're using and read up on XFS vs
EXT4.

FWIW we've had reasonable success with ZFS on Linux (look on github)
binary drivers for centos 6 and, a bit less so: 7. With effectively
RAID-10'ed HDDs and a regular SSD for read & write caching.

Either way, check with `df` first: if you're more than ~75% full, you
need a bigger disk no matter what else you do.

Dima