What is the bottleneck for an optimise operation?

classic Classic list List threaded Threaded
22 messages Options
What is the bottleneck for an optimise operation? – I’m currently performing an optimise operation on a ~190GB index with about 4 million documents. The process has been running for hours. This i...
Optimize operation is no longer recommended for Solr, as the background merges got a lot smarter. It is an extremely expensive operation that ...
Agreed, and since it takes three times the space is part of the reason it takes so long, so that 190gb index ends up writing another 380 gb until...
Re: What is the bottleneck for an optimise operation? / solve the disk space and time issues by specifying multiple segments to optimize – You can solve the disk space and time issues by specifying multiple segments to optimize down to instead of a single segment. When we reindex...
Yes, we already do it outside Solr. See https://github.com/ICIJ/extract which we developed for this purpose. My guess is that the documents are ve...
What do you have for merge configuration in solrconfig.xml? You should be able to tune it to - approximately - whatever you want without doing t...
This is the current config: <indexConfig> <ramBufferSizeMB>100</ramBufferSizeMB> &l...
Hi Matthew OCR is something which can be parallelized outside of Solr/Tika. Do one OCR task per core, and you can have all cores running at 1...
Hi Rick, We already do this with 30 eight-core machines running seven jobs each, working off a shared queue. See https://github.com/ICIJ/extrac...
I typically end up with about 60-70 segments after indexing. What configuration do you use to bring it down to 16? > On 2 Mar 2017, at 7:42 ...
Thank you. The question remains however, if this is such a hefty operation then why is it walking to the destination instead of running, so to spe...
On Thu, 2017-03-02 at 15:39 +0000, Caruana, Matthew wrote: > Thank you. The question remains however, if this is such a hefty > operation ...
Thank you, you’re right - only one of the four cores is hitting 100%. This is the correct answer. The bottleneck is CPU exacerbated by an absence ...
Matthew: What load testing have you done on optimized .vs. unoptimized indexes? Is there enough of a performance gain to be worth the trouble?...
We index rarely and in bulk as we’re an organisation that deals in enabling access to leaked documents for journalists. The indexes are mostly ...
Well, historically during the really old days, optimize made a major difference. As Lucene evolved that difference was smaller, and in recent _a...
Hi, It's simply expensive. You are rewriting your whole index. Why are you running optimize? Are you seeing performance problems you are ...
On 3/2/2017 8:04 AM, Caruana, Matthew wrote: > I’m currently performing an optimise operation on a ~190GB index with about 4 million documents...
6.4.0 added a lot of metrics to low-level calls. That makes many operations slow. Go back to 6.3.0 or wait for 6.4.2. Meanwhile, stop running o...
Thank you, these are useful tips. We were previously working with a 4GB heap and getting OOMs in Solr while updating (probably from the analyse...
It's _very_ unlikely that optimize will help with OOMs, so that's very probably a red herring. Likely the document that's causing the issue is v...
Hi Matthew, I'm guessing it's the EBS. With EBS we've seen: * cpu.system going up in some kernels * low read/write speeds and maxed out IO a...