Index optimization takes too long

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Index optimization takes too long

weiwang19
Hello,

After a recent schema change,  it takes almost 40 minutes to optimize the
index.  The schema change is to enable docValues for all sort/facet fields,
which increase the index size from 12G to 14G. Before the change it only
takes 5 minutes to do the optimization.

I have tried to increase maxMergeAtOnceExplicit because the default 30
could be too low:

<int name="maxMergeAtOnceExplicit">100</int>

But it doesn't seem to help. Any suggestions?

Thanks,
Wei
Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

Shawn Heisey-2
On 11/2/2018 5:00 PM, Wei wrote:
> After a recent schema change,  it takes almost 40 minutes to optimize the
> index.  The schema change is to enable docValues for all sort/facet fields,
> which increase the index size from 12G to 14G. Before the change it only
> takes 5 minutes to do the optimization.

An optimize is not just a straight data copy.  Lucene is actually
completely recalculating the index data structures.  It will never
proceed at the full data rate your disks are capable of achieving.

I do not know how docValues actually work during a segment merge, but
given exactly how the info relates to the inverted index, it's probably
even more complicated than the rest of the data structures in a Lucene
index.

On one of the systems I used to manage, back in March of 2017, I was
seeing a 50GB index take 1.73 hours to optimize.  I do not recall
whether I had docValues at that point, but I probably did.

http://lucene.472066.n3.nabble.com/What-is-the-bottleneck-for-an-optimise-operation-tt4323039.html#a4323140

There's not much you can do to make this go faster. Putting massively
faster CPUs in the machine MIGHT make a difference, but it probably
wouldn't be a BIG difference.  I'm talking about clock speed, not core
count.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

Deepak Goel
In reply to this post by weiwang19
I would start by monitoring the hardware (CPU, Memory, Disk) & software
(heap, threads) utilization's and seeing where the bottlenecks are. Or what
is getting utilized the most. And then tune that parameter.

I would also look at profiling the software.


Deepak
"The greatness of a nation can be judged by the way its animals are
treated. Please consider stopping the cruelty by becoming a Vegan"

+91 73500 12833
[hidden email]

Facebook: https://www.facebook.com/deicool
LinkedIn: www.linkedin.com/in/deicool

"Plant a Tree, Go Green"

Make In India : http://www.makeinindia.com/home


On Sat, Nov 3, 2018 at 4:30 AM Wei <[hidden email]> wrote:

> Hello,
>
> After a recent schema change,  it takes almost 40 minutes to optimize the
> index.  The schema change is to enable docValues for all sort/facet fields,
> which increase the index size from 12G to 14G. Before the change it only
> takes 5 minutes to do the optimization.
>
> I have tried to increase maxMergeAtOnceExplicit because the default 30
> could be too low:
>
> <int name="maxMergeAtOnceExplicit">100</int>
>
> But it doesn't seem to help. Any suggestions?
>
> Thanks,
> Wei
>
Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

David Hastings
On a side note, does adding docvalues to an already indexed field, and then optimizing, prevent the need to reindex to take advantage of docvalues? I was under the impression you had to reindex the content.

> On Nov 3, 2018, at 4:41 AM, Deepak Goel <[hidden email]> wrote:
>
> I would start by monitoring the hardware (CPU, Memory, Disk) & software
> (heap, threads) utilization's and seeing where the bottlenecks are. Or what
> is getting utilized the most. And then tune that parameter.
>
> I would also look at profiling the software.
>
>
> Deepak
> "The greatness of a nation can be judged by the way its animals are
> treated. Please consider stopping the cruelty by becoming a Vegan"
>
> +91 73500 12833
> [hidden email]
>
> Facebook: https://www.facebook.com/deicool
> LinkedIn: www.linkedin.com/in/deicool
>
> "Plant a Tree, Go Green"
>
> Make In India : http://www.makeinindia.com/home
>
>
>> On Sat, Nov 3, 2018 at 4:30 AM Wei <[hidden email]> wrote:
>>
>> Hello,
>>
>> After a recent schema change,  it takes almost 40 minutes to optimize the
>> index.  The schema change is to enable docValues for all sort/facet fields,
>> which increase the index size from 12G to 14G. Before the change it only
>> takes 5 minutes to do the optimization.
>>
>> I have tried to increase maxMergeAtOnceExplicit because the default 30
>> could be too low:
>>
>> <int name="maxMergeAtOnceExplicit">100</int>
>>
>> But it doesn't seem to help. Any suggestions?
>>
>> Thanks,
>> Wei
>>
Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

Shawn Heisey-2
On 11/3/2018 5:32 AM, Dave wrote:
> On a side note, does adding docvalues to an already indexed field, and then optimizing, prevent the need to reindex to take advantage of docvalues? I was under the impression you had to reindex the content.

You must reindex when changing the schema to add docValues.  An optimize
will not build the new data structures. It will only rebuild the data
structures that are already there.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

Erick Erickson
Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
(or something like that). Also, there's an idea in SOLR-12259 IIRC, but
that'll be in 7.6 at the earliest.

On Sat, Nov 3, 2018, 07:13 Shawn Heisey <[hidden email] wrote:

> On 11/3/2018 5:32 AM, Dave wrote:
> > On a side note, does adding docvalues to an already indexed field, and
> then optimizing, prevent the need to reindex to take advantage of
> docvalues? I was under the impression you had to reindex the content.
>
> You must reindex when changing the schema to add docValues.  An optimize
> will not build the new data structures. It will only rebuild the data
> structures that are already there.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

weiwang19
Thanks everyone! I checked the system metrics during the optimization
process. CPU usage is quite low, there is no I/O wait,  and memory usage is
not much different from before the docValues change.  So I wonder what
could be the bottleneck.

Thanks,
Wei

On Sat, Nov 3, 2018 at 1:38 PM Erick Erickson <[hidden email]>
wrote:

> Going from my phone so it'll be terse.  See uninvertingmergeuodateprocessor
> (or something like that). Also, there's an idea in SOLR-12259 IIRC, but
> that'll be in 7.6 at the earliest.
>
> On Sat, Nov 3, 2018, 07:13 Shawn Heisey <[hidden email] wrote:
>
> > On 11/3/2018 5:32 AM, Dave wrote:
> > > On a side note, does adding docvalues to an already indexed field, and
> > then optimizing, prevent the need to reindex to take advantage of
> > docvalues? I was under the impression you had to reindex the content.
> >
> > You must reindex when changing the schema to add docValues.  An optimize
> > will not build the new data structures. It will only rebuild the data
> > structures that are already there.
> >
> > Thanks,
> > Shawn
> >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Index optimization takes too long

Toke Eskildsen-2
On Sat, 2018-11-03 at 21:41 -0700, Wei wrote:
> Thanks everyone! I checked the system metrics during the optimization
> process. CPU usage is quite low, there is no I/O wait,  and memory
> usage is not much different from before the docValues change.  So I
> wonder what could be the bottleneck.

Are you looking at overall CPU usage or single-core? When we run force
merge, we have a single core at 100% while the rest are idle.


NB: There is currently a thread "Static index, fastest way to do
forceMerge" in the Lucene users mailinglist, which seem to be quite
parallel to this thread.

- Toke Eskildsen, royal Danish Library