Lucene index directory grows and shrinks

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Lucene index directory grows and shrinks

Raffaele Gambelli
Hi all,

I'm using Jackrabbit 2.18.0 which uses lucene-core 3.6.0.

I'm working on an application that has reached 37 G of directory index, a few days ago, disk occupancy has quickly reached 100% and then returned to pre-growth employment.

I believe that was caused by a rapid growth of Lucene index directory, looking for such an event I've found only this article describing something really similar https://helpx.adobe.com/uk/experience-manager/kb/lucene-index-directory-growth.html

I would like to know more info about this behaviour, first of all can you confirm this growth and shrinkage?

Thanks in advance, best regards
[https://westpole.it/firma/logo.png]

Raffaele Gambelli
WebRainbow(r) Software Developer

P +39 051 8550 576
M #
E [hidden email]
W https://westpole.webex.com/meet/R.Gambelli
A Via Ettore Cristoni, 84 - 40030 Casalecchio di Reno

[https://vitamined.it/westpole/website.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>

This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:[hidden email]) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.

[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email
Reply | Threaded
Open this post in threaded view
|

Re: Lucene index directory grows and shrinks

Atri Sharma-3
This are typical symptoms of an index merge.

However, it is hard to predict more without knowing more data. What is
your segment size limit? Have you changed the default merge frequency
or max segments configuration? Would you have an estimate of ratio of
number of segments reaching max limit / total segments?

Atri

On Mon, Nov 4, 2019 at 7:12 PM Raffaele Gambelli <[hidden email]> wrote:

>
> Hi all,
>
> I'm using Jackrabbit 2.18.0 which uses lucene-core 3.6.0.
>
> I'm working on an application that has reached 37 G of directory index, a few days ago, disk occupancy has quickly reached 100% and then returned to pre-growth employment.
>
> I believe that was caused by a rapid growth of Lucene index directory, looking for such an event I've found only this article describing something really similar https://helpx.adobe.com/uk/experience-manager/kb/lucene-index-directory-growth.html
>
> I would like to know more info about this behaviour, first of all can you confirm this growth and shrinkage?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

R: Lucene index directory grows and shrinks

Raffaele Gambelli
Thanks for your quick reply, I'm quite a beginner in Lucene concepts, Jackrabbit hides almost all about the way it uses Lucene internally.

Anyway here it is the size of each sub-directory in my index, please note the bigger one, 25G,  is it normal?

...repository/workspaces/default/index$ du -h .
2.5G    ./_12ey1
14M     ./_1dr9s
20M     ./_1dr8d
2.8G    ./_1b9pj
5.8M    ./_1drqc
19M     ./_1dr4q
2.5G    ./_17lmu
4.0M    ./_1drmx
11M     ./_1drbf
4.3M    ./_1drok
13M     ./_1drq1
40K     ./_1drqe
11M     ./_1drhc
260M    ./_1dr3g
664M    ./_1by44
2.5G    ./_14tet
281M    ./_1c4wj
25G     ./_zzgq
274M    ./_1d2nc
638M    ./_1ctf0
580K    ./_1drqf
304K    ./_1drqd
6.5M    ./_1dr6m
325M    ./_1djfp
37G

I tried also to download index directory to my local machine, to inspect them with Luke which I know a bit, but for network problem the download always interrupts.

> What is your segment size limit?

I don't know, where could I see that limit?

> Have you changed the default merge frequency or max segments configuration?

Merge frequency is the mergeFactor ? If yes I'm using the default that is 10, read here https://jackrabbit.apache.org/archive/wiki/JCR/Search_115513504.html

Max segment I don't know, where could I see it?

Bye

-----Messaggio originale-----
Da: Atri Sharma <[hidden email]>
Inviato: lunedì 4 novembre 2019 14:46
A: [hidden email]
Oggetto: Re: Lucene index directory grows and shrinks

This are typical symptoms of an index merge.

However, it is hard to predict more without knowing more data. What is your segment size limit? Have you changed the default merge frequency or max segments configuration? Would you have an estimate of ratio of number of segments reaching max limit / total segments?

Atri

On Mon, Nov 4, 2019 at 7:12 PM Raffaele Gambelli <[hidden email]> wrote:

>
> Hi all,
>
> I'm using Jackrabbit 2.18.0 which uses lucene-core 3.6.0.
>
> I'm working on an application that has reached 37 G of directory index, a few days ago, disk occupancy has quickly reached 100% and then returned to pre-growth employment.
>
> I believe that was caused by a rapid growth of Lucene index directory,
> looking for such an event I've found only this article describing
> something really similar
> https://helpx.adobe.com/uk/experience-manager/kb/lucene-index-director
> y-growth.html
>
> I would like to know more info about this behaviour, first of all can you confirm this growth and shrinkage?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

[https://westpole.it/firma/logo.png]

Raffaele Gambelli
WebRainbow® Software Developer

P +39 051 8550 576
M #
E [hidden email]
W https://westpole.webex.com/meet/R.Gambelli
A Via Ettore Cristoni, 84 - 40030 Casalecchio di Reno

[https://vitamined.it/westpole/website.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>

This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:[hidden email]) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.

[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

R: Lucene index directory grows and shrinks

Raffaele Gambelli
For what you know, is this behaviour which you defined "typical" described deeply somewhere?

It is foundamental for me to better understand it even to know how big an index can grow, in a way that I can allocate the right disk space.

Thank you very much

-----Messaggio originale-----
Da: Raffaele Gambelli <[hidden email]>
Inviato: lunedì 4 novembre 2019 15:16
A: [hidden email]
Oggetto: R: Lucene index directory grows and shrinks

Thanks for your quick reply, I'm quite a beginner in Lucene concepts, Jackrabbit hides almost all about the way it uses Lucene internally.

Anyway here it is the size of each sub-directory in my index, please note the bigger one, 25G,  is it normal?

...repository/workspaces/default/index$ du -h .
2.5G    ./_12ey1
14M     ./_1dr9s
20M     ./_1dr8d
2.8G    ./_1b9pj
5.8M    ./_1drqc
19M     ./_1dr4q
2.5G    ./_17lmu
4.0M    ./_1drmx
11M     ./_1drbf
4.3M    ./_1drok
13M     ./_1drq1
40K     ./_1drqe
11M     ./_1drhc
260M    ./_1dr3g
664M    ./_1by44
2.5G    ./_14tet
281M    ./_1c4wj
25G     ./_zzgq
274M    ./_1d2nc
638M    ./_1ctf0
580K    ./_1drqf
304K    ./_1drqd
6.5M    ./_1dr6m
325M    ./_1djfp
37G

I tried also to download index directory to my local machine, to inspect them with Luke which I know a bit, but for network problem the download always interrupts.

> What is your segment size limit?

I don't know, where could I see that limit?

> Have you changed the default merge frequency or max segments configuration?

Merge frequency is the mergeFactor ? If yes I'm using the default that is 10, read here https://jackrabbit.apache.org/archive/wiki/JCR/Search_115513504.html

Max segment I don't know, where could I see it?

Bye

-----Messaggio originale-----
Da: Sharma <[hidden email]>
Inviato: lunedì 4 novembre 2019 14:46
A: [hidden email]
Oggetto: Re: Lucene index directory grows and shrinks

This are typical symptoms of an index merge.

However, it is hard to predict more without knowing more data. What is your segment size limit? Have you changed the default merge frequency or max segments configuration? Would you have an estimate of ratio of number of segments reaching max limit / total segments?

Atri

On Mon, Nov 4, 2019 at 7:12 PM Raffaele Gambelli <[hidden email]> wrote:

>
> Hi all,
>
> I'm using Jackrabbit 2.18.0 which uses lucene-core 3.6.0.
>
> I'm working on an application that has reached 37 G of directory index, a few days ago, disk occupancy has quickly reached 100% and then returned to pre-growth employment.
>
> I believe that was caused by a rapid growth of Lucene index directory,
> looking for such an event I've found only this article describing
> something really similar
> https://helpx.adobe.com/uk/experience-manager/kb/lucene-index-director
> y-growth.html
>
> I would like to know more info about this behaviour, first of all can you confirm this growth and shrinkage?
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

[https://westpole.it/firma/logo.png]

Raffaele Gambelli
WebRainbow® Software Developer

P +39 051 8550 576
M #
E [hidden email]
W https://westpole.webex.com/meet/R.Gambelli
A Via Ettore Cristoni, 84 - 40030 Casalecchio di Reno

[https://vitamined.it/westpole/website.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>

This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:[hidden email]) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.

[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Lucene index directory grows and shrinks

Erick Erickson
Here’s a neat visualization: http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

The short form is this:

- A “segment” is all the files with a particular prefix in your index directory, e.g.  _12ey1* is one segment
- Segments are created as documents are indexed and commits occur.
- Periodically, segments are “merged”, that is some number of segments are combined into a single new segment and then the old segments are deleted.
- During the merge, both the old and new segments occupy index space.
- Deleted documents continue to occupy disk space until the segment containing them are merged. NOTE: updating the same document deletes the old version and adds a new one, so that is a “deleted” document for this discussion.

So it’s quite common for deletes to accumulate until they are merged away. You have two sources of fluctuation:
1> deleted docs
2> the merging process.

And in your case, I see one segment around 25G. That indicates your index has been optimized at some point, and also I’d guess you’re on Lucene prior to release 7.5, so whenever you optimized again, _all_ segments will be merged into a single new segment, meaning your index will _at least- double in size temporarily.

Now, how this happens, you’d have to ask the jackrabbit folks since I don’t know that app either.

For the gory details on optimize, see: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/. Even though that’s labeled Solr, it’s really about Lucene and the doc applies to anything that uses Lucene with the Tiered Merge Policy (which has been the default for some time). Although whether jackrabbit does anything with this I don’t have a clue.

Best,
Erick


> On Nov 4, 2019, at 11:19 AM, Raffaele Gambelli <[hidden email]> wrote:
>
> For what you know, is this behaviour which you defined "typical" described deeply somewhere?
>
> It is foundamental for me to better understand it even to know how big an index can grow, in a way that I can allocate the right disk space.
>
> Thank you very much
>
> -----Messaggio originale-----
> Da: Raffaele Gambelli <[hidden email]>
> Inviato: lunedì 4 novembre 2019 15:16
> A: [hidden email]
> Oggetto: R: Lucene index directory grows and shrinks
>
> Thanks for your quick reply, I'm quite a beginner in Lucene concepts, Jackrabbit hides almost all about the way it uses Lucene internally.
>
> Anyway here it is the size of each sub-directory in my index, please note the bigger one, 25G,  is it normal?
>
> ...repository/workspaces/default/index$ du -h .
> 2.5G    ./_12ey1
> 14M     ./_1dr9s
> 20M     ./_1dr8d
> 2.8G    ./_1b9pj
> 5.8M    ./_1drqc
> 19M     ./_1dr4q
> 2.5G    ./_17lmu
> 4.0M    ./_1drmx
> 11M     ./_1drbf
> 4.3M    ./_1drok
> 13M     ./_1drq1
> 40K     ./_1drqe
> 11M     ./_1drhc
> 260M    ./_1dr3g
> 664M    ./_1by44
> 2.5G    ./_14tet
> 281M    ./_1c4wj
> 25G     ./_zzgq
> 274M    ./_1d2nc
> 638M    ./_1ctf0
> 580K    ./_1drqf
> 304K    ./_1drqd
> 6.5M    ./_1dr6m
> 325M    ./_1djfp
> 37G
>
> I tried also to download index directory to my local machine, to inspect them with Luke which I know a bit, but for network problem the download always interrupts.
>
>> What is your segment size limit?
>
> I don't know, where could I see that limit?
>
>> Have you changed the default merge frequency or max segments configuration?
>
> Merge frequency is the mergeFactor ? If yes I'm using the default that is 10, read here https://jackrabbit.apache.org/archive/wiki/JCR/Search_115513504.html
>
> Max segment I don't know, where could I see it?
>
> Bye
>
> -----Messaggio originale-----
> Da: Sharma <[hidden email]>
> Inviato: lunedì 4 novembre 2019 14:46
> A: [hidden email]
> Oggetto: Re: Lucene index directory grows and shrinks
>
> This are typical symptoms of an index merge.
>
> However, it is hard to predict more without knowing more data. What is your segment size limit? Have you changed the default merge frequency or max segments configuration? Would you have an estimate of ratio of number of segments reaching max limit / total segments?
>
> Atri
>
> On Mon, Nov 4, 2019 at 7:12 PM Raffaele Gambelli <[hidden email]> wrote:
>>
>> Hi all,
>>
>> I'm using Jackrabbit 2.18.0 which uses lucene-core 3.6.0.
>>
>> I'm working on an application that has reached 37 G of directory index, a few days ago, disk occupancy has quickly reached 100% and then returned to pre-growth employment.
>>
>> I believe that was caused by a rapid growth of Lucene index directory,
>> looking for such an event I've found only this article describing
>> something really similar
>> https://helpx.adobe.com/uk/experience-manager/kb/lucene-index-director
>> y-growth.html
>>
>> I would like to know more info about this behaviour, first of all can you confirm this growth and shrinkage?
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
> [https://westpole.it/firma/logo.png]
>
> Raffaele Gambelli
> WebRainbow® Software Developer
>
> P +39 051 8550 576
> M #
> E [hidden email]
> W https://westpole.webex.com/meet/R.Gambelli
> A Via Ettore Cristoni, 84 - 40030 Casalecchio di Reno
>
> [https://vitamined.it/westpole/website.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>
>
> This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:[hidden email]) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.
>
> [https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]