debugging growing index size

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

debugging growing index size

Rob Audenaerde
Hi all,

I'm still debugging the growing-index size. I think closing index readers
might help (work in progress), but I can't really see them holding on to
files (at least, using lsof ). Restarting the application sheds some light,
I see logging on files that are no longer referenced.

What I see is that there are files in the index-directory, that seem to
longer referenced..

I put the output of the infoStream online, because is it rather big (30MB
gzipped):  http://www.audenaerde.org/lucene/merges.log.gz

Output of lsof:  (executed 'sudo lsof *' in the index directory  ). This is
on an CentOS box (maybe that influences stuff as well?)

COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
java    30581 apache  mem    REG  253,0 3176094924 18880508
_4gs5_Lucene50_0.dvd
java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
java    30581 apache  mem    REG  253,0  369563337 18880631
_4gs5_Lucene50_0.tim
java    30581 apache  mem    REG  253,0  176344058 18880623
_4gs5_Lucene50_0.pos
java    30581 apache  mem    REG  253,0  378055201 18880606
_4gs5_Lucene50_0.doc
java    30581 apache  mem    REG  253,0  372579599 18880400
_4i5a_Lucene50_0.dvd
java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
java    30581 apache  mem    REG  253,0  363493917 18880533
_4ct1_Lucene50_0.dvd
java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
java    30581 apache  mem    REG  253,0   46271330 18880661
_4ct1_Lucene50_0.tim
java    30581 apache  mem    REG  253,0   26911387 18880653
_4ct1_Lucene50_0.pos
java    30581 apache  mem    REG  253,0   54678249 18880568
_4ct1_Lucene50_0.doc
java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
java    30581 apache  mem    REG  253,0   45032159 18880389
_4i5a_Lucene50_0.tim
java    30581 apache  mem    REG  253,0   26486772 18880388
_4i5a_Lucene50_0.pos
java    30581 apache  mem    REG  253,0   55411002 18880362
_4i5a_Lucene50_0.doc
java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
java    30581 apache   30wW  REG  253,0          0 18877901 write.lock

Output of some of the biggest files in the index directory:

-rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
-rw-r--r--. 1 apache apache  363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd
-rw-r--r--. 1 apache apache  369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim
-rw-r--r--. 1 apache apache  372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd
-rw-r--r--. 1 apache apache  378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc
-rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
-rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
-rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd
-rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd

Note that the 3ou7 and 3k3a segments no longer appear to be in use?
Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Michael McCandless-2
Hmm, curious.

I looked at the [large] infoStream output and I see segment _3ou7
present on init of IW, a few getReader calls referencing it, then a
forceMerge that indeed merges it away, yet I do NOT see IW attempting
deletion of its files.

And indeed I see plenty (too many: many times per second?) of commits
after that, so the index itself is no longer referencing _3ou7.

If you are failing to close all NRT readers then I would expect _3ou7
to be in the lsof output, but it's not.

The NRT readers close method has logic that notifies IndexWriter when
it's done "needing" the files, to emulate "delete on last close"
semantics for filesystems like HDFS that don't do that ... it's
possible something is wrong here.

Can you set the (public, static) boolean
IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
log?  This causes IW to log the ref count of each file it's tracking
...

I'll also add a bit more verbosity to IW when NRT readers are opened
and close, for 5.4.0.

Mike McCandless

http://blog.mikemccandless.com


On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
<[hidden email]> wrote:

> Hi all,
>
> I'm still debugging the growing-index size. I think closing index readers
> might help (work in progress), but I can't really see them holding on to
> files (at least, using lsof ). Restarting the application sheds some light,
> I see logging on files that are no longer referenced.
>
> What I see is that there are files in the index-directory, that seem to
> longer referenced..
>
> I put the output of the infoStream online, because is it rather big (30MB
> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>
> Output of lsof:  (executed 'sudo lsof *' in the index directory  ). This is
> on an CentOS box (maybe that influences stuff as well?)
>
> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> java    30581 apache  mem    REG  253,0 3176094924 18880508
> _4gs5_Lucene50_0.dvd
> java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> java    30581 apache  mem    REG  253,0  369563337 18880631
> _4gs5_Lucene50_0.tim
> java    30581 apache  mem    REG  253,0  176344058 18880623
> _4gs5_Lucene50_0.pos
> java    30581 apache  mem    REG  253,0  378055201 18880606
> _4gs5_Lucene50_0.doc
> java    30581 apache  mem    REG  253,0  372579599 18880400
> _4i5a_Lucene50_0.dvd
> java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> java    30581 apache  mem    REG  253,0  363493917 18880533
> _4ct1_Lucene50_0.dvd
> java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> java    30581 apache  mem    REG  253,0   46271330 18880661
> _4ct1_Lucene50_0.tim
> java    30581 apache  mem    REG  253,0   26911387 18880653
> _4ct1_Lucene50_0.pos
> java    30581 apache  mem    REG  253,0   54678249 18880568
> _4ct1_Lucene50_0.doc
> java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> java    30581 apache  mem    REG  253,0   45032159 18880389
> _4i5a_Lucene50_0.tim
> java    30581 apache  mem    REG  253,0   26486772 18880388
> _4i5a_Lucene50_0.pos
> java    30581 apache  mem    REG  253,0   55411002 18880362
> _4i5a_Lucene50_0.doc
> java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
>
> Output of some of the biggest files in the index directory:
>
> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim
> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc
> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd
>
> Note that the 3ou7 and 3k3a segments no longer appear to be in use?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Rob Audenaerde
Curious indeed!

I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the
logs. Will get back with them in a day hopefully.

Thanks for the extra logging!

-Rob

On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
[hidden email]> wrote:

> Hmm, curious.
>
> I looked at the [large] infoStream output and I see segment _3ou7
> present on init of IW, a few getReader calls referencing it, then a
> forceMerge that indeed merges it away, yet I do NOT see IW attempting
> deletion of its files.
>
> And indeed I see plenty (too many: many times per second?) of commits
> after that, so the index itself is no longer referencing _3ou7.
>
> If you are failing to close all NRT readers then I would expect _3ou7
> to be in the lsof output, but it's not.
>
> The NRT readers close method has logic that notifies IndexWriter when
> it's done "needing" the files, to emulate "delete on last close"
> semantics for filesystems like HDFS that don't do that ... it's
> possible something is wrong here.
>
> Can you set the (public, static) boolean
> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
> log?  This causes IW to log the ref count of each file it's tracking
> ...
>
> I'll also add a bit more verbosity to IW when NRT readers are opened
> and close, for 5.4.0.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> <[hidden email]> wrote:
> > Hi all,
> >
> > I'm still debugging the growing-index size. I think closing index readers
> > might help (work in progress), but I can't really see them holding on to
> > files (at least, using lsof ). Restarting the application sheds some
> light,
> > I see logging on files that are no longer referenced.
> >
> > What I see is that there are files in the index-directory, that seem to
> > longer referenced..
> >
> > I put the output of the infoStream online, because is it rather big (30MB
> > gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> >
> > Output of lsof:  (executed 'sudo lsof *' in the index directory  ). This
> is
> > on an CentOS box (maybe that influences stuff as well?)
> >
> > COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> > java    30581 apache  mem    REG  253,0 3176094924 18880508
> > _4gs5_Lucene50_0.dvd
> > java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> > java    30581 apache  mem    REG  253,0  369563337 18880631
> > _4gs5_Lucene50_0.tim
> > java    30581 apache  mem    REG  253,0  176344058 18880623
> > _4gs5_Lucene50_0.pos
> > java    30581 apache  mem    REG  253,0  378055201 18880606
> > _4gs5_Lucene50_0.doc
> > java    30581 apache  mem    REG  253,0  372579599 18880400
> > _4i5a_Lucene50_0.dvd
> > java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> > java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> > java    30581 apache  mem    REG  253,0  363493917 18880533
> > _4ct1_Lucene50_0.dvd
> > java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> > java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> > java    30581 apache  mem    REG  253,0   46271330 18880661
> > _4ct1_Lucene50_0.tim
> > java    30581 apache  mem    REG  253,0   26911387 18880653
> > _4ct1_Lucene50_0.pos
> > java    30581 apache  mem    REG  253,0   54678249 18880568
> > _4ct1_Lucene50_0.doc
> > java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> > java    30581 apache  mem    REG  253,0   45032159 18880389
> > _4i5a_Lucene50_0.tim
> > java    30581 apache  mem    REG  253,0   26486772 18880388
> > _4i5a_Lucene50_0.pos
> > java    30581 apache  mem    REG  253,0   55411002 18880362
> > _4i5a_Lucene50_0.doc
> > java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> > java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> > java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> > java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> > java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> > java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> > java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> > java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> > java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> > java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> > java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> > java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> > java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> > java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> > java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> > java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> > java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> > java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> > java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> > java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> > java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> > java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> > java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> > java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> > java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> > java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> > java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
> >
> > Output of some of the biggest files in the index directory:
> >
> > -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> > -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim
> > -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc
> > -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> > -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd
> >
> > Note that the 3ou7 and 3k3a segments no longer appear to be in use?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Michael McCandless-2
Hi Rob,

A couple more things:

Can you print the value of MMapDirectory.UNMAP_SUPPORTED?

Also, can you try your test using NIOFSDirectory instead?  Curious if
that changes things...

Mike McCandless

http://blog.mikemccandless.com


On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
<[hidden email]> wrote:

> Curious indeed!
>
> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the
> logs. Will get back with them in a day hopefully.
>
> Thanks for the extra logging!
>
> -Rob
>
> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> [hidden email]> wrote:
>
>> Hmm, curious.
>>
>> I looked at the [large] infoStream output and I see segment _3ou7
>> present on init of IW, a few getReader calls referencing it, then a
>> forceMerge that indeed merges it away, yet I do NOT see IW attempting
>> deletion of its files.
>>
>> And indeed I see plenty (too many: many times per second?) of commits
>> after that, so the index itself is no longer referencing _3ou7.
>>
>> If you are failing to close all NRT readers then I would expect _3ou7
>> to be in the lsof output, but it's not.
>>
>> The NRT readers close method has logic that notifies IndexWriter when
>> it's done "needing" the files, to emulate "delete on last close"
>> semantics for filesystems like HDFS that don't do that ... it's
>> possible something is wrong here.
>>
>> Can you set the (public, static) boolean
>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
>> log?  This causes IW to log the ref count of each file it's tracking
>> ...
>>
>> I'll also add a bit more verbosity to IW when NRT readers are opened
>> and close, for 5.4.0.
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
>> <[hidden email]> wrote:
>> > Hi all,
>> >
>> > I'm still debugging the growing-index size. I think closing index readers
>> > might help (work in progress), but I can't really see them holding on to
>> > files (at least, using lsof ). Restarting the application sheds some
>> light,
>> > I see logging on files that are no longer referenced.
>> >
>> > What I see is that there are files in the index-directory, that seem to
>> > longer referenced..
>> >
>> > I put the output of the infoStream online, because is it rather big (30MB
>> > gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>> >
>> > Output of lsof:  (executed 'sudo lsof *' in the index directory  ). This
>> is
>> > on an CentOS box (maybe that influences stuff as well?)
>> >
>> > COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
>> > java    30581 apache  mem    REG  253,0 3176094924 18880508
>> > _4gs5_Lucene50_0.dvd
>> > java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
>> > java    30581 apache  mem    REG  253,0  369563337 18880631
>> > _4gs5_Lucene50_0.tim
>> > java    30581 apache  mem    REG  253,0  176344058 18880623
>> > _4gs5_Lucene50_0.pos
>> > java    30581 apache  mem    REG  253,0  378055201 18880606
>> > _4gs5_Lucene50_0.doc
>> > java    30581 apache  mem    REG  253,0  372579599 18880400
>> > _4i5a_Lucene50_0.dvd
>> > java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
>> > java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
>> > java    30581 apache  mem    REG  253,0  363493917 18880533
>> > _4ct1_Lucene50_0.dvd
>> > java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
>> > java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
>> > java    30581 apache  mem    REG  253,0   46271330 18880661
>> > _4ct1_Lucene50_0.tim
>> > java    30581 apache  mem    REG  253,0   26911387 18880653
>> > _4ct1_Lucene50_0.pos
>> > java    30581 apache  mem    REG  253,0   54678249 18880568
>> > _4ct1_Lucene50_0.doc
>> > java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
>> > java    30581 apache  mem    REG  253,0   45032159 18880389
>> > _4i5a_Lucene50_0.tim
>> > java    30581 apache  mem    REG  253,0   26486772 18880388
>> > _4i5a_Lucene50_0.pos
>> > java    30581 apache  mem    REG  253,0   55411002 18880362
>> > _4i5a_Lucene50_0.doc
>> > java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
>> > java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
>> > java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
>> > java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
>> > java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
>> > java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
>> > java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
>> > java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
>> > java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
>> > java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
>> > java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
>> > java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
>> > java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
>> > java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
>> > java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
>> > java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
>> > java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
>> > java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
>> > java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
>> > java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
>> > java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
>> > java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
>> > java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
>> > java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
>> > java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
>> > java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
>> > java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
>> >
>> > Output of some of the biggest files in the index directory:
>> >
>> > -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
>> > -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd
>> > -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim
>> > -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd
>> > -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc
>> > -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
>> > -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
>> > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd
>> > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd
>> >
>> > Note that the 3ou7 and 3k3a segments no longer appear to be in use?
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Rob Audenaerde
In reply to this post by Michael McCandless-2
I got the data (beware, it is about 180MB download, xz-zipped, unpacked it
is about 6.6 GB).

Unfortunately,  I accidentally restarted the application so the index-files
and lsof output could not be determined for this run. Hopefully the
infoStream log with the extra logging will provide enough information. I
will work that next week if needed.

The infoStream can be downloaded here:

http://www.audenaerde.org/lucene/merges.log.xz

The value of MMapDirectory.UNMAP_SUPPORTED= true

I'm currently trying to create a build with NIOFSDirectory instead.

-Rob



On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
[hidden email]> wrote:

> Hmm, curious.
>
> I looked at the [large] infoStream output and I see segment _3ou7
> present on init of IW, a few getReader calls referencing it, then a
> forceMerge that indeed merges it away, yet I do NOT see IW attempting
> deletion of its files.
>
> And indeed I see plenty (too many: many times per second?) of commits
> after that, so the index itself is no longer referencing _3ou7.
>
> If you are failing to close all NRT readers then I would expect _3ou7
> to be in the lsof output, but it's not.
>
> The NRT readers close method has logic that notifies IndexWriter when
> it's done "needing" the files, to emulate "delete on last close"
> semantics for filesystems like HDFS that don't do that ... it's
> possible something is wrong here.
>
> Can you set the (public, static) boolean
> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
> log?  This causes IW to log the ref count of each file it's tracking
> ...
>
> I'll also add a bit more verbosity to IW when NRT readers are opened
> and close, for 5.4.0.
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> <[hidden email]> wrote:
> > Hi all,
> >
> > I'm still debugging the growing-index size. I think closing index readers
> > might help (work in progress), but I can't really see them holding on to
> > files (at least, using lsof ). Restarting the application sheds some
> light,
> > I see logging on files that are no longer referenced.
> >
> > What I see is that there are files in the index-directory, that seem to
> > longer referenced..
> >
> > I put the output of the infoStream online, because is it rather big (30MB
> > gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> >
> > Output of lsof:  (executed 'sudo lsof *' in the index directory  ). This
> is
> > on an CentOS box (maybe that influences stuff as well?)
> >
> > COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> > java    30581 apache  mem    REG  253,0 3176094924 18880508
> > _4gs5_Lucene50_0.dvd
> > java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> > java    30581 apache  mem    REG  253,0  369563337 18880631
> > _4gs5_Lucene50_0.tim
> > java    30581 apache  mem    REG  253,0  176344058 18880623
> > _4gs5_Lucene50_0.pos
> > java    30581 apache  mem    REG  253,0  378055201 18880606
> > _4gs5_Lucene50_0.doc
> > java    30581 apache  mem    REG  253,0  372579599 18880400
> > _4i5a_Lucene50_0.dvd
> > java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> > java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> > java    30581 apache  mem    REG  253,0  363493917 18880533
> > _4ct1_Lucene50_0.dvd
> > java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> > java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> > java    30581 apache  mem    REG  253,0   46271330 18880661
> > _4ct1_Lucene50_0.tim
> > java    30581 apache  mem    REG  253,0   26911387 18880653
> > _4ct1_Lucene50_0.pos
> > java    30581 apache  mem    REG  253,0   54678249 18880568
> > _4ct1_Lucene50_0.doc
> > java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> > java    30581 apache  mem    REG  253,0   45032159 18880389
> > _4i5a_Lucene50_0.tim
> > java    30581 apache  mem    REG  253,0   26486772 18880388
> > _4i5a_Lucene50_0.pos
> > java    30581 apache  mem    REG  253,0   55411002 18880362
> > _4i5a_Lucene50_0.doc
> > java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> > java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> > java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> > java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> > java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> > java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> > java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> > java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> > java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> > java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> > java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> > java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> > java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> > java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> > java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> > java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> > java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> > java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> > java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> > java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> > java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> > java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> > java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> > java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> > java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> > java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> > java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
> >
> > Output of some of the biggest files in the index directory:
> >
> > -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> > -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54 _4ct1_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06 _4gs5_Lucene50_0.tim
> > -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09 _4i5a_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06 _4gs5_Lucene50_0.doc
> > -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> > -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55 _3k3a_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10 _4gs5_Lucene50_0.dvd
> >
> > Note that the 3ou7 and 3k3a segments no longer appear to be in use?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Rob Audenaerde
In reply to this post by Michael McCandless-2
I'm currently running using NIOFS. It seems to prevent the issue from
appearing.

This is a second run (with applied deletes etc)

raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
-rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
-rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
-rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
-rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
-rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
-rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd

raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
-rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
-rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
-rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
-rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
-rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
-rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd



On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
[hidden email]> wrote:

> Hi Rob,
>
> A couple more things:
>
> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>
> Also, can you try your test using NIOFSDirectory instead?  Curious if
> that changes things...
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> <[hidden email]> wrote:
> > Curious indeed!
> >
> > I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the
> > logs. Will get back with them in a day hopefully.
> >
> > Thanks for the extra logging!
> >
> > -Rob
> >
> > On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> > [hidden email]> wrote:
> >
> >> Hmm, curious.
> >>
> >> I looked at the [large] infoStream output and I see segment _3ou7
> >> present on init of IW, a few getReader calls referencing it, then a
> >> forceMerge that indeed merges it away, yet I do NOT see IW attempting
> >> deletion of its files.
> >>
> >> And indeed I see plenty (too many: many times per second?) of commits
> >> after that, so the index itself is no longer referencing _3ou7.
> >>
> >> If you are failing to close all NRT readers then I would expect _3ou7
> >> to be in the lsof output, but it's not.
> >>
> >> The NRT readers close method has logic that notifies IndexWriter when
> >> it's done "needing" the files, to emulate "delete on last close"
> >> semantics for filesystems like HDFS that don't do that ... it's
> >> possible something is wrong here.
> >>
> >> Can you set the (public, static) boolean
> >> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
> >> log?  This causes IW to log the ref count of each file it's tracking
> >> ...
> >>
> >> I'll also add a bit more verbosity to IW when NRT readers are opened
> >> and close, for 5.4.0.
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> >> <[hidden email]> wrote:
> >> > Hi all,
> >> >
> >> > I'm still debugging the growing-index size. I think closing index
> readers
> >> > might help (work in progress), but I can't really see them holding on
> to
> >> > files (at least, using lsof ). Restarting the application sheds some
> >> light,
> >> > I see logging on files that are no longer referenced.
> >> >
> >> > What I see is that there are files in the index-directory, that seem
> to
> >> > longer referenced..
> >> >
> >> > I put the output of the infoStream online, because is it rather big
> (30MB
> >> > gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> >> >
> >> > Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
> This
> >> is
> >> > on an CentOS box (maybe that influences stuff as well?)
> >> >
> >> > COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> >> > java    30581 apache  mem    REG  253,0 3176094924 18880508
> >> > _4gs5_Lucene50_0.dvd
> >> > java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> >> > java    30581 apache  mem    REG  253,0  369563337 18880631
> >> > _4gs5_Lucene50_0.tim
> >> > java    30581 apache  mem    REG  253,0  176344058 18880623
> >> > _4gs5_Lucene50_0.pos
> >> > java    30581 apache  mem    REG  253,0  378055201 18880606
> >> > _4gs5_Lucene50_0.doc
> >> > java    30581 apache  mem    REG  253,0  372579599 18880400
> >> > _4i5a_Lucene50_0.dvd
> >> > java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> >> > java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> >> > java    30581 apache  mem    REG  253,0  363493917 18880533
> >> > _4ct1_Lucene50_0.dvd
> >> > java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> >> > java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> >> > java    30581 apache  mem    REG  253,0   46271330 18880661
> >> > _4ct1_Lucene50_0.tim
> >> > java    30581 apache  mem    REG  253,0   26911387 18880653
> >> > _4ct1_Lucene50_0.pos
> >> > java    30581 apache  mem    REG  253,0   54678249 18880568
> >> > _4ct1_Lucene50_0.doc
> >> > java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> >> > java    30581 apache  mem    REG  253,0   45032159 18880389
> >> > _4i5a_Lucene50_0.tim
> >> > java    30581 apache  mem    REG  253,0   26486772 18880388
> >> > _4i5a_Lucene50_0.pos
> >> > java    30581 apache  mem    REG  253,0   55411002 18880362
> >> > _4i5a_Lucene50_0.doc
> >> > java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> >> > java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> >> > java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> >> > java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> >> > java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> >> > java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> >> > java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> >> > java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> >> > java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> >> > java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> >> > java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> >> > java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> >> > java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> >> > java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> >> > java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> >> > java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> >> > java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> >> > java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> >> > java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> >> > java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> >> > java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> >> > java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> >> > java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> >> > java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> >> > java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> >> > java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> >> > java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
> >> >
> >> > Output of some of the biggest files in the index directory:
> >> >
> >> > -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> >> > -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> _4ct1_Lucene50_0.dvd
> >> > -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> _4gs5_Lucene50_0.tim
> >> > -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> _4i5a_Lucene50_0.dvd
> >> > -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> _4gs5_Lucene50_0.doc
> >> > -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> >> > -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> >> > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> _3k3a_Lucene50_0.dvd
> >> > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> _4gs5_Lucene50_0.dvd
> >> >
> >> > Note that the 3ou7 and 3k3a segments no longer appear to be in use?
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

wmartinusa
Hi Rob:


Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?

MappedByteBuffer …. what OS are you on Rob? What JVM?

http://bugs.java.com/view_bug.do?bug_id=4724038

http://mail-archives.apache.org/mod_mbox/lucene-dev/201509.mbox/%3C55F0461A.2070005@...%3E

hth
-will



> On Nov 13, 2015, at 11:23 AM, Rob Audenaerde <[hidden email]> wrote:
>
> I'm currently running using NIOFS. It seems to prevent the issue from
> appearing.
>
> This is a second run (with applied deletes etc)
>
> raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>
> raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>
>
>
> On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> [hidden email]> wrote:
>
>> Hi Rob,
>>
>> A couple more things:
>>
>> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>>
>> Also, can you try your test using NIOFSDirectory instead?  Curious if
>> that changes things...
>>
>> Mike McCandless
>>
>> http://blog.mikemccandless.com
>>
>>
>> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
>> <[hidden email]> wrote:
>>> Curious indeed!
>>>
>>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate the
>>> logs. Will get back with them in a day hopefully.
>>>
>>> Thanks for the extra logging!
>>>
>>> -Rob
>>>
>>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
>>> [hidden email]> wrote:
>>>
>>>> Hmm, curious.
>>>>
>>>> I looked at the [large] infoStream output and I see segment _3ou7
>>>> present on init of IW, a few getReader calls referencing it, then a
>>>> forceMerge that indeed merges it away, yet I do NOT see IW attempting
>>>> deletion of its files.
>>>>
>>>> And indeed I see plenty (too many: many times per second?) of commits
>>>> after that, so the index itself is no longer referencing _3ou7.
>>>>
>>>> If you are failing to close all NRT readers then I would expect _3ou7
>>>> to be in the lsof output, but it's not.
>>>>
>>>> The NRT readers close method has logic that notifies IndexWriter when
>>>> it's done "needing" the files, to emulate "delete on last close"
>>>> semantics for filesystems like HDFS that don't do that ... it's
>>>> possible something is wrong here.
>>>>
>>>> Can you set the (public, static) boolean
>>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate this
>>>> log?  This causes IW to log the ref count of each file it's tracking
>>>> ...
>>>>
>>>> I'll also add a bit more verbosity to IW when NRT readers are opened
>>>> and close, for 5.4.0.
>>>>
>>>> Mike McCandless
>>>>
>>>> http://blog.mikemccandless.com
>>>>
>>>>
>>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
>>>> <[hidden email]> wrote:
>>>>> Hi all,
>>>>>
>>>>> I'm still debugging the growing-index size. I think closing index
>> readers
>>>>> might help (work in progress), but I can't really see them holding on
>> to
>>>>> files (at least, using lsof ). Restarting the application sheds some
>>>> light,
>>>>> I see logging on files that are no longer referenced.
>>>>>
>>>>> What I see is that there are files in the index-directory, that seem
>> to
>>>>> longer referenced..
>>>>>
>>>>> I put the output of the infoStream online, because is it rather big
>> (30MB
>>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>>>>>
>>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
>> This
>>>> is
>>>>> on an CentOS box (maybe that influences stuff as well?)
>>>>>
>>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
>>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
>>>>> _4gs5_Lucene50_0.dvd
>>>>> java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
>>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
>>>>> _4gs5_Lucene50_0.tim
>>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
>>>>> _4gs5_Lucene50_0.pos
>>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
>>>>> _4gs5_Lucene50_0.doc
>>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
>>>>> _4i5a_Lucene50_0.dvd
>>>>> java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
>>>>> java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
>>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
>>>>> _4ct1_Lucene50_0.dvd
>>>>> java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
>>>>> java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
>>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
>>>>> _4ct1_Lucene50_0.tim
>>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
>>>>> _4ct1_Lucene50_0.pos
>>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
>>>>> _4ct1_Lucene50_0.doc
>>>>> java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
>>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
>>>>> _4i5a_Lucene50_0.tim
>>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
>>>>> _4i5a_Lucene50_0.pos
>>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
>>>>> _4i5a_Lucene50_0.doc
>>>>> java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
>>>>> java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
>>>>> java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
>>>>> java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
>>>>> java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
>>>>> java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
>>>>> java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
>>>>> java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
>>>>> java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
>>>>> java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
>>>>> java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
>>>>> java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
>>>>> java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
>>>>> java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
>>>>> java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
>>>>> java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
>>>>> java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
>>>>> java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
>>>>> java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
>>>>> java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
>>>>> java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
>>>>> java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
>>>>> java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
>>>>> java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
>>>>> java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
>>>>> java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
>>>>> java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
>>>>>
>>>>> Output of some of the biggest files in the index directory:
>>>>>
>>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
>>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
>> _4ct1_Lucene50_0.dvd
>>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
>> _4gs5_Lucene50_0.tim
>>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
>> _4i5a_Lucene50_0.dvd
>>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
>> _4gs5_Lucene50_0.doc
>>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
>>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
>>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
>> _3k3a_Lucene50_0.dvd
>>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
>> _4gs5_Lucene50_0.dvd
>>>>>
>>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in use?
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: debugging growing index size

Uwe Schindler
In reply to this post by Rob Audenaerde
Did you disable unmapping using MMapDirectory#setEnableUnmap() ? By default it should be enabled, but maybe you disabled it for some reason?

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Rob Audenaerde [mailto:[hidden email]]
> Sent: Friday, November 13, 2015 5:24 PM
> To: [hidden email]
> Subject: Re: debugging growing index size
>
> I'm currently running using NIOFS. It seems to prevent the issue from
> appearing.
>
> This is a second run (with applied deletes etc)
>
> raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>
> raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>
>
>
> On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> [hidden email]> wrote:
>
> > Hi Rob,
> >
> > A couple more things:
> >
> > Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> >
> > Also, can you try your test using NIOFSDirectory instead?  Curious if
> > that changes things...
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> > <[hidden email]> wrote:
> > > Curious indeed!
> > >
> > > I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
> the
> > > logs. Will get back with them in a day hopefully.
> > >
> > > Thanks for the extra logging!
> > >
> > > -Rob
> > >
> > > On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> > > [hidden email]> wrote:
> > >
> > >> Hmm, curious.
> > >>
> > >> I looked at the [large] infoStream output and I see segment _3ou7
> > >> present on init of IW, a few getReader calls referencing it, then a
> > >> forceMerge that indeed merges it away, yet I do NOT see IW
> attempting
> > >> deletion of its files.
> > >>
> > >> And indeed I see plenty (too many: many times per second?) of
> commits
> > >> after that, so the index itself is no longer referencing _3ou7.
> > >>
> > >> If you are failing to close all NRT readers then I would expect _3ou7
> > >> to be in the lsof output, but it's not.
> > >>
> > >> The NRT readers close method has logic that notifies IndexWriter when
> > >> it's done "needing" the files, to emulate "delete on last close"
> > >> semantics for filesystems like HDFS that don't do that ... it's
> > >> possible something is wrong here.
> > >>
> > >> Can you set the (public, static) boolean
> > >> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-generate
> this
> > >> log?  This causes IW to log the ref count of each file it's tracking
> > >> ...
> > >>
> > >> I'll also add a bit more verbosity to IW when NRT readers are opened
> > >> and close, for 5.4.0.
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >>
> > >> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> > >> <[hidden email]> wrote:
> > >> > Hi all,
> > >> >
> > >> > I'm still debugging the growing-index size. I think closing index
> > readers
> > >> > might help (work in progress), but I can't really see them holding on
> > to
> > >> > files (at least, using lsof ). Restarting the application sheds some
> > >> light,
> > >> > I see logging on files that are no longer referenced.
> > >> >
> > >> > What I see is that there are files in the index-directory, that seem
> > to
> > >> > longer referenced..
> > >> >
> > >> > I put the output of the infoStream online, because is it rather big
> > (30MB
> > >> > gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> > >> >
> > >> > Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
> > This
> > >> is
> > >> > on an CentOS box (maybe that influences stuff as well?)
> > >> >
> > >> > COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> > >> > java    30581 apache  mem    REG  253,0 3176094924 18880508
> > >> > _4gs5_Lucene50_0.dvd
> > >> > java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> > >> > java    30581 apache  mem    REG  253,0  369563337 18880631
> > >> > _4gs5_Lucene50_0.tim
> > >> > java    30581 apache  mem    REG  253,0  176344058 18880623
> > >> > _4gs5_Lucene50_0.pos
> > >> > java    30581 apache  mem    REG  253,0  378055201 18880606
> > >> > _4gs5_Lucene50_0.doc
> > >> > java    30581 apache  mem    REG  253,0  372579599 18880400
> > >> > _4i5a_Lucene50_0.dvd
> > >> > java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> > >> > java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> > >> > java    30581 apache  mem    REG  253,0  363493917 18880533
> > >> > _4ct1_Lucene50_0.dvd
> > >> > java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> > >> > java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> > >> > java    30581 apache  mem    REG  253,0   46271330 18880661
> > >> > _4ct1_Lucene50_0.tim
> > >> > java    30581 apache  mem    REG  253,0   26911387 18880653
> > >> > _4ct1_Lucene50_0.pos
> > >> > java    30581 apache  mem    REG  253,0   54678249 18880568
> > >> > _4ct1_Lucene50_0.doc
> > >> > java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> > >> > java    30581 apache  mem    REG  253,0   45032159 18880389
> > >> > _4i5a_Lucene50_0.tim
> > >> > java    30581 apache  mem    REG  253,0   26486772 18880388
> > >> > _4i5a_Lucene50_0.pos
> > >> > java    30581 apache  mem    REG  253,0   55411002 18880362
> > >> > _4i5a_Lucene50_0.doc
> > >> > java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> > >> > java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> > >> > java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> > >> > java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> > >> > java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> > >> > java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> > >> > java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> > >> > java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> > >> > java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> > >> > java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> > >> > java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> > >> > java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> > >> > java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> > >> > java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> > >> > java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> > >> > java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> > >> > java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> > >> > java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> > >> > java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> > >> > java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> > >> > java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> > >> > java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> > >> > java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> > >> > java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> > >> > java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> > >> > java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> > >> > java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
> > >> >
> > >> > Output of some of the biggest files in the index directory:
> > >> >
> > >> > -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> > >> > -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> > _4ct1_Lucene50_0.dvd
> > >> > -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> > _4gs5_Lucene50_0.tim
> > >> > -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> > _4i5a_Lucene50_0.dvd
> > >> > -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> > _4gs5_Lucene50_0.doc
> > >> > -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> > >> > -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> > >> > -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> > _3k3a_Lucene50_0.dvd
> > >> > -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> > _4gs5_Lucene50_0.dvd
> > >> >
> > >> > Note that the 3ou7 and 3k3a segments no longer appear to be in use?
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [hidden email]
> > >> For additional commands, e-mail: [hidden email]
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
> >


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: debugging growing index size

Uwe Schindler
In reply to this post by wmartinusa
Hi,

Lucene has the workaround, so it should not happen, UNLESS you explicitly disable the hack using MMapDirectory#setEnableUnmap(false).

Uwe

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: will martin [mailto:[hidden email]]
> Sent: Friday, November 13, 2015 6:04 PM
> To: [hidden email]
> Subject: Re: debugging growing index size
>
> Hi Rob:
>
>
> Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter
> Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
>
> MappedByteBuffer …. what OS are you on Rob? What JVM?
>
> http://bugs.java.com/view_bug.do?bug_id=4724038
>
> http://mail-archives.apache.org/mod_mbox/lucene-
> dev/201509.mbox/%[hidden email]%3E
>
> hth
> -will
>
>
>
> > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
> <[hidden email]> wrote:
> >
> > I'm currently running using NIOFS. It seems to prevent the issue from
> > appearing.
> >
> > This is a second run (with applied deletes etc)
> >
> > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> >
> > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> >
> >
> >
> > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> > [hidden email]> wrote:
> >
> >> Hi Rob,
> >>
> >> A couple more things:
> >>
> >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> >>
> >> Also, can you try your test using NIOFSDirectory instead?  Curious if
> >> that changes things...
> >>
> >> Mike McCandless
> >>
> >> http://blog.mikemccandless.com
> >>
> >>
> >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> >> <[hidden email]> wrote:
> >>> Curious indeed!
> >>>
> >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
> the
> >>> logs. Will get back with them in a day hopefully.
> >>>
> >>> Thanks for the extra logging!
> >>>
> >>> -Rob
> >>>
> >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> >>> [hidden email]> wrote:
> >>>
> >>>> Hmm, curious.
> >>>>
> >>>> I looked at the [large] infoStream output and I see segment _3ou7
> >>>> present on init of IW, a few getReader calls referencing it, then a
> >>>> forceMerge that indeed merges it away, yet I do NOT see IW
> attempting
> >>>> deletion of its files.
> >>>>
> >>>> And indeed I see plenty (too many: many times per second?) of
> commits
> >>>> after that, so the index itself is no longer referencing _3ou7.
> >>>>
> >>>> If you are failing to close all NRT readers then I would expect _3ou7
> >>>> to be in the lsof output, but it's not.
> >>>>
> >>>> The NRT readers close method has logic that notifies IndexWriter when
> >>>> it's done "needing" the files, to emulate "delete on last close"
> >>>> semantics for filesystems like HDFS that don't do that ... it's
> >>>> possible something is wrong here.
> >>>>
> >>>> Can you set the (public, static) boolean
> >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
> generate this
> >>>> log?  This causes IW to log the ref count of each file it's tracking
> >>>> ...
> >>>>
> >>>> I'll also add a bit more verbosity to IW when NRT readers are opened
> >>>> and close, for 5.4.0.
> >>>>
> >>>> Mike McCandless
> >>>>
> >>>> http://blog.mikemccandless.com
> >>>>
> >>>>
> >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> >>>> <[hidden email]> wrote:
> >>>>> Hi all,
> >>>>>
> >>>>> I'm still debugging the growing-index size. I think closing index
> >> readers
> >>>>> might help (work in progress), but I can't really see them holding on
> >> to
> >>>>> files (at least, using lsof ). Restarting the application sheds some
> >>>> light,
> >>>>> I see logging on files that are no longer referenced.
> >>>>>
> >>>>> What I see is that there are files in the index-directory, that seem
> >> to
> >>>>> longer referenced..
> >>>>>
> >>>>> I put the output of the infoStream online, because is it rather big
> >> (30MB
> >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> >>>>>
> >>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
> >> This
> >>>> is
> >>>>> on an CentOS box (maybe that influences stuff as well?)
> >>>>>
> >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
> >>>>> _4gs5_Lucene50_0.dvd
> >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546 _4gs5.fdt
> >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
> >>>>> _4gs5_Lucene50_0.tim
> >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
> >>>>> _4gs5_Lucene50_0.pos
> >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
> >>>>> _4gs5_Lucene50_0.doc
> >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
> >>>>> _4i5a_Lucene50_0.dvd
> >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748 _4g37.cfs
> >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721 _4fb3.cfs
> >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
> >>>>> _4ct1_Lucene50_0.dvd
> >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806 _4gjc.cfs
> >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553 _4ct1.fdt
> >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
> >>>>> _4ct1_Lucene50_0.tim
> >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
> >>>>> _4ct1_Lucene50_0.pos
> >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
> >>>>> _4ct1_Lucene50_0.doc
> >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328 _4i5a.fdt
> >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
> >>>>> _4i5a_Lucene50_0.tim
> >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
> >>>>> _4i5a_Lucene50_0.pos
> >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
> >>>>> _4i5a_Lucene50_0.doc
> >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340 _4hkn.cfs
> >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324 _4gpz.cfs
> >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524 _4i11.cfs
> >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575 _4i0t.cfs
> >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624 _4i0i.cfs
> >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717 _4hzp.cfs
> >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583 _4hze.cfs
> >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411 _4gs5.nvd
> >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732 _4hod.cfs
> >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495 _4i84.cfs
> >>>>> java    30581 apache  mem    REG  253,0     835726 18880660 _4i7z.cfs
> >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648 _4i7w.cfs
> >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401 _4i4o.cfs
> >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440 _4i4a.cfs
> >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512 _4i7v.cfs
> >>>>> java    30581 apache  mem    REG  253,0     941773 18880613 _4i7x.cfs
> >>>>> java    30581 apache  mem    REG  253,0     984023 18880588 _4i7o.cfs
> >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619 _4i7y.cfs
> >>>>> java    30581 apache  mem    REG  253,0     466371 18880515 _4ct1.nvd
> >>>>> java    30581 apache  mem    REG  253,0     723280 18880573 _4i7q.cfs
> >>>>> java    30581 apache  mem    REG  253,0     806289 18880517 _4i7h.cfs
> >>>>> java    30581 apache  mem    REG  253,0      17362 18880520 _4i9s.cfs
> >>>>> java    30581 apache  mem    REG  253,0     698362 18880531 _4i9r.cfs
> >>>>> java    30581 apache  mem    REG  253,0     483215 18880406 _4i5a.nvd
> >>>>> java    30581 apache  mem    REG  253,0      14110 18880416 _4i9v.cfs
> >>>>> java    30581 apache  mem    REG  253,0       6121 18880412 _4i9t.cfs
> >>>>> java    30581 apache   30wW  REG  253,0          0 18877901 write.lock
> >>>>>
> >>>>> Output of some of the biggest files in the index directory:
> >>>>>
> >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> >> _4ct1_Lucene50_0.dvd
> >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> >> _4gs5_Lucene50_0.tim
> >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> >> _4i5a_Lucene50_0.dvd
> >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> >> _4gs5_Lucene50_0.doc
> >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> >> _3k3a_Lucene50_0.dvd
> >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> >> _4gs5_Lucene50_0.dvd
> >>>>>
> >>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in use?
> >>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: [hidden email]
> >>>> For additional commands, e-mail: [hidden email]
> >>>>
> >>>>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: debugging growing index size

Rob Audenaerde
I haven't disabled unmapping, and I am running out-of-the-box
FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
explicitly constructed a NIOFSDIrectoryReader

OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
later for more details.
On Nov 13, 2015 18:07, "Uwe Schindler" <[hidden email]> wrote:

> Hi,
>
> Lucene has the workaround, so it should not happen, UNLESS you explicitly
> disable the hack using MMapDirectory#setEnableUnmap(false).
>
> Uwe
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
> > -----Original Message-----
> > From: will martin [mailto:[hidden email]]
> > Sent: Friday, November 13, 2015 6:04 PM
> > To: [hidden email]
> > Subject: Re: debugging growing index size
> >
> > Hi Rob:
> >
> >
> > Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter
> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
> >
> > MappedByteBuffer …. what OS are you on Rob? What JVM?
> >
> > http://bugs.java.com/view_bug.do?bug_id=4724038
> >
> > http://mail-archives.apache.org/mod_mbox/lucene-
> > dev/201509.mbox/%[hidden email]%3E
> >
> > hth
> > -will
> >
> >
> >
> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
> > <[hidden email]> wrote:
> > >
> > > I'm currently running using NIOFS. It seems to prevent the issue from
> > > appearing.
> > >
> > > This is a second run (with applied deletes etc)
> > >
> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> > >
> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> > >
> > >
> > >
> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> > > [hidden email]> wrote:
> > >
> > >> Hi Rob,
> > >>
> > >> A couple more things:
> > >>
> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> > >>
> > >> Also, can you try your test using NIOFSDirectory instead?  Curious if
> > >> that changes things...
> > >>
> > >> Mike McCandless
> > >>
> > >> http://blog.mikemccandless.com
> > >>
> > >>
> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> > >> <[hidden email]> wrote:
> > >>> Curious indeed!
> > >>>
> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
> > the
> > >>> logs. Will get back with them in a day hopefully.
> > >>>
> > >>> Thanks for the extra logging!
> > >>>
> > >>> -Rob
> > >>>
> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> > >>> [hidden email]> wrote:
> > >>>
> > >>>> Hmm, curious.
> > >>>>
> > >>>> I looked at the [large] infoStream output and I see segment _3ou7
> > >>>> present on init of IW, a few getReader calls referencing it, then a
> > >>>> forceMerge that indeed merges it away, yet I do NOT see IW
> > attempting
> > >>>> deletion of its files.
> > >>>>
> > >>>> And indeed I see plenty (too many: many times per second?) of
> > commits
> > >>>> after that, so the index itself is no longer referencing _3ou7.
> > >>>>
> > >>>> If you are failing to close all NRT readers then I would expect
> _3ou7
> > >>>> to be in the lsof output, but it's not.
> > >>>>
> > >>>> The NRT readers close method has logic that notifies IndexWriter
> when
> > >>>> it's done "needing" the files, to emulate "delete on last close"
> > >>>> semantics for filesystems like HDFS that don't do that ... it's
> > >>>> possible something is wrong here.
> > >>>>
> > >>>> Can you set the (public, static) boolean
> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
> > generate this
> > >>>> log?  This causes IW to log the ref count of each file it's tracking
> > >>>> ...
> > >>>>
> > >>>> I'll also add a bit more verbosity to IW when NRT readers are opened
> > >>>> and close, for 5.4.0.
> > >>>>
> > >>>> Mike McCandless
> > >>>>
> > >>>> http://blog.mikemccandless.com
> > >>>>
> > >>>>
> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> > >>>> <[hidden email]> wrote:
> > >>>>> Hi all,
> > >>>>>
> > >>>>> I'm still debugging the growing-index size. I think closing index
> > >> readers
> > >>>>> might help (work in progress), but I can't really see them holding
> on
> > >> to
> > >>>>> files (at least, using lsof ). Restarting the application sheds
> some
> > >>>> light,
> > >>>>> I see logging on files that are no longer referenced.
> > >>>>>
> > >>>>> What I see is that there are files in the index-directory, that
> seem
> > >> to
> > >>>>> longer referenced..
> > >>>>>
> > >>>>> I put the output of the infoStream online, because is it rather big
> > >> (30MB
> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> > >>>>>
> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
> > >> This
> > >>>> is
> > >>>>> on an CentOS box (maybe that influences stuff as well?)
> > >>>>>
> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
> > >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
> > >>>>> _4gs5_Lucene50_0.dvd
> > >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546
> _4gs5.fdt
> > >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
> > >>>>> _4gs5_Lucene50_0.tim
> > >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
> > >>>>> _4gs5_Lucene50_0.pos
> > >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
> > >>>>> _4gs5_Lucene50_0.doc
> > >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
> > >>>>> _4i5a_Lucene50_0.dvd
> > >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748
> _4g37.cfs
> > >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721
> _4fb3.cfs
> > >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
> > >>>>> _4ct1_Lucene50_0.dvd
> > >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806
> _4gjc.cfs
> > >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553
> _4ct1.fdt
> > >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
> > >>>>> _4ct1_Lucene50_0.tim
> > >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
> > >>>>> _4ct1_Lucene50_0.pos
> > >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
> > >>>>> _4ct1_Lucene50_0.doc
> > >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328
> _4i5a.fdt
> > >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
> > >>>>> _4i5a_Lucene50_0.tim
> > >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
> > >>>>> _4i5a_Lucene50_0.pos
> > >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
> > >>>>> _4i5a_Lucene50_0.doc
> > >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340
> _4hkn.cfs
> > >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324
> _4gpz.cfs
> > >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524
> _4i11.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575
> _4i0t.cfs
> > >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624
> _4i0i.cfs
> > >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717
> _4hzp.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583
> _4hze.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411
> _4gs5.nvd
> > >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732
> _4hod.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495
> _4i84.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     835726 18880660
> _4i7z.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648
> _4i7w.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401
> _4i4o.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440
> _4i4a.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512
> _4i7v.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     941773 18880613
> _4i7x.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     984023 18880588
> _4i7o.cfs
> > >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619
> _4i7y.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     466371 18880515
> _4ct1.nvd
> > >>>>> java    30581 apache  mem    REG  253,0     723280 18880573
> _4i7q.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     806289 18880517
> _4i7h.cfs
> > >>>>> java    30581 apache  mem    REG  253,0      17362 18880520
> _4i9s.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     698362 18880531
> _4i9r.cfs
> > >>>>> java    30581 apache  mem    REG  253,0     483215 18880406
> _4i5a.nvd
> > >>>>> java    30581 apache  mem    REG  253,0      14110 18880416
> _4i9v.cfs
> > >>>>> java    30581 apache  mem    REG  253,0       6121 18880412
> _4i9t.cfs
> > >>>>> java    30581 apache   30wW  REG  253,0          0 18877901
> write.lock
> > >>>>>
> > >>>>> Output of some of the biggest files in the index directory:
> > >>>>>
> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> > >> _4ct1_Lucene50_0.dvd
> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> > >> _4gs5_Lucene50_0.tim
> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> > >> _4i5a_Lucene50_0.dvd
> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> > >> _4gs5_Lucene50_0.doc
> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> > >> _3k3a_Lucene50_0.dvd
> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> > >> _4gs5_Lucene50_0.dvd
> > >>>>>
> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in use?
> > >>>>
> > >>>>
> ---------------------------------------------------------------------
> > >>>> To unsubscribe, e-mail: [hidden email]
> > >>>> For additional commands, e-mail: [hidden email]
> > >>>>
> > >>>>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [hidden email]
> > >> For additional commands, e-mail: [hidden email]
> > >>
> > >>
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Michael McCandless-2
So with MMapDir at defaults (unmap is enabled) you see old files, with
no open file handles as reported by lsof, still existing in your index
directory, taking lots of space.

But with NIOFSDirectory the issue doesn't happen?  Are you sure?

I'll look at the 6.6 GB infoStream to see what it says about the ref counts.

Did you fix the issue in your app where you're not closing all opened
NRT readers?

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
<[hidden email]> wrote:

> I haven't disabled unmapping, and I am running out-of-the-box
> FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
> explicitly constructed a NIOFSDIrectoryReader
>
> OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
> later for more details.
> On Nov 13, 2015 18:07, "Uwe Schindler" <[hidden email]> wrote:
>
>> Hi,
>>
>> Lucene has the workaround, so it should not happen, UNLESS you explicitly
>> disable the hack using MMapDirectory#setEnableUnmap(false).
>>
>> Uwe
>>
>> -----
>> Uwe Schindler
>> H.-H.-Meier-Allee 63, D-28213 Bremen
>> http://www.thetaphi.de
>> eMail: [hidden email]
>>
>> > -----Original Message-----
>> > From: will martin [mailto:[hidden email]]
>> > Sent: Friday, November 13, 2015 6:04 PM
>> > To: [hidden email]
>> > Subject: Re: debugging growing index size
>> >
>> > Hi Rob:
>> >
>> >
>> > Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter
>> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
>> >
>> > MappedByteBuffer …. what OS are you on Rob? What JVM?
>> >
>> > http://bugs.java.com/view_bug.do?bug_id=4724038
>> >
>> > http://mail-archives.apache.org/mod_mbox/lucene-
>> > dev/201509.mbox/%[hidden email]%3E
>> >
>> > hth
>> > -will
>> >
>> >
>> >
>> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
>> > <[hidden email]> wrote:
>> > >
>> > > I'm currently running using NIOFS. It seems to prevent the issue from
>> > > appearing.
>> > >
>> > > This is a second run (with applied deletes etc)
>> > >
>> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>> > >
>> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>> > >
>> > >
>> > >
>> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
>> > > [hidden email]> wrote:
>> > >
>> > >> Hi Rob,
>> > >>
>> > >> A couple more things:
>> > >>
>> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>> > >>
>> > >> Also, can you try your test using NIOFSDirectory instead?  Curious if
>> > >> that changes things...
>> > >>
>> > >> Mike McCandless
>> > >>
>> > >> http://blog.mikemccandless.com
>> > >>
>> > >>
>> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
>> > >> <[hidden email]> wrote:
>> > >>> Curious indeed!
>> > >>>
>> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
>> > the
>> > >>> logs. Will get back with them in a day hopefully.
>> > >>>
>> > >>> Thanks for the extra logging!
>> > >>>
>> > >>> -Rob
>> > >>>
>> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
>> > >>> [hidden email]> wrote:
>> > >>>
>> > >>>> Hmm, curious.
>> > >>>>
>> > >>>> I looked at the [large] infoStream output and I see segment _3ou7
>> > >>>> present on init of IW, a few getReader calls referencing it, then a
>> > >>>> forceMerge that indeed merges it away, yet I do NOT see IW
>> > attempting
>> > >>>> deletion of its files.
>> > >>>>
>> > >>>> And indeed I see plenty (too many: many times per second?) of
>> > commits
>> > >>>> after that, so the index itself is no longer referencing _3ou7.
>> > >>>>
>> > >>>> If you are failing to close all NRT readers then I would expect
>> _3ou7
>> > >>>> to be in the lsof output, but it's not.
>> > >>>>
>> > >>>> The NRT readers close method has logic that notifies IndexWriter
>> when
>> > >>>> it's done "needing" the files, to emulate "delete on last close"
>> > >>>> semantics for filesystems like HDFS that don't do that ... it's
>> > >>>> possible something is wrong here.
>> > >>>>
>> > >>>> Can you set the (public, static) boolean
>> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
>> > generate this
>> > >>>> log?  This causes IW to log the ref count of each file it's tracking
>> > >>>> ...
>> > >>>>
>> > >>>> I'll also add a bit more verbosity to IW when NRT readers are opened
>> > >>>> and close, for 5.4.0.
>> > >>>>
>> > >>>> Mike McCandless
>> > >>>>
>> > >>>> http://blog.mikemccandless.com
>> > >>>>
>> > >>>>
>> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
>> > >>>> <[hidden email]> wrote:
>> > >>>>> Hi all,
>> > >>>>>
>> > >>>>> I'm still debugging the growing-index size. I think closing index
>> > >> readers
>> > >>>>> might help (work in progress), but I can't really see them holding
>> on
>> > >> to
>> > >>>>> files (at least, using lsof ). Restarting the application sheds
>> some
>> > >>>> light,
>> > >>>>> I see logging on files that are no longer referenced.
>> > >>>>>
>> > >>>>> What I see is that there are files in the index-directory, that
>> seem
>> > >> to
>> > >>>>> longer referenced..
>> > >>>>>
>> > >>>>> I put the output of the infoStream online, because is it rather big
>> > >> (30MB
>> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>> > >>>>>
>> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
>> > >> This
>> > >>>> is
>> > >>>>> on an CentOS box (maybe that influences stuff as well?)
>> > >>>>>
>> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
>> > >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
>> > >>>>> _4gs5_Lucene50_0.dvd
>> > >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546
>> _4gs5.fdt
>> > >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
>> > >>>>> _4gs5_Lucene50_0.tim
>> > >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
>> > >>>>> _4gs5_Lucene50_0.pos
>> > >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
>> > >>>>> _4gs5_Lucene50_0.doc
>> > >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
>> > >>>>> _4i5a_Lucene50_0.dvd
>> > >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748
>> _4g37.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721
>> _4fb3.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
>> > >>>>> _4ct1_Lucene50_0.dvd
>> > >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806
>> _4gjc.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553
>> _4ct1.fdt
>> > >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
>> > >>>>> _4ct1_Lucene50_0.tim
>> > >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
>> > >>>>> _4ct1_Lucene50_0.pos
>> > >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
>> > >>>>> _4ct1_Lucene50_0.doc
>> > >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328
>> _4i5a.fdt
>> > >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
>> > >>>>> _4i5a_Lucene50_0.tim
>> > >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
>> > >>>>> _4i5a_Lucene50_0.pos
>> > >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
>> > >>>>> _4i5a_Lucene50_0.doc
>> > >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340
>> _4hkn.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324
>> _4gpz.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524
>> _4i11.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575
>> _4i0t.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624
>> _4i0i.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717
>> _4hzp.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583
>> _4hze.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411
>> _4gs5.nvd
>> > >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732
>> _4hod.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495
>> _4i84.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     835726 18880660
>> _4i7z.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648
>> _4i7w.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401
>> _4i4o.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440
>> _4i4a.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512
>> _4i7v.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     941773 18880613
>> _4i7x.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     984023 18880588
>> _4i7o.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619
>> _4i7y.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     466371 18880515
>> _4ct1.nvd
>> > >>>>> java    30581 apache  mem    REG  253,0     723280 18880573
>> _4i7q.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     806289 18880517
>> _4i7h.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0      17362 18880520
>> _4i9s.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     698362 18880531
>> _4i9r.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0     483215 18880406
>> _4i5a.nvd
>> > >>>>> java    30581 apache  mem    REG  253,0      14110 18880416
>> _4i9v.cfs
>> > >>>>> java    30581 apache  mem    REG  253,0       6121 18880412
>> _4i9t.cfs
>> > >>>>> java    30581 apache   30wW  REG  253,0          0 18877901
>> write.lock
>> > >>>>>
>> > >>>>> Output of some of the biggest files in the index directory:
>> > >>>>>
>> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
>> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
>> > >> _4ct1_Lucene50_0.dvd
>> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
>> > >> _4gs5_Lucene50_0.tim
>> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
>> > >> _4i5a_Lucene50_0.dvd
>> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
>> > >> _4gs5_Lucene50_0.doc
>> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
>> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
>> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
>> > >> _3k3a_Lucene50_0.dvd
>> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
>> > >> _4gs5_Lucene50_0.dvd
>> > >>>>>
>> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in use?
>> > >>>>
>> > >>>>
>> ---------------------------------------------------------------------
>> > >>>> To unsubscribe, e-mail: [hidden email]
>> > >>>> For additional commands, e-mail: [hidden email]
>> > >>>>
>> > >>>>
>> > >>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: [hidden email]
>> > >> For additional commands, e-mail: [hidden email]
>> > >>
>> > >>
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: [hidden email]
>> > For additional commands, e-mail: [hidden email]
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: debugging growing index size

Michael McCandless-2
I can see that segments that are deleted never decRef to 0, e.g.
_1dtf, I assume because NRT readers are not being closed.

I think you should fix that bug, so every NRT reader ever opened is
also closed (SearcherManager is a simple way to ensure this), and then
let's regroup if disk space is still being leaked with a single
IndexWriter.

Mike McCandless

http://blog.mikemccandless.com


On Fri, Nov 13, 2015 at 2:29 PM, Michael McCandless
<[hidden email]> wrote:

> So with MMapDir at defaults (unmap is enabled) you see old files, with
> no open file handles as reported by lsof, still existing in your index
> directory, taking lots of space.
>
> But with NIOFSDirectory the issue doesn't happen?  Are you sure?
>
> I'll look at the 6.6 GB infoStream to see what it says about the ref counts.
>
> Did you fix the issue in your app where you're not closing all opened
> NRT readers?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
> <[hidden email]> wrote:
>> I haven't disabled unmapping, and I am running out-of-the-box
>> FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
>> explicitly constructed a NIOFSDIrectoryReader
>>
>> OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
>> later for more details.
>> On Nov 13, 2015 18:07, "Uwe Schindler" <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>> Lucene has the workaround, so it should not happen, UNLESS you explicitly
>>> disable the hack using MMapDirectory#setEnableUnmap(false).
>>>
>>> Uwe
>>>
>>> -----
>>> Uwe Schindler
>>> H.-H.-Meier-Allee 63, D-28213 Bremen
>>> http://www.thetaphi.de
>>> eMail: [hidden email]
>>>
>>> > -----Original Message-----
>>> > From: will martin [mailto:[hidden email]]
>>> > Sent: Friday, November 13, 2015 6:04 PM
>>> > To: [hidden email]
>>> > Subject: Re: debugging growing index size
>>> >
>>> > Hi Rob:
>>> >
>>> >
>>> > Doesn’t this look like known SE issue JDK-4724038 and discussed by Peter
>>> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
>>> >
>>> > MappedByteBuffer …. what OS are you on Rob? What JVM?
>>> >
>>> > http://bugs.java.com/view_bug.do?bug_id=4724038
>>> >
>>> > http://mail-archives.apache.org/mod_mbox/lucene-
>>> > dev/201509.mbox/%[hidden email]%3E
>>> >
>>> > hth
>>> > -will
>>> >
>>> >
>>> >
>>> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
>>> > <[hidden email]> wrote:
>>> > >
>>> > > I'm currently running using NIOFS. It seems to prevent the issue from
>>> > > appearing.
>>> > >
>>> > > This is a second run (with applied deletes etc)
>>> > >
>>> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
>>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>>> > >
>>> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
>>> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17 _z20_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13 _y0s_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19 _12r5_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17 _110e_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12 _xod_Lucene50_0.dvd
>>> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
>>> > >
>>> > >
>>> > >
>>> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
>>> > > [hidden email]> wrote:
>>> > >
>>> > >> Hi Rob,
>>> > >>
>>> > >> A couple more things:
>>> > >>
>>> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
>>> > >>
>>> > >> Also, can you try your test using NIOFSDirectory instead?  Curious if
>>> > >> that changes things...
>>> > >>
>>> > >> Mike McCandless
>>> > >>
>>> > >> http://blog.mikemccandless.com
>>> > >>
>>> > >>
>>> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
>>> > >> <[hidden email]> wrote:
>>> > >>> Curious indeed!
>>> > >>>
>>> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and recreate
>>> > the
>>> > >>> logs. Will get back with them in a day hopefully.
>>> > >>>
>>> > >>> Thanks for the extra logging!
>>> > >>>
>>> > >>> -Rob
>>> > >>>
>>> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
>>> > >>> [hidden email]> wrote:
>>> > >>>
>>> > >>>> Hmm, curious.
>>> > >>>>
>>> > >>>> I looked at the [large] infoStream output and I see segment _3ou7
>>> > >>>> present on init of IW, a few getReader calls referencing it, then a
>>> > >>>> forceMerge that indeed merges it away, yet I do NOT see IW
>>> > attempting
>>> > >>>> deletion of its files.
>>> > >>>>
>>> > >>>> And indeed I see plenty (too many: many times per second?) of
>>> > commits
>>> > >>>> after that, so the index itself is no longer referencing _3ou7.
>>> > >>>>
>>> > >>>> If you are failing to close all NRT readers then I would expect
>>> _3ou7
>>> > >>>> to be in the lsof output, but it's not.
>>> > >>>>
>>> > >>>> The NRT readers close method has logic that notifies IndexWriter
>>> when
>>> > >>>> it's done "needing" the files, to emulate "delete on last close"
>>> > >>>> semantics for filesystems like HDFS that don't do that ... it's
>>> > >>>> possible something is wrong here.
>>> > >>>>
>>> > >>>> Can you set the (public, static) boolean
>>> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
>>> > generate this
>>> > >>>> log?  This causes IW to log the ref count of each file it's tracking
>>> > >>>> ...
>>> > >>>>
>>> > >>>> I'll also add a bit more verbosity to IW when NRT readers are opened
>>> > >>>> and close, for 5.4.0.
>>> > >>>>
>>> > >>>> Mike McCandless
>>> > >>>>
>>> > >>>> http://blog.mikemccandless.com
>>> > >>>>
>>> > >>>>
>>> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
>>> > >>>> <[hidden email]> wrote:
>>> > >>>>> Hi all,
>>> > >>>>>
>>> > >>>>> I'm still debugging the growing-index size. I think closing index
>>> > >> readers
>>> > >>>>> might help (work in progress), but I can't really see them holding
>>> on
>>> > >> to
>>> > >>>>> files (at least, using lsof ). Restarting the application sheds
>>> some
>>> > >>>> light,
>>> > >>>>> I see logging on files that are no longer referenced.
>>> > >>>>>
>>> > >>>>> What I see is that there are files in the index-directory, that
>>> seem
>>> > >> to
>>> > >>>>> longer referenced..
>>> > >>>>>
>>> > >>>>> I put the output of the infoStream online, because is it rather big
>>> > >> (30MB
>>> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
>>> > >>>>>
>>> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
>>> > >> This
>>> > >>>> is
>>> > >>>>> on an CentOS box (maybe that influences stuff as well?)
>>> > >>>>>
>>> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE NAME
>>> > >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
>>> > >>>>> _4gs5_Lucene50_0.dvd
>>> > >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546
>>> _4gs5.fdt
>>> > >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
>>> > >>>>> _4gs5_Lucene50_0.tim
>>> > >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
>>> > >>>>> _4gs5_Lucene50_0.pos
>>> > >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
>>> > >>>>> _4gs5_Lucene50_0.doc
>>> > >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
>>> > >>>>> _4i5a_Lucene50_0.dvd
>>> > >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748
>>> _4g37.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721
>>> _4fb3.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
>>> > >>>>> _4ct1_Lucene50_0.dvd
>>> > >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806
>>> _4gjc.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553
>>> _4ct1.fdt
>>> > >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
>>> > >>>>> _4ct1_Lucene50_0.tim
>>> > >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
>>> > >>>>> _4ct1_Lucene50_0.pos
>>> > >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
>>> > >>>>> _4ct1_Lucene50_0.doc
>>> > >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328
>>> _4i5a.fdt
>>> > >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
>>> > >>>>> _4i5a_Lucene50_0.tim
>>> > >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
>>> > >>>>> _4i5a_Lucene50_0.pos
>>> > >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
>>> > >>>>> _4i5a_Lucene50_0.doc
>>> > >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340
>>> _4hkn.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324
>>> _4gpz.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524
>>> _4i11.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575
>>> _4i0t.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624
>>> _4i0i.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717
>>> _4hzp.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583
>>> _4hze.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411
>>> _4gs5.nvd
>>> > >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732
>>> _4hod.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495
>>> _4i84.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     835726 18880660
>>> _4i7z.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648
>>> _4i7w.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401
>>> _4i4o.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440
>>> _4i4a.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512
>>> _4i7v.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     941773 18880613
>>> _4i7x.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     984023 18880588
>>> _4i7o.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619
>>> _4i7y.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     466371 18880515
>>> _4ct1.nvd
>>> > >>>>> java    30581 apache  mem    REG  253,0     723280 18880573
>>> _4i7q.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     806289 18880517
>>> _4i7h.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0      17362 18880520
>>> _4i9s.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     698362 18880531
>>> _4i9r.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0     483215 18880406
>>> _4i5a.nvd
>>> > >>>>> java    30581 apache  mem    REG  253,0      14110 18880416
>>> _4i9v.cfs
>>> > >>>>> java    30581 apache  mem    REG  253,0       6121 18880412
>>> _4i9t.cfs
>>> > >>>>> java    30581 apache   30wW  REG  253,0          0 18877901
>>> write.lock
>>> > >>>>>
>>> > >>>>> Output of some of the biggest files in the index directory:
>>> > >>>>>
>>> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
>>> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
>>> > >> _4ct1_Lucene50_0.dvd
>>> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
>>> > >> _4gs5_Lucene50_0.tim
>>> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
>>> > >> _4i5a_Lucene50_0.dvd
>>> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
>>> > >> _4gs5_Lucene50_0.doc
>>> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
>>> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
>>> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
>>> > >> _3k3a_Lucene50_0.dvd
>>> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
>>> > >> _4gs5_Lucene50_0.dvd
>>> > >>>>>
>>> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in use?
>>> > >>>>
>>> > >>>>
>>> ---------------------------------------------------------------------
>>> > >>>> To unsubscribe, e-mail: [hidden email]
>>> > >>>> For additional commands, e-mail: [hidden email]
>>> > >>>>
>>> > >>>>
>>> > >>
>>> > >> ---------------------------------------------------------------------
>>> > >> To unsubscribe, e-mail: [hidden email]
>>> > >> For additional commands, e-mail: [hidden email]
>>> > >>
>>> > >>
>>> >
>>> >
>>> > ---------------------------------------------------------------------
>>> > To unsubscribe, e-mail: [hidden email]
>>> > For additional commands, e-mail: [hidden email]
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: debugging growing index size

Uwe Schindler
In reply to this post by Michael McCandless-2
I agree. On Linux it is impossible that MMapDirectory is the reason! Only on windows you cannot delete still open/mapped files.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Michael McCandless [mailto:[hidden email]]
> Sent: Friday, November 13, 2015 8:30 PM
> To: Lucene Users <[hidden email]>
> Subject: Re: debugging growing index size
>
> So with MMapDir at defaults (unmap is enabled) you see old files, with
> no open file handles as reported by lsof, still existing in your index
> directory, taking lots of space.
>
> But with NIOFSDirectory the issue doesn't happen?  Are you sure?
>
> I'll look at the 6.6 GB infoStream to see what it says about the ref counts.
>
> Did you fix the issue in your app where you're not closing all opened
> NRT readers?
>
> Mike McCandless
>
> http://blog.mikemccandless.com
>
>
> On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
> <[hidden email]> wrote:
> > I haven't disabled unmapping, and I am running out-of-the-box
> > FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
> > explicitly constructed a NIOFSDIrectoryReader
> >
> > OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
> > later for more details.
> > On Nov 13, 2015 18:07, "Uwe Schindler" <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> Lucene has the workaround, so it should not happen, UNLESS you
> explicitly
> >> disable the hack using MMapDirectory#setEnableUnmap(false).
> >>
> >> Uwe
> >>
> >> -----
> >> Uwe Schindler
> >> H.-H.-Meier-Allee 63, D-28213 Bremen
> >> http://www.thetaphi.de
> >> eMail: [hidden email]
> >>
> >> > -----Original Message-----
> >> > From: will martin [mailto:[hidden email]]
> >> > Sent: Friday, November 13, 2015 6:04 PM
> >> > To: [hidden email]
> >> > Subject: Re: debugging growing index size
> >> >
> >> > Hi Rob:
> >> >
> >> >
> >> > Doesn’t this look like known SE issue JDK-4724038 and discussed by
> Peter
> >> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
> >> >
> >> > MappedByteBuffer …. what OS are you on Rob? What JVM?
> >> >
> >> > http://bugs.java.com/view_bug.do?bug_id=4724038
> >> >
> >> > http://mail-archives.apache.org/mod_mbox/lucene-
> >> > dev/201509.mbox/%[hidden email]%3E
> >> >
> >> > hth
> >> > -will
> >> >
> >> >
> >> >
> >> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
> >> > <[hidden email]> wrote:
> >> > >
> >> > > I'm currently running using NIOFS. It seems to prevent the issue from
> >> > > appearing.
> >> > >
> >> > > This is a second run (with applied deletes etc)
> >> > >
> >> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> >> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12
> _xod_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17
> _110e_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19
> _12r5_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13
> _y0s_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17
> _z20_Lucene50_0.dvd
> >> > >
> >> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> >> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17
> _z20_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13
> _y0s_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19
> _12r5_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17
> _110e_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12
> _xod_Lucene50_0.dvd
> >> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09 _y_Lucene50_0.dvd
> >> > >
> >> > >
> >> > >
> >> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> >> > > [hidden email]> wrote:
> >> > >
> >> > >> Hi Rob,
> >> > >>
> >> > >> A couple more things:
> >> > >>
> >> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> >> > >>
> >> > >> Also, can you try your test using NIOFSDirectory instead?  Curious if
> >> > >> that changes things...
> >> > >>
> >> > >> Mike McCandless
> >> > >>
> >> > >> http://blog.mikemccandless.com
> >> > >>
> >> > >>
> >> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> >> > >> <[hidden email]> wrote:
> >> > >>> Curious indeed!
> >> > >>>
> >> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and
> recreate
> >> > the
> >> > >>> logs. Will get back with them in a day hopefully.
> >> > >>>
> >> > >>> Thanks for the extra logging!
> >> > >>>
> >> > >>> -Rob
> >> > >>>
> >> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> >> > >>> [hidden email]> wrote:
> >> > >>>
> >> > >>>> Hmm, curious.
> >> > >>>>
> >> > >>>> I looked at the [large] infoStream output and I see segment _3ou7
> >> > >>>> present on init of IW, a few getReader calls referencing it, then a
> >> > >>>> forceMerge that indeed merges it away, yet I do NOT see IW
> >> > attempting
> >> > >>>> deletion of its files.
> >> > >>>>
> >> > >>>> And indeed I see plenty (too many: many times per second?) of
> >> > commits
> >> > >>>> after that, so the index itself is no longer referencing _3ou7.
> >> > >>>>
> >> > >>>> If you are failing to close all NRT readers then I would expect
> >> _3ou7
> >> > >>>> to be in the lsof output, but it's not.
> >> > >>>>
> >> > >>>> The NRT readers close method has logic that notifies IndexWriter
> >> when
> >> > >>>> it's done "needing" the files, to emulate "delete on last close"
> >> > >>>> semantics for filesystems like HDFS that don't do that ... it's
> >> > >>>> possible something is wrong here.
> >> > >>>>
> >> > >>>> Can you set the (public, static) boolean
> >> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
> >> > generate this
> >> > >>>> log?  This causes IW to log the ref count of each file it's tracking
> >> > >>>> ...
> >> > >>>>
> >> > >>>> I'll also add a bit more verbosity to IW when NRT readers are
> opened
> >> > >>>> and close, for 5.4.0.
> >> > >>>>
> >> > >>>> Mike McCandless
> >> > >>>>
> >> > >>>> http://blog.mikemccandless.com
> >> > >>>>
> >> > >>>>
> >> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> >> > >>>> <[hidden email]> wrote:
> >> > >>>>> Hi all,
> >> > >>>>>
> >> > >>>>> I'm still debugging the growing-index size. I think closing index
> >> > >> readers
> >> > >>>>> might help (work in progress), but I can't really see them holding
> >> on
> >> > >> to
> >> > >>>>> files (at least, using lsof ). Restarting the application sheds
> >> some
> >> > >>>> light,
> >> > >>>>> I see logging on files that are no longer referenced.
> >> > >>>>>
> >> > >>>>> What I see is that there are files in the index-directory, that
> >> seem
> >> > >> to
> >> > >>>>> longer referenced..
> >> > >>>>>
> >> > >>>>> I put the output of the infoStream online, because is it rather big
> >> > >> (30MB
> >> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> >> > >>>>>
> >> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index directory  ).
> >> > >> This
> >> > >>>> is
> >> > >>>>> on an CentOS box (maybe that influences stuff as well?)
> >> > >>>>>
> >> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE
> NAME
> >> > >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
> >> > >>>>> _4gs5_Lucene50_0.dvd
> >> > >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546
> >> _4gs5.fdt
> >> > >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
> >> > >>>>> _4gs5_Lucene50_0.tim
> >> > >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
> >> > >>>>> _4gs5_Lucene50_0.pos
> >> > >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
> >> > >>>>> _4gs5_Lucene50_0.doc
> >> > >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
> >> > >>>>> _4i5a_Lucene50_0.dvd
> >> > >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748
> >> _4g37.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721
> >> _4fb3.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
> >> > >>>>> _4ct1_Lucene50_0.dvd
> >> > >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806
> >> _4gjc.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553
> >> _4ct1.fdt
> >> > >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
> >> > >>>>> _4ct1_Lucene50_0.tim
> >> > >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
> >> > >>>>> _4ct1_Lucene50_0.pos
> >> > >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
> >> > >>>>> _4ct1_Lucene50_0.doc
> >> > >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328
> >> _4i5a.fdt
> >> > >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
> >> > >>>>> _4i5a_Lucene50_0.tim
> >> > >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
> >> > >>>>> _4i5a_Lucene50_0.pos
> >> > >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
> >> > >>>>> _4i5a_Lucene50_0.doc
> >> > >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340
> >> _4hkn.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324
> >> _4gpz.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524
> >> _4i11.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575
> >> _4i0t.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624
> >> _4i0i.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717
> >> _4hzp.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583
> >> _4hze.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411
> >> _4gs5.nvd
> >> > >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732
> >> _4hod.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495
> >> _4i84.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     835726 18880660
> >> _4i7z.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648
> >> _4i7w.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401
> >> _4i4o.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440
> >> _4i4a.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512
> >> _4i7v.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     941773 18880613
> >> _4i7x.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     984023 18880588
> >> _4i7o.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619
> >> _4i7y.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     466371 18880515
> >> _4ct1.nvd
> >> > >>>>> java    30581 apache  mem    REG  253,0     723280 18880573
> >> _4i7q.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     806289 18880517
> >> _4i7h.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0      17362 18880520
> >> _4i9s.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     698362 18880531
> >> _4i9r.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0     483215 18880406
> >> _4i5a.nvd
> >> > >>>>> java    30581 apache  mem    REG  253,0      14110 18880416
> >> _4i9v.cfs
> >> > >>>>> java    30581 apache  mem    REG  253,0       6121 18880412
> >> _4i9t.cfs
> >> > >>>>> java    30581 apache   30wW  REG  253,0          0 18877901
> >> write.lock
> >> > >>>>>
> >> > >>>>> Output of some of the biggest files in the index directory:
> >> > >>>>>
> >> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> >> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> >> > >> _4ct1_Lucene50_0.dvd
> >> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> >> > >> _4gs5_Lucene50_0.tim
> >> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> >> > >> _4i5a_Lucene50_0.dvd
> >> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> >> > >> _4gs5_Lucene50_0.doc
> >> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> >> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> >> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> >> > >> _3k3a_Lucene50_0.dvd
> >> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> >> > >> _4gs5_Lucene50_0.dvd
> >> > >>>>>
> >> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in
> use?
> >> > >>>>
> >> > >>>>
> >> ---------------------------------------------------------------------
> >> > >>>> To unsubscribe, e-mail: java-user-
> [hidden email]
> >> > >>>> For additional commands, e-mail: java-user-
> [hidden email]
> >> > >>>>
> >> > >>>>
> >> > >>
> >> > >> ---------------------------------------------------------------------
> >> > >> To unsubscribe, e-mail: [hidden email]
> >> > >> For additional commands, e-mail: [hidden email]
> >> > >>
> >> > >>
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: [hidden email]
> >> > For additional commands, e-mail: [hidden email]
> >>
> >>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: [hidden email]
> >> For additional commands, e-mail: [hidden email]
> >>
> >>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: debugging growing index size

Rob Audenaerde
Thank you all,

I will further fix and investigate!
On Nov 14, 2015 10:00, "Uwe Schindler" <[hidden email]> wrote:

> I agree. On Linux it is impossible that MMapDirectory is the reason! Only
> on windows you cannot delete still open/mapped files.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
> > -----Original Message-----
> > From: Michael McCandless [mailto:[hidden email]]
> > Sent: Friday, November 13, 2015 8:30 PM
> > To: Lucene Users <[hidden email]>
> > Subject: Re: debugging growing index size
> >
> > So with MMapDir at defaults (unmap is enabled) you see old files, with
> > no open file handles as reported by lsof, still existing in your index
> > directory, taking lots of space.
> >
> > But with NIOFSDirectory the issue doesn't happen?  Are you sure?
> >
> > I'll look at the 6.6 GB infoStream to see what it says about the ref
> counts.
> >
> > Did you fix the issue in your app where you're not closing all opened
> > NRT readers?
> >
> > Mike McCandless
> >
> > http://blog.mikemccandless.com
> >
> >
> > On Fri, Nov 13, 2015 at 12:22 PM, Rob Audenaerde
> > <[hidden email]> wrote:
> > > I haven't disabled unmapping, and I am running out-of-the-box
> > > FSDirectory.open(). As I can see it tries to pick MMap.  For the test I
> > > explicitly constructed a NIOFSDIrectoryReader
> > >
> > > OS is (from the top of my head)  CentOS 6.x, Java 1.8.0u33. I can check
> > > later for more details.
> > > On Nov 13, 2015 18:07, "Uwe Schindler" <[hidden email]> wrote:
> > >
> > >> Hi,
> > >>
> > >> Lucene has the workaround, so it should not happen, UNLESS you
> > explicitly
> > >> disable the hack using MMapDirectory#setEnableUnmap(false).
> > >>
> > >> Uwe
> > >>
> > >> -----
> > >> Uwe Schindler
> > >> H.-H.-Meier-Allee 63, D-28213 Bremen
> > >> http://www.thetaphi.de
> > >> eMail: [hidden email]
> > >>
> > >> > -----Original Message-----
> > >> > From: will martin [mailto:[hidden email]]
> > >> > Sent: Friday, November 13, 2015 6:04 PM
> > >> > To: [hidden email]
> > >> > Subject: Re: debugging growing index size
> > >> >
> > >> > Hi Rob:
> > >> >
> > >> >
> > >> > Doesn’t this look like known SE issue JDK-4724038 and discussed by
> > Peter
> > >> > Levart and Uwe Schindler on a lucene-dev thread 9/9/2015?
> > >> >
> > >> > MappedByteBuffer …. what OS are you on Rob? What JVM?
> > >> >
> > >> > http://bugs.java.com/view_bug.do?bug_id=4724038
> > >> >
> > >> > http://mail-archives.apache.org/mod_mbox/lucene-
> > >> > dev/201509.mbox/%[hidden email]%3E
> > >> >
> > >> > hth
> > >> > -will
> > >> >
> > >> >
> > >> >
> > >> > > On Nov 13, 2015, at 11:23 AM, Rob Audenaerde
> > >> > <[hidden email]> wrote:
> > >> > >
> > >> > > I'm currently running using NIOFS. It seems to prevent the issue
> from
> > >> > > appearing.
> > >> > >
> > >> > > This is a second run (with applied deletes etc)
> > >> > >
> > >> > > raudenaerd@:/<6>index/index$sudo ls -lSra *.dvd
> > >> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09
> _y_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12
> > _xod_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17
> > _110e_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19
> > _12r5_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13
> > _y0s_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17
> > _z20_Lucene50_0.dvd
> > >> > >
> > >> > > raudenaerde:/<6>index/index$sudo ls -lSaa *.dvd
> > >> > > -rw-r--r--. 1 apache apache 222062059 Nov 13 17:17
> > _z20_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 151149886 Nov 13 17:13
> > _y0s_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache 112855516 Nov 13 17:19
> > _12r5_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  53699972 Nov 13 17:17
> > _110e_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache  39048886 Nov 13 17:12
> > _xod_Lucene50_0.dvd
> > >> > > -rw-r--r--. 1 apache apache      7993 Nov 13 16:09
> _y_Lucene50_0.dvd
> > >> > >
> > >> > >
> > >> > >
> > >> > > On Thu, Nov 12, 2015 at 3:40 PM, Michael McCandless <
> > >> > > [hidden email]> wrote:
> > >> > >
> > >> > >> Hi Rob,
> > >> > >>
> > >> > >> A couple more things:
> > >> > >>
> > >> > >> Can you print the value of MMapDirectory.UNMAP_SUPPORTED?
> > >> > >>
> > >> > >> Also, can you try your test using NIOFSDirectory instead?
> Curious if
> > >> > >> that changes things...
> > >> > >>
> > >> > >> Mike McCandless
> > >> > >>
> > >> > >> http://blog.mikemccandless.com
> > >> > >>
> > >> > >>
> > >> > >> On Thu, Nov 12, 2015 at 7:28 AM, Rob Audenaerde
> > >> > >> <[hidden email]> wrote:
> > >> > >>> Curious indeed!
> > >> > >>>
> > >> > >>> I will turn on the IndexFileDeleter.VERBOSE_REF_COUNTS and
> > recreate
> > >> > the
> > >> > >>> logs. Will get back with them in a day hopefully.
> > >> > >>>
> > >> > >>> Thanks for the extra logging!
> > >> > >>>
> > >> > >>> -Rob
> > >> > >>>
> > >> > >>> On Thu, Nov 12, 2015 at 11:34 AM, Michael McCandless <
> > >> > >>> [hidden email]> wrote:
> > >> > >>>
> > >> > >>>> Hmm, curious.
> > >> > >>>>
> > >> > >>>> I looked at the [large] infoStream output and I see segment
> _3ou7
> > >> > >>>> present on init of IW, a few getReader calls referencing it,
> then a
> > >> > >>>> forceMerge that indeed merges it away, yet I do NOT see IW
> > >> > attempting
> > >> > >>>> deletion of its files.
> > >> > >>>>
> > >> > >>>> And indeed I see plenty (too many: many times per second?) of
> > >> > commits
> > >> > >>>> after that, so the index itself is no longer referencing _3ou7.
> > >> > >>>>
> > >> > >>>> If you are failing to close all NRT readers then I would expect
> > >> _3ou7
> > >> > >>>> to be in the lsof output, but it's not.
> > >> > >>>>
> > >> > >>>> The NRT readers close method has logic that notifies
> IndexWriter
> > >> when
> > >> > >>>> it's done "needing" the files, to emulate "delete on last
> close"
> > >> > >>>> semantics for filesystems like HDFS that don't do that ... it's
> > >> > >>>> possible something is wrong here.
> > >> > >>>>
> > >> > >>>> Can you set the (public, static) boolean
> > >> > >>>> IndexFileDeleter.VERBOSE_REF_COUNTS to true, and then re-
> > >> > generate this
> > >> > >>>> log?  This causes IW to log the ref count of each file it's
> tracking
> > >> > >>>> ...
> > >> > >>>>
> > >> > >>>> I'll also add a bit more verbosity to IW when NRT readers are
> > opened
> > >> > >>>> and close, for 5.4.0.
> > >> > >>>>
> > >> > >>>> Mike McCandless
> > >> > >>>>
> > >> > >>>> http://blog.mikemccandless.com
> > >> > >>>>
> > >> > >>>>
> > >> > >>>> On Wed, Nov 11, 2015 at 6:09 AM, Rob Audenaerde
> > >> > >>>> <[hidden email]> wrote:
> > >> > >>>>> Hi all,
> > >> > >>>>>
> > >> > >>>>> I'm still debugging the growing-index size. I think closing
> index
> > >> > >> readers
> > >> > >>>>> might help (work in progress), but I can't really see them
> holding
> > >> on
> > >> > >> to
> > >> > >>>>> files (at least, using lsof ). Restarting the application
> sheds
> > >> some
> > >> > >>>> light,
> > >> > >>>>> I see logging on files that are no longer referenced.
> > >> > >>>>>
> > >> > >>>>> What I see is that there are files in the index-directory,
> that
> > >> seem
> > >> > >> to
> > >> > >>>>> longer referenced..
> > >> > >>>>>
> > >> > >>>>> I put the output of the infoStream online, because is it
> rather big
> > >> > >> (30MB
> > >> > >>>>> gzipped):  http://www.audenaerde.org/lucene/merges.log.gz
> > >> > >>>>>
> > >> > >>>>> Output of lsof:  (executed 'sudo lsof *' in the index
> directory  ).
> > >> > >> This
> > >> > >>>> is
> > >> > >>>>> on an CentOS box (maybe that influences stuff as well?)
> > >> > >>>>>
> > >> > >>>>> COMMAND   PID   USER   FD   TYPE DEVICE   SIZE/OFF     NODE
> > NAME
> > >> > >>>>> java    30581 apache  mem    REG  253,0 3176094924 18880508
> > >> > >>>>> _4gs5_Lucene50_0.dvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0  505758610 18880546
> > >> _4gs5.fdt
> > >> > >>>>> java    30581 apache  mem    REG  253,0  369563337 18880631
> > >> > >>>>> _4gs5_Lucene50_0.tim
> > >> > >>>>> java    30581 apache  mem    REG  253,0  176344058 18880623
> > >> > >>>>> _4gs5_Lucene50_0.pos
> > >> > >>>>> java    30581 apache  mem    REG  253,0  378055201 18880606
> > >> > >>>>> _4gs5_Lucene50_0.doc
> > >> > >>>>> java    30581 apache  mem    REG  253,0  372579599 18880400
> > >> > >>>>> _4i5a_Lucene50_0.dvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0   82017447 18880748
> > >> _4g37.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   85376507 18880721
> > >> _4fb3.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0  363493917 18880533
> > >> > >>>>> _4ct1_Lucene50_0.dvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0    9421892 18880806
> > >> _4gjc.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   76877461 18880553
> > >> _4ct1.fdt
> > >> > >>>>> java    30581 apache  mem    REG  253,0   46271330 18880661
> > >> > >>>>> _4ct1_Lucene50_0.tim
> > >> > >>>>> java    30581 apache  mem    REG  253,0   26911387 18880653
> > >> > >>>>> _4ct1_Lucene50_0.pos
> > >> > >>>>> java    30581 apache  mem    REG  253,0   54678249 18880568
> > >> > >>>>> _4ct1_Lucene50_0.doc
> > >> > >>>>> java    30581 apache  mem    REG  253,0   76556587 18880328
> > >> _4i5a.fdt
> > >> > >>>>> java    30581 apache  mem    REG  253,0   45032159 18880389
> > >> > >>>>> _4i5a_Lucene50_0.tim
> > >> > >>>>> java    30581 apache  mem    REG  253,0   26486772 18880388
> > >> > >>>>> _4i5a_Lucene50_0.pos
> > >> > >>>>> java    30581 apache  mem    REG  253,0   55411002 18880362
> > >> > >>>>> _4i5a_Lucene50_0.doc
> > >> > >>>>> java    30581 apache  mem    REG  253,0   70484185 18880340
> > >> _4hkn.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   10873921 18880324
> > >> _4gpz.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   17230506 18880524
> > >> _4i11.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    6706969 18880575
> > >> _4i0t.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   15135578 18880624
> > >> _4i0i.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0   15368310 18880717
> > >> _4hzp.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    5146140 18880583
> > >> _4hze.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    2917380 18880411
> > >> _4gs5.nvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0    6871469 18880732
> > >> _4hod.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    2860341 18880495
> > >> _4i84.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     835726 18880660
> > >> _4i7z.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    1005595 18880648
> > >> _4i7w.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    5639672 18880401
> > >> _4i4o.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    4388371 18880440
> > >> _4i4a.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    1151845 18880512
> > >> _4i7v.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     941773 18880613
> > >> _4i7x.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     984023 18880588
> > >> _4i7o.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0    1790005 18880619
> > >> _4i7y.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     466371 18880515
> > >> _4ct1.nvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0     723280 18880573
> > >> _4i7q.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     806289 18880517
> > >> _4i7h.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0      17362 18880520
> > >> _4i9s.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     698362 18880531
> > >> _4i9r.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0     483215 18880406
> > >> _4i5a.nvd
> > >> > >>>>> java    30581 apache  mem    REG  253,0      14110 18880416
> > >> _4i9v.cfs
> > >> > >>>>> java    30581 apache  mem    REG  253,0       6121 18880412
> > >> _4i9t.cfs
> > >> > >>>>> java    30581 apache   30wW  REG  253,0          0 18877901
> > >> write.lock
> > >> > >>>>>
> > >> > >>>>> Output of some of the biggest files in the index directory:
> > >> > >>>>>
> > >> > >>>>> -rw-r--r--. 1 apache apache  358684577 Nov 11 08:04 _4fjn.cfs
> > >> > >>>>> -rw-r--r--. 1 apache apache  363493917 Nov 11 07:54
> > >> > >> _4ct1_Lucene50_0.dvd
> > >> > >>>>> -rw-r--r--. 1 apache apache  369563337 Nov 11 08:06
> > >> > >> _4gs5_Lucene50_0.tim
> > >> > >>>>> -rw-r--r--. 1 apache apache  372579599 Nov 11 08:09
> > >> > >> _4i5a_Lucene50_0.dvd
> > >> > >>>>> -rw-r--r--. 1 apache apache  378055201 Nov 11 08:06
> > >> > >> _4gs5_Lucene50_0.doc
> > >> > >>>>> -rw-r--r--. 1 apache apache  427401813 Nov 10 08:14 _3ou7.cfs
> > >> > >>>>> -rw-r--r--. 1 apache apache  505758610 Nov 11 08:04 _4gs5.fdt
> > >> > >>>>> -rw-r--r--. 1 apache apache 1107391579 Nov 10 07:55
> > >> > >> _3k3a_Lucene50_0.dvd
> > >> > >>>>> -rw-r--r--. 1 apache apache 3176094924 Nov 11 08:10
> > >> > >> _4gs5_Lucene50_0.dvd
> > >> > >>>>>
> > >> > >>>>> Note that the 3ou7 and 3k3a segments no longer appear to be in
> > use?
> > >> > >>>>
> > >> > >>>>
> > >> ---------------------------------------------------------------------
> > >> > >>>> To unsubscribe, e-mail: java-user-
> > [hidden email]
> > >> > >>>> For additional commands, e-mail: java-user-
> > [hidden email]
> > >> > >>>>
> > >> > >>>>
> > >> > >>
> > >> > >>
> ---------------------------------------------------------------------
> > >> > >> To unsubscribe, e-mail: [hidden email]
> > >> > >> For additional commands, e-mail:
> [hidden email]
> > >> > >>
> > >> > >>
> > >> >
> > >> >
> > >> >
> ---------------------------------------------------------------------
> > >> > To unsubscribe, e-mail: [hidden email]
> > >> > For additional commands, e-mail: [hidden email]
> > >>
> > >>
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: [hidden email]
> > >> For additional commands, e-mail: [hidden email]
> > >>
> > >>
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>