Disk Free decrease in a directory containing only live lucene indexes

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Disk Free decrease in a directory containing only live lucene indexes

Riccardo Tasso
Hi,
 I'm running a lucene based application on a linux system.

The application writes and read many lucene indexes under the same
directory, which doesn't contain other data.

We are monitoring the indexes directory and we noticed that the disk usage
as calculated by the df util grows more rapidly than that calculated by the
du util.

When we terminate the application the disk usage calculated with the two
utils is the same and it is the one calculated with du when the application
is running.

Can you figure out which is the reason?

Thanks,
 Riccardo
Reply | Threaded
Open this post in threaded view
|

Re: Disk Free decrease in a directory containing only live lucene indexes

Uwe Schindler
Hi,

That's easy to explain: While indexing it constantly creates new files (new segments). Those segments are merged from time to time into larger segments. If you have an IndexReader open at the same time for searching while indexing, it will see a specific snapshot (point in time) until it is reopened to see latest updates.

IndexWriter at the same time merged segments ad deletes old segments that were merged. The IndexReader opened in parallel still sees an old state of index so it keeps its files open, also the older segments. Unix has "delete on last close" semantics, so disk space is only freed once the last user of a file has closed it. Deleting a file just removes the directory entry (the one that "du" looks at), but the inode (allocated disk space) is freed later (this is what "df" sees).

Uwe

Am January 21, 2020 2:17:33 PM UTC schrieb Riccardo Tasso <[hidden email]>:

>Hi,
> I'm running a lucene based application on a linux system.
>
>The application writes and read many lucene indexes under the same
>directory, which doesn't contain other data.
>
>We are monitoring the indexes directory and we noticed that the disk
>usage
>as calculated by the df util grows more rapidly than that calculated by
>the
>du util.
>
>When we terminate the application the disk usage calculated with the
>two
>utils is the same and it is the one calculated with du when the
>application
>is running.
>
>Can you figure out which is the reason?
>
>Thanks,
> Riccardo

--
Uwe Schindler
Achterdiek 19, 28357 Bremen
https://www.thetaphi.de