Searching while optimizing

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

Searching while optimizing

vsevel
Hi, I am using lucene 2.9.1 to index a continuous flow of events. My server keeps an index writer open at all time and write events as groups of a few hundred followed by a commit. While writing, users invoke my server to perform searches. I want those searches to return results as current as possible. Once a day I optimize the index, while writes happens and searches may happen. I adopted the following strategy:

for every search I open a new IndexSearcher of the reader of the writer. I execute the search, fetch the documents and finally close the searcher. Specifically, I never close the reader, nor the writer.

Q: is that a reasonnable strategy?

Every day, I add 500000 new events, and remove just as much to make the index stable at 30 millions docs.

Q: how often should I optimize? should I only play with the mergeFactor? is 5 a reasonnable value?

I found out that my 40Gb index grew up to 200Gb while the number of docs stayed put at 30 millions. I am suspecting that a search during the optimize caused this situation, as described in the index writer javadoc (about refreshing readers during an optimize).

Q: is that the likely cause? is getting a reader of the writer just as "bad" as refreshing a reader during an optimize? how can I avoid this behavior? should I just deny searches while optimizing?

question on the side: is there any way to interrupt a search that takes too long? for instance by setting a boolean from another thread on the searcher currently performing the search.

thanks,
vincent
Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
When you say "getting a reader of the writer" do you mean
writer.getReader()?  Ie the new near real-time API in 2.9?

For that API (an in general whenever you open a reader), you must
close it.  I think all your files is because you're not closing your
old readers.

Reopening readers during optimize is fine, if you close the old reader
each time.  It will possibly tie up more transient disk usage than had
you reopened at the end of optimize, but if you have plenty of disk
space it shouldn't be a problem.

Mike

On Mon, Nov 23, 2009 at 3:20 PM, vsevel <[hidden email]> wrote:

>
> Hi, I am using lucene 2.9.1 to index a continuous flow of events. My server
> keeps an index writer open at all time and write events as groups of a few
> hundred followed by a commit. While writing, users invoke my server to
> perform searches. Once a day I optimize the index, while writes happens and
> searches may happen. I adopted the following strategy:
>
> for every search I open a new IndexSearcher of the reader of the writer. I
> execute the search, fetch the documents and finally close the searcher.
> Specifically, I never close the reader, nor the writer.
>
> Q: is that a reasonnable strategy?
>
> I found out that my 40Gb index grew up to 200Gb while the number of docs
> stayed put at 30 millions. I am suspecting that a search during the optimize
> caused this situation, as described in the index writer javadoc (about
> refreshing readers during an optimize).
>
> Q: is that the likely cause? is getting a reader of the writer just as "bad"
> as refreshing a reader during an optimize? how can I avoid this behavior?
> should I just deny searches while optimizing?
>
> question on the side: is there any way to interrupt a search that takes too
> long? for instance by setting a boolean from another thread on the searcher
> currently performing the search.
>
> thanks,
> vincent
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26485138.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

vsevel
1) correct: I am using IndexWriter.getReader(). I guess I was assuming that was a privately owned object and I had no business dealing with its lifecycle. the api would be clearer to rename the operation createReader().

2) how much transient disk space should I expect? isn't this pretty much what the index writer javadoc said we should not do: "When running in this mode, be careful not to refresh your readers while optimize or segment merges are taking place as this can tie up substantial disk space."

Michael McCandless-2 wrote
When you say "getting a reader of the writer" do you mean
writer.getReader()?  Ie the new near real-time API in 2.9?

For that API (an in general whenever you open a reader), you must
close it.  I think all your files is because you're not closing your
old readers.

Reopening readers during optimize is fine, if you close the old reader
each time.  It will possibly tie up more transient disk usage than had
you reopened at the end of optimize, but if you have plenty of disk
space it shouldn't be a problem.

Mike

On Mon, Nov 23, 2009 at 3:20 PM, vsevel <v.sevel@lombardodier.com> wrote:
>
> Hi, I am using lucene 2.9.1 to index a continuous flow of events. My server
> keeps an index writer open at all time and write events as groups of a few
> hundred followed by a commit. While writing, users invoke my server to
> perform searches. Once a day I optimize the index, while writes happens and
> searches may happen. I adopted the following strategy:
>
> for every search I open a new IndexSearcher of the reader of the writer. I
> execute the search, fetch the documents and finally close the searcher.
> Specifically, I never close the reader, nor the writer.
>
> Q: is that a reasonnable strategy?
>
> I found out that my 40Gb index grew up to 200Gb while the number of docs
> stayed put at 30 millions. I am suspecting that a search during the optimize
> caused this situation, as described in the index writer javadoc (about
> refreshing readers during an optimize).
>
> Q: is that the likely cause? is getting a reader of the writer just as "bad"
> as refreshing a reader during an optimize? how can I avoid this behavior?
> should I just deny searches while optimizing?
>
> question on the side: is there any way to interrupt a search that takes too
> long? for instance by setting a boolean from another thread on the searcher
> currently performing the search.
>
> thanks,
> vincent
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26485138.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
On Tue, Nov 24, 2009 at 1:44 AM, vsevel <[hidden email]> wrote:
>
> 1) correct: I am using IndexWriter.getReader(). I guess I was assuming that
> was a privately owned object and I had no business dealing with its
> lifecycle. the api would be clearer to rename the operation createReader().

I just committed an addition to the javadocs that the caller is
responsible for closing the returned reader.

I think createReader() isn't great either because it sound more
expensive than it is -- under the hood, the returned reader is
typically sharing many subreaders with the last reader obtained.  That
sharing is what makes the reopen time fast.

> 2) how much transient disk space should I expect? isn't this pretty much
> what the index writer javadoc said we should not do: "When running in this
> mode, be careful not to refresh your readers while optimize or segment
> merges are taking place as this can tie up substantial disk space."

It is exactly what the javadoc says you should not do, but if you know
the risks, go for it ;)

How much space is tied up depends on how often you reopen and how
quickly you close the last reader.  If eg you aggressively close the
last reader, such that effectively only one reader is open at once,
then I think you're looking at worst case index consumes 4X it's
"nominal" size (vs 3X if you don't open a single reader).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Searching while optimizing

Uwe Schindler
How about newReader()?

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: [hidden email]


> -----Original Message-----
> From: Michael McCandless [mailto:[hidden email]]
> Sent: Tuesday, November 24, 2009 11:00 AM
> To: [hidden email]
> Subject: Re: Searching while optimizing
>
> On Tue, Nov 24, 2009 at 1:44 AM, vsevel <[hidden email]> wrote:
> >
> > 1) correct: I am using IndexWriter.getReader(). I guess I was assuming
> that
> > was a privately owned object and I had no business dealing with its
> > lifecycle. the api would be clearer to rename the operation
> createReader().
>
> I just committed an addition to the javadocs that the caller is
> responsible for closing the returned reader.
>
> I think createReader() isn't great either because it sound more
> expensive than it is -- under the hood, the returned reader is
> typically sharing many subreaders with the last reader obtained.  That
> sharing is what makes the reopen time fast.
>
> > 2) how much transient disk space should I expect? isn't this pretty much
> > what the index writer javadoc said we should not do: "When running in
> this
> > mode, be careful not to refresh your readers while optimize or segment
> > merges are taking place as this can tie up substantial disk space."
>
> It is exactly what the javadoc says you should not do, but if you know
> the risks, go for it ;)
>
> How much space is tied up depends on how often you reopen and how
> quickly you close the last reader.  If eg you aggressively close the
> last reader, such that effectively only one reader is open at once,
> then I think you're looking at worst case index consumes 4X it's
> "nominal" size (vs 3X if you don't open a single reader).
>
> Mike
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
I don't really like that name, for the same reason ("create" and "new"
imply that an entirely new reader is being created, which is far more
costly than what normally happens).

Mike

On Tue, Nov 24, 2009 at 5:02 AM, Uwe Schindler <[hidden email]> wrote:

> How about newReader()?
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: [hidden email]
>
>
>> -----Original Message-----
>> From: Michael McCandless [mailto:[hidden email]]
>> Sent: Tuesday, November 24, 2009 11:00 AM
>> To: [hidden email]
>> Subject: Re: Searching while optimizing
>>
>> On Tue, Nov 24, 2009 at 1:44 AM, vsevel <[hidden email]> wrote:
>> >
>> > 1) correct: I am using IndexWriter.getReader(). I guess I was assuming
>> that
>> > was a privately owned object and I had no business dealing with its
>> > lifecycle. the api would be clearer to rename the operation
>> createReader().
>>
>> I just committed an addition to the javadocs that the caller is
>> responsible for closing the returned reader.
>>
>> I think createReader() isn't great either because it sound more
>> expensive than it is -- under the hood, the returned reader is
>> typically sharing many subreaders with the last reader obtained.  That
>> sharing is what makes the reopen time fast.
>>
>> > 2) how much transient disk space should I expect? isn't this pretty much
>> > what the index writer javadoc said we should not do: "When running in
>> this
>> > mode, be careful not to refresh your readers while optimize or segment
>> > merges are taking place as this can tie up substantial disk space."
>>
>> It is exactly what the javadoc says you should not do, but if you know
>> the risks, go for it ;)
>>
>> How much space is tied up depends on how often you reopen and how
>> quickly you close the last reader.  If eg you aggressively close the
>> last reader, such that effectively only one reader is open at once,
>> then I think you're looking at worst case index consumes 4X it's
>> "nominal" size (vs 3X if you don't open a single reader).
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

vsevel
In reply to this post by Michael McCandless-2
Hi, just to make sure I understand correctly... After an optimize, without any reader, my index takes 30Gb on the disk. Are you saying that if I can ensure there is only one reader at a time, it could take up to 120Gb on the disk if searching while an optimize is going on?

I did not get your 3X when there is no reader. In that situation isn't that the nominal size?

different subject: I saw in 3.0.0RC1 that interrupting a merging thread was being discussed. couldn't you do something similar for searches. I let my users do full text searches on documents with over 50 fields. if using too many wildcards, the search could take a long time. and rather than restricting what they can do, I would rather let them cancel the search gracefully. would that be something feasible?

Thanks,
vincent

Michael McCandless-2 wrote
On Tue, Nov 24, 2009 at 1:44 AM, vsevel <v.sevel@lombardodier.com> wrote:
>
> 1) correct: I am using IndexWriter.getReader(). I guess I was assuming that
> was a privately owned object and I had no business dealing with its
> lifecycle. the api would be clearer to rename the operation createReader().

I just committed an addition to the javadocs that the caller is
responsible for closing the returned reader.

I think createReader() isn't great either because it sound more
expensive than it is -- under the hood, the returned reader is
typically sharing many subreaders with the last reader obtained.  That
sharing is what makes the reopen time fast.

> 2) how much transient disk space should I expect? isn't this pretty much
> what the index writer javadoc said we should not do: "When running in this
> mode, be careful not to refresh your readers while optimize or segment
> merges are taking place as this can tie up substantial disk space."

It is exactly what the javadoc says you should not do, but if you know
the risks, go for it ;)

How much space is tied up depends on how often you reopen and how
quickly you close the last reader.  If eg you aggressively close the
last reader, such that effectively only one reader is open at once,
then I think you're looking at worst case index consumes 4X it's
"nominal" size (vs 3X if you don't open a single reader).

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
On Tue, Nov 24, 2009 at 9:08 AM, vsevel <[hidden email]> wrote:

> Hi, just to make sure I understand correctly... After an optimize, without
> any reader, my index takes 30Gb on the disk. Are you saying that if I can
> ensure there is only one reader at a time, it could take up to 120Gb on the
> disk if searching while an optimize is going on?
>
> I did not get your 3X when there is no reader. In that situation isn't that
> the nominal size?

If before optimizing your index takes 30 GB, then you open a writer,
and start the optimize and wait for it to finish, it can take up to 90
GB.  Once the optimize is done, but before you commit, 60 GB will be
in use.  Once you commit/close this will drop to 30 GB.

(These are all worst-case numbers -- in practice, an optimized index
is smaller, sometimes by alot eg if there are many pending deletions,
than the original).

If the reader was already opened before you opened the writer, then
there's no change to disk space requirements (because the reader has
opened a commit (the starting commit) that the writer will not delete,
anyway).

But if you open a new reader while the optimize is underway, it's
possible to require total 120 GB of space (30 GB for your index, 90 GB
transient), because the reader is holding open temporary segments that
the writer wants to delete.  If you open more than one reader, and
don't close the old ones, you can tie up even more disk space.

> different subject: I saw in 3.0.0RC1 that interrupting a merging thread was
> being discussed. couldn't you do something similar for searches. I let my
> users do full text searches on documents with over 50 fields. if using too
> many wildcards, the search could take a long time. and rather than
> restricting what they can do, I would rather let them cancel the search
> gracefully. would that be something feasible?

IndexWriter is interruptible via Thread.interrupt(), but searching
currently is not.  However, TimeLimitingCollector can be used to set a
timeout for searches.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

vsevel
Hi, this is good information. as I read your post I realized that I am supposed to commit after an optimize, which is something I do not currently do. That would probably lead to the extra disk space I saw being consumed. If this is correct, then the optimize javadoc could be improved to say that it needs to be followed by a commit or close, like any other write.
thanks for the help,
vincent

Michael McCandless-2 wrote
On Tue, Nov 24, 2009 at 9:08 AM, vsevel <v.sevel@lombardodier.com> wrote:
> Hi, just to make sure I understand correctly... After an optimize, without
> any reader, my index takes 30Gb on the disk. Are you saying that if I can
> ensure there is only one reader at a time, it could take up to 120Gb on the
> disk if searching while an optimize is going on?
>
> I did not get your 3X when there is no reader. In that situation isn't that
> the nominal size?

If before optimizing your index takes 30 GB, then you open a writer,
and start the optimize and wait for it to finish, it can take up to 90
GB.  Once the optimize is done, but before you commit, 60 GB will be
in use.  Once you commit/close this will drop to 30 GB.

(These are all worst-case numbers -- in practice, an optimized index
is smaller, sometimes by alot eg if there are many pending deletions,
than the original).

If the reader was already opened before you opened the writer, then
there's no change to disk space requirements (because the reader has
opened a commit (the starting commit) that the writer will not delete,
anyway).

But if you open a new reader while the optimize is underway, it's
possible to require total 120 GB of space (30 GB for your index, 90 GB
transient), because the reader is holding open temporary segments that
the writer wants to delete.  If you open more than one reader, and
don't close the old ones, you can tie up even more disk space.

> different subject: I saw in 3.0.0RC1 that interrupting a merging thread was
> being discussed. couldn't you do something similar for searches. I let my
> users do full text searches on documents with over 50 fields. if using too
> many wildcards, the search could take a long time. and rather than
> restricting what they can do, I would rather let them cancel the search
> gracefully. would that be something feasible?

IndexWriter is interruptible via Thread.interrupt(), but searching
currently is not.  However, TimeLimitingCollector can be used to set a
timeout for searches.

Mike

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
OK, I'll add that to the javadocs; thanks.

But the fact that you weren't closing the old readers was probably
also tying up lots of disk space...

Mike

On Tue, Nov 24, 2009 at 3:31 PM, vsevel <[hidden email]> wrote:

>
> Hi, this is good information. as I read your post I realized that I am
> supposed to commit after an optimize, which is something I do not currently
> do. That would probably lead to the extra disk space I saw being consumed.
> If this is correct, then the optimize javadoc could be improved to say that
> it needs to be followed by a commit or close, like any other write.
> thanks for the help,
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> On Tue, Nov 24, 2009 at 9:08 AM, vsevel <[hidden email]> wrote:
>>> Hi, just to make sure I understand correctly... After an optimize,
>>> without
>>> any reader, my index takes 30Gb on the disk. Are you saying that if I can
>>> ensure there is only one reader at a time, it could take up to 120Gb on
>>> the
>>> disk if searching while an optimize is going on?
>>>
>>> I did not get your 3X when there is no reader. In that situation isn't
>>> that
>>> the nominal size?
>>
>> If before optimizing your index takes 30 GB, then you open a writer,
>> and start the optimize and wait for it to finish, it can take up to 90
>> GB.  Once the optimize is done, but before you commit, 60 GB will be
>> in use.  Once you commit/close this will drop to 30 GB.
>>
>> (These are all worst-case numbers -- in practice, an optimized index
>> is smaller, sometimes by alot eg if there are many pending deletions,
>> than the original).
>>
>> If the reader was already opened before you opened the writer, then
>> there's no change to disk space requirements (because the reader has
>> opened a commit (the starting commit) that the writer will not delete,
>> anyway).
>>
>> But if you open a new reader while the optimize is underway, it's
>> possible to require total 120 GB of space (30 GB for your index, 90 GB
>> transient), because the reader is holding open temporary segments that
>> the writer wants to delete.  If you open more than one reader, and
>> don't close the old ones, you can tie up even more disk space.
>>
>>> different subject: I saw in 3.0.0RC1 that interrupting a merging thread
>>> was
>>> being discussed. couldn't you do something similar for searches. I let my
>>> users do full text searches on documents with over 50 fields. if using
>>> too
>>> many wildcards, the search could take a long time. and rather than
>>> restricting what they can do, I would rather let them cancel the search
>>> gracefully. would that be something feasible?
>>
>> IndexWriter is interruptible via Thread.interrupt(), but searching
>> currently is not.  However, TimeLimitingCollector can be used to set a
>> timeout for searches.
>>
>> Mike
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26502131.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

vsevel
Hi, I have done some testing that I would like to share with you.

I am starting my tests with an unoptimized 40Mb index. I have 3 test cases:
1) open a writer, optimize, commit, close
2) open a writer, open a reader from the writer, optimize, commit, close
3) same as 2) except the reader is opened while the optimize is done in a different thread

During all the tests, I monitor the size of the index on the disk. The results are:
1) initial=41Mb, before end of optimize=122Mb, after end of optimize=81Mb,  after commit=40Mb,                            after writer close=40Mb
2) initial=41Mb, before end of optimize=122Mb, after end of optimize=104Mb, after commit=104Mb, after reader close=104Mb, after writer close=40Mb
3) initial=41Mb, before end of optimize=145Mb, after end of optimize=127Mb, after commit=103Mb, after reader close=103Mb, after writer close=40Mb

From your different posts I assumed that a commit would have the same effect as a close as far as reclaiming disk space is concerned. however test cases 2 and 3 show that whether the reader is opened before or during the optimize we end up after commit with an index that is 2.5 times the nominal size. closing the reader does not change anything. only a close can get us the index back to nominal.

What is the reason why the commit nor closing the reader can get us back to nominal?
Do you recommend closing and recreating a new writer after an optimize?

thanks
vincent

Michael McCandless-2 wrote
OK, I'll add that to the javadocs; thanks.

But the fact that you weren't closing the old readers was probably
also tying up lots of disk space...

Mike
Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
Phew, thanks for testing!  It's all explainable...

When you have a reader open, it prevents the segments it had opened
from being deleted.

When you close that reader, the segments could be deleted, however,
that won't happen until the writer next tries to delete, which it does
only periodically (eg, on flushing a new segment, committing a new
merge, etc.).

Could you try closing your reader, then calling writer.commit() (which
is a no-op, since you had already committed, but it may tickle the
writer into attempting the deletions), and see if that frees up disk
space w/o closing?

Mike

On Fri, Nov 27, 2009 at 4:12 PM, vsevel <[hidden email]> wrote:

> I am starting my tests with an unoptimized 40Mb index. I have 3 test cases:
> 1) open a writer, optimize, commit, close
> 2) open a writer, open a reader from the writer, optimize, commit, close
> 3) same as 2) except the reader is opened while the optimize is done in a
> different thread
>
> During all the tests, I monitor the size of the index on the disk. The
> results are:
> 1) initial=41Mb, before end of optimize=122Mb, after end of optimize=81Mb,
> after commit=40Mb,                            after writer close=40Mb
> 2) initial=41Mb, before end of optimize=122Mb, after end of optimize=104Mb,
> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
> 3) initial=41Mb, before end of optimize=145Mb, after end of optimize=127Mb,
> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>
> From your different posts I assumed that a commit would have the same effect
> as a close as far as reclaiming disk space is concerned. however test cases
> 2 and 3 show that whether the reader is opened before or during the optimize
> we end up after commit with an index that is 2.5 times the nominal size.
> closing the reader does not change anything. only a close can get us the
> index back to nominal.
>
> What is the reason why the commit nor closing the reader can get us back to
> nominal?
> Do you recommend closing and recreating a new writer after an optimize?
>
> thanks
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> OK, I'll add that to the javadocs; thanks.
>>
>> But the fact that you weren't closing the old readers was probably
>> also tying up lots of disk space...
>>
>> Mike
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

vsevel
Hi, thanks for the explanations. Though I had no luck...

I now do the close of the reader before the commit. But still, only the close get us back to nominal. Here is the complete test:

    @Test
    public void optimize() throws Exception {
        final File dir = new File("lucene_work/optimize");
        dir.mkdirs();

        for (File f : dir.listFiles()) {
            f.delete();
        }

        Assert.assertEquals(0, dir.listFiles().length);

        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
        IndexWriter writer = new IndexWriter(FSDirectory.open(dir), analyzer, true, maxLength);
        monitorIndexSize(dir);
        long time = 2000;

        log.info("writing...");
        for (int i = 0; i < 1000000; i++) {
            Document doc = new Document();
            doc.add(new Field("foo", "bar " + i, Store.YES, Index.NOT_ANALYZED));
            writer.addDocument(doc);
        }

        writer.commit();
        log.info("done write");
        Thread.sleep(time);
       
        log.info("opening reader...");
        IndexReader reader = writer.getReader();
        log.info("done open reader");
        Thread.sleep(time);
       
        log.info("optimizing...");
        writer.optimize();
        log.info("done optimize");
        Thread.sleep(time);

        log.info("closing reader...");
        reader.close();
        log.info("done reader close");
        Thread.sleep(time);
       
        log.info("committing...");
        writer.commit();
        log.info("done commit");
        Thread.sleep(time);
       
        log.info("closing writer...");
        writer.close();
        log.info("done writer close");
        Thread.sleep(time);
    }

And an exec log:

15:58:46,875  INFO logserver.LuceneSystemTest     writing...
15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
15:59:04,328  INFO logserver.LuceneSystemTest     done write
15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
15:59:16,406  INFO logserver.LuceneSystemTest     committing...
15:59:16,469  INFO logserver.LuceneSystemTest     done commit
15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb

I guess I would be able to do a close and reopen if really I need to. But if there is a nicer and more natural solution, I would love to know about it.

thanks,
vincent

Michael McCandless-2 wrote
Phew, thanks for testing!  It's all explainable...

When you have a reader open, it prevents the segments it had opened
from being deleted.

When you close that reader, the segments could be deleted, however,
that won't happen until the writer next tries to delete, which it does
only periodically (eg, on flushing a new segment, committing a new
merge, etc.).

Could you try closing your reader, then calling writer.commit() (which
is a no-op, since you had already committed, but it may tickle the
writer into attempting the deletions), and see if that frees up disk
space w/o closing?

Mike

On Fri, Nov 27, 2009 at 4:12 PM, vsevel <v.sevel@lombardodier.com> wrote:
> I am starting my tests with an unoptimized 40Mb index. I have 3 test cases:
> 1) open a writer, optimize, commit, close
> 2) open a writer, open a reader from the writer, optimize, commit, close
> 3) same as 2) except the reader is opened while the optimize is done in a
> different thread
>
> During all the tests, I monitor the size of the index on the disk. The
> results are:
> 1) initial=41Mb, before end of optimize=122Mb, after end of optimize=81Mb,
> after commit=40Mb,                            after writer close=40Mb
> 2) initial=41Mb, before end of optimize=122Mb, after end of optimize=104Mb,
> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
> 3) initial=41Mb, before end of optimize=145Mb, after end of optimize=127Mb,
> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>
> From your different posts I assumed that a commit would have the same effect
> as a close as far as reclaiming disk space is concerned. however test cases
> 2 and 3 show that whether the reader is opened before or during the optimize
> we end up after commit with an index that is 2.5 times the nominal size.
> closing the reader does not change anything. only a close can get us the
> index back to nominal.
>
> What is the reason why the commit nor closing the reader can get us back to
> nominal?
> Do you recommend closing and recreating a new writer after an optimize?
>
> thanks
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> OK, I'll add that to the javadocs; thanks.
>>
>> But the fact that you weren't closing the old readers was probably
>> also tying up lots of disk space...
>>
>> Mike
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
OK I dug down on this one... it's actually a bug in IndexWriter, when
used in near real-time mode *and* when CFS is enabled.  In that case,
internally IndexWriter holds open the wrong SegmentReader, thus tying
up more disk space than it should.

Functionally, the bug is harmless -- it's just tying up disk space.

I've boiled your example down to a test case.

Thanks for catching & reporting this! I'll open an issue.

If it's a problem, you can workaround the bug by either turning off
CFS, or, using IndexReader.open (& reopen) to get your reader, instead
of the near real-time writer. getReader() method.

Mike

On Sat, Nov 28, 2009 at 3:02 PM, vsevel <[hidden email]> wrote:

>
> Hi, thanks for the explanations. Though I had no luck...
>
> I now do the close of the reader before the commit. But still, only the
> close get us back to nominal. Here is the complete test:
>
>    @Test
>    public void optimize() throws Exception {
>        final File dir = new File("lucene_work/optimize");
>        dir.mkdirs();
>
>        for (File f : dir.listFiles()) {
>            f.delete();
>        }
>
>        Assert.assertEquals(0, dir.listFiles().length);
>
>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
>        IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
> analyzer, true, maxLength);
>        monitorIndexSize(dir);
>        long time = 2000;
>
>        log.info("writing...");
>        for (int i = 0; i < 1000000; i++) {
>            Document doc = new Document();
>            doc.add(new Field("foo", "bar " + i, Store.YES,
> Index.NOT_ANALYZED));
>            writer.addDocument(doc);
>        }
>
>        writer.commit();
>        log.info("done write");
>        Thread.sleep(time);
>
>        log.info("opening reader...");
>        IndexReader reader = writer.getReader();
>        log.info("done open reader");
>        Thread.sleep(time);
>
>        log.info("optimizing...");
>        writer.optimize();
>        log.info("done optimize");
>        Thread.sleep(time);
>
>        log.info("closing reader...");
>        reader.close();
>        log.info("done reader close");
>        Thread.sleep(time);
>
>        log.info("committing...");
>        writer.commit();
>        log.info("done commit");
>        Thread.sleep(time);
>
>        log.info("closing writer...");
>        writer.close();
>        log.info("done writer close");
>        Thread.sleep(time);
>    }
>
> And an exec log:
>
> 15:58:46,875  INFO logserver.LuceneSystemTest     writing...
> 15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
> 15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
> 15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
> 15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
> 15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
> 15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
> 15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
> 15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
> 15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
> 15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
> 15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
> 15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
> 15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
> 15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
> 15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
> 15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
> 15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
> 15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:04,328  INFO logserver.LuceneSystemTest     done write
> 15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
> 15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
> 15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
> 15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
> 15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
> 15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
> 15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
> 15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
> 15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
> 15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
> 15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
> 15:59:16,406  INFO logserver.LuceneSystemTest     committing...
> 15:59:16,469  INFO logserver.LuceneSystemTest     done commit
> 15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
> 15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
> 15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
> 15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
> 15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
> 15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb
>
> I guess I would be able to do a close and reopen if really I need to. But if
> there is a nicer and more natural solution, I would love to know about it.
>
> thanks,
> vincent
>
>
> Michael McCandless-2 wrote:
>>
>> Phew, thanks for testing!  It's all explainable...
>>
>> When you have a reader open, it prevents the segments it had opened
>> from being deleted.
>>
>> When you close that reader, the segments could be deleted, however,
>> that won't happen until the writer next tries to delete, which it does
>> only periodically (eg, on flushing a new segment, committing a new
>> merge, etc.).
>>
>> Could you try closing your reader, then calling writer.commit() (which
>> is a no-op, since you had already committed, but it may tickle the
>> writer into attempting the deletions), and see if that frees up disk
>> space w/o closing?
>>
>> Mike
>>
>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <[hidden email]> wrote:
>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>>> cases:
>>> 1) open a writer, optimize, commit, close
>>> 2) open a writer, open a reader from the writer, optimize, commit, close
>>> 3) same as 2) except the reader is opened while the optimize is done in a
>>> different thread
>>>
>>> During all the tests, I monitor the size of the index on the disk. The
>>> results are:
>>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>>> optimize=81Mb,
>>> after commit=40Mb,                            after writer close=40Mb
>>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>>> optimize=104Mb,
>>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>>> optimize=127Mb,
>>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>>
>>> From your different posts I assumed that a commit would have the same
>>> effect
>>> as a close as far as reclaiming disk space is concerned. however test
>>> cases
>>> 2 and 3 show that whether the reader is opened before or during the
>>> optimize
>>> we end up after commit with an index that is 2.5 times the nominal size.
>>> closing the reader does not change anything. only a close can get us the
>>> index back to nominal.
>>>
>>> What is the reason why the commit nor closing the reader can get us back
>>> to
>>> nominal?
>>> Do you recommend closing and recreating a new writer after an optimize?
>>>
>>> thanks
>>> vincent
>>>
>>>
>>> Michael McCandless-2 wrote:
>>>>
>>>> OK, I'll add that to the javadocs; thanks.
>>>>
>>>> But the fact that you weren't closing the old readers was probably
>>>> also tying up lots of disk space...
>>>>
>>>> Mike
>>>>
>>>
>>> --
>>> View this message in context:
>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>>
>
> --
> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Searching while optimizing

Michael McCandless-2
OK I opened https://issues.apache.org/jira/browse/LUCENE-2097 to track this.

Thanks v.sevel!

Mike

On Sun, Nov 29, 2009 at 5:57 AM, Michael McCandless
<[hidden email]> wrote:

> OK I dug down on this one... it's actually a bug in IndexWriter, when
> used in near real-time mode *and* when CFS is enabled.  In that case,
> internally IndexWriter holds open the wrong SegmentReader, thus tying
> up more disk space than it should.
>
> Functionally, the bug is harmless -- it's just tying up disk space.
>
> I've boiled your example down to a test case.
>
> Thanks for catching & reporting this! I'll open an issue.
>
> If it's a problem, you can workaround the bug by either turning off
> CFS, or, using IndexReader.open (& reopen) to get your reader, instead
> of the near real-time writer. getReader() method.
>
> Mike
>
> On Sat, Nov 28, 2009 at 3:02 PM, vsevel <[hidden email]> wrote:
>>
>> Hi, thanks for the explanations. Though I had no luck...
>>
>> I now do the close of the reader before the commit. But still, only the
>> close get us back to nominal. Here is the complete test:
>>
>>    @Test
>>    public void optimize() throws Exception {
>>        final File dir = new File("lucene_work/optimize");
>>        dir.mkdirs();
>>
>>        for (File f : dir.listFiles()) {
>>            f.delete();
>>        }
>>
>>        Assert.assertEquals(0, dir.listFiles().length);
>>
>>        Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_CURRENT);
>>        MaxFieldLength maxLength = IndexWriter.MaxFieldLength.UNLIMITED;
>>        IndexWriter writer = new IndexWriter(FSDirectory.open(dir),
>> analyzer, true, maxLength);
>>        monitorIndexSize(dir);
>>        long time = 2000;
>>
>>        log.info("writing...");
>>        for (int i = 0; i < 1000000; i++) {
>>            Document doc = new Document();
>>            doc.add(new Field("foo", "bar " + i, Store.YES,
>> Index.NOT_ANALYZED));
>>            writer.addDocument(doc);
>>        }
>>
>>        writer.commit();
>>        log.info("done write");
>>        Thread.sleep(time);
>>
>>        log.info("opening reader...");
>>        IndexReader reader = writer.getReader();
>>        log.info("done open reader");
>>        Thread.sleep(time);
>>
>>        log.info("optimizing...");
>>        writer.optimize();
>>        log.info("done optimize");
>>        Thread.sleep(time);
>>
>>        log.info("closing reader...");
>>        reader.close();
>>        log.info("done reader close");
>>        Thread.sleep(time);
>>
>>        log.info("committing...");
>>        writer.commit();
>>        log.info("done commit");
>>        Thread.sleep(time);
>>
>>        log.info("closing writer...");
>>        writer.close();
>>        log.info("done writer close");
>>        Thread.sleep(time);
>>    }
>>
>> And an exec log:
>>
>> 15:58:46,875  INFO logserver.LuceneSystemTest     writing...
>> 15:58:46,875  INFO logserver.LuceneSystemTest     size=0Mb
>> 15:58:47,891  INFO logserver.LuceneSystemTest     size=1Mb
>> 15:58:48,891  INFO logserver.LuceneSystemTest     size=3Mb
>> 15:58:49,891  INFO logserver.LuceneSystemTest     size=5Mb
>> 15:58:50,906  INFO logserver.LuceneSystemTest     size=8Mb
>> 15:58:51,906  INFO logserver.LuceneSystemTest     size=9Mb
>> 15:58:52,906  INFO logserver.LuceneSystemTest     size=12Mb
>> 15:58:53,922  INFO logserver.LuceneSystemTest     size=14Mb
>> 15:58:54,984  INFO logserver.LuceneSystemTest     size=15Mb
>> 15:58:55,984  INFO logserver.LuceneSystemTest     size=18Mb
>> 15:58:56,984  INFO logserver.LuceneSystemTest     size=20Mb
>> 15:58:58,000  INFO logserver.LuceneSystemTest     size=21Mb
>> 15:58:59,000  INFO logserver.LuceneSystemTest     size=25Mb
>> 15:59:00,016  INFO logserver.LuceneSystemTest     size=27Mb
>> 15:59:01,016  INFO logserver.LuceneSystemTest     size=29Mb
>> 15:59:02,016  INFO logserver.LuceneSystemTest     size=52Mb
>> 15:59:03,031  INFO logserver.LuceneSystemTest     size=52Mb
>> 15:59:04,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:04,328  INFO logserver.LuceneSystemTest     done write
>> 15:59:05,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:06,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:06,328  INFO logserver.LuceneSystemTest     opening reader...
>> 15:59:06,453  INFO logserver.LuceneSystemTest     done open reader
>> 15:59:07,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:08,031  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:08,453  INFO logserver.LuceneSystemTest     optimizing...
>> 15:59:09,047  INFO logserver.LuceneSystemTest     size=34Mb
>> 15:59:10,047  INFO logserver.LuceneSystemTest     size=37Mb
>> 15:59:11,047  INFO logserver.LuceneSystemTest     size=40Mb
>> 15:59:12,047  INFO logserver.LuceneSystemTest     size=42Mb
>> 15:59:12,391  INFO logserver.LuceneSystemTest     done optimize
>> 15:59:13,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:14,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:14,391  INFO logserver.LuceneSystemTest     closing reader...
>> 15:59:14,406  INFO logserver.LuceneSystemTest     done reader close
>> 15:59:15,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:16,062  INFO logserver.LuceneSystemTest     size=55Mb
>> 15:59:16,406  INFO logserver.LuceneSystemTest     committing...
>> 15:59:16,469  INFO logserver.LuceneSystemTest     done commit
>> 15:59:17,062  INFO logserver.LuceneSystemTest     size=43Mb
>> 15:59:18,062  INFO logserver.LuceneSystemTest     size=43Mb
>> 15:59:18,469  INFO logserver.LuceneSystemTest     closing writer...
>> 15:59:18,484  INFO logserver.LuceneSystemTest     done writer close
>> 15:59:19,062  INFO logserver.LuceneSystemTest     size=32Mb
>> 15:59:20,078  INFO logserver.LuceneSystemTest     size=32Mb
>>
>> I guess I would be able to do a close and reopen if really I need to. But if
>> there is a nicer and more natural solution, I would love to know about it.
>>
>> thanks,
>> vincent
>>
>>
>> Michael McCandless-2 wrote:
>>>
>>> Phew, thanks for testing!  It's all explainable...
>>>
>>> When you have a reader open, it prevents the segments it had opened
>>> from being deleted.
>>>
>>> When you close that reader, the segments could be deleted, however,
>>> that won't happen until the writer next tries to delete, which it does
>>> only periodically (eg, on flushing a new segment, committing a new
>>> merge, etc.).
>>>
>>> Could you try closing your reader, then calling writer.commit() (which
>>> is a no-op, since you had already committed, but it may tickle the
>>> writer into attempting the deletions), and see if that frees up disk
>>> space w/o closing?
>>>
>>> Mike
>>>
>>> On Fri, Nov 27, 2009 at 4:12 PM, vsevel <[hidden email]> wrote:
>>>> I am starting my tests with an unoptimized 40Mb index. I have 3 test
>>>> cases:
>>>> 1) open a writer, optimize, commit, close
>>>> 2) open a writer, open a reader from the writer, optimize, commit, close
>>>> 3) same as 2) except the reader is opened while the optimize is done in a
>>>> different thread
>>>>
>>>> During all the tests, I monitor the size of the index on the disk. The
>>>> results are:
>>>> 1) initial=41Mb, before end of optimize=122Mb, after end of
>>>> optimize=81Mb,
>>>> after commit=40Mb,                            after writer close=40Mb
>>>> 2) initial=41Mb, before end of optimize=122Mb, after end of
>>>> optimize=104Mb,
>>>> after commit=104Mb, after reader close=104Mb, after writer close=40Mb
>>>> 3) initial=41Mb, before end of optimize=145Mb, after end of
>>>> optimize=127Mb,
>>>> after commit=103Mb, after reader close=103Mb, after writer close=40Mb
>>>>
>>>> From your different posts I assumed that a commit would have the same
>>>> effect
>>>> as a close as far as reclaiming disk space is concerned. however test
>>>> cases
>>>> 2 and 3 show that whether the reader is opened before or during the
>>>> optimize
>>>> we end up after commit with an index that is 2.5 times the nominal size.
>>>> closing the reader does not change anything. only a close can get us the
>>>> index back to nominal.
>>>>
>>>> What is the reason why the commit nor closing the reader can get us back
>>>> to
>>>> nominal?
>>>> Do you recommend closing and recreating a new writer after an optimize?
>>>>
>>>> thanks
>>>> vincent
>>>>
>>>>
>>>> Michael McCandless-2 wrote:
>>>>>
>>>>> OK, I'll add that to the javadocs; thanks.
>>>>>
>>>>> But the fact that you weren't closing the old readers was probably
>>>>> also tying up lots of disk space...
>>>>>
>>>>> Mike
>>>>>
>>>>
>>>> --
>>>> View this message in context:
>>>> http://old.nabble.com/Searching-while-optimizing-tp26485138p26545384.html
>>>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>>>
>>>>
>>>> ---------------------------------------------------------------------
>>>> To unsubscribe, e-mail: [hidden email]
>>>> For additional commands, e-mail: [hidden email]
>>>>
>>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: [hidden email]
>>> For additional commands, e-mail: [hidden email]
>>>
>>>
>>>
>>
>> --
>> View this message in context: http://old.nabble.com/Searching-while-optimizing-tp26485138p26556468.html
>> Sent from the Lucene - Java Users mailing list archive at Nabble.com.
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]