how long should optimizing take

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

how long should optimizing take

Angelov, Rossen
Hi,

I'm having problems with the Lucene optimization. Two of the indexes are
about 2BG big and every day about 30 documents are added to each of these
indexes. At the end of the indexing the IndexWriter optimize() method is
executed and it takes about 30 minutes to finish the optimization for each
index.

The indexing happens through a web service. A servlet takes an http request
and executes methods to index the new documents and optimize the indexes.

The problem is that the request takes too long to finish because of the
optimization and the web server doesn't return a response. The browser will
keep waiting forever.

Has anybody else experienced similar behavior with the optimization process?

Thanks,
Ross


"This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution."

Reply | Threaded
Open this post in threaded view
|

RE: how long should optimizing take

Angelov, Rossen
I would like to bring that issue up again as I haven't resolved it yet and
haven't found what's causing it.

Any help, ideas or sharing experience are welcome!

Thanks,
Ross

-----Original Message-----
From: Angelov, Rossen
Sent: Friday, May 27, 2005 10:42 AM
To: '[hidden email]'
Subject: how long should optimizing take


Hi,

I'm having problems with the Lucene optimization. Two of the indexes are
about 2BG big and every day about 30 documents are added to each of these
indexes. At the end of the indexing the IndexWriter optimize() method is
executed and it takes about 30 minutes to finish the optimization for each
index.

The indexing happens through a web service. A servlet takes an http request
and executes methods to index the new documents and optimize the indexes.

The problem is that the request takes too long to finish because of the
optimization and the web server doesn't return a response. The browser will
keep waiting forever.

Has anybody else experienced similar behavior with the optimization process?

Thanks,
Ross

"This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution."



"This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution."

Reply | Threaded
Open this post in threaded view
|

Re: how long should optimizing take

Dan Armbrust
I would run your optimize process in a separate thread, so that your web
client doesn't have to wait for it to return.

You may even want to set the optimize part up to run on a weekly
schedule, at a low load time.  I probably wouldn't reoptimize after
every 30 documents, on an index that size.

Optimizing takes a while on your index, because it basically has to copy
the entire index to a new index, so it will take how ever long it takes
to copy 2 GB's on your hardware + a small amount of overhead...

Dan

Angelov, Rossen wrote:

>I would like to bring that issue up again as I haven't resolved it yet and
>haven't found what's causing it.
>
>Any help, ideas or sharing experience are welcome!
>
>Thanks,
>Ross
>
>-----Original Message-----
>From: Angelov, Rossen
>Sent: Friday, May 27, 2005 10:42 AM
>To: '[hidden email]'
>Subject: how long should optimizing take
>
>
>Hi,
>
>I'm having problems with the Lucene optimization. Two of the indexes are
>about 2BG big and every day about 30 documents are added to each of these
>indexes. At the end of the indexing the IndexWriter optimize() method is
>executed and it takes about 30 minutes to finish the optimization for each
>index.
>
>The indexing happens through a web service. A servlet takes an http request
>and executes methods to index the new documents and optimize the indexes.
>
>The problem is that the request takes too long to finish because of the
>optimization and the web server doesn't return a response. The browser will
>keep waiting forever.
>
>Has anybody else experienced similar behavior with the optimization process?
>
>Thanks,
>Ross
>
>"This communication is intended solely for the addressee and is
>confidential and not for third party unauthorized distribution."
>
>
>
>"This communication is intended solely for the addressee and is
>confidential and not for third party unauthorized distribution."
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: how long should optimizing take

jian chen
In reply to this post by Angelov, Rossen
Hi,

optimize() merges the index segments into one single index segment. In
your case, I guess the 2G index segment is quite large, if you merge
it with any other small index segments, the merging process definitely
will be slow.

I think the performance should be ok without calling optimize().
Moreover, could you call optimize() every several days, say, every
week?

Cheers,

Jian

I think you don't need to call optimize() that often, given you only
have 30 documents each day to be added to the index.

On 6/2/05, Angelov, Rossen <[hidden email]> wrote:

> I would like to bring that issue up again as I haven't resolved it yet and
> haven't found what's causing it.
>
> Any help, ideas or sharing experience are welcome!
>
> Thanks,
> Ross
>
> -----Original Message-----
> From: Angelov, Rossen
> Sent: Friday, May 27, 2005 10:42 AM
> To: '[hidden email]'
> Subject: how long should optimizing take
>
>
> Hi,
>
> I'm having problems with the Lucene optimization. Two of the indexes are
> about 2BG big and every day about 30 documents are added to each of these
> indexes. At the end of the indexing the IndexWriter optimize() method is
> executed and it takes about 30 minutes to finish the optimization for each
> index.
>
> The indexing happens through a web service. A servlet takes an http request
> and executes methods to index the new documents and optimize the indexes.
>
> The problem is that the request takes too long to finish because of the
> optimization and the web server doesn't return a response. The browser will
> keep waiting forever.
>
> Has anybody else experienced similar behavior with the optimization process?
>
> Thanks,
> Ross
>
> "This communication is intended solely for the addressee and is
> confidential and not for third party unauthorized distribution."
>
>
>
> "This communication is intended solely for the addressee and is
> confidential and not for third party unauthorized distribution."
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: how long should optimizing take

Angelov, Rossen
In reply to this post by Angelov, Rossen
Thanks for the suggestion, Jian Chen's idea is very similar too.
Probably optimizing that often is not necessary and not that critical for
speeding up the searches.

I'll try changing the index process not to optimize at all and execute the
optimization independently of the indexing on a weekly bases.

Ross

-----Original Message-----
From: Dan Armbrust [mailto:[hidden email]]
Sent: Thursday, June 02, 2005 11:10 AM
To: [hidden email]
Subject: Re: how long should optimizing take


I would run your optimize process in a separate thread, so that your web
client doesn't have to wait for it to return.

You may even want to set the optimize part up to run on a weekly
schedule, at a low load time.  I probably wouldn't reoptimize after
every 30 documents, on an index that size.

Optimizing takes a while on your index, because it basically has to copy
the entire index to a new index, so it will take how ever long it takes
to copy 2 GB's on your hardware + a small amount of overhead...

Dan

Angelov, Rossen wrote:

>I would like to bring that issue up again as I haven't resolved it yet and
>haven't found what's causing it.
>
>Any help, ideas or sharing experience are welcome!
>
>Thanks,
>Ross
>
>-----Original Message-----
>From: Angelov, Rossen
>Sent: Friday, May 27, 2005 10:42 AM
>To: '[hidden email]'
>Subject: how long should optimizing take
>
>
>Hi,
>
>I'm having problems with the Lucene optimization. Two of the indexes are
>about 2BG big and every day about 30 documents are added to each of these
>indexes. At the end of the indexing the IndexWriter optimize() method is
>executed and it takes about 30 minutes to finish the optimization for each
>index.
>
>The indexing happens through a web service. A servlet takes an http request
>and executes methods to index the new documents and optimize the indexes.
>
>The problem is that the request takes too long to finish because of the
>optimization and the web server doesn't return a response. The browser will
>keep waiting forever.
>
>Has anybody else experienced similar behavior with the optimization
process?

>
>Thanks,
>Ross
>
>"This communication is intended solely for the addressee and is
>confidential and not for third party unauthorized distribution."
>
>
>
>"This communication is intended solely for the addressee and is
>confidential and not for third party unauthorized distribution."
>
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


"This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution."

Reply | Threaded
Open this post in threaded view
|

Re: how long should optimizing take

Dan Armbrust
You should be careful, however, not to end up with two VM instances each
trying to open an index writer at the same time - one of them is going
to fail.

Aka, if someone using your web interface tries to add a new document to
the index while you have the optimizer running standalone, the web
interface is not going to be able to get a lock on the index to add the
documents.

Dan

Angelov, Rossen wrote:

>Thanks for the suggestion, Jian Chen's idea is very similar too.
>Probably optimizing that often is not necessary and not that critical for
>speeding up the searches.
>
>I'll try changing the index process not to optimize at all and execute the
>optimization independently of the indexing on a weekly bases.
>
>Ross
>
>  
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: how long should optimizing take

Angelov, Rossen
In reply to this post by Angelov, Rossen
I'll make sure no indexing is started before the optimization is done.
Most likely Sunday will be the optimization day for the indexes and every
other night the documents will be added to the index.

Only searching will be available through the web service while optimizing,
but this should not be a problem as an IndexReader will be opened, not a
second IndexWriter.

Ross

-----Original Message-----
From: Dan Armbrust [mailto:[hidden email]]
Sent: Thursday, June 02, 2005 3:10 PM
To: [hidden email]
Subject: Re: how long should optimizing take


You should be careful, however, not to end up with two VM instances each
trying to open an index writer at the same time - one of them is going
to fail.

Aka, if someone using your web interface tries to add a new document to
the index while you have the optimizer running standalone, the web
interface is not going to be able to get a lock on the index to add the
documents.

Dan

Angelov, Rossen wrote:

>Thanks for the suggestion, Jian Chen's idea is very similar too.
>Probably optimizing that often is not necessary and not that critical for
>speeding up the searches.
>
>I'll try changing the index process not to optimize at all and execute the
>optimization independently of the indexing on a weekly bases.
>
>Ross
>
>  
>
>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


"This communication is intended solely for the addressee and is
confidential and not for third party unauthorized distribution."