Re: eternal optimize interrupted

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Re: eternal optimize interrupted

hossman

(replying to solr-user thread over onto solr-dev)

: > last evening we started an optimize over our solr index of 45GB. This morning
: > the optimize was still running, discs spinning like crazy and de index
: > directory has grew to 83GB.
:
: Hmmm, it was probably code to done given that 45*2=90.
: But with that size of an index, and given that solr/tomcat wasn't
: responsive, and that there was a lot of disk IO, perhaps the system
: was swapping?

random thought here, but for really big indexes, would iterative partial
optimizes result in less disk (and in theory: less swap) then doing a full
optimize?

With a full optimize, the original segment files have to remain until the
entire optimize is finishe,d hence the 2x disk usage ... but if you
continuously send partial optimize commands (with maxSegments- one less
then the current number of segments) then on each iteration the old
segment files could be cleaned up.

If i remember correctly: a full optimize is just iterative merging the
smallest two segments anyway, which means (unless i'm smoking crack)
iterative partial merges should take the same amount of time -- and use
less disk.

what do the segment merging experts think?  does this sound right?


which begs the quesiton: should <optimize/> do this automaticly for
people?  In a generic lucene app, a "full optimize" needs to work the way
it does so any other threads/apps trying to open the index get either the
original index or the new fully optimized index; but we don't really have
that limitation in Solr ... we could do the iteration yourself, and just
hold off on firing any postOptimize or newSearcher events.



-Hoss