IndexWriter forceOptimize() ?

classic Classic list List threaded Threaded
14 messages Options
Reply | Threaded
Open this post in threaded view
|

IndexWriter forceOptimize() ?

Otis Gospodnetic-2
Hi,

What do people here think about adding forceOptimize() to IndexWriter?

  public synchronized void forceOptimize() throws IOException {
      flushRamSegments();
      int minSegment = segmentInfos.size() - mergeFactor;
      mergeSegments(minSegment < 0 ? 0 : minSegment);      
  }

I need it for https://issues.apache.org/jira/browse/LUCENE-741 (Field Norms Modifier), which I wrote to work with multi-file indices, which means that if there are any CFS index files in an index, I need to expand those first.  Is there a better way to extract a CFS file in an index that may also contain some non-CFS segments?  There is a CfsExtractor tool in https://issues.apache.org/jira/browse/LUCENE-770 now, maybe that will do for LUCENE-741.... haven't tried it yet.

Otis




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Doron Cohen
Otis Gospodnetic <[hidden email]> wrote on 11/01/2007 06:25:59:

> Hi,
>
> What do people here think about adding forceOptimize() to IndexWriter?
>
>   public synchronized void forceOptimize() throws IOException {
>       flushRamSegments();
>       int minSegment = segmentInfos.size() - mergeFactor;
>       mergeSegments(minSegment < 0 ? 0 : minSegment);
>   }
>
> I need it for https://issues.apache.org/jira/browse/LUCENE-741
> (Field Norms Modifier), which I wrote to work with multi-file
> indices, which means that if there are any CFS index files in an
> index, I need to expand those first.  Is there a better way to
> extract a CFS file in an index that may also contain some non-CFS
> segments?  There is a CfsExtractor tool in https://issues.apache.
> org/jira/browse/LUCENE-770 now, maybe that will do for
> LUCENE-741.... haven't tried it yet.
>
> Otis

I think one (non-performant) external way to move an index from CFS to
non-CFS is:
  1. open in non-CFS mode
  2. add one (empty) doc
  3. optimize
  4. (optionally) remove last doc and (optionally) optimize again


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Otis Gospodnetic-2
In reply to this post by Otis Gospodnetic-2
Hi Doron,

Yeah, you are right, adding that (empty) Doc would force the optimize to actually optimize.  I was trying to avoid doing that and forceOptimize() looked cleaner.... but I'm not sure if others would agree.  Are there other situations where one would want to force index optimization even if none of those conditions in optimize() are true?

I'd actually appreciate it if you could look at https://issues.apache.org/jira/browse/LUCENE-741 .  The code can completely remove norms for a given field, but this assumes a pre-.nrm index structure (.fN field norms files).  I'm not sure yet how to deal with .nrm, so if you have a quick solution to plug into the code in LUCENE-741, that would be great.

Thanks,
Otis

----- Original Message ----
From: Doron Cohen <[hidden email]>
To: [hidden email]
Sent: Thursday, January 11, 2007 12:16:35 PM
Subject: Re: IndexWriter forceOptimize() ?

Otis Gospodnetic <[hidden email]> wrote on 11/01/2007 06:25:59:

> Hi,
>
> What do people here think about adding forceOptimize() to IndexWriter?
>
>   public synchronized void forceOptimize() throws IOException {
>       flushRamSegments();
>       int minSegment = segmentInfos.size() - mergeFactor;
>       mergeSegments(minSegment < 0 ? 0 : minSegment);
>   }
>
> I need it for https://issues.apache.org/jira/browse/LUCENE-741
> (Field Norms Modifier), which I wrote to work with multi-file
> indices, which means that if there are any CFS index files in an
> index, I need to expand those first.  Is there a better way to
> extract a CFS file in an index that may also contain some non-CFS
> segments?  There is a CfsExtractor tool in https://issues.apache.
> org/jira/browse/LUCENE-770 now, maybe that will do for
> LUCENE-741.... haven't tried it yet.
>
> Otis

I think one (non-performant) external way to move an index from CFS to
non-CFS is:
  1. open in non-CFS mode
  2. add one (empty) doc
  3. optimize
  4. (optionally) remove last doc and (optionally) optimize again


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Chris Hostetter-3
In reply to this post by Otis Gospodnetic-2

: What do people here think about adding forceOptimize() to IndexWriter?

I like the idea, but i don't have any value add to offer to the discussion
of wether the implimentation you suggest is "safe" ... in particular i
notice that the current optimize method is an iterative loop, presumably
to make surethat mergeSegments gets called as many times as it needs to
based on segmentInfos.size() .. your version doesn't seem to have that, so
does that mean your new version wouldn't allways result in a single
segment?

another suggestin i have is with the API ... instead of calling it
"forceOptimize" perhaps the current noarg optimize method should be
deprecated, and replaced with a new optimize(boolean force) where
force==true means an optimize will be done, and force==false means an
optimize will be done if the IndexWriter feels it should be done ... this
would also address my above concern (assuming it's valid)...

@deprecated use optimize(false)
public synchronized void optimize() throws IOException { optimize(false); }
public synchronized void optimize(boolean force) throws IOException {
  flushRamSegments();
  while (force ||
         (segmentInfos.size() > 1 ||
          (segmentInfos.size() == 1 &&
           (SegmentReader.hasDeletions(segmentInfos.info(0)) ||
            segmentInfos.info(0).dir != directory ||
            (useCompoundFile &&
             (!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
               SegmentReader.hasSeparateNorms(segmentInfos.info(0)))))))) {
    int minSegment = segmentInfos.size() - mergeFactor;
    mergeSegments(segmentInfos, minSegment < 0 ? 0 : minSegment, segmentInfos.size());
  }
}


-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Robert Engels
I agree with the boolean addition.

optimize(false) is a request to maybe optimize, optimize(true) always  
should optimize to a single segment

optimize(false) might check some parameter as to the maximum number  
of segments allowed before an actual optimize if performed.


On Jan 11, 2007, at 2:47 PM, Chris Hostetter wrote:

>
> : What do people here think about adding forceOptimize() to  
> IndexWriter?
>
> I like the idea, but i don't have any value add to offer to the  
> discussion
> of wether the implimentation you suggest is "safe" ... in particular i
> notice that the current optimize method is an iterative loop,  
> presumably
> to make surethat mergeSegments gets called as many times as it  
> needs to
> based on segmentInfos.size() .. your version doesn't seem to have  
> that, so
> does that mean your new version wouldn't allways result in a single
> segment?
>
> another suggestin i have is with the API ... instead of calling it
> "forceOptimize" perhaps the current noarg optimize method should be
> deprecated, and replaced with a new optimize(boolean force) where
> force==true means an optimize will be done, and force==false means an
> optimize will be done if the IndexWriter feels it should be  
> done ... this
> would also address my above concern (assuming it's valid)...
>
> @deprecated use optimize(false)
> public synchronized void optimize() throws IOException { optimize
> (false); }
> public synchronized void optimize(boolean force) throws IOException {
>   flushRamSegments();
>   while (force ||
>          (segmentInfos.size() > 1 ||
>           (segmentInfos.size() == 1 &&
>            (SegmentReader.hasDeletions(segmentInfos.info(0)) ||
>             segmentInfos.info(0).dir != directory ||
>             (useCompoundFile &&
>              (!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
>                SegmentReader.hasSeparateNorms(segmentInfos.info
> (0)))))))) {
>     int minSegment = segmentInfos.size() - mergeFactor;
>     mergeSegments(segmentInfos, minSegment < 0 ? 0 : minSegment,  
> segmentInfos.size());
>   }
> }
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Chris Hostetter-3

: optimize(false) is a request to maybe optimize, optimize(true) always
: should optimize to a single segment
:
: optimize(false) might check some parameter as to the maximum number
: of segments allowed before an actual optimize if performed.

maybe it should be optimize(int minSegmentCountToSkip), with
optimize(0) forcing an optimize even if there is only 1 segment, and
optimize() remaining undeprecated and using a "sensible default" (whatever
that may be ... 1 perhaps?)



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Doron Cohen
In reply to this post by Otis Gospodnetic-2
Otis Gospodnetic <[hidden email]> wrote on 11/01/2007 09:30:08:
>
> I'd actually appreciate it if you could look at https://issues.
> apache.org/jira/browse/LUCENE-741 .  The code can completely remove
> norms for a given field, but this assumes a pre-.nrm index structure
> (.fN field norms files).  I'm not sure yet how to deal with .nrm, so
> if you have a quick solution to plug into the code in LUCENE-741,
> that would be great.

Okay, sure, see new comments in lucene-741

>
> Thanks,
> Otis


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Yonik Seeley-2
In reply to this post by Chris Hostetter-3
On 1/11/07, Chris Hostetter <[hidden email]> wrote:
> maybe it should be optimize(int minSegmentCountToSkip), with
> optimize(0) forcing an optimize even if there is only 1 segment, and
> optimize() remaining undeprecated and using a "sensible default" (whatever
> that may be ... 1 perhaps?)

If we are going to expose that there are multiple segments, I wouldn't
mind a direct API to get the number of segments (for IndexReader and
IndexWriter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Otis Gospodnetic-2
In reply to this post by Otis Gospodnetic-2
Yeah, I actually had:

public int segments() { return segmentInfos.size(); }

in my IndexReader, but then erased it precisely because I thought this was exposing too much about the impl.
I think optimize(int) that Chris mentioned exposes too much.  I thought about having optimize(boolean force) in place of optimize(), but then we'd have to deprecate, so I opted for forceOptimize() that, I feel exposes a little less.
But I'm looking to hear what others think before committing LUCENE-741, which includes this forceOptimize() addition.

Otis

----- Original Message ----
From: Yonik Seeley <[hidden email]>
To: [hidden email]
Sent: Thursday, January 11, 2007 5:08:26 PM
Subject: Re: IndexWriter forceOptimize() ?

On 1/11/07, Chris Hostetter <[hidden email]> wrote:
> maybe it should be optimize(int minSegmentCountToSkip), with
> optimize(0) forcing an optimize even if there is only 1 segment, and
> optimize() remaining undeprecated and using a "sensible default" (whatever
> that may be ... 1 perhaps?)

If we are going to expose that there are multiple segments, I wouldn't
mind a direct API to get the number of segments (for IndexReader and
IndexWriter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Otis Gospodnetic-2
In reply to this post by Otis Gospodnetic-2
Doron,

Maybe my browser is misbehaving, but I don't see your comments in http://issues.apache.org/jira/browse/LUCENE-741 .  Didn't see the JIRA email with them either...

Otis

----- Original Message ----
From: Doron Cohen <[hidden email]>
To: [hidden email]
Sent: Thursday, January 11, 2007 4:27:49 PM
Subject: Re: IndexWriter forceOptimize() ?

Otis Gospodnetic <[hidden email]> wrote on 11/01/2007 09:30:08:
>
> I'd actually appreciate it if you could look at https://issues.
> apache.org/jira/browse/LUCENE-741 .  The code can completely remove
> norms for a given field, but this assumes a pre-.nrm index structure
> (.fN field norms files).  I'm not sure yet how to deal with .nrm, so
> if you have a quick solution to plug into the code in LUCENE-741,
> that would be great.

Okay, sure, see new comments in lucene-741

>
> Thanks,
> Otis


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Chris Hostetter-3
In reply to this post by Otis Gospodnetic-2

: I think optimize(int) that Chris mentioned exposes too much.  I thought
: about having optimize(boolean force) in place of optimize(), but then
: we'd have to deprecate, so I opted for forceOptimize() that, I feel
: exposes a little less.

i have no strong feelings about exposing the number of segments, or having
optimize(int) ... but i would prefer optimize(boolean) over forceOptimize
.. because it saves apps (like Solr) from needing to have code like this
to drive their behavior...

   if (someBooleanValue) {
      writer.forceOptimize();
   } else {
      writer.optimize();
   }



----- Original Message ----
From: Yonik Seeley <[hidden email]>
To: [hidden email]
Sent: Thursday, January 11, 2007 5:08:26 PM
Subject: Re: IndexWriter forceOptimize() ?

On 1/11/07, Chris Hostetter <[hidden email]> wrote:
> maybe it should be optimize(int minSegmentCountToSkip), with
> optimize(0) forcing an optimize even if there is only 1 segment, and
> optimize() remaining undeprecated and using a "sensible default" (whatever
> that may be ... 1 perhaps?)

If we are going to expose that there are multiple segments, I wouldn't
mind a direct API to get the number of segments (for IndexReader and
IndexWriter).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




:
: ---------------------------------------------------------------------
: To unsubscribe, e-mail: [hidden email]
: For additional commands, e-mail: [hidden email]
:



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Otis Gospodnetic-2
In reply to this post by Otis Gospodnetic-2
One day I read email in a different order, I miss replies like this.
If optimize(boolean force) looks more attractive than optimizeForce(), that's fine by me.  I just want to be able to force the cfs index, even if it's already optimized, to expand.  Getting it to have a single segment is just a nice bonus here for me.

Regarding that while loop.... it looks like iteration is not needed to force reoptimization.  I've tested it with CFS and non-CFS indices, with optimized and unoptimized indices, with and without deletions, and after forced optimization I always ended up with a single segment:

        sis = new SegmentInfos();
        sis.read(dir);
        System.out.println("SEGS: " + sis.size());

If nobody speaks up until the weekend, I'll add optimize(boolean force).  We can leave optimize() and make it call optimize(false);

Otis

----- Original Message ----
From: robert engels <[hidden email]>
To: [hidden email]
Sent: Thursday, January 11, 2007 3:55:29 PM
Subject: Re: IndexWriter forceOptimize() ?

I agree with the boolean addition.

optimize(false) is a request to maybe optimize, optimize(true) always  
should optimize to a single segment

optimize(false) might check some parameter as to the maximum number  
of segments allowed before an actual optimize if performed.


On Jan 11, 2007, at 2:47 PM, Chris Hostetter wrote:

>
> : What do people here think about adding forceOptimize() to  
> IndexWriter?
>
> I like the idea, but i don't have any value add to offer to the  
> discussion
> of wether the implimentation you suggest is "safe" ... in particular i
> notice that the current optimize method is an iterative loop,  
> presumably
> to make surethat mergeSegments gets called as many times as it  
> needs to
> based on segmentInfos.size() .. your version doesn't seem to have  
> that, so
> does that mean your new version wouldn't allways result in a single
> segment?
>
> another suggestin i have is with the API ... instead of calling it
> "forceOptimize" perhaps the current noarg optimize method should be
> deprecated, and replaced with a new optimize(boolean force) where
> force==true means an optimize will be done, and force==false means an
> optimize will be done if the IndexWriter feels it should be  
> done ... this
> would also address my above concern (assuming it's valid)...
>
> @deprecated use optimize(false)
> public synchronized void optimize() throws IOException { optimize
> (false); }
> public synchronized void optimize(boolean force) throws IOException {
>   flushRamSegments();
>   while (force ||
>          (segmentInfos.size() > 1 ||
>           (segmentInfos.size() == 1 &&
>            (SegmentReader.hasDeletions(segmentInfos.info(0)) ||
>             segmentInfos.info(0).dir != directory ||
>             (useCompoundFile &&
>              (!SegmentReader.usesCompoundFile(segmentInfos.info(0)) ||
>                SegmentReader.hasSeparateNorms(segmentInfos.info
> (0)))))))) {
>     int minSegment = segmentInfos.size() - mergeFactor;
>     mergeSegments(segmentInfos, minSegment < 0 ? 0 : minSegment,  
> segmentInfos.size());
>   }
> }
>
>
> -Hoss
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Doron Cohen
In reply to this post by Otis Gospodnetic-2

Otis Gospodnetic <[hidden email]> wrote on 11/01/2007 20:17:31:

> Doron,
>
> Maybe my browser is misbehaving, but I don't see your comments in
> http://issues.apache.org/jira/browse/LUCENE-741 .  Didn't see the
> JIRA email with them either...
>
> Otis

Otis, your browser is perfect, just that I was distracted with stg else...
it is there now!


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: IndexWriter forceOptimize() ?

Yonik Seeley-2
In reply to this post by Otis Gospodnetic-2
On 1/11/07, Otis Gospodnetic <[hidden email]> wrote:
> Yeah, I actually had:
>
> public int segments() { return segmentInfos.size(); }
>
> in my IndexReader, but then erased it precisely because I thought this was exposing too much about the impl.

That was my first instinct, but then again, we do expose mergeFactor
and maxMergeDocs, both of which make no sense w/o understanding the
underlying merge model and the fact that there are multiple segments.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]