Re: GData, updateable IndexSearcher

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

jason rutherglen-2
This originated on the Solr mailing list.

> That's the way Lucene changes.

I was thinking you implied that you knew of someone who had customized their own, but it was a closed source solution.  And if so then you would know how that project faired.  

I definitely sounds like an interesting project, it will take me several days to digest the design you described.  As this would be used with Solr I wonder if there would be a good way to also update the Solr caches.  Wouldn't there also need to be a hack on the IndexWriter to keep track of new segments?

----- Original Message ----
From: Doug Cutting <[hidden email]>
To: [hidden email]
Sent: Wednesday, April 26, 2006 11:27:44 AM
Subject: Re: GData, updateable IndexSearcher

jason rutherglen wrote:
> Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to become part of Lucene?

In general, there is no plan for Lucene.  If someone implements a
generally useful, efficient, feature in a back-compatible, easy to use,
manner, and submits it as a patch, then it becomes a part of Lucene.
That's the way Lucene changes.  Since we don't pay anyone, we can't make
plans and assign tasks.  So if you're particularly interested in this
feature, you might search the archives to find past efforts, or simply
try to implement it yourself.

I think a good approach would be to create a new IndexSearcher instance
based on an existing one, that shares IndexReaders.  Similarly, one
should be able to create a new IndexReader based on an existing one.
This would be a MultiReader that shares many of the same SegmentReaders.

Things get a little tricky after this.

Lucene caches filters based on the IndexReader.  So filters would need
to be re-created.  Ideally these could be incrementally re-created, but
that might be difficult.  What might be simpler would be to use a
MultiSearcher constructed with an IndexSearcher per SegmentReader,
avoiding the use of MultiReader.  Then the caches would still work.
This would require making a few things public that are not at present.
Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an
'static IndexReader.reopen(IndexReader)' method.  The latter would
return a new MultiReader that shared SegmentReaders with the old
version.  Then one could use getSubReaders() on the new multi reader to
extract the current set to use when constructing a MultiSearcher.

Another tricky bit is figuring out when to close readers.

Does this make sense?  This discussion should probably move to the
lucene-dev list.

> Are there any negatives to updateable IndexSearchers?  

Not if implemented well!

Doug



Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

Yonik Seeley
On 4/26/06, jason rutherglen <[hidden email]> wrote:
> As this would be used with Solr I wonder if there would be a good way to also update the Solr caches.

Other than re-executing queries that generated the results? Probably not.
The nice thing about knowing exactly when the view of an index changes
(and having it only change once in a while), is that you can do very
aggressive caching.

If you want an IndexSearcher who's view of the index changed every
second (for example), I don't think Solr's type of caching would be
useful at all (or even possible, if you have big caches and
autowarming).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

Doug Cutting
In reply to this post by jason rutherglen-2
jason rutherglen wrote:
> I was thinking you implied that you knew of someone who had customized their own, but it was a closed source solution.  And if so then you would know how that project faired.  

I don't recall the details, but I know folks have discussed this
previously, and probably even posted patches, but I don't think any of
the patches was ready to commit.

> Wouldn't there also need to be a hack on the IndexWriter to keep track of new segments?

I think the 'public static IndexReader.reopen(IndexReader old)' method I
proposed can easily compare the current list of segments for the
directory of old to those that old already has open, and determine which
can be reused and which new segments must be opened.  Deletions would be
a little tricky to track.  If a segment has had deletions, then a new
SegmentReader could be cloned from the old, sharing everything but the
deletions, which could be re-read from disk.  This would invalidate
cached filters for segments that had deletions.

You could even try to figure out what documents have been deleted, then
update filters incrementally.  That would be fastest, but more complicated.

Doug

> ----- Original Message ----
> From: Doug Cutting <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, April 26, 2006 11:27:44 AM
> Subject: Re: GData, updateable IndexSearcher
>
> jason rutherglen wrote:
>
>>Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to become part of Lucene?
>
>
> In general, there is no plan for Lucene.  If someone implements a
> generally useful, efficient, feature in a back-compatible, easy to use,
> manner, and submits it as a patch, then it becomes a part of Lucene.
> That's the way Lucene changes.  Since we don't pay anyone, we can't make
> plans and assign tasks.  So if you're particularly interested in this
> feature, you might search the archives to find past efforts, or simply
> try to implement it yourself.
>
> I think a good approach would be to create a new IndexSearcher instance
> based on an existing one, that shares IndexReaders.  Similarly, one
> should be able to create a new IndexReader based on an existing one.
> This would be a MultiReader that shares many of the same SegmentReaders.
>
> Things get a little tricky after this.
>
> Lucene caches filters based on the IndexReader.  So filters would need
> to be re-created.  Ideally these could be incrementally re-created, but
> that might be difficult.  What might be simpler would be to use a
> MultiSearcher constructed with an IndexSearcher per SegmentReader,
> avoiding the use of MultiReader.  Then the caches would still work.
> This would require making a few things public that are not at present.
> Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an
> 'static IndexReader.reopen(IndexReader)' method.  The latter would
> return a new MultiReader that shared SegmentReaders with the old
> version.  Then one could use getSubReaders() on the new multi reader to
> extract the current set to use when constructing a MultiSearcher.
>
> Another tricky bit is figuring out when to close readers.
>
> Does this make sense?  This discussion should probably move to the
> lucene-dev list.
>
>
>>Are there any negatives to updateable IndexSearchers?  
>
>
> Not if implemented well!
>
> Doug
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

Chuck Williams-2
If I'm following this correctly, it omits a related issue which is the
need to periodically close and reopen the IndexWriter in order to flush
its internal RAMDirectory, and similarly for the IndexReader used for
delete.  Is there any good solution to avoid these as well?

My app has an IndexManager class that is somewhat like the built-in
IndexModifier, except that it contains all index operations:
IndexSearcher, IndexReader for search, IndexReader for delete,
IndexWriter and IndexUpdater (my own class that covers many more use
cases than the contributed patch of the same name).  IndexManager tracks
whether or not the index has changed, and spawns a RefreshThread that
periodically commits all updates (close and reopen whatever is open of
IndexWriter, IndexReader for delete and IndexUpdater) and then reopens
the searcher and search reader.

The notion of incrementally updating the searcher is great.  Is there
any way to also avoid closing the writer and delete reader?

Chuck


Doug Cutting wrote on 04/26/2006 10:44 AM:

> jason rutherglen wrote:
>> I was thinking you implied that you knew of someone who had
>> customized their own, but it was a closed source solution.  And if so
>> then you would know how that project faired.  
>
> I don't recall the details, but I know folks have discussed this
> previously, and probably even posted patches, but I don't think any of
> the patches was ready to commit.
>
>> Wouldn't there also need to be a hack on the IndexWriter to keep
>> track of new segments?
>
> I think the 'public static IndexReader.reopen(IndexReader old)' method
> I proposed can easily compare the current list of segments for the
> directory of old to those that old already has open, and determine
> which can be reused and which new segments must be opened.  Deletions
> would be a little tricky to track.  If a segment has had deletions,
> then a new SegmentReader could be cloned from the old, sharing
> everything but the deletions, which could be re-read from disk.  This
> would invalidate cached filters for segments that had deletions.
>
> You could even try to figure out what documents have been deleted,
> then update filters incrementally.  That would be fastest, but more
> complicated.
>
> Doug
>
>> ----- Original Message ----
>> From: Doug Cutting <[hidden email]>
>> To: [hidden email]
>> Sent: Wednesday, April 26, 2006 11:27:44 AM
>> Subject: Re: GData, updateable IndexSearcher
>>
>> jason rutherglen wrote:
>>
>>> Interesting, does this mean there is a plan for incrementally
>>> updateable IndexSearchers to become part of Lucene?
>>
>>
>> In general, there is no plan for Lucene.  If someone implements a
>> generally useful, efficient, feature in a back-compatible, easy to
>> use, manner, and submits it as a patch, then it becomes a part of
>> Lucene. That's the way Lucene changes.  Since we don't pay anyone, we
>> can't make plans and assign tasks.  So if you're particularly
>> interested in this feature, you might search the archives to find
>> past efforts, or simply try to implement it yourself.
>>
>> I think a good approach would be to create a new IndexSearcher
>> instance based on an existing one, that shares IndexReaders.
>> Similarly, one should be able to create a new IndexReader based on an
>> existing one. This would be a MultiReader that shares many of the
>> same SegmentReaders.
>>
>> Things get a little tricky after this.
>>
>> Lucene caches filters based on the IndexReader.  So filters would
>> need to be re-created.  Ideally these could be incrementally
>> re-created, but that might be difficult.  What might be simpler would
>> be to use a MultiSearcher constructed with an IndexSearcher per
>> SegmentReader, avoiding the use of MultiReader.  Then the caches
>> would still work. This would require making a few things public that
>> are not at present. Perhaps adding a 'MultiReader.getSubReaders()'
>> method, combined with an 'static IndexReader.reopen(IndexReader)'
>> method.  The latter would return a new MultiReader that shared
>> SegmentReaders with the old version.  Then one could use
>> getSubReaders() on the new multi reader to extract the current set to
>> use when constructing a MultiSearcher.
>>
>> Another tricky bit is figuring out when to close readers.
>>
>> Does this make sense?  This discussion should probably move to the
>> lucene-dev list.
>>
>>
>>> Are there any negatives to updateable IndexSearchers?  
>>
>>
>> Not if implemented well!
>>
>> Doug
>>
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

jason rutherglen-2
In reply to this post by Doug Cutting
> I think the 'public static IndexReader.reopen(IndexReader old)' method I proposed can easily compare the current list of segments for the directory of old to those that old already has open, and determine which can be reused and which new segments must be opened.

This makes sense.  Could you describe how the new segments would be known, or where in the code they can be loaded?  Where in the design would there need to be synchronization blocks?

----- Original Message ----
From: Doug Cutting <[hidden email]>
To: [hidden email]
Sent: Wednesday, April 26, 2006 1:44:08 PM
Subject: Re: GData, updateable IndexSearcher

jason rutherglen wrote:
> I was thinking you implied that you knew of someone who had customized their own, but it was a closed source solution.  And if so then you would know how that project faired.  

I don't recall the details, but I know folks have discussed this
previously, and probably even posted patches, but I don't think any of
the patches was ready to commit.

> Wouldn't there also need to be a hack on the IndexWriter to keep track of new segments?

I think the 'public static IndexReader.reopen(IndexReader old)' method I
proposed can easily compare the current list of segments for the
directory of old to those that old already has open, and determine which
can be reused and which new segments must be opened.  Deletions would be
a little tricky to track.  If a segment has had deletions, then a new
SegmentReader could be cloned from the old, sharing everything but the
deletions, which could be re-read from disk.  This would invalidate
cached filters for segments that had deletions.

You could even try to figure out what documents have been deleted, then
update filters incrementally.  That would be fastest, but more complicated.

Doug

> ----- Original Message ----
> From: Doug Cutting <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, April 26, 2006 11:27:44 AM
> Subject: Re: GData, updateable IndexSearcher
>
> jason rutherglen wrote:
>
>>Interesting, does this mean there is a plan for incrementally updateable IndexSearchers to become part of Lucene?
>
>
> In general, there is no plan for Lucene.  If someone implements a
> generally useful, efficient, feature in a back-compatible, easy to use,
> manner, and submits it as a patch, then it becomes a part of Lucene.
> That's the way Lucene changes.  Since we don't pay anyone, we can't make
> plans and assign tasks.  So if you're particularly interested in this
> feature, you might search the archives to find past efforts, or simply
> try to implement it yourself.
>
> I think a good approach would be to create a new IndexSearcher instance
> based on an existing one, that shares IndexReaders.  Similarly, one
> should be able to create a new IndexReader based on an existing one.
> This would be a MultiReader that shares many of the same SegmentReaders.
>
> Things get a little tricky after this.
>
> Lucene caches filters based on the IndexReader.  So filters would need
> to be re-created.  Ideally these could be incrementally re-created, but
> that might be difficult.  What might be simpler would be to use a
> MultiSearcher constructed with an IndexSearcher per SegmentReader,
> avoiding the use of MultiReader.  Then the caches would still work.
> This would require making a few things public that are not at present.
> Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an
> 'static IndexReader.reopen(IndexReader)' method.  The latter would
> return a new MultiReader that shared SegmentReaders with the old
> version.  Then one could use getSubReaders() on the new multi reader to
> extract the current set to use when constructing a MultiSearcher.
>
> Another tricky bit is figuring out when to close readers.
>
> Does this make sense?  This discussion should probably move to the
> lucene-dev list.
>
>
>>Are there any negatives to updateable IndexSearchers?  
>
>
> Not if implemented well!
>
> Doug
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




Reply | Threaded
Open this post in threaded view
|

RE: GData, updateable IndexSearcher

Robert Engels
In reply to this post by Doug Cutting
Doug can you please elaborate on this.

I thought each segment maintained its own list of deleted documents (since
segments are WRITE ONCE, and when that segment is merged or optimized it
would "go away" anyway, as the deleted documents are removed.

In my reopen() implementation, I check to see if the existing segment name
is the same as an already open segment, and then just use the existing
SegmentInfo object (since it should still have reference to its deleted
documents).

For example,

Index has 3 (1-3) segments. A new document is written that causes a segment
to be created (4). A reopen would retain the SegmentInfo for 1-3, and create
a new one for 4.

It would be no different if segment 2 had deleted documents when the
creation of segment 4 occurs, segment 2 is not modified in this case.

If adding the new document, which creates a new segment, caused a merge,
segment 2 would be rewritten (and the deletions processed), so the segment
name for 2 would no longer be valid anyway, and the SegmentInfo would not
reused.

I've had this code in production for almost 2 years and have not seen any
problems - trying to get a handle on the possibility that our code may be
"unstable".

-----Original Message-----
From: Doug Cutting [mailto:[hidden email]]
Sent: Wednesday, April 26, 2006 3:44 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


jason rutherglen wrote:
> I was thinking you implied that you knew of someone who had customized
their own, but it was a closed source solution.  And if so then you would
know how that project faired.

I don't recall the details, but I know folks have discussed this
previously, and probably even posted patches, but I don't think any of
the patches was ready to commit.

> Wouldn't there also need to be a hack on the IndexWriter to keep track of
new segments?

I think the 'public static IndexReader.reopen(IndexReader old)' method I
proposed can easily compare the current list of segments for the
directory of old to those that old already has open, and determine which
can be reused and which new segments must be opened.  Deletions would be
a little tricky to track.  If a segment has had deletions, then a new
SegmentReader could be cloned from the old, sharing everything but the
deletions, which could be re-read from disk.  This would invalidate
cached filters for segments that had deletions.

You could even try to figure out what documents have been deleted, then
update filters incrementally.  That would be fastest, but more complicated.

Doug

> ----- Original Message ----
> From: Doug Cutting <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, April 26, 2006 11:27:44 AM
> Subject: Re: GData, updateable IndexSearcher
>
> jason rutherglen wrote:
>
>>Interesting, does this mean there is a plan for incrementally updateable
IndexSearchers to become part of Lucene?

>
>
> In general, there is no plan for Lucene.  If someone implements a
> generally useful, efficient, feature in a back-compatible, easy to use,
> manner, and submits it as a patch, then it becomes a part of Lucene.
> That's the way Lucene changes.  Since we don't pay anyone, we can't make
> plans and assign tasks.  So if you're particularly interested in this
> feature, you might search the archives to find past efforts, or simply
> try to implement it yourself.
>
> I think a good approach would be to create a new IndexSearcher instance
> based on an existing one, that shares IndexReaders.  Similarly, one
> should be able to create a new IndexReader based on an existing one.
> This would be a MultiReader that shares many of the same SegmentReaders.
>
> Things get a little tricky after this.
>
> Lucene caches filters based on the IndexReader.  So filters would need
> to be re-created.  Ideally these could be incrementally re-created, but
> that might be difficult.  What might be simpler would be to use a
> MultiSearcher constructed with an IndexSearcher per SegmentReader,
> avoiding the use of MultiReader.  Then the caches would still work.
> This would require making a few things public that are not at present.
> Perhaps adding a 'MultiReader.getSubReaders()' method, combined with an
> 'static IndexReader.reopen(IndexReader)' method.  The latter would
> return a new MultiReader that shared SegmentReaders with the old
> version.  Then one could use getSubReaders() on the new multi reader to
> extract the current set to use when constructing a MultiSearcher.
>
> Another tricky bit is figuring out when to close readers.
>
> Does this make sense?  This discussion should probably move to the
> lucene-dev list.
>
>
>>Are there any negatives to updateable IndexSearchers?
>
>
> Not if implemented well!
>
> Doug
>
>
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

Yonik Seeley
On 4/27/06, Robert Engels <[hidden email]> wrote:
> I thought each segment maintained its own list of deleted documents

Right.

> (since segments are WRITE ONCE

Yes, but deletions are the exception to that rule.  Once written,
segment files never change, except for the file that tracks deleted
documents for that segment.

Hence, if the segment name is the same, you should be able to count on
everything being unchanged *except* for which documents are deleted.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: GData, updateable IndexSearcher

Vanlerberghe, Luc
In reply to this post by jason rutherglen-2
Here are some remarks from what I learned by inspecting the code (quite
a while ago now, but the principle shouldn't have changed)...

When an IndexReader opens the segments of an index it
- grabs the commit lock,
- reads the "segments" file for the list of segment names.
- opens the files for each segment (except the .del one),
- *loads* the .del files associated with each segment (if present) and
then
- releases the commit lock.

The segment files never change, and the .del files are loaded in memory
so an open IndexReader will always have the same view of its segments,
even if the .del files are changed by an other IndexReader.

So if you want to implement reopen() of a segment, you should be fine by
just reloading the .del file in memory for that segment (while holding
the commit lock of course).

Luc

-----Original Message-----
From: Yonik Seeley [mailto:[hidden email]]
Sent: donderdag 27 april 2006 20:30
To: [hidden email]; [hidden email]
Subject: Re: GData, updateable IndexSearcher

On 4/27/06, Robert Engels <[hidden email]> wrote:
> I thought each segment maintained its own list of deleted documents

Right.

> (since segments are WRITE ONCE

Yes, but deletions are the exception to that rule.  Once written,
segment files never change, except for the file that tracks deleted
documents for that segment.

Hence, if the segment name is the same, you should be able to count on
everything being unchanged *except* for which documents are deleted.

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search
server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

Doug Cutting
Vanlerberghe, Luc wrote:
> So if you want to implement reopen() of a segment, you should be fine by
> just reloading the .del file in memory for that segment (while holding
> the commit lock of course).

Correct.  However if Filters were cached based on the IndexReader (as is
standard) then these filters would now be invalid.  So it would be
safest to clone the IndexReader, reusing all datastructures except the
deletions which would be re-read.  This would force the filters to be
re-cached based on the new deletions.

Another complication is determining whether deletions have changed.  We
don't have a per-segment version number.  We could either add a version
to the segment (perhaps even in the deletions file) or store the date of
the deletions file when it is read and compare that.  There are problems
using file modification times (poor resolution on some OSes, etc.) but
that is the simplest approach.

Doug


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

Yonik Seeley
On 4/28/06, Doug Cutting <[hidden email]> wrote:
> Another complication is determining whether deletions have changed.  We
> don't have a per-segment version number.  We could either add a version
> to the segment (perhaps even in the deletions file)

Deletions (.del) --> ByteCount,BitCount,Bits
ByteSize,BitCount --> Uint32
Bits --> <Byte>ByteCount

If ByteCount is a Uint32, it's high bit will never be 1, so we could add
a flags field first that always had that bit set to be able to tell
the difference between an old .del file and a new style one.  The
flags would also leave room for future expansion (different
representations of a bitvector when the number of documents deleted is
very small, etc).

Deletions (.del) --> DelFlags, DelVersion, ByteCount, BitCount, Bits
DelFlags,DelVersion,ByteCount,BitCount --> Uint32
Bits --> <Byte>ByteCount

-Yonik
http://incubator.apache.org/solr Solr, the open-source Lucene search server

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

jason rutherglen-2
In reply to this post by jason rutherglen-2
I wanted to post a quick hack to see if it is along the correct lines.  A few of the questions regard whether to resuse existing MultiReaders or simply strip out only the SegmentReaders.  I do a compare on the segment name and made it public.  Thanks!


public static IndexReader reopen(IndexReader indexReader) throws IOException {
    if (indexReader instanceof MultiReader) {
      MultiReader multiReader = (MultiReader)indexReader;
     
      SegmentInfos segmentInfos = new SegmentInfos();
      segmentInfos.read(indexReader.directory());
      if (segmentInfos.size() == 1) {          // index is optimized
        return SegmentReader.get(segmentInfos, segmentInfos.info(0), false);
      }
     
      IndexReader[] existingIndexReaders = multiReader.getSubReaders();
      // now go through and compare the segment readers
      Map<String,SegmentReader> existingSegmentMap = new HashMap<String,SegmentReader>();
      getSegmentReaders(existingIndexReaders, existingSegmentMap);
     
      Map<String,SegmentInfo> newSegmentInfosMap = new HashMap<String,SegmentInfo>();
     
      List<SegmentReader> newSegmentReaders = new ArrayList<SegmentReader>();
     
      Iterator segmentInfosIterator = segmentInfos.iterator();
      while (segmentInfosIterator.hasNext()) {
        SegmentInfo segmentInfo = (SegmentInfo)segmentInfosIterator.next();
       
        if (!existingSegmentMap.containsKey(segmentInfo.name)) {
          // it's new
          SegmentReader newSegmentReader = SegmentReader.get(segmentInfo);
          newSegmentReaders.add(newSegmentReader);
        }
      }
      List<IndexReader> allSegmentReaders = new ArrayList<IndexReader>();
      allSegmentReaders.add(multiReader);
      allSegmentReaders.addAll(newSegmentReaders);
     
      return new MultiReader(indexReader.directory(), segmentInfos, false, (IndexReader[])allSegmentReaders.toArray(new IndexReader[0]));
    }
    throw new RuntimeException("indexReader not supported at this time");
  }
 
  public static void getSegmentReaders(IndexReader[] indexReaders, Map<String,SegmentReader> map) {
    for (int x=0; x < indexReaders.length; x++) {
      if (indexReaders[x] instanceof MultiReader) {
        MultiReader multiReader = (MultiReader)indexReaders[x];
        IndexReader[] subReaders = multiReader.getSubReaders();
        getSegmentReaders(subReaders, map);
      } else if (indexReaders[x] instanceof SegmentReader) {
        SegmentReader segmentReader = (SegmentReader)indexReaders[x];
        map.put(segmentReader.segment, segmentReader);
      }
    }
  }

Reply | Threaded
Open this post in threaded view
|

RE: GData, updateable IndexSearcher

Robert Engels
fyi, using my reopen(0 implementation (which rereads the deletions)

on a 135mb index, with 5000 iterations

open & close time using new reader = 585609
open & close time using reopen = 27422

Almost 20x faster. Important in a highly interactive/incremental updating
index.

-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:24 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


I wanted to post a quick hack to see if it is along the correct lines.  A
few of the questions regard whether to resuse existing MultiReaders or
simply strip out only the SegmentReaders.  I do a compare on the segment
name and made it public.  Thanks!


public static IndexReader reopen(IndexReader indexReader) throws IOException
{
    if (indexReader instanceof MultiReader) {
      MultiReader multiReader = (MultiReader)indexReader;

      SegmentInfos segmentInfos = new SegmentInfos();
      segmentInfos.read(indexReader.directory());
      if (segmentInfos.size() == 1) {          // index is optimized
        return SegmentReader.get(segmentInfos, segmentInfos.info(0), false);
      }

      IndexReader[] existingIndexReaders = multiReader.getSubReaders();
      // now go through and compare the segment readers
      Map<String,SegmentReader> existingSegmentMap = new
HashMap<String,SegmentReader>();
      getSegmentReaders(existingIndexReaders, existingSegmentMap);

      Map<String,SegmentInfo> newSegmentInfosMap = new
HashMap<String,SegmentInfo>();

      List<SegmentReader> newSegmentReaders = new
ArrayList<SegmentReader>();

      Iterator segmentInfosIterator = segmentInfos.iterator();
      while (segmentInfosIterator.hasNext()) {
        SegmentInfo segmentInfo = (SegmentInfo)segmentInfosIterator.next();

        if (!existingSegmentMap.containsKey(segmentInfo.name)) {
          // it's new
          SegmentReader newSegmentReader = SegmentReader.get(segmentInfo);
          newSegmentReaders.add(newSegmentReader);
        }
      }
      List<IndexReader> allSegmentReaders = new ArrayList<IndexReader>();
      allSegmentReaders.add(multiReader);
      allSegmentReaders.addAll(newSegmentReaders);

      return new MultiReader(indexReader.directory(), segmentInfos, false,
(IndexReader[])allSegmentReaders.toArray(new IndexReader[0]));
    }
    throw new RuntimeException("indexReader not supported at this time");
  }

  public static void getSegmentReaders(IndexReader[] indexReaders,
Map<String,SegmentReader> map) {
    for (int x=0; x < indexReaders.length; x++) {
      if (indexReaders[x] instanceof MultiReader) {
        MultiReader multiReader = (MultiReader)indexReaders[x];
        IndexReader[] subReaders = multiReader.getSubReaders();
        getSegmentReaders(subReaders, map);
      } else if (indexReaders[x] instanceof SegmentReader) {
        SegmentReader segmentReader = (SegmentReader)indexReaders[x];
        map.put(segmentReader.segment, segmentReader);
      }
    }
  }



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

jason rutherglen-2
Can you post your code?

----- Original Message ----
From: Robert Engels <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Monday, May 1, 2006 11:33:06 AM
Subject: RE: GData, updateable IndexSearcher

fyi, using my reopen(0 implementation (which rereads the deletions)

on a 135mb index, with 5000 iterations

open & close time using new reader = 585609
open & close time using reopen = 27422

Almost 20x faster. Important in a highly interactive/incremental updating
index.

-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:24 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


I wanted to post a quick hack to see if it is along the correct lines.  A
few of the questions regard whether to resuse existing MultiReaders or
simply strip out only the SegmentReaders.  I do a compare on the segment
name and made it public.  Thanks!


public static IndexReader reopen(IndexReader indexReader) throws IOException
{
    if (indexReader instanceof MultiReader) {
      MultiReader multiReader = (MultiReader)indexReader;

      SegmentInfos segmentInfos = new SegmentInfos();
      segmentInfos.read(indexReader.directory());
      if (segmentInfos.size() == 1) {          // index is optimized
        return SegmentReader.get(segmentInfos, segmentInfos.info(0), false);
      }

      IndexReader[] existingIndexReaders = multiReader.getSubReaders();
      // now go through and compare the segment readers
      Map<String,SegmentReader> existingSegmentMap = new
HashMap<String,SegmentReader>();
      getSegmentReaders(existingIndexReaders, existingSegmentMap);

      Map<String,SegmentInfo> newSegmentInfosMap = new
HashMap<String,SegmentInfo>();

      List<SegmentReader> newSegmentReaders = new
ArrayList<SegmentReader>();

      Iterator segmentInfosIterator = segmentInfos.iterator();
      while (segmentInfosIterator.hasNext()) {
        SegmentInfo segmentInfo = (SegmentInfo)segmentInfosIterator.next();

        if (!existingSegmentMap.containsKey(segmentInfo.name)) {
          // it's new
          SegmentReader newSegmentReader = SegmentReader.get(segmentInfo);
          newSegmentReaders.add(newSegmentReader);
        }
      }
      List<IndexReader> allSegmentReaders = new ArrayList<IndexReader>();
      allSegmentReaders.add(multiReader);
      allSegmentReaders.addAll(newSegmentReaders);

      return new MultiReader(indexReader.directory(), segmentInfos, false,
(IndexReader[])allSegmentReaders.toArray(new IndexReader[0]));
    }
    throw new RuntimeException("indexReader not supported at this time");
  }

  public static void getSegmentReaders(IndexReader[] indexReaders,
Map<String,SegmentReader> map) {
    for (int x=0; x < indexReaders.length; x++) {
      if (indexReaders[x] instanceof MultiReader) {
        MultiReader multiReader = (MultiReader)indexReaders[x];
        IndexReader[] subReaders = multiReader.getSubReaders();
        getSegmentReaders(subReaders, map);
      } else if (indexReaders[x] instanceof SegmentReader) {
        SegmentReader segmentReader = (SegmentReader)indexReaders[x];
        map.put(segmentReader.segment, segmentReader);
      }
    }
  }





Reply | Threaded
Open this post in threaded view
|

RE: GData, updateable IndexSearcher

Robert Engels
Attached.

It uses subclasses and instanceof which is sort of "hackish" - to do it
correctly requires changes to the base classes.



-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:43 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


Can you post your code?

----- Original Message ----
From: Robert Engels <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Monday, May 1, 2006 11:33:06 AM
Subject: RE: GData, updateable IndexSearcher

fyi, using my reopen(0 implementation (which rereads the deletions)

on a 135mb index, with 5000 iterations

open & close time using new reader = 585609
open & close time using reopen = 27422

Almost 20x faster. Important in a highly interactive/incremental updating
index.

-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:24 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


I wanted to post a quick hack to see if it is along the correct lines.  A
few of the questions regard whether to resuse existing MultiReaders or
simply strip out only the SegmentReaders.  I do a compare on the segment
name and made it public.  Thanks!


public static IndexReader reopen(IndexReader indexReader) throws IOException
{
    if (indexReader instanceof MultiReader) {
      MultiReader multiReader = (MultiReader)indexReader;

      SegmentInfos segmentInfos = new SegmentInfos();
      segmentInfos.read(indexReader.directory());
      if (segmentInfos.size() == 1) {          // index is optimized
        return SegmentReader.get(segmentInfos, segmentInfos.info(0), false);
      }

      IndexReader[] existingIndexReaders = multiReader.getSubReaders();
      // now go through and compare the segment readers
      Map<String,SegmentReader> existingSegmentMap = new
HashMap<String,SegmentReader>();
      getSegmentReaders(existingIndexReaders, existingSegmentMap);

      Map<String,SegmentInfo> newSegmentInfosMap = new
HashMap<String,SegmentInfo>();

      List<SegmentReader> newSegmentReaders = new
ArrayList<SegmentReader>();

      Iterator segmentInfosIterator = segmentInfos.iterator();
      while (segmentInfosIterator.hasNext()) {
        SegmentInfo segmentInfo = (SegmentInfo)segmentInfosIterator.next();

        if (!existingSegmentMap.containsKey(segmentInfo.name)) {
          // it's new
          SegmentReader newSegmentReader = SegmentReader.get(segmentInfo);
          newSegmentReaders.add(newSegmentReader);
        }
      }
      List<IndexReader> allSegmentReaders = new ArrayList<IndexReader>();
      allSegmentReaders.add(multiReader);
      allSegmentReaders.addAll(newSegmentReaders);

      return new MultiReader(indexReader.directory(), segmentInfos, false,
(IndexReader[])allSegmentReaders.toArray(new IndexReader[0]));
    }
    throw new RuntimeException("indexReader not supported at this time");
  }

  public static void getSegmentReaders(IndexReader[] indexReaders,
Map<String,SegmentReader> map) {
    for (int x=0; x < indexReaders.length; x++) {
      if (indexReaders[x] instanceof MultiReader) {
        MultiReader multiReader = (MultiReader)indexReaders[x];
        IndexReader[] subReaders = multiReader.getSubReaders();
        getSegmentReaders(subReaders, map);
      } else if (indexReaders[x] instanceof SegmentReader) {
        SegmentReader segmentReader = (SegmentReader)indexReaders[x];
        map.put(segmentReader.segment, segmentReader);
      }
    }
  }






---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

MyMultiReader.java (697 bytes) Download Attachment
IndexReaderUtils.java (5K) Download Attachment
MySegmentReader.java (607 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

Re: GData, updateable IndexSearcher

jason rutherglen-2
Thanks for the code and performance metric Robert.  Have you had any issues with the deleted segments as Doug has been describing?

----- Original Message ----
From: Robert Engels <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Monday, May 1, 2006 11:49:41 AM
Subject: RE: GData, updateable IndexSearcher

Attached.

It uses subclasses and instanceof which is sort of "hackish" - to do it
correctly requires changes to the base classes.



-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:43 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


Can you post your code?

----- Original Message ----
From: Robert Engels <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Monday, May 1, 2006 11:33:06 AM
Subject: RE: GData, updateable IndexSearcher

fyi, using my reopen(0 implementation (which rereads the deletions)

on a 135mb index, with 5000 iterations

open & close time using new reader = 585609
open & close time using reopen = 27422

Almost 20x faster. Important in a highly interactive/incremental updating
index.

-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:24 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


I wanted to post a quick hack to see if it is along the correct lines.  A
few of the questions regard whether to resuse existing MultiReaders or
simply strip out only the SegmentReaders.  I do a compare on the segment
name and made it public.  Thanks!


public static IndexReader reopen(IndexReader indexReader) throws IOException
{
    if (indexReader instanceof MultiReader) {
      MultiReader multiReader = (MultiReader)indexReader;

      SegmentInfos segmentInfos = new SegmentInfos();
      segmentInfos.read(indexReader.directory());
      if (segmentInfos.size() == 1) {          // index is optimized
        return SegmentReader.get(segmentInfos, segmentInfos.info(0), false);
      }

      IndexReader[] existingIndexReaders = multiReader.getSubReaders();
      // now go through and compare the segment readers
      Map<String,SegmentReader> existingSegmentMap = new
HashMap<String,SegmentReader>();
      getSegmentReaders(existingIndexReaders, existingSegmentMap);

      Map<String,SegmentInfo> newSegmentInfosMap = new
HashMap<String,SegmentInfo>();

      List<SegmentReader> newSegmentReaders = new
ArrayList<SegmentReader>();

      Iterator segmentInfosIterator = segmentInfos.iterator();
      while (segmentInfosIterator.hasNext()) {
        SegmentInfo segmentInfo = (SegmentInfo)segmentInfosIterator.next();

        if (!existingSegmentMap.containsKey(segmentInfo.name)) {
          // it's new
          SegmentReader newSegmentReader = SegmentReader.get(segmentInfo);
          newSegmentReaders.add(newSegmentReader);
        }
      }
      List<IndexReader> allSegmentReaders = new ArrayList<IndexReader>();
      allSegmentReaders.add(multiReader);
      allSegmentReaders.addAll(newSegmentReaders);

      return new MultiReader(indexReader.directory(), segmentInfos, false,
(IndexReader[])allSegmentReaders.toArray(new IndexReader[0]));
    }
    throw new RuntimeException("indexReader not supported at this time");
  }

  public static void getSegmentReaders(IndexReader[] indexReaders,
Map<String,SegmentReader> map) {
    for (int x=0; x < indexReaders.length; x++) {
      if (indexReaders[x] instanceof MultiReader) {
        MultiReader multiReader = (MultiReader)indexReaders[x];
        IndexReader[] subReaders = multiReader.getSubReaders();
        getSegmentReaders(subReaders, map);
      } else if (indexReaders[x] instanceof SegmentReader) {
        SegmentReader segmentReader = (SegmentReader)indexReaders[x];
        map.put(segmentReader.segment, segmentReader);
      }
    }
  }








-----Inline Attachment Follows-----

package org.apache.lucene.index;

import java.io.IOException;

import org.apache.lucene.store.Directory;

/**
 * overridden to allow retrieval of contained IndexReader's to enable IndexReaderUtils.reopen()
 */
public class MyMultiReader extends MultiReader {

    private IndexReader[] readers;
   
    public MyMultiReader(Directory directory,SegmentInfos infos,IndexReader[] subReaders) throws IOException {
        super(directory,infos,true,subReaders);
        readers = subReaders;
    }
   
    public IndexReader[] getReaders() {
        return readers;
    }
   
    public void doCommit() throws IOException {
        super.doCommit();
    }
}



-----Inline Attachment Follows-----

package org.apache.lucene.index;

import java.io.IOException;
import java.util.*;

import org.apache.lucene.store.*;

public class IndexReaderUtils {
    private static Map segments = new WeakHashMap();
    static {
        // must use String class name, otherwise instantiation order will not allow the override to work
        System.setProperty("org.apache.lucene.SegmentReader.class","org.apache.lucene.index.MySegmentReader");
    }
   
    /**
     * reopens the IndexReader, possibly reusing the segments for greater efficiency. The original IndexReader instance
     * is closed, and the reference is no longer valid
     *
     * @return the new IndexReader
     */
    public static synchronized IndexReader reopen(IndexReader ir) throws IOException {
        final Directory directory = ir.directory();
       
        if(!(ir instanceof MyMultiReader)) {
            SegmentInfos infos = new SegmentInfos();
            infos.read(directory);
            IndexReader[] readers = new IndexReader[infos.size()];
            for(int i=0;i<infos.size();i++){
                readers[i] = MySegmentReader.get((SegmentInfo) infos.get(i));
            }
//            System.err.println("reopen, fresh reader with "+infos.size()+" segments");
            return new MyMultiReader(directory,infos,readers);
        }
       
        MyMultiReader mr = (MyMultiReader) ir;
       
        final IndexReader[] oldreaders = mr.getReaders();
        final boolean[] stayopen = new boolean[oldreaders.length];
       
        synchronized (directory) {            // in- & inter-process sync
            return (IndexReader)new Lock.With(
                directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
                IndexWriter.COMMIT_LOCK_TIMEOUT) {
                public Object doBody() throws IOException {
                  SegmentInfos infos = new SegmentInfos();
                  infos.read(directory);
                  if (infos.size() == 1) {        // index is optimized
//                      System.err.println("single segment during reopen");
                    return MySegmentReader.get(infos.info(0));
                  } else {
//                    System.err.println("reopen, has "+infos.size()+" segments");
                    IndexReader[] readers = new IndexReader[infos.size()];
                    for (int i = 0; i < infos.size(); i++) {
                        SegmentInfo newsi = (SegmentInfo) infos.get(i);
                        for(int j=0;j<oldreaders.length;j++) {
                            SegmentReader sr = (SegmentReader) oldreaders[j];
                            SegmentInfo si = (SegmentInfo) segments.get(sr);
                            if(si!=null && si.name.equals(newsi.name)) {
                                readers[i]=sr;
                                ((MySegmentReader)sr).reopen();
                                stayopen[j]=true;
//                                System.err.println("keeping "+si.name+" on reopen");
                            }
                        }
                       
                        if(readers[i]==null) {
                            readers[i] = MySegmentReader.get(newsi);
                            segments.put(readers[i],newsi);
                        }
                    }
                   
                    for(int i=0;i<stayopen.length;i++)
                        if(!stayopen[i])
                            oldreaders[i].close();
                       
                    return new MyMultiReader(directory,infos,readers);
                  }
                }
              }.run();
          }
    }

    public static synchronized IndexReader open(String path) throws IOException {
        Directory d = FSDirectory.getDirectory(path,false);
        return open(d,true);
    }
   
    private static IndexReader open(final Directory directory, final boolean closeDirectory) throws IOException {
        synchronized (directory) {            // in- & inter-process sync
          return (IndexReader)new Lock.With(
              directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
              IndexWriter.COMMIT_LOCK_TIMEOUT) {
              public Object doBody() throws IOException {
                SegmentInfos infos = new SegmentInfos();
                infos.read(directory);
                if (infos.size() == 1) {          // index is optimized
                  return MySegmentReader.get(infos.info(0));
                } else {
                  IndexReader[] readers = new IndexReader[infos.size()];
                  for (int i = 0; i < infos.size(); i++) {
                      SegmentInfo si = infos.info(i);
                    readers[i] = MySegmentReader.get(si);
                    segments.put(readers[i],si);
                  }
                  return new MyMultiReader(directory,infos,readers);
                }
              }
            }.run();
        }
    }
}



-----Inline Attachment Follows-----

package org.apache.lucene.index;

import java.io.IOException;

import org.apache.lucene.util.BitVector;

public class MySegmentReader extends SegmentReader {
    SegmentInfo si;

    public MySegmentReader() {
    }
   
    public void reopen() throws IOException {
        if (hasDeletions(si))
            deletedDocs = new BitVector(directory(), si.name + ".del");
    }
   
    public static SegmentReader get(SegmentInfo si) throws IOException {
        MySegmentReader reader = (MySegmentReader) SegmentReader.get(si);
        reader.si = si;
        return reader;
    }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


Reply | Threaded
Open this post in threaded view
|

RE: GData, updateable IndexSearcher

Robert Engels
I just sent an email covering that. The code I provided takes that into
account, but in re-reading the code, I do not think it is necessary.


-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 5:17 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


Thanks for the code and performance metric Robert.  Have you had any issues
with the deleted segments as Doug has been describing?

----- Original Message ----
From: Robert Engels <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Monday, May 1, 2006 11:49:41 AM
Subject: RE: GData, updateable IndexSearcher

Attached.

It uses subclasses and instanceof which is sort of "hackish" - to do it
correctly requires changes to the base classes.



-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:43 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


Can you post your code?

----- Original Message ----
From: Robert Engels <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Monday, May 1, 2006 11:33:06 AM
Subject: RE: GData, updateable IndexSearcher

fyi, using my reopen(0 implementation (which rereads the deletions)

on a 135mb index, with 5000 iterations

open & close time using new reader = 585609
open & close time using reopen = 27422

Almost 20x faster. Important in a highly interactive/incremental updating
index.

-----Original Message-----
From: jason rutherglen [mailto:[hidden email]]
Sent: Monday, May 01, 2006 1:24 PM
To: [hidden email]
Subject: Re: GData, updateable IndexSearcher


I wanted to post a quick hack to see if it is along the correct lines.  A
few of the questions regard whether to resuse existing MultiReaders or
simply strip out only the SegmentReaders.  I do a compare on the segment
name and made it public.  Thanks!


public static IndexReader reopen(IndexReader indexReader) throws IOException
{
    if (indexReader instanceof MultiReader) {
      MultiReader multiReader = (MultiReader)indexReader;

      SegmentInfos segmentInfos = new SegmentInfos();
      segmentInfos.read(indexReader.directory());
      if (segmentInfos.size() == 1) {          // index is optimized
        return SegmentReader.get(segmentInfos, segmentInfos.info(0), false);
      }

      IndexReader[] existingIndexReaders = multiReader.getSubReaders();
      // now go through and compare the segment readers
      Map<String,SegmentReader> existingSegmentMap = new
HashMap<String,SegmentReader>();
      getSegmentReaders(existingIndexReaders, existingSegmentMap);

      Map<String,SegmentInfo> newSegmentInfosMap = new
HashMap<String,SegmentInfo>();

      List<SegmentReader> newSegmentReaders = new
ArrayList<SegmentReader>();

      Iterator segmentInfosIterator = segmentInfos.iterator();
      while (segmentInfosIterator.hasNext()) {
        SegmentInfo segmentInfo = (SegmentInfo)segmentInfosIterator.next();

        if (!existingSegmentMap.containsKey(segmentInfo.name)) {
          // it's new
          SegmentReader newSegmentReader = SegmentReader.get(segmentInfo);
          newSegmentReaders.add(newSegmentReader);
        }
      }
      List<IndexReader> allSegmentReaders = new ArrayList<IndexReader>();
      allSegmentReaders.add(multiReader);
      allSegmentReaders.addAll(newSegmentReaders);

      return new MultiReader(indexReader.directory(), segmentInfos, false,
(IndexReader[])allSegmentReaders.toArray(new IndexReader[0]));
    }
    throw new RuntimeException("indexReader not supported at this time");
  }

  public static void getSegmentReaders(IndexReader[] indexReaders,
Map<String,SegmentReader> map) {
    for (int x=0; x < indexReaders.length; x++) {
      if (indexReaders[x] instanceof MultiReader) {
        MultiReader multiReader = (MultiReader)indexReaders[x];
        IndexReader[] subReaders = multiReader.getSubReaders();
        getSegmentReaders(subReaders, map);
      } else if (indexReaders[x] instanceof SegmentReader) {
        SegmentReader segmentReader = (SegmentReader)indexReaders[x];
        map.put(segmentReader.segment, segmentReader);
      }
    }
  }








-----Inline Attachment Follows-----

package org.apache.lucene.index;

import java.io.IOException;

import org.apache.lucene.store.Directory;

/**
 * overridden to allow retrieval of contained IndexReader's to enable
IndexReaderUtils.reopen()
 */
public class MyMultiReader extends MultiReader {

    private IndexReader[] readers;

    public MyMultiReader(Directory directory,SegmentInfos
infos,IndexReader[] subReaders) throws IOException {
        super(directory,infos,true,subReaders);
        readers = subReaders;
    }

    public IndexReader[] getReaders() {
        return readers;
    }

    public void doCommit() throws IOException {
        super.doCommit();
    }
}



-----Inline Attachment Follows-----

package org.apache.lucene.index;

import java.io.IOException;
import java.util.*;

import org.apache.lucene.store.*;

public class IndexReaderUtils {
    private static Map segments = new WeakHashMap();
    static {
        // must use String class name, otherwise instantiation order will
not allow the override to work
        System.setProperty("org.apache.lucene.SegmentReader.class","org.apac
he.lucene.index.MySegmentReader");
    }

    /**
     * reopens the IndexReader, possibly reusing the segments for greater
efficiency. The original IndexReader instance
     * is closed, and the reference is no longer valid
     *
     * @return the new IndexReader
     */
    public static synchronized IndexReader reopen(IndexReader ir) throws
IOException {
        final Directory directory = ir.directory();

        if(!(ir instanceof MyMultiReader)) {
            SegmentInfos infos = new SegmentInfos();
            infos.read(directory);
            IndexReader[] readers = new IndexReader[infos.size()];
            for(int i=0;i<infos.size();i++){
                readers[i] = MySegmentReader.get((SegmentInfo)
infos.get(i));
            }
//            System.err.println("reopen, fresh reader with "+infos.size()+"
segments");
            return new MyMultiReader(directory,infos,readers);
        }

        MyMultiReader mr = (MyMultiReader) ir;

        final IndexReader[] oldreaders = mr.getReaders();
        final boolean[] stayopen = new boolean[oldreaders.length];

        synchronized (directory) {            // in- & inter-process sync
            return (IndexReader)new Lock.With(
                directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
                IndexWriter.COMMIT_LOCK_TIMEOUT) {
                public Object doBody() throws IOException {
                  SegmentInfos infos = new SegmentInfos();
                  infos.read(directory);
                  if (infos.size() == 1) {        // index is optimized
//                      System.err.println("single segment during reopen");
                    return MySegmentReader.get(infos.info(0));
                  } else {
//                    System.err.println("reopen, has "+infos.size()+"
segments");
                    IndexReader[] readers = new IndexReader[infos.size()];
                    for (int i = 0; i < infos.size(); i++) {
                        SegmentInfo newsi = (SegmentInfo) infos.get(i);
                        for(int j=0;j<oldreaders.length;j++) {
                            SegmentReader sr = (SegmentReader)
oldreaders[j];
                            SegmentInfo si = (SegmentInfo) segments.get(sr);
                            if(si!=null && si.name.equals(newsi.name)) {
                                readers[i]=sr;
                                ((MySegmentReader)sr).reopen();
                                stayopen[j]=true;
//                                System.err.println("keeping "+si.name+" on
reopen");
                            }
                        }

                        if(readers[i]==null) {
                            readers[i] = MySegmentReader.get(newsi);
                            segments.put(readers[i],newsi);
                        }
                    }

                    for(int i=0;i<stayopen.length;i++)
                        if(!stayopen[i])
                            oldreaders[i].close();

                    return new MyMultiReader(directory,infos,readers);
                  }
                }
              }.run();
          }
    }

    public static synchronized IndexReader open(String path) throws
IOException {
        Directory d = FSDirectory.getDirectory(path,false);
        return open(d,true);
    }

    private static IndexReader open(final Directory directory, final boolean
closeDirectory) throws IOException {
        synchronized (directory) {            // in- & inter-process sync
          return (IndexReader)new Lock.With(
              directory.makeLock(IndexWriter.COMMIT_LOCK_NAME),
              IndexWriter.COMMIT_LOCK_TIMEOUT) {
              public Object doBody() throws IOException {
                SegmentInfos infos = new SegmentInfos();
                infos.read(directory);
                if (infos.size() == 1) {          // index is optimized
                  return MySegmentReader.get(infos.info(0));
                } else {
                  IndexReader[] readers = new IndexReader[infos.size()];
                  for (int i = 0; i < infos.size(); i++) {
                      SegmentInfo si = infos.info(i);
                    readers[i] = MySegmentReader.get(si);
                    segments.put(readers[i],si);
                  }
                  return new MyMultiReader(directory,infos,readers);
                }
              }
            }.run();
        }
    }
}



-----Inline Attachment Follows-----

package org.apache.lucene.index;

import java.io.IOException;

import org.apache.lucene.util.BitVector;

public class MySegmentReader extends SegmentReader {
    SegmentInfo si;

    public MySegmentReader() {
    }

    public void reopen() throws IOException {
        if (hasDeletions(si))
            deletedDocs = new BitVector(directory(), si.name + ".del");
    }

    public static SegmentReader get(SegmentInfo si) throws IOException {
        MySegmentReader reader = (MySegmentReader) SegmentReader.get(si);
        reader.si = si;
        return reader;
    }
}


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]




---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]