[jira] Created: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

classic Classic list List threaded Threaded
167 messages Options
1234 ... 9
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
---------------------------------------------------------------------------------

         Key: LUCENE-565
         URL: http://issues.apache.org/jira/browse/LUCENE-565
     Project: Lucene - Java
        Type: Bug

  Components: Index  
    Reporter: Ning Li


Today, applications have to open/close an IndexWriter and open/close an
IndexReader directly or indirectly (via IndexModifier) in order to handle a
mix of inserts and deletes. This performs well when inserts and deletes
come in fairly large batches. However, the performance can degrade
dramatically when inserts and deletes are interleaved in small batches.
This is because the ramDirectory is flushed to disk whenever an IndexWriter
is closed, causing a lot of small segments to be created on disk, which
eventually need to be merged.

We would like to propose a small API change to eliminate this problem. We
are aware that this kind change has come up in discusions before. See
http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
. The difference this time is that we have implemented the change and
tested its performance, as described below.

API Changes
-----------
We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
Using this method, inserts and deletes can be interleaved using the same
IndexWriter.

Note that, with this change it would be very easy to add another method to
IndexWriter for updating documents, allowing applications to avoid a
separate delete and insert to update a document.

Also note that this change can co-exist with the existing APIs for deleting
documents using an IndexReader. But if our proposal is accepted, we think
those APIs should probably be deprecated.

Coding Changes
--------------
Coding changes are localized to IndexWriter. Internally, the new
deleteDocuments() method works by buffering the terms to be deleted.
Deletes are deferred until the ramDirectory is flushed to disk, either
because it becomes full or because the IndexWriter is closed. Using Java
synchronization, care is taken to ensure that an interleaved sequence of
inserts and deletes for the same document are properly serialized.

We have attached a modified version of IndexWriter in Release 1.9.1 with
these changes. Only a few hundred lines of coding changes are needed. All
changes are commented by "CHANGE". We have also attached a modified version
of an example from Chapter 2.2 of Lucene in Action.

Performance Results
-------------------
To test the performance our proposed changes, we ran some experiments using
the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
Xeon server running Linux. The disk storage was configured as RAID0 array
with 5 drives. Before indexes were built, the input documents were parsed
to remove the HTML from them (i.e., only the text was indexed). This was
done to minimize the impact of parsing on performance. A simple
WhitespaceAnalyzer was used during index build.

We experimented with three workloads:
  - Insert only. 1.6M documents were inserted and the final
    index size was 2.3GB.
  - Insert/delete (big batches). The same documents were
    inserted, but 25% were deleted. 1000 documents were
    deleted for every 4000 inserted.
  - Insert/delete (small batches). In this case, 5 documents
    were deleted for every 20 inserted.

                                current       current          new
Workload                      IndexWriter  IndexModifier   IndexWriter
-----------------------------------------------------------------------
Insert only                     116 min       119 min        116 min
Insert/delete (big batches)       --          135 min        125 min
Insert/delete (small batches)     --          338 min        134 min

As the experiments show, with the proposed changes, the performance
improved by 60% when inserts and deletes were interleaved in small batches.


Regards,
Ning


Ning Li
Search Technologies
IBM Almaden Research Center
650 Harry Road
San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-565?page=all ]

Ning Li updated LUCENE-565:
---------------------------

    Attachment: IndexWriter.java
                TestWriterDelete.java

> Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
> ---------------------------------------------------------------------------------
>
>          Key: LUCENE-565
>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Reporter: Ning Li
>  Attachments: IndexWriter.java, TestWriterDelete.java
>
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fairly large batches. However, the performance can degrade
> dramatically when inserts and deletes are interleaved in small batches.
> This is because the ramDirectory is flushed to disk whenever an IndexWriter
> is closed, causing a lot of small segments to be created on disk, which
> eventually need to be merged.
> We would like to propose a small API change to eliminate this problem. We
> are aware that this kind change has come up in discusions before. See
> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
> . The difference this time is that we have implemented the change and
> tested its performance, as described below.
> API Changes
> -----------
> We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
> Using this method, inserts and deletes can be interleaved using the same
> IndexWriter.
> Note that, with this change it would be very easy to add another method to
> IndexWriter for updating documents, allowing applications to avoid a
> separate delete and insert to update a document.
> Also note that this change can co-exist with the existing APIs for deleting
> documents using an IndexReader. But if our proposal is accepted, we think
> those APIs should probably be deprecated.
> Coding Changes
> --------------
> Coding changes are localized to IndexWriter. Internally, the new
> deleteDocuments() method works by buffering the terms to be deleted.
> Deletes are deferred until the ramDirectory is flushed to disk, either
> because it becomes full or because the IndexWriter is closed. Using Java
> synchronization, care is taken to ensure that an interleaved sequence of
> inserts and deletes for the same document are properly serialized.
> We have attached a modified version of IndexWriter in Release 1.9.1 with
> these changes. Only a few hundred lines of coding changes are needed. All
> changes are commented by "CHANGE". We have also attached a modified version
> of an example from Chapter 2.2 of Lucene in Action.
> Performance Results
> -------------------
> To test the performance our proposed changes, we ran some experiments using
> the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
> Xeon server running Linux. The disk storage was configured as RAID0 array
> with 5 drives. Before indexes were built, the input documents were parsed
> to remove the HTML from them (i.e., only the text was indexed). This was
> done to minimize the impact of parsing on performance. A simple
> WhitespaceAnalyzer was used during index build.
> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.
>                                 current       current          new
> Workload                      IndexWriter  IndexModifier   IndexWriter
> -----------------------------------------------------------------------
> Insert only                     116 min       119 min        116 min
> Insert/delete (big batches)       --          135 min        125 min
> Insert/delete (small batches)     --          338 min        134 min
> As the experiments show, with the proposed changes, the performance
> improved by 60% when inserts and deletes were interleaved in small batches.
> Regards,
> Ning
> Ning Li
> Search Technologies
> IBM Almaden Research Center
> 650 Harry Road
> San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-565?page=comments#action_12378557 ]

Doug Cutting commented on LUCENE-565:
-------------------------------------

Can you please attach diffs rather than complete files?  The diffs should not not contain CHANGE comments.  To generate diffs, check Lucene out of Subversion, make your changes, then, from the Lucene trunk, run something like 'svn diff > my.patch'.  New files should first be added with 'svn add' so that they're included in the diff.  Thanks!



> Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
> ---------------------------------------------------------------------------------
>
>          Key: LUCENE-565
>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Reporter: Ning Li
>  Attachments: IndexWriter.java, TestWriterDelete.java
>
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fairly large batches. However, the performance can degrade
> dramatically when inserts and deletes are interleaved in small batches.
> This is because the ramDirectory is flushed to disk whenever an IndexWriter
> is closed, causing a lot of small segments to be created on disk, which
> eventually need to be merged.
> We would like to propose a small API change to eliminate this problem. We
> are aware that this kind change has come up in discusions before. See
> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
> . The difference this time is that we have implemented the change and
> tested its performance, as described below.
> API Changes
> -----------
> We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
> Using this method, inserts and deletes can be interleaved using the same
> IndexWriter.
> Note that, with this change it would be very easy to add another method to
> IndexWriter for updating documents, allowing applications to avoid a
> separate delete and insert to update a document.
> Also note that this change can co-exist with the existing APIs for deleting
> documents using an IndexReader. But if our proposal is accepted, we think
> those APIs should probably be deprecated.
> Coding Changes
> --------------
> Coding changes are localized to IndexWriter. Internally, the new
> deleteDocuments() method works by buffering the terms to be deleted.
> Deletes are deferred until the ramDirectory is flushed to disk, either
> because it becomes full or because the IndexWriter is closed. Using Java
> synchronization, care is taken to ensure that an interleaved sequence of
> inserts and deletes for the same document are properly serialized.
> We have attached a modified version of IndexWriter in Release 1.9.1 with
> these changes. Only a few hundred lines of coding changes are needed. All
> changes are commented by "CHANGE". We have also attached a modified version
> of an example from Chapter 2.2 of Lucene in Action.
> Performance Results
> -------------------
> To test the performance our proposed changes, we ran some experiments using
> the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
> Xeon server running Linux. The disk storage was configured as RAID0 array
> with 5 drives. Before indexes were built, the input documents were parsed
> to remove the HTML from them (i.e., only the text was indexed). This was
> done to minimize the impact of parsing on performance. A simple
> WhitespaceAnalyzer was used during index build.
> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.
>                                 current       current          new
> Workload                      IndexWriter  IndexModifier   IndexWriter
> -----------------------------------------------------------------------
> Insert only                     116 min       119 min        116 min
> Insert/delete (big batches)       --          135 min        125 min
> Insert/delete (small batches)     --          338 min        134 min
> As the experiments show, with the proposed changes, the performance
> improved by 60% when inserts and deletes were interleaved in small batches.
> Regards,
> Ning
> Ning Li
> Search Technologies
> IBM Almaden Research Center
> 650 Harry Road
> San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

need HowToContribute on wiki

Doug Cutting
Doug Cutting (JIRA) wrote:
> Can you please attach diffs rather than complete files?  The diffs should not not contain CHANGE comments.  To generate diffs, check Lucene out of Subversion, make your changes, then, from the Lucene trunk, run something like 'svn diff > my.patch'.  New files should first be added with 'svn add' so that they're included in the diff.  Thanks!

Memo to self: Lucene Java should really have a HowToContribute page like
those for Nutch & Hadoop.

http://wiki.apache.org/nutch/HowToContribute
http://wiki.apache.org/lucene-hadoop/HowToContribute

Doug

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
     [ http://issues.apache.org/jira/browse/LUCENE-565?page=all ]

Ning Li updated LUCENE-565:
---------------------------

    Attachment: IndexWriter.patch

Here is the diff file of IndexWriter.java.

> Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
> ---------------------------------------------------------------------------------
>
>          Key: LUCENE-565
>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Reporter: Ning Li
>  Attachments: IndexWriter.java, IndexWriter.patch, TestWriterDelete.java
>
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fairly large batches. However, the performance can degrade
> dramatically when inserts and deletes are interleaved in small batches.
> This is because the ramDirectory is flushed to disk whenever an IndexWriter
> is closed, causing a lot of small segments to be created on disk, which
> eventually need to be merged.
> We would like to propose a small API change to eliminate this problem. We
> are aware that this kind change has come up in discusions before. See
> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
> . The difference this time is that we have implemented the change and
> tested its performance, as described below.
> API Changes
> -----------
> We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
> Using this method, inserts and deletes can be interleaved using the same
> IndexWriter.
> Note that, with this change it would be very easy to add another method to
> IndexWriter for updating documents, allowing applications to avoid a
> separate delete and insert to update a document.
> Also note that this change can co-exist with the existing APIs for deleting
> documents using an IndexReader. But if our proposal is accepted, we think
> those APIs should probably be deprecated.
> Coding Changes
> --------------
> Coding changes are localized to IndexWriter. Internally, the new
> deleteDocuments() method works by buffering the terms to be deleted.
> Deletes are deferred until the ramDirectory is flushed to disk, either
> because it becomes full or because the IndexWriter is closed. Using Java
> synchronization, care is taken to ensure that an interleaved sequence of
> inserts and deletes for the same document are properly serialized.
> We have attached a modified version of IndexWriter in Release 1.9.1 with
> these changes. Only a few hundred lines of coding changes are needed. All
> changes are commented by "CHANGE". We have also attached a modified version
> of an example from Chapter 2.2 of Lucene in Action.
> Performance Results
> -------------------
> To test the performance our proposed changes, we ran some experiments using
> the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
> Xeon server running Linux. The disk storage was configured as RAID0 array
> with 5 drives. Before indexes were built, the input documents were parsed
> to remove the HTML from them (i.e., only the text was indexed). This was
> done to minimize the impact of parsing on performance. A simple
> WhitespaceAnalyzer was used during index build.
> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.
>                                 current       current          new
> Workload                      IndexWriter  IndexModifier   IndexWriter
> -----------------------------------------------------------------------
> Insert only                     116 min       119 min        116 min
> Insert/delete (big batches)       --          135 min        125 min
> Insert/delete (small batches)     --          338 min        134 min
> As the experiments show, with the proposed changes, the performance
> improved by 60% when inserts and deletes were interleaved in small batches.
> Regards,
> Ning
> Ning Li
> Search Technologies
> IBM Almaden Research Center
> 650 Harry Road
> San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-565?page=comments#action_12419396 ]

Otis Gospodnetic commented on LUCENE-565:
-----------------------------------------

I took a look at the patch and it looks good to me (anyone else had a look)?
Unfortunately, I couldn't get the patch to apply :(

$ patch -F3 < IndexWriter.patch
(Stripping trailing CRs from patch.)
patching file IndexWriter.java
Hunk #1 succeeded at 58 with fuzz 1.
Hunk #2 succeeded at 112 (offset 2 lines).
Hunk #4 succeeded at 504 (offset 33 lines).
Hunk #6 succeeded at 605 with fuzz 2 (offset 57 lines).
missing header for unified diff at line 259 of patch
(Stripping trailing CRs from patch.)
can't find file to patch at input line 259
Perhaps you should have used the -p or --strip option?
The text leading up to this was:
...
...
...
File to patch: IndexWriter.java
patching file IndexWriter.java
Hunk #1 FAILED at 802.
Hunk #2 succeeded at 745 with fuzz 2 (offset -131 lines).
1 out of 2 hunks FAILED -- saving rejects to file IndexWriter.java.rej


Would it be possible for you to regenerate the patch against IndexWriter in HEAD?

Also, I noticed ^Ms in the patch, but I can take care of those easily (dos2unix).

Finally, I noticed in 2-3 places that the simple logging via "infoStream" variable was removed, for example:
-    if (infoStream != null) infoStream.print("merging segments");

Perhaps this was just an oversight?

Looking forward to the new patch. Thanks!

> Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
> ---------------------------------------------------------------------------------
>
>          Key: LUCENE-565
>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Reporter: Ning Li
>  Attachments: IndexWriter.java, IndexWriter.patch, TestWriterDelete.java
>
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fairly large batches. However, the performance can degrade
> dramatically when inserts and deletes are interleaved in small batches.
> This is because the ramDirectory is flushed to disk whenever an IndexWriter
> is closed, causing a lot of small segments to be created on disk, which
> eventually need to be merged.
> We would like to propose a small API change to eliminate this problem. We
> are aware that this kind change has come up in discusions before. See
> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
> . The difference this time is that we have implemented the change and
> tested its performance, as described below.
> API Changes
> -----------
> We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
> Using this method, inserts and deletes can be interleaved using the same
> IndexWriter.
> Note that, with this change it would be very easy to add another method to
> IndexWriter for updating documents, allowing applications to avoid a
> separate delete and insert to update a document.
> Also note that this change can co-exist with the existing APIs for deleting
> documents using an IndexReader. But if our proposal is accepted, we think
> those APIs should probably be deprecated.
> Coding Changes
> --------------
> Coding changes are localized to IndexWriter. Internally, the new
> deleteDocuments() method works by buffering the terms to be deleted.
> Deletes are deferred until the ramDirectory is flushed to disk, either
> because it becomes full or because the IndexWriter is closed. Using Java
> synchronization, care is taken to ensure that an interleaved sequence of
> inserts and deletes for the same document are properly serialized.
> We have attached a modified version of IndexWriter in Release 1.9.1 with
> these changes. Only a few hundred lines of coding changes are needed. All
> changes are commented by "CHANGE". We have also attached a modified version
> of an example from Chapter 2.2 of Lucene in Action.
> Performance Results
> -------------------
> To test the performance our proposed changes, we ran some experiments using
> the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
> Xeon server running Linux. The disk storage was configured as RAID0 array
> with 5 drives. Before indexes were built, the input documents were parsed
> to remove the HTML from them (i.e., only the text was indexed). This was
> done to minimize the impact of parsing on performance. A simple
> WhitespaceAnalyzer was used during index build.
> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.
>                                 current       current          new
> Workload                      IndexWriter  IndexModifier   IndexWriter
> -----------------------------------------------------------------------
> Insert only                     116 min       119 min        116 min
> Insert/delete (big batches)       --          135 min        125 min
> Insert/delete (small batches)     --          338 min        134 min
> As the experiments show, with the proposed changes, the performance
> improved by 60% when inserts and deletes were interleaved in small batches.
> Regards,
> Ning
> Ning Li
> Search Technologies
> IBM Almaden Research Center
> 650 Harry Road
> San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Robert Engels
I applied the patch, and made code changes to use it. It did not make  
any appreciable difference in performance over our current code  
(delete using IndexReader and then update the documents using  
IndexWriter - each document has a unique "key").

I attempted to evaluate the code on its own, but must admit that I  
got "lost" a bit.

Maybe if the submitter could provide a "design overview" of why this  
is more efficient, and in what cases it is (and possible degradation  
in others) it would be easier to evaluate.


On Jul 5, 2006, at 10:25 PM, Otis Gospodnetic (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/LUCENE-565?
> page=comments#action_12419396 ]
>
> Otis Gospodnetic commented on LUCENE-565:
> -----------------------------------------
>
> I took a look at the patch and it looks good to me (anyone else had  
> a look)?
> Unfortunately, I couldn't get the patch to apply :(
>
> $ patch -F3 < IndexWriter.patch
> (Stripping trailing CRs from patch.)
> patching file IndexWriter.java
> Hunk #1 succeeded at 58 with fuzz 1.
> Hunk #2 succeeded at 112 (offset 2 lines).
> Hunk #4 succeeded at 504 (offset 33 lines).
> Hunk #6 succeeded at 605 with fuzz 2 (offset 57 lines).
> missing header for unified diff at line 259 of patch
> (Stripping trailing CRs from patch.)
> can't find file to patch at input line 259
> Perhaps you should have used the -p or --strip option?
> The text leading up to this was:
> ...
> ...
> ...
> File to patch: IndexWriter.java
> patching file IndexWriter.java
> Hunk #1 FAILED at 802.
> Hunk #2 succeeded at 745 with fuzz 2 (offset -131 lines).
> 1 out of 2 hunks FAILED -- saving rejects to file IndexWriter.java.rej
>
>
> Would it be possible for you to regenerate the patch against  
> IndexWriter in HEAD?
>
> Also, I noticed ^Ms in the patch, but I can take care of those  
> easily (dos2unix).
>
> Finally, I noticed in 2-3 places that the simple logging via  
> "infoStream" variable was removed, for example:
> -    if (infoStream != null) infoStream.print("merging segments");
>
> Perhaps this was just an oversight?
>
> Looking forward to the new patch. Thanks!
>
>> Supporting deleteDocuments in IndexWriter (Code and Performance  
>> Results Provided)
>> ---------------------------------------------------------------------
>> ------------
>>
>>          Key: LUCENE-565
>>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>>      Project: Lucene - Java
>>         Type: Bug
>
>>   Components: Index
>>     Reporter: Ning Li
>>  Attachments: IndexWriter.java, IndexWriter.patch,  
>> TestWriterDelete.java
>>
>> Today, applications have to open/close an IndexWriter and open/
>> close an
>> IndexReader directly or indirectly (via IndexModifier) in order to  
>> handle a
>> mix of inserts and deletes. This performs well when inserts and  
>> deletes
>> come in fairly large batches. However, the performance can degrade
>> dramatically when inserts and deletes are interleaved in small  
>> batches.
>> This is because the ramDirectory is flushed to disk whenever an  
>> IndexWriter
>> is closed, causing a lot of small segments to be created on disk,  
>> which
>> eventually need to be merged.
>> We would like to propose a small API change to eliminate this  
>> problem. We
>> are aware that this kind change has come up in discusions before. See
>> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?
>> search_string=indexwriter%20delete;#23049
>> . The difference this time is that we have implemented the change and
>> tested its performance, as described below.
>> API Changes
>> -----------
>> We propose adding a "deleteDocuments(Term term)" method to  
>> IndexWriter.
>> Using this method, inserts and deletes can be interleaved using  
>> the same
>> IndexWriter.
>> Note that, with this change it would be very easy to add another  
>> method to
>> IndexWriter for updating documents, allowing applications to avoid a
>> separate delete and insert to update a document.
>> Also note that this change can co-exist with the existing APIs for  
>> deleting
>> documents using an IndexReader. But if our proposal is accepted,  
>> we think
>> those APIs should probably be deprecated.
>> Coding Changes
>> --------------
>> Coding changes are localized to IndexWriter. Internally, the new
>> deleteDocuments() method works by buffering the terms to be deleted.
>> Deletes are deferred until the ramDirectory is flushed to disk,  
>> either
>> because it becomes full or because the IndexWriter is closed.  
>> Using Java
>> synchronization, care is taken to ensure that an interleaved  
>> sequence of
>> inserts and deletes for the same document are properly serialized.
>> We have attached a modified version of IndexWriter in Release  
>> 1.9.1 with
>> these changes. Only a few hundred lines of coding changes are  
>> needed. All
>> changes are commented by "CHANGE". We have also attached a  
>> modified version
>> of an example from Chapter 2.2 of Lucene in Action.
>> Performance Results
>> -------------------
>> To test the performance our proposed changes, we ran some  
>> experiments using
>> the TREC WT 10G dataset. The experiments were run on a dual 2.4  
>> Ghz Intel
>> Xeon server running Linux. The disk storage was configured as  
>> RAID0 array
>> with 5 drives. Before indexes were built, the input documents were  
>> parsed
>> to remove the HTML from them (i.e., only the text was indexed).  
>> This was
>> done to minimize the impact of parsing on performance. A simple
>> WhitespaceAnalyzer was used during index build.
>> We experimented with three workloads:
>>   - Insert only. 1.6M documents were inserted and the final
>>     index size was 2.3GB.
>>   - Insert/delete (big batches). The same documents were
>>     inserted, but 25% were deleted. 1000 documents were
>>     deleted for every 4000 inserted.
>>   - Insert/delete (small batches). In this case, 5 documents
>>     were deleted for every 20 inserted.
>>                                 current       current          new
>> Workload                      IndexWriter  IndexModifier    
>> IndexWriter
>> ---------------------------------------------------------------------
>> --
>> Insert only                     116 min       119 min        116 min
>> Insert/delete (big batches)       --          135 min        125 min
>> Insert/delete (small batches)     --          338 min        134 min
>> As the experiments show, with the proposed changes, the performance
>> improved by 60% when inserts and deletes were interleaved in small  
>> batches.
>> Regards,
>> Ning
>> Ning Li
>> Search Technologies
>> IBM Almaden Research Center
>> 650 Harry Road
>> San Jose, CA 95120
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Otis Gospodnetic-2
Robert, it's better to put your comments in JIRA, where Ning Li is more likely to see them.

As for performance, it looks like the biggest gain is when one has small interleaving add/delete batches.  It sounds like your app doesn't have that and has fewer larger add/delete batches.

I do agree about the complexity there.  I couldn't follow everything either, but saw nothing wrong.  More comments would certainly help.

Otis

----- Original Message ----
From: robert engels <[hidden email]>
To: [hidden email]
Sent: Thursday, July 6, 2006 3:20:02 AM
Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

I applied the patch, and made code changes to use it. It did not make  
any appreciable difference in performance over our current code  
(delete using IndexReader and then update the documents using  
IndexWriter - each document has a unique "key").

I attempted to evaluate the code on its own, but must admit that I  
got "lost" a bit.

Maybe if the submitter could provide a "design overview" of why this  
is more efficient, and in what cases it is (and possible degradation  
in others) it would be easier to evaluate.


On Jul 5, 2006, at 10:25 PM, Otis Gospodnetic (JIRA) wrote:

>     [ http://issues.apache.org/jira/browse/LUCENE-565?
> page=comments#action_12419396 ]
>
> Otis Gospodnetic commented on LUCENE-565:
> -----------------------------------------
>
> I took a look at the patch and it looks good to me (anyone else had  
> a look)?
> Unfortunately, I couldn't get the patch to apply :(
>
> $ patch -F3 < IndexWriter.patch
> (Stripping trailing CRs from patch.)
> patching file IndexWriter.java
> Hunk #1 succeeded at 58 with fuzz 1.
> Hunk #2 succeeded at 112 (offset 2 lines).
> Hunk #4 succeeded at 504 (offset 33 lines).
> Hunk #6 succeeded at 605 with fuzz 2 (offset 57 lines).
> missing header for unified diff at line 259 of patch
> (Stripping trailing CRs from patch.)
> can't find file to patch at input line 259
> Perhaps you should have used the -p or --strip option?
> The text leading up to this was:
> ...
> ...
> ...
> File to patch: IndexWriter.java
> patching file IndexWriter.java
> Hunk #1 FAILED at 802.
> Hunk #2 succeeded at 745 with fuzz 2 (offset -131 lines).
> 1 out of 2 hunks FAILED -- saving rejects to file IndexWriter.java.rej
>
>
> Would it be possible for you to regenerate the patch against  
> IndexWriter in HEAD?
>
> Also, I noticed ^Ms in the patch, but I can take care of those  
> easily (dos2unix).
>
> Finally, I noticed in 2-3 places that the simple logging via  
> "infoStream" variable was removed, for example:
> -    if (infoStream != null) infoStream.print("merging segments");
>
> Perhaps this was just an oversight?
>
> Looking forward to the new patch. Thanks!
>
>> Supporting deleteDocuments in IndexWriter (Code and Performance  
>> Results Provided)
>> ---------------------------------------------------------------------
>> ------------
>>
>>          Key: LUCENE-565
>>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>>      Project: Lucene - Java
>>         Type: Bug
>
>>   Components: Index
>>     Reporter: Ning Li
>>  Attachments: IndexWriter.java, IndexWriter.patch,  
>> TestWriterDelete.java
>>
>> Today, applications have to open/close an IndexWriter and open/
>> close an
>> IndexReader directly or indirectly (via IndexModifier) in order to  
>> handle a
>> mix of inserts and deletes. This performs well when inserts and  
>> deletes
>> come in fairly large batches. However, the performance can degrade
>> dramatically when inserts and deletes are interleaved in small  
>> batches.
>> This is because the ramDirectory is flushed to disk whenever an  
>> IndexWriter
>> is closed, causing a lot of small segments to be created on disk,  
>> which
>> eventually need to be merged.
>> We would like to propose a small API change to eliminate this  
>> problem. We
>> are aware that this kind change has come up in discusions before. See
>> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?
>> search_string=indexwriter%20delete;#23049
>> . The difference this time is that we have implemented the change and
>> tested its performance, as described below.
>> API Changes
>> -----------
>> We propose adding a "deleteDocuments(Term term)" method to  
>> IndexWriter.
>> Using this method, inserts and deletes can be interleaved using  
>> the same
>> IndexWriter.
>> Note that, with this change it would be very easy to add another  
>> method to
>> IndexWriter for updating documents, allowing applications to avoid a
>> separate delete and insert to update a document.
>> Also note that this change can co-exist with the existing APIs for  
>> deleting
>> documents using an IndexReader. But if our proposal is accepted,  
>> we think
>> those APIs should probably be deprecated.
>> Coding Changes
>> --------------
>> Coding changes are localized to IndexWriter. Internally, the new
>> deleteDocuments() method works by buffering the terms to be deleted.
>> Deletes are deferred until the ramDirectory is flushed to disk,  
>> either
>> because it becomes full or because the IndexWriter is closed.  
>> Using Java
>> synchronization, care is taken to ensure that an interleaved  
>> sequence of
>> inserts and deletes for the same document are properly serialized.
>> We have attached a modified version of IndexWriter in Release  
>> 1.9.1 with
>> these changes. Only a few hundred lines of coding changes are  
>> needed. All
>> changes are commented by "CHANGE". We have also attached a  
>> modified version
>> of an example from Chapter 2.2 of Lucene in Action.
>> Performance Results
>> -------------------
>> To test the performance our proposed changes, we ran some  
>> experiments using
>> the TREC WT 10G dataset. The experiments were run on a dual 2.4  
>> Ghz Intel
>> Xeon server running Linux. The disk storage was configured as  
>> RAID0 array
>> with 5 drives. Before indexes were built, the input documents were  
>> parsed
>> to remove the HTML from them (i.e., only the text was indexed).  
>> This was
>> done to minimize the impact of parsing on performance. A simple
>> WhitespaceAnalyzer was used during index build.
>> We experimented with three workloads:
>>   - Insert only. 1.6M documents were inserted and the final
>>     index size was 2.3GB.
>>   - Insert/delete (big batches). The same documents were
>>     inserted, but 25% were deleted. 1000 documents were
>>     deleted for every 4000 inserted.
>>   - Insert/delete (small batches). In this case, 5 documents
>>     were deleted for every 20 inserted.
>>                                 current       current          new
>> Workload                      IndexWriter  IndexModifier    
>> IndexWriter
>> ---------------------------------------------------------------------
>> --
>> Insert only                     116 min       119 min        116 min
>> Insert/delete (big batches)       --          135 min        125 min
>> Insert/delete (small batches)     --          338 min        134 min
>> As the experiments show, with the proposed changes, the performance
>> improved by 60% when inserts and deletes were interleaved in small  
>> batches.
>> Regards,
>> Ning
>> Ning Li
>> Search Technologies
>> IBM Almaden Research Center
>> 650 Harry Road
>> San Jose, CA 95120
>
> --
> This message is automatically generated by JIRA.
> -
> If you think it was sent incorrectly contact one of the  
> administrators:
>    http://issues.apache.org/jira/secure/Administrators.jspa
> -
> For more information on JIRA, see:
>    http://www.atlassian.com/software/jira
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Robert Engels
I don't like "mucking up" JIRA with "commentary:. I thought emails  
were more approproate, and then update JIRA with more pertinent info.

Anyway, my test did exercise the small batches, in that in our  
incremental updates we delete the documents with the unique term, and  
then add the new (which is what I assumed this was improving), and I  
saw o appreciable difference.

I think a design overview for something as involved as this would be  
very beneficial - I know the submitter references a previous bug/
email but the provided implementation doesn't seem to match up with  
that - at least that I could tell.

It appears that maybe??? the performance gain is only realized when  
then newly submitted documents are previously submitted within the  
same update???

Maybe a test case that demonstrated the performance improvements?


On Jul 6, 2006, at 3:03 AM, Otis Gospodnetic wrote:

> Robert, it's better to put your comments in JIRA, where Ning Li is  
> more likely to see them.
>
> As for performance, it looks like the biggest gain is when one has  
> small interleaving add/delete batches.  It sounds like your app  
> doesn't have that and has fewer larger add/delete batches.
>
> I do agree about the complexity there.  I couldn't follow  
> everything either, but saw nothing wrong.  More comments would  
> certainly help.
>
> Otis
>
> ----- Original Message ----
> From: robert engels <[hidden email]>
> To: [hidden email]
> Sent: Thursday, July 6, 2006 3:20:02 AM
> Subject: Re: [jira] Commented: (LUCENE-565) Supporting  
> deleteDocuments in IndexWriter (Code and Performance Results Provided)
>
> I applied the patch, and made code changes to use it. It did not make
> any appreciable difference in performance over our current code
> (delete using IndexReader and then update the documents using
> IndexWriter - each document has a unique "key").
>
> I attempted to evaluate the code on its own, but must admit that I
> got "lost" a bit.
>
> Maybe if the submitter could provide a "design overview" of why this
> is more efficient, and in what cases it is (and possible degradation
> in others) it would be easier to evaluate.
>
>
> On Jul 5, 2006, at 10:25 PM, Otis Gospodnetic (JIRA) wrote:
>
>>     [ http://issues.apache.org/jira/browse/LUCENE-565?
>> page=comments#action_12419396 ]
>>
>> Otis Gospodnetic commented on LUCENE-565:
>> -----------------------------------------
>>
>> I took a look at the patch and it looks good to me (anyone else had
>> a look)?
>> Unfortunately, I couldn't get the patch to apply :(
>>
>> $ patch -F3 < IndexWriter.patch
>> (Stripping trailing CRs from patch.)
>> patching file IndexWriter.java
>> Hunk #1 succeeded at 58 with fuzz 1.
>> Hunk #2 succeeded at 112 (offset 2 lines).
>> Hunk #4 succeeded at 504 (offset 33 lines).
>> Hunk #6 succeeded at 605 with fuzz 2 (offset 57 lines).
>> missing header for unified diff at line 259 of patch
>> (Stripping trailing CRs from patch.)
>> can't find file to patch at input line 259
>> Perhaps you should have used the -p or --strip option?
>> The text leading up to this was:
>> ...
>> ...
>> ...
>> File to patch: IndexWriter.java
>> patching file IndexWriter.java
>> Hunk #1 FAILED at 802.
>> Hunk #2 succeeded at 745 with fuzz 2 (offset -131 lines).
>> 1 out of 2 hunks FAILED -- saving rejects to file  
>> IndexWriter.java.rej
>>
>>
>> Would it be possible for you to regenerate the patch against
>> IndexWriter in HEAD?
>>
>> Also, I noticed ^Ms in the patch, but I can take care of those
>> easily (dos2unix).
>>
>> Finally, I noticed in 2-3 places that the simple logging via
>> "infoStream" variable was removed, for example:
>> -    if (infoStream != null) infoStream.print("merging segments");
>>
>> Perhaps this was just an oversight?
>>
>> Looking forward to the new patch. Thanks!
>>
>>> Supporting deleteDocuments in IndexWriter (Code and Performance
>>> Results Provided)
>>> --------------------------------------------------------------------
>>> -
>>> ------------
>>>
>>>          Key: LUCENE-565
>>>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>>>      Project: Lucene - Java
>>>         Type: Bug
>>
>>>   Components: Index
>>>     Reporter: Ning Li
>>>  Attachments: IndexWriter.java, IndexWriter.patch,
>>> TestWriterDelete.java
>>>
>>> Today, applications have to open/close an IndexWriter and open/
>>> close an
>>> IndexReader directly or indirectly (via IndexModifier) in order to
>>> handle a
>>> mix of inserts and deletes. This performs well when inserts and
>>> deletes
>>> come in fairly large batches. However, the performance can degrade
>>> dramatically when inserts and deletes are interleaved in small
>>> batches.
>>> This is because the ramDirectory is flushed to disk whenever an
>>> IndexWriter
>>> is closed, causing a lot of small segments to be created on disk,
>>> which
>>> eventually need to be merged.
>>> We would like to propose a small API change to eliminate this
>>> problem. We
>>> are aware that this kind change has come up in discusions before.  
>>> See
>>> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?
>>> search_string=indexwriter%20delete;#23049
>>> . The difference this time is that we have implemented the change  
>>> and
>>> tested its performance, as described below.
>>> API Changes
>>> -----------
>>> We propose adding a "deleteDocuments(Term term)" method to
>>> IndexWriter.
>>> Using this method, inserts and deletes can be interleaved using
>>> the same
>>> IndexWriter.
>>> Note that, with this change it would be very easy to add another
>>> method to
>>> IndexWriter for updating documents, allowing applications to avoid a
>>> separate delete and insert to update a document.
>>> Also note that this change can co-exist with the existing APIs for
>>> deleting
>>> documents using an IndexReader. But if our proposal is accepted,
>>> we think
>>> those APIs should probably be deprecated.
>>> Coding Changes
>>> --------------
>>> Coding changes are localized to IndexWriter. Internally, the new
>>> deleteDocuments() method works by buffering the terms to be deleted.
>>> Deletes are deferred until the ramDirectory is flushed to disk,
>>> either
>>> because it becomes full or because the IndexWriter is closed.
>>> Using Java
>>> synchronization, care is taken to ensure that an interleaved
>>> sequence of
>>> inserts and deletes for the same document are properly serialized.
>>> We have attached a modified version of IndexWriter in Release
>>> 1.9.1 with
>>> these changes. Only a few hundred lines of coding changes are
>>> needed. All
>>> changes are commented by "CHANGE". We have also attached a
>>> modified version
>>> of an example from Chapter 2.2 of Lucene in Action.
>>> Performance Results
>>> -------------------
>>> To test the performance our proposed changes, we ran some
>>> experiments using
>>> the TREC WT 10G dataset. The experiments were run on a dual 2.4
>>> Ghz Intel
>>> Xeon server running Linux. The disk storage was configured as
>>> RAID0 array
>>> with 5 drives. Before indexes were built, the input documents were
>>> parsed
>>> to remove the HTML from them (i.e., only the text was indexed).
>>> This was
>>> done to minimize the impact of parsing on performance. A simple
>>> WhitespaceAnalyzer was used during index build.
>>> We experimented with three workloads:
>>>   - Insert only. 1.6M documents were inserted and the final
>>>     index size was 2.3GB.
>>>   - Insert/delete (big batches). The same documents were
>>>     inserted, but 25% were deleted. 1000 documents were
>>>     deleted for every 4000 inserted.
>>>   - Insert/delete (small batches). In this case, 5 documents
>>>     were deleted for every 20 inserted.
>>>                                 current       current          new
>>> Workload                      IndexWriter  IndexModifier
>>> IndexWriter
>>> --------------------------------------------------------------------
>>> -
>>> --
>>> Insert only                     116 min       119 min        116 min
>>> Insert/delete (big batches)       --          135 min        125 min
>>> Insert/delete (small batches)     --          338 min        134 min
>>> As the experiments show, with the proposed changes, the performance
>>> improved by 60% when inserts and deletes were interleaved in small
>>> batches.
>>> Regards,
>>> Ning
>>> Ning Li
>>> Search Technologies
>>> IBM Almaden Research Center
>>> 650 Harry Road
>>> San Jose, CA 95120
>>
>> --
>> This message is automatically generated by JIRA.
>> -
>> If you think it was sent incorrectly contact one of the
>> administrators:
>>    http://issues.apache.org/jira/secure/Administrators.jspa
>> -
>> For more information on JIRA, see:
>>    http://www.atlassian.com/software/jira
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Yonik Seeley
In reply to this post by Robert Engels
On 7/6/06, robert engels <[hidden email]> wrote:
> I applied the patch, and made code changes to use it. It did not make
> any appreciable difference in performance over our current code

Yes, I'm not sure the performance comparisons made in the patch
description are apples-to-apples.  And hopefully the test wasn't
against IndexModifier, which isn't built for speed at all.

When one interleaves adds and deletes, it isn't the case that
indexreaders and indexwriters need to be opened and closed each
interleave.

> (delete using IndexReader and then update the documents using
> IndexWriter - each document has a unique "key").
>
> I attempted to evaluate the code on its own, but must admit that I
> got "lost" a bit.

I was left wondering if the extensive changes to IndexWriter were
worth it, or if it was best left to something at a higher level (like
a better IndexModifier, or something like what Solr does).

-Yonik

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-565?page=comments#action_12419580 ]

Ning Li commented on LUCENE-565:
--------------------------------

For an overview of my changes, I'll repeat some of what I said in
my earlier e-mail (see http://www.gossamer-threads.com/lists/lucene/java-dev/35143),
then add more detail about specific coding changes.

Overview
--------
Today, applications have to open/close an IndexWriter and
open/close an IndexReader directly or indirectly (via IndexModifier)
in order to handle a mix of inserts and deletes. This performs well
when inserts and deletes come in fairly large batches. However, the
performance can degrade dramatically when inserts and deletes are
interleaved in small batches. This is because the ramDirectory is
flushed to disk whenever an IndexWriter is closed, causing a lot of
small segments to be created on disk, which eventually need to be
merged.

API Changes
-----------
We propose adding a "deleteDocuments(Term term)" method to
IndexWriter. Using this method, inserts and deletes can be
interleaved using the same IndexWriter.

Coding Changes
--------------
Coding changes are localized to IndexWriter. Internally, the new
deleteDocuments() method works by buffering the terms to be deleted.
Deletes are deferred until the ramDirectory is flushed to disk,
either because it becomes full or because the IndexWriter is closed.
Using Java synchronization, care is taken to ensure that an
interleaved sequence of inserts and deletes for the same document
are properly serialized.

For simplicity of explanation, let's assume the index resides in a
disk-based directory.

Changes to the IndexWriter variables:
  - segmentInfos used to store the info of all segments (on disk
    or in ram). Now it only stores the info of segments on disk.
  - ramSegmentInfos is a new variable which stores the info of just
    ram segments.
  - bufferedDeleteTerms is a new variable which buffers delete terms
    before they are applied.
  - maxBufferedDeleteTerms is similar to maxBufferedDocs. It controls
    the max number of delete terms that can be buffered before they
    must be flushed to disk.

Changes to IndexWriter methods:
  - addDocument()
    The info of the new ram segment is added to ramSegmentInfos.
  - deleteDocuments(), batchDeleteDocuments()
    The terms are added to bufferedDeleteTerms. bufferedDeleteTerms
    also records the current number of documents buffered in ram, so
    the delete terms can be applied to ram segments as well as
    the segments on disk.
  - flushRamSegments()
    Step 1: Apply buffered delete terms to all the segments on disk.
    Step 2: Merge all the ram segments into one segment on disk.
    Step 3: Apply buffered delete terms to the new segment appropriately,
            so that a delete term is only applied to the documents
            buffered before it, but not to those buffered after it.
    Step 4: Clean up and commit the change to the index (both the new
            segment and the .del files if it applies).
  - maybeMergeSegments()
    Before, a flush would be triggered only if enough documents were
    buffered. Now a flush is triggered if enough documents are
    buffered OR if enough delete terms are buffered.


> Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
> ---------------------------------------------------------------------------------
>
>          Key: LUCENE-565
>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Reporter: Ning Li
>  Attachments: IndexWriter.java, IndexWriter.patch, TestWriterDelete.java
>
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fairly large batches. However, the performance can degrade
> dramatically when inserts and deletes are interleaved in small batches.
> This is because the ramDirectory is flushed to disk whenever an IndexWriter
> is closed, causing a lot of small segments to be created on disk, which
> eventually need to be merged.
> We would like to propose a small API change to eliminate this problem. We
> are aware that this kind change has come up in discusions before. See
> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
> . The difference this time is that we have implemented the change and
> tested its performance, as described below.
> API Changes
> -----------
> We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
> Using this method, inserts and deletes can be interleaved using the same
> IndexWriter.
> Note that, with this change it would be very easy to add another method to
> IndexWriter for updating documents, allowing applications to avoid a
> separate delete and insert to update a document.
> Also note that this change can co-exist with the existing APIs for deleting
> documents using an IndexReader. But if our proposal is accepted, we think
> those APIs should probably be deprecated.
> Coding Changes
> --------------
> Coding changes are localized to IndexWriter. Internally, the new
> deleteDocuments() method works by buffering the terms to be deleted.
> Deletes are deferred until the ramDirectory is flushed to disk, either
> because it becomes full or because the IndexWriter is closed. Using Java
> synchronization, care is taken to ensure that an interleaved sequence of
> inserts and deletes for the same document are properly serialized.
> We have attached a modified version of IndexWriter in Release 1.9.1 with
> these changes. Only a few hundred lines of coding changes are needed. All
> changes are commented by "CHANGE". We have also attached a modified version
> of an example from Chapter 2.2 of Lucene in Action.
> Performance Results
> -------------------
> To test the performance our proposed changes, we ran some experiments using
> the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
> Xeon server running Linux. The disk storage was configured as RAID0 array
> with 5 drives. Before indexes were built, the input documents were parsed
> to remove the HTML from them (i.e., only the text was indexed). This was
> done to minimize the impact of parsing on performance. A simple
> WhitespaceAnalyzer was used during index build.
> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.
>                                 current       current          new
> Workload                      IndexWriter  IndexModifier   IndexWriter
> -----------------------------------------------------------------------
> Insert only                     116 min       119 min        116 min
> Insert/delete (big batches)       --          135 min        125 min
> Insert/delete (small batches)     --          338 min        134 min
> As the experiments show, with the proposed changes, the performance
> improved by 60% when inserts and deletes were interleaved in small batches.
> Regards,
> Ning
> Ning Li
> Search Technologies
> IBM Almaden Research Center
> 650 Harry Road
> San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Ning Li-2
In reply to this post by Robert Engels
Hi Otis and Robert,

I added an overview of my changes in JIRA. Hope that helps.

> Anyway, my test did exercise the small batches, in that in our
> incremental updates we delete the documents with the unique term, and
> then add the new (which is what I assumed this was improving), and I
> saw o appreciable difference.

Robert, could you describe a bit more how your test is set up? Or a short
code snippet will help me explain.

Without the patch, when inserts and deletes are interleaved in small
batches, the performance can degrade dramatically because the ramDirectory
is flushed to disk whenever an IndexWriter is closed, causing a lot of
small segments to be created on disk, which eventually need to be merged.

Is this how your test is set up? And, what are the maxBufferedDocs and the
maxBufferedDeleteTerms in your test? You won't see a performance
improvement
if they are about the same as the small batch size. The patch works by
internally buffering inserts and deletes into larger batches.

Regards,
Ning


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Ning Li-2
In reply to this post by Yonik Seeley
Hi Yonik,

> When one interleaves adds and deletes, it isn't the case that
> indexreaders and indexwriters need to be opened and closed each
> interleave.

I'm not sure I understand this. Could you elaborate?

I thought IndexWriter acquires the write lock and holds it until
it is done. This will prevent IndexReader from making real changes
to the index...

Regards,
Ning


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Robert Engels
In reply to this post by Ning Li-2
I guess we just chose a much simpler way to do this...

Even with you code changes, to see the modification made using the  
IndexWriter, it must be closed, and a new IndexReader opened.

So a far simpler way is to get the collection of updates first, then

using opened indexreader,
for each doc in collection
       delete document using "key"
endfor

open indexwriter
for each doc in collection
       add document
endfor

open indexreader


I don't see how your way is any faster. You must always flush to disk  
and open the indexreader to see the changes.



On Jul 6, 2006, at 2:07 PM, Ning Li wrote:

> Hi Otis and Robert,
>
> I added an overview of my changes in JIRA. Hope that helps.
>
>> Anyway, my test did exercise the small batches, in that in our
>> incremental updates we delete the documents with the unique term, and
>> then add the new (which is what I assumed this was improving), and I
>> saw o appreciable difference.
>
> Robert, could you describe a bit more how your test is set up? Or a  
> short
> code snippet will help me explain.
>
> Without the patch, when inserts and deletes are interleaved in small
> batches, the performance can degrade dramatically because the  
> ramDirectory
> is flushed to disk whenever an IndexWriter is closed, causing a lot of
> small segments to be created on disk, which eventually need to be  
> merged.
>
> Is this how your test is set up? And, what are the maxBufferedDocs  
> and the
> maxBufferedDeleteTerms in your test? You won't see a performance
> improvement
> if they are about the same as the small batch size. The patch works by
> internally buffering inserts and deletes into larger batches.
>
> Regards,
> Ning
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Chuck Williams-2
robert engels wrote on 07/06/2006 12:24 PM:

> I guess we just chose a much simpler way to do this...
>
> Even with you code changes, to see the modification made using the
> IndexWriter, it must be closed, and a new IndexReader opened.
>
> So a far simpler way is to get the collection of updates first, then
>
> using opened indexreader,
> for each doc in collection
>       delete document using "key"
> endfor
>
> open indexwriter
> for each doc in collection
>       add document
> endfor
>
> open indexreader
>
>
> I don't see how your way is any faster. You must always flush to disk
> and open the indexreader to see the changes.

With the patch you can have ongoing writes and deletes happening
asynchronously with reads and searches.  Reopening the IndexReader to
refresh its view is an independent decision.  The IndexWriter need never
be closed.

Without the patch, you have to close the IndexWriter to do any deletes.
If the requirements of your app prohibit batching updates for very long,
this could be a frequent occurrence.

So, it seems to me the patch has benefit for apps that do frequent
updates and need reasonably quick access to those changes.

Bulk updates however require yet another approach.  Sorry to change
topics here, but I'm wondering if there was a final decision on the
question of java 1.5 in the core.  If I submitted a bulk update
capability that required java 1.5, would it be eligible for inclusion in
the core or not?

Chuck


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Ning Li-2
In reply to this post by Robert Engels
> Even with you code changes, to see the modification made using the
> IndexWriter, it must be closed, and a new IndexReader opened.

That behaviour remains the same.


> So a far simpler way is to get the collection of updates first, then
> using opened indexreader,
> for each doc in collection
>        delete document using "key"
> endfor
> open indexwriter
> for each doc in collection
>        add document
> endfor
> open indexreader

So, you are buffering the updates into large batches. This patch
improves performance for small batches.

There are several advantages in supporting deletes in IndexWriter:

1 Applications don't have to worry about how each of them should buffer
  inserts/deletes into large batches. IndexWriter takes care of that.
2 deleteDocuments(Term)/batchDeleteDocuments(Terms[]) supported by
  IndexWriter will be as general as deleteDocuments(Term) supported by
  IndexReader. No concept of a "key" is necessary.
3 If an application reopens the index after your batched deletes but
  before your batched inserts, some previously available documents will
  "disappear" (see
http://www.gossamer-threads.com/lists/lucene/java-dev/23045?nohighlight=1#23045).
  Supporting deletes in IndexWriter will eliminate this problem.
4 When IndexWriter supports deletes, a concurrent merge thread is
  possible and makes sense. A concurrent merge thread means having a
  separate thread dedicated to merging segments. Today, when a merge
  of large segments (or a cascade of merges) is started, no documents
  can be inserted before the merge(s) finish(es). A concurrent merge
  thread will eliminate this problem. In addition, on a machine with
  sufficient CPU resources, this will improve the insert/delete
  performance not only for small insert/delete batches, but also for
  large batches. I have coded this and experiments have verified the
  claims. I will make it available if people are interested.


Regards,
Ning


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Otis Gospodnetic-2
In reply to this post by Robert Engels
I think that patch is for a different scenario, the one where you can't wait to batch deletes and adds, and want/need to execute them more frequently and in order they really are happening, without grouping them.

Otis

----- Original Message ----
From: robert engels <[hidden email]>
To: [hidden email]
Sent: Thursday, July 6, 2006 3:24:13 PM
Subject: Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

I guess we just chose a much simpler way to do this...

Even with you code changes, to see the modification made using the  
IndexWriter, it must be closed, and a new IndexReader opened.

So a far simpler way is to get the collection of updates first, then

using opened indexreader,
for each doc in collection
       delete document using "key"
endfor

open indexwriter
for each doc in collection
       add document
endfor

open indexreader


I don't see how your way is any faster. You must always flush to disk  
and open the indexreader to see the changes.



On Jul 6, 2006, at 2:07 PM, Ning Li wrote:

> Hi Otis and Robert,
>
> I added an overview of my changes in JIRA. Hope that helps.
>
>> Anyway, my test did exercise the small batches, in that in our
>> incremental updates we delete the documents with the unique term, and
>> then add the new (which is what I assumed this was improving), and I
>> saw o appreciable difference.
>
> Robert, could you describe a bit more how your test is set up? Or a  
> short
> code snippet will help me explain.
>
> Without the patch, when inserts and deletes are interleaved in small
> batches, the performance can degrade dramatically because the  
> ramDirectory
> is flushed to disk whenever an IndexWriter is closed, causing a lot of
> small segments to be created on disk, which eventually need to be  
> merged.
>
> Is this how your test is set up? And, what are the maxBufferedDocs  
> and the
> maxBufferedDeleteTerms in your test? You won't see a performance
> improvement
> if they are about the same as the small batch size. The patch works by
> internally buffering inserts and deletes into larger batches.
>
> Regards,
> Ning
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Nick Burch (Jira)
In reply to this post by Nick Burch (Jira)
    [ http://issues.apache.org/jira/browse/LUCENE-565?page=comments#action_12419605 ]

Otis Gospodnetic commented on LUCENE-565:
-----------------------------------------

Thanks for all the information about coding changes, that makes it easier to understand the diff.
Ideally this will become comments in the new diff, which I can commit.

Yonik mentioned this in email.  It does sound like a better place for this might be in a higher level class.  IndexWriter would really not be just a writer/appender once delete functionality is added to it, even if it's the IndexReaders behind the scenes doing the work.  So if you are going to be redoing the patch, consider this.

Perhaps IndexModifier methods should be deprecated and it should get a new/your API?


> Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)
> ---------------------------------------------------------------------------------
>
>          Key: LUCENE-565
>          URL: http://issues.apache.org/jira/browse/LUCENE-565
>      Project: Lucene - Java
>         Type: Bug

>   Components: Index
>     Reporter: Ning Li
>  Attachments: IndexWriter.java, IndexWriter.patch, TestWriterDelete.java
>
> Today, applications have to open/close an IndexWriter and open/close an
> IndexReader directly or indirectly (via IndexModifier) in order to handle a
> mix of inserts and deletes. This performs well when inserts and deletes
> come in fairly large batches. However, the performance can degrade
> dramatically when inserts and deletes are interleaved in small batches.
> This is because the ramDirectory is flushed to disk whenever an IndexWriter
> is closed, causing a lot of small segments to be created on disk, which
> eventually need to be merged.
> We would like to propose a small API change to eliminate this problem. We
> are aware that this kind change has come up in discusions before. See
> http://www.gossamer-threads.com/lists/lucene/java-dev/23049?search_string=indexwriter%20delete;#23049
> . The difference this time is that we have implemented the change and
> tested its performance, as described below.
> API Changes
> -----------
> We propose adding a "deleteDocuments(Term term)" method to IndexWriter.
> Using this method, inserts and deletes can be interleaved using the same
> IndexWriter.
> Note that, with this change it would be very easy to add another method to
> IndexWriter for updating documents, allowing applications to avoid a
> separate delete and insert to update a document.
> Also note that this change can co-exist with the existing APIs for deleting
> documents using an IndexReader. But if our proposal is accepted, we think
> those APIs should probably be deprecated.
> Coding Changes
> --------------
> Coding changes are localized to IndexWriter. Internally, the new
> deleteDocuments() method works by buffering the terms to be deleted.
> Deletes are deferred until the ramDirectory is flushed to disk, either
> because it becomes full or because the IndexWriter is closed. Using Java
> synchronization, care is taken to ensure that an interleaved sequence of
> inserts and deletes for the same document are properly serialized.
> We have attached a modified version of IndexWriter in Release 1.9.1 with
> these changes. Only a few hundred lines of coding changes are needed. All
> changes are commented by "CHANGE". We have also attached a modified version
> of an example from Chapter 2.2 of Lucene in Action.
> Performance Results
> -------------------
> To test the performance our proposed changes, we ran some experiments using
> the TREC WT 10G dataset. The experiments were run on a dual 2.4 Ghz Intel
> Xeon server running Linux. The disk storage was configured as RAID0 array
> with 5 drives. Before indexes were built, the input documents were parsed
> to remove the HTML from them (i.e., only the text was indexed). This was
> done to minimize the impact of parsing on performance. A simple
> WhitespaceAnalyzer was used during index build.
> We experimented with three workloads:
>   - Insert only. 1.6M documents were inserted and the final
>     index size was 2.3GB.
>   - Insert/delete (big batches). The same documents were
>     inserted, but 25% were deleted. 1000 documents were
>     deleted for every 4000 inserted.
>   - Insert/delete (small batches). In this case, 5 documents
>     were deleted for every 20 inserted.
>                                 current       current          new
> Workload                      IndexWriter  IndexModifier   IndexWriter
> -----------------------------------------------------------------------
> Insert only                     116 min       119 min        116 min
> Insert/delete (big batches)       --          135 min        125 min
> Insert/delete (small batches)     --          338 min        134 min
> As the experiments show, with the proposed changes, the performance
> improved by 60% when inserts and deletes were interleaved in small batches.
> Regards,
> Ning
> Ning Li
> Search Technologies
> IBM Almaden Research Center
> 650 Harry Road
> San Jose, CA 95120

--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
   http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
   http://www.atlassian.com/software/jira


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Robert Engels
In reply to this post by Otis Gospodnetic-2
I guess I don't see the difference...

You need the write lock to use the indexWriter, and you also need the  
write lock to perform a deletion, so if you just get the write lock  
you can perform the deletion and the add, then close the writer.

I have asked how this submission optimizes anything, and I still  
can't seem to get an answer?


On Jul 6, 2006, at 4:27 PM, Otis Gospodnetic wrote:

> I think that patch is for a different scenario, the one where you  
> can't wait to batch deletes and adds, and want/need to execute them  
> more frequently and in order they really are happening, without  
> grouping them.
>
> Otis
>
> ----- Original Message ----
> From: robert engels <[hidden email]>
> To: [hidden email]
> Sent: Thursday, July 6, 2006 3:24:13 PM
> Subject: Re: [jira] Commented: (LUCENE-565) Supporting  
> deleteDocuments in IndexWriter (Code and Performance Results Provided)
>
> I guess we just chose a much simpler way to do this...
>
> Even with you code changes, to see the modification made using the
> IndexWriter, it must be closed, and a new IndexReader opened.
>
> So a far simpler way is to get the collection of updates first, then
>
> using opened indexreader,
> for each doc in collection
>        delete document using "key"
> endfor
>
> open indexwriter
> for each doc in collection
>        add document
> endfor
>
> open indexreader
>
>
> I don't see how your way is any faster. You must always flush to disk
> and open the indexreader to see the changes.
>
>
>
> On Jul 6, 2006, at 2:07 PM, Ning Li wrote:
>
>> Hi Otis and Robert,
>>
>> I added an overview of my changes in JIRA. Hope that helps.
>>
>>> Anyway, my test did exercise the small batches, in that in our
>>> incremental updates we delete the documents with the unique term,  
>>> and
>>> then add the new (which is what I assumed this was improving), and I
>>> saw o appreciable difference.
>>
>> Robert, could you describe a bit more how your test is set up? Or a
>> short
>> code snippet will help me explain.
>>
>> Without the patch, when inserts and deletes are interleaved in small
>> batches, the performance can degrade dramatically because the
>> ramDirectory
>> is flushed to disk whenever an IndexWriter is closed, causing a  
>> lot of
>> small segments to be created on disk, which eventually need to be
>> merged.
>>
>> Is this how your test is set up? And, what are the maxBufferedDocs
>> and the
>> maxBufferedDeleteTerms in your test? You won't see a performance
>> improvement
>> if they are about the same as the small batch size. The patch  
>> works by
>> internally buffering inserts and deletes into larger batches.
>>
>> Regards,
>> Ning
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: [hidden email]
>> For additional commands, e-mail: [hidden email]
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: [jira] Commented: (LUCENE-565) Supporting deleteDocuments in IndexWriter (Code and Performance Results Provided)

Ning Li-2
In reply to this post by Nick Burch (Jira)
> Yonik mentioned this in email.  It does sound like a better place for
> this might be in a higher level class.  IndexWriter would really not
> be just a writer/appender once delete functionality is added to it,
> even if it's the IndexReaders behind the scenes doing the work.  So
> if you are going to be redoing the patch, consider this.

Interesting. I thought a writer should support both inserts and deletes.

> Perhaps IndexModifier methods should be deprecated and it should get
> a new/your API?

Do you mention putting the patch in IndexModifier? I'm fine with that.
Is it ok if the new IndexModifier extends IndexWriter, since they do
share most of the APIs and implementations?


Regards,
Ning


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

1234 ... 9