[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael Busch updated LUCENE-2324:
----------------------------------

    Attachment: lucene-2324.patch

Finally a new version of the patch! (Sorry for keeping you guys waiting...)

It's not done yet, but it compiles (against realtime branch!) and >95% of the core test cases pass.

Work done in addition to last patch:

- Added DocumentsWriterPerThread
- Reimplemented big parts of DocumentsWriter
- Added DocumentsWriterThreadPool which is an extension point for different pool implementation.  The default impl is
  the ThreadAffinityDocumentsWriterThreadPool, which does what the old code did (try to assign a DWPT always to
  the same thread).  It should be easy now to add Document#getSourceID() and another pool that can assign threads
  based on the sourceID.
- Initial implementation of sequenceIDs.  Currently they're only used to keep track of deletes and not for
  e.g. NRT readers yet.
- Lots of other changes here and there.

TODOs:

- Implement flush-by-ram logic
- Implement logic to discard deletes from the deletes buffer
- Finish sequenceID handling: IW#commit() and IW#close() should return ID of last flushed sequenceID
- Maybe change delete logic:  currently deletes are applied when a segment is flushed.  Maybe we can keep it this way
  in the realtime-branch though, because that's most likely what we want to do once the RAM buffer is searchable and
  deletes are cheaper as they can then be done in-memory before flush
- Fix unit tests (mostly exception handling and thread safety)
- New test cases, e.g. for sequenceID testing
- Simplify code:  In some places I copied code around, which can probably be further simplified
- I started removing some of the old setters/getters in IW which are not in IndexWriterConfig - need to finish that,
  or revert those changes and use a different patch
- Fix nocommits
- Performance testing

I'm planning to commit this soon to the realtime branch, even though it's obviously not done yet.  But it's a big
patch and changes will be easier to track with an svn history.

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: 4.0
>
>         Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]