[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2324) Per thread DocumentsWriters that write their own private segments

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12891228#action_12891228 ]

Michael Busch commented on LUCENE-2324:
---------------------------------------

Thanks, Mike - great feedback! (as always)

{quote}
I still see usage of docStoreOffset, but aren't we doing away with
shared doc stores with the cutover to DWPT?
{quote}

Do we want all segments that one DWPT writes to share the same
doc store, i.e. one doc store per DWPT, or remove doc stores
entirely?


{quote}
I think you can further simplify DocumentsWriterPerThread.DocWriter;
in fact I think you can remove it & all subclasses in consumers!
{quote}

I agree!  Now that a high number of testcases pass it's less scary
to modify even more code :)  - will do this next.


{quote}
Also, we don't need separate closeDocStore; it should just be closed
during flush.
{quote}

OK sounds good.


{quote}
I like the ThreadAffinityDocumentsWriterThreadPool; it's the default
right (I see some tests explicitly setting in on IWC; not sure why)?
{quote}

It's actually only TestStressIndexing2 and it sets it to use a different
number of max thread states than the default.


{quote}
We should make the in-RAM deletes impl somehow pluggable?
{quote}

Do you mean so that it's customizable how deletes are handled?
E.g. doing live deletes vs. lazy deletes on flush?
I think that's a good idea.  E.g. at Twitter we'll do live deletes always
to get the lowest latency (and we don't have too many deletes),
but that's probably not the best default for everyone.
So I agree that making this customizable is a good idea.

It'd also be nice to have a more efficient data structure to buffer the
deletes.  With many buffered deletes the java hashmap approach
will not be very efficient.  Terms could be written into a byte pool,
but what should we do with queries?

> Per thread DocumentsWriters that write their own private segments
> -----------------------------------------------------------------
>
>                 Key: LUCENE-2324
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2324
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Michael Busch
>            Priority: Minor
>             Fix For: Realtime Branch
>
>         Attachments: lucene-2324.patch, lucene-2324.patch, LUCENE-2324.patch
>
>
> See LUCENE-2293 for motivation and more details.
> I'm copying here Mike's summary he posted on 2293:
> Change the approach for how we buffer in RAM to a more isolated
> approach, whereby IW has N fully independent RAM segments
> in-process and when a doc needs to be indexed it's added to one of
> them. Each segment would also write its own doc stores and
> "normal" segment merging (not the inefficient merge we now do on
> flush) would merge them. This should be a good simplification in
> the chain (eg maybe we can remove the *PerThread classes). The
> segments can flush independently, letting us make much better
> concurrent use of IO & CPU.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]