[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] Commented: (LUCENE-2573) Tiered flushing of DWPTs by RAM with low/high water marks

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/LUCENE-2573?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004184#comment-13004184 ]

Michael McCandless commented on LUCENE-2573:

bq. so you mean we always flush ALL DWPT once we reached the low watermark?

No, I mean: as soon as we pass low wm, pick biggest DWPT and flush it.
As soon as you mark that DWPT as flushPending, its RAM used is removed
from active pool and added to flushPending pool.

Then, if the active pool again crosses low wm, pick the biggest and
mark as flush pending, etc.

But if the flushing cannot keep up, and the sum of active +
flushPending pools crosses high wm, you hijack (stall) incoming

I think this may make a good "flush by RAM" policy, but I agree we should
test.  I think the fully tiered approach may be overly complex...

bq. for now this is internal only so even if we decide to I would shift that to a different issue.

OK sounds good.

Also, if the app really cares about this (I suspect none will) they
could make a custom FlushPolicy that they could directly query to find
out when threads get stalled.

Besides this, is it only getting flushing of deletes working
correctly that remains, before landing RT?

> Tiered flushing of DWPTs by RAM with low/high water marks
> ---------------------------------------------------------
>                 Key: LUCENE-2573
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2573
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: Index
>            Reporter: Michael Busch
>            Assignee: Simon Willnauer
>            Priority: Minor
>             Fix For: Realtime Branch
>         Attachments: LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch, LUCENE-2573.patch
> Now that we have DocumentsWriterPerThreads we need to track total consumed RAM across all DWPTs.
> A flushing strategy idea that was discussed in LUCENE-2324 was to use a tiered approach:  
> - Flush the first DWPT at a low water mark (e.g. at 90% of allowed RAM)
> - Flush all DWPTs at a high water mark (e.g. at 110%)
> - Use linear steps in between high and low watermark:  E.g. when 5 DWPTs are used, flush at 90%, 95%, 100%, 105% and 110%.
> Should we allow the user to configure the low and high water mark values explicitly using total values (e.g. low water mark at 120MB, high water mark at 140MB)?  Or shall we keep for simplicity the single setRAMBufferSizeMB() config method and use something like 90% and 110% for the water marks?

This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]