[ https://issues.apache.org/jira/browse/LUCENE-7976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16439785#comment-16439785 ]
Erick Erickson commented on LUCENE-7976:
Well, it didn't get simpler ;(...
[~mikemccand] The problem with tweaks to scoring is that the assumptions made in findForcedDeletesMerges and findForcedMerges now have to respect max segment size. Which really means that all three methods (including findMerges) are the same operation just with some different initial assumptions. findForcedMerges is particularly ugly in that it can have a segment count specified and that makes for some uglier code.
I think I want to defer your comments about findForcedMergeDeletes possibly having a bug and should do the same kind of round-tripping as findForcedMerges to another JIRA if necessary, this is already big enough.
> Despite all the nocommits and extraneous comments, I think it's pretty close to being functionally correct.
> I need to clean this up considerably as I've been concentrating on getting it structured, I'll leave it for a day or two and then look again.
> I'm not entirely sure I like the structure of the InfosStats class with the computeStats being sensitive to what _kind_ of merge is being called for. On the one hand it does centralize all the different considerations. On the other it concentrates the ugliness in one place. Moving tricky code from one place to the other isn't necessarily an improvement. Oh the other other hand, when the trickiness was in findMerges, findForcedMerges and findForcedDeletesMerges I had to pass a bunch of parameters to getSpec which was ugly too.
> I've hard-coded 20% as a threshold in indexDeletedPctAllowed and it does double-duty, both as a threshold for total index pct deleted before singleton merges are allowed _and_ the threshold for a singleton merge. I don't think this is something I particularly want to make into a tuning parameter right now, possibly leave that for another JIRA if at all. See the bit on perf below. With expungeDeletes and forceMerge now respecting max segment bytes, if someone really, really, really cares about this they can use those operations.
> singleton merges work, so if I have a massive segment it will gradually reduce in size over time. Max sized segments are also singleton-merged when the minimum deleted percentage threshold is reached and they're over 20% deleted docs.
> There's some reporting in this code that will disappear completely to measure bytes written. I compared this version to the original and I'm pleasantly surprised to see only about a 10% increase in bytes written with the new patch. For this testing, I indexed 10M docs with maxMergedSegmentMB=50. Each doc's ID was randomly generated between 0 and 10M and I ran through all 10M 25 times. I indexed in packets of 1,000 and sent the _same_ packet to the old and new versions. Of course I added the reporting to the old version as well. Mind you that was last night so I haven't analyzed in detail yet.
> This approach, especially the singleton merges, will certainly increase the I/O if the index has been optimized to 1 segment. I don't think that's something that should be addressed in this JIRA (or at all). Prior to this there was no way to recover from that situation except to wait until most of it was deleted docs.
> I think the critical bit here is that all these merges, including the singleton merges, run through the (unchanged) scoring mechanism, which I haven't changed at all. I'll re-run my test today with the new code changing reclaimDeletesWeight to 1.5 'cause I'm curious. And maybe 1.0 (no effect) and maybe 0.75 (reducing deletes weight).
Let me know what you think.
> Make TieredMergePolicy respect maxSegmentSizeMB and allow singleton merges of very large segments
> Key: LUCENE-7976
> URL: https://issues.apache.org/jira/browse/LUCENE-7976
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Erick Erickson
> Assignee: Erick Erickson
> Priority: Major
> Attachments: LUCENE-7976.patch, LUCENE-7976.patch, LUCENE-7976.patch
> We're seeing situations "in the wild" where there are very large indexes (on disk) handled quite easily in a single Lucene index. This is particularly true as features like docValues move data into MMapDirectory space. The current TMP algorithm allows on the order of 50% deleted documents as per a dev list conversation with Mike McCandless (and his blog here: https://www.elastic.co/blog/lucenes-handling-of-deleted-documents).
> Especially in the current era of very large indexes in aggregate, (think many TB) solutions like "you need to distribute your collection over more shards" become very costly. Additionally, the tempting "optimize" button exacerbates the issue since once you form, say, a 100G segment (by optimizing/forceMerging) it is not eligible for merging until 97.5G of the docs in it are deleted (current default 5G max segment size).
> The proposal here would be to add a new parameter to TMP, something like <maxAllowedPctDeletedInBigSegments> (no, that's not serious name, suggestions welcome) which would default to 100 (or the same behavior we have now).
> So if I set this parameter to, say, 20%, and the max segment size stays at 5G, the following would happen when segments were selected for merging:
> > any segment with > 20% deleted documents would be merged or rewritten NO MATTER HOW LARGE. There are two cases,
> >> the segment has < 5G "live" docs. In that case it would be merged with smaller segments to bring the resulting segment up to 5G. If no smaller segments exist, it would just be rewritten
> >> The segment has > 5G "live" docs (the result of a forceMerge or optimize). It would be rewritten into a single segment removing all deleted docs no matter how big it is to start. The 100G example above would be rewritten to an 80G segment for instance.
> Of course this would lead to potentially much more I/O which is why the default would be the same behavior we see now. As it stands now, though, there's no way to recover from an optimize/forceMerge except to re-index from scratch. We routinely see 200G-300G Lucene indexes at this point "in the wild" with 10s of shards replicated 3 or more times. And that doesn't even include having these over HDFS.
> Alternatives welcome! Something like the above seems minimally invasive. A new merge policy is certainly an alternative.
This message was sent by Atlassian JIRA
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]
|Free forum by Nabble||Edit this page|