Thinking about upgrading indexes to X+2

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Thinking about upgrading indexes to X+2

Erick Erickson
So yet another iteration on the users list of going from X to X+2 got me to thinking (dangerous I know). I wanted to run this by folks to see if it’s worth a JIRA.

It _seems_ reasonable from a user’s perspective to create an index with, say, 6x, then upgrade to 7x and reindex all documents (without deleting the index first), then be able to upgrade to 8x and reindex all documents. Rinse, repeat.

The problem of course is that the 6x segments get merged by TMP and the 6x stamp is preserved. (BTW, I’m going from hearsay here rather than code knowledge, correct me if I’m wrong, I’ve assumed all along that these are on each _segment_, not global to the entire index).

I can think of a couple of options for, say, TMP that might work out to support the above (I’m not proposing both, and these are bad names…):
1 - onlyMergeSegmentsCreatedWithTheSameVersion
2 - neverMergeSegmentsCreatedWithAPriorVersion

Either of these would, if and only if _all_ docs were indeed indexed again, result in all the X-1 segments consisting entirely of deleted documents and being dropped. Now no segment has the X-1 marker and we could upgrade to X+1.

There are some edge cases of course:

- if even one X-1 doc wasn't reindexed, it wouldn’t work. I can think of ways around this, e.g. a command deleteAllSegmentsCreatedWithPriorVersions, but since that’s indeterminate in terms of _which_ docs get deleted, I don’t like it at all. Handling this case sounds like a best practice recommendation for people concerned with this to index a field in each doc themselves (we could automate this) and do a delete-by-query.

- Disk space issues. If we used <1> above, this wouldn’t be much differently from what we have now in terms of wasted space. There’d be some extra wasted space, but not much. <2> would cause greater disk space waste. <2> would probably be easier, but I don’t think <1> is much work either.

- Is it worth the effort? People have to reindex every doc anyway.

- How to test?

- ???

I think the question of whether to pursue this or not comes down to two questions:

1> Does it really help end users enough to be worth the effort? How many users can _guarantee_ that they reindex every document?

2> Would something along these lines work at all? Like I said, I’m going from hearsay rather than deep knowledge of the X-2 mechanism.

All I’m looking for here is whether it’s interesting enough for me to create a JIRA and discuss details there...
---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Thinking about upgrading indexes to X+2

Uwe Schindler
Thanks for bringing this again.

I tend to say: Let us just allow also IndexUpgrader beyodn 2 versions! If somebody complains about incorrect offsets, oh man - It's their problem.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: Erick Erickson <[hidden email]>
> Sent: Friday, November 20, 2020 4:03 PM
> To: [hidden email]
> Subject: Thinking about upgrading indexes to X+2
>
> So yet another iteration on the users list of going from X to X+2 got me to
> thinking (dangerous I know). I wanted to run this by folks to see if it’s worth a
> JIRA.
>
> It _seems_ reasonable from a user’s perspective to create an index with, say,
> 6x, then upgrade to 7x and reindex all documents (without deleting the index
> first), then be able to upgrade to 8x and reindex all documents. Rinse, repeat.
>
> The problem of course is that the 6x segments get merged by TMP and the 6x
> stamp is preserved. (BTW, I’m going from hearsay here rather than code
> knowledge, correct me if I’m wrong, I’ve assumed all along that these are on
> each _segment_, not global to the entire index).
>
> I can think of a couple of options for, say, TMP that might work out to support
> the above (I’m not proposing both, and these are bad names…):
> 1 - onlyMergeSegmentsCreatedWithTheSameVersion
> 2 - neverMergeSegmentsCreatedWithAPriorVersion
>
> Either of these would, if and only if _all_ docs were indeed indexed again,
> result in all the X-1 segments consisting entirely of deleted documents and
> being dropped. Now no segment has the X-1 marker and we could upgrade to
> X+1.
>
> There are some edge cases of course:
>
> - if even one X-1 doc wasn't reindexed, it wouldn’t work. I can think of ways
> around this, e.g. a command deleteAllSegmentsCreatedWithPriorVersions, but
> since that’s indeterminate in terms of _which_ docs get deleted, I don’t like it
> at all. Handling this case sounds like a best practice recommendation for people
> concerned with this to index a field in each doc themselves (we could automate
> this) and do a delete-by-query.
>
> - Disk space issues. If we used <1> above, this wouldn’t be much differently
> from what we have now in terms of wasted space. There’d be some extra
> wasted space, but not much. <2> would cause greater disk space waste. <2>
> would probably be easier, but I don’t think <1> is much work either.
>
> - Is it worth the effort? People have to reindex every doc anyway.
>
> - How to test?
>
> - ???
>
> I think the question of whether to pursue this or not comes down to two
> questions:
>
> 1> Does it really help end users enough to be worth the effort? How many
> users can _guarantee_ that they reindex every document?
>
> 2> Would something along these lines work at all? Like I said, I’m going from
> hearsay rather than deep knowledge of the X-2 mechanism.
>
> All I’m looking for here is whether it’s interesting enough for me to create a
> JIRA and discuss details there...
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]