Support for field removal

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Support for field removal

Michael Kleen
Hello, 

i am interested in the implementation of the field removal from an index. I am a Lucene user and I have a basic understanding of the internals. I have a few first questions and I hope this is the right place to ask them.

Is already someone working on this topic ?

Are there some real blockers for this feature ?

From my basic understanding, field removal should work conceptually in the following way: The FieldInfo which is supposed to be removed would need to be first stored until the segment merge is happening. Once the segment merge is happening the field has to excluded from the new segment. Is this correct, or do I miss something here ? 

Thank you for your time,

Michael
Reply | Threaded
Open this post in threaded view
|

Re: Support for field removal

Erick Erickson
There’s an SIP (Solr Improvement Proposal) that would encompass this. It takes a slightly different approach by forcibly rewriting _all_ segments without exceeding the size limitations of Tiered Merge Policy. I can’t quite get to it at this point.

We’ve worked out a mechanism for this that allowed docValues to be added to a field that was indexed, but haven’t worked on various other “safe” operations, field removal being one of them.

 https://cwiki.apache.org/confluence/display/SOLR/SIP-2+Support+safe+index+transformations+without+reindexing

Best,
Erick

On Mar 9, 2020, at 11:53, Michael Kleen <[hidden email]> wrote:


Hello, 

i am interested in the implementation of the field removal from an index. I am a Lucene user and I have a basic understanding of the internals. I have a few first questions and I hope this is the right place to ask them.

Is already someone working on this topic ?

Are there some real blockers for this feature ?

From my basic understanding, field removal should work conceptually in the following way: The FieldInfo which is supposed to be removed would need to be first stored until the segment merge is happening. Once the segment merge is happening the field has to excluded from the new segment. Is this correct, or do I miss something here ? 

Thank you for your time,

Michael
Reply | Threaded
Open this post in threaded view
|

Re: Support for field removal

david.w.smiley@gmail.com
In reply to this post by Michael Kleen

The main thing missing is that the stored fields data doesn't actually store what IDs are being used (!).  If that were in place, I could imagine a CLI tool to purge the IDs.

~ David Smiley
Apache Lucene/Solr Search Developer


On Mon, Mar 9, 2020 at 2:53 PM Michael Kleen <[hidden email]> wrote:
Hello, 

i am interested in the implementation of the field removal from an index. I am a Lucene user and I have a basic understanding of the internals. I have a few first questions and I hope this is the right place to ask them.

Is already someone working on this topic ?

Are there some real blockers for this feature ?

From my basic understanding, field removal should work conceptually in the following way: The FieldInfo which is supposed to be removed would need to be first stored until the segment merge is happening. Once the segment merge is happening the field has to excluded from the new segment. Is this correct, or do I miss something here ? 

Thank you for your time,

Michael