|
|
Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
|
|
I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention. ~ David Smiley Apache Lucene/Solr Search Developer Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
|
|
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.
WDYT Sent from a mobile device I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention. ~ David Smiley Apache Lucene/Solr Search Developer Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
|
|
I like Oren's idea and Simon's proposal of unlimited by default but configurable. Marcus On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer < [hidden email]> wrote: I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.
WDYT Sent from a mobile device I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention. ~ David Smiley Apache Lucene/Solr Search Developer Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
--
|
|
I think it makes sense to un-deprecate that API (why did we deprecate it?), but I'm not sure IW should be in the business of soft/hard limits on field count?
I agree such limits make sense if the integrity of the index is at risk, e.g. IW does enforce a max number of unique documents in one index.
But for number of fields, as long as we expose the API, then the layer above Lucene can handle soft/hard limits, notifying the user correctly, rejecting updates, etc.? I like Oren's idea and Simon's proposal of unlimited by default but configurable. Marcus
On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer < [hidden email]> wrote: I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.
WDYT Sent from a mobile device I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention. ~ David Smiley Apache Lucene/Solr Search Developer Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
--
|
|
Thanks for the responses and advice.
Un-deprecating sounds great, it solves our issue and gives us the flexibility to choose different strategies to deal with it (soft/hard limits etc.). Created LUCENE-9680 to track this, I'll have a patch ready by the beginning of next week.
Best, Oren
P.S: getFieldNames was deprecated after SOLR-12368 made in-place DV updates easier for fields that didn't exist. On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless < [hidden email]> wrote: I think it makes sense to un-deprecate that API (why did we deprecate it?), but I'm not sure IW should be in the business of soft/hard limits on field count?
I agree such limits make sense if the integrity of the index is at risk, e.g. IW does enforce a max number of unique documents in one index.
But for number of fields, as long as we expose the API, then the layer above Lucene can handle soft/hard limits, notifying the user correctly, rejecting updates, etc.?
I like Oren's idea and Simon's proposal of unlimited by default but configurable. Marcus
On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer < [hidden email]> wrote: I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.
WDYT Sent from a mobile device I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention. ~ David Smiley Apache Lucene/Solr Search Developer Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
--
|
|
Thanks in advance for taking a look.
Is anyone game to help me back port this to the upcoming minor version in 8.7? Thank you, Oren
Thanks for the responses and advice.
Un-deprecating sounds great, it solves our issue and gives us the flexibility to choose different strategies to deal with it (soft/hard limits etc.). Created LUCENE-9680 to track this, I'll have a patch ready by the beginning of next week.
Best, Oren
P.S: getFieldNames was deprecated after SOLR-12368 made in-place DV updates easier for fields that didn't exist.
On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless < [hidden email]> wrote: I think it makes sense to un-deprecate that API (why did we deprecate it?), but I'm not sure IW should be in the business of soft/hard limits on field count?
I agree such limits make sense if the integrity of the index is at risk, e.g. IW does enforce a max number of unique documents in one index.
But for number of fields, as long as we expose the API, then the layer above Lucene can handle soft/hard limits, notifying the user correctly, rejecting updates, etc.?
I like Oren's idea and Simon's proposal of unlimited by default but configurable. Marcus
On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer < [hidden email]> wrote: I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.
WDYT Sent from a mobile device I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation. Try commenting on the issue of it's deprecations, which has various watchers to get their attention. ~ David Smiley Apache Lucene/Solr Search Developer Hi All, I work on Lucene at MongoDB.
I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion.
Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).
Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.
--
|
|