Add maxFields Option to IndexWriter

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Add maxFields Option to IndexWriter

Oren Ovadia
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren

Reply | Threaded
Open this post in threaded view
|

Re: Add maxFields Option to IndexWriter

David Smiley
I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation.  Try commenting on the issue of it's deprecations, which has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <[hidden email]> wrote:
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren

Reply | Threaded
Open this post in threaded view
|

Re: Add maxFields Option to IndexWriter

Simon Willnauer-4
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. 
I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

On 14. Jan 2021, at 06:36, David Smiley <[hidden email]> wrote:


I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation.  Try commenting on the issue of it's deprecations, which has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <[hidden email]> wrote:
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren

Reply | Threaded
Open this post in threaded view
|

Re: Add maxFields Option to IndexWriter

Marcus Eagan
I like Oren's idea and Simon's proposal of unlimited by default but configurable. 
Marcus 

On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <[hidden email]> wrote:
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. 
I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

On 14. Jan 2021, at 06:36, David Smiley <[hidden email]> wrote:


I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation.  Try commenting on the issue of it's deprecations, which has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <[hidden email]> wrote:
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren



--
Marcus Eagan

Reply | Threaded
Open this post in threaded view
|

Re: Add maxFields Option to IndexWriter

Michael McCandless-2
I think it makes sense to un-deprecate that API (why did we deprecate it?), but I'm not sure IW should be in the business of soft/hard limits on field count?

I agree such limits make sense if the integrity of the index is at risk, e.g. IW does enforce a max number of unique documents in one index.

But for number of fields, as long as we expose the API, then the layer above Lucene can handle soft/hard limits, notifying the user correctly, rejecting updates, etc.?

On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <[hidden email]> wrote:
I like Oren's idea and Simon's proposal of unlimited by default but configurable. 
Marcus 

On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <[hidden email]> wrote:
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. 
I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

On 14. Jan 2021, at 06:36, David Smiley <[hidden email]> wrote:


I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation.  Try commenting on the issue of it's deprecations, which has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <[hidden email]> wrote:
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren



--
Marcus Eagan

Reply | Threaded
Open this post in threaded view
|

Re: Add maxFields Option to IndexWriter

Oren Ovadia
Thanks for the responses and advice.

Un-deprecating sounds great, it solves our issue and gives us the flexibility to choose different strategies to deal with it (soft/hard limits etc.).
Created LUCENE-9680 to track this, I'll have a patch ready by the beginning of next week.

Best,
Oren

P.S: getFieldNames was deprecated after SOLR-12368 made in-place DV updates easier for fields that didn't exist.

On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless <[hidden email]> wrote:
I think it makes sense to un-deprecate that API (why did we deprecate it?), but I'm not sure IW should be in the business of soft/hard limits on field count?

I agree such limits make sense if the integrity of the index is at risk, e.g. IW does enforce a max number of unique documents in one index.

But for number of fields, as long as we expose the API, then the layer above Lucene can handle soft/hard limits, notifying the user correctly, rejecting updates, etc.?

On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <[hidden email]> wrote:
I like Oren's idea and Simon's proposal of unlimited by default but configurable. 
Marcus 

On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <[hidden email]> wrote:
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. 
I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

On 14. Jan 2021, at 06:36, David Smiley <[hidden email]> wrote:


I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation.  Try commenting on the issue of it's deprecations, which has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <[hidden email]> wrote:
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren



--
Marcus Eagan

Reply | Threaded
Open this post in threaded view
|

Re: Add maxFields Option to IndexWriter

Oren Ovadia
Thanks in advance for taking a look.

Is anyone game to help me back port this to the upcoming minor version in 8.7?

Thank you,
Oren


On Tue, Jan 19, 2021 at 5:56 PM Oren Ovadia <[hidden email]> wrote:
Thanks for the responses and advice.

Un-deprecating sounds great, it solves our issue and gives us the flexibility to choose different strategies to deal with it (soft/hard limits etc.).
Created LUCENE-9680 to track this, I'll have a patch ready by the beginning of next week.

Best,
Oren

P.S: getFieldNames was deprecated after SOLR-12368 made in-place DV updates easier for fields that didn't exist.

On Tue, Jan 19, 2021 at 7:42 AM Michael McCandless <[hidden email]> wrote:
I think it makes sense to un-deprecate that API (why did we deprecate it?), but I'm not sure IW should be in the business of soft/hard limits on field count?

I agree such limits make sense if the integrity of the index is at risk, e.g. IW does enforce a max number of unique documents in one index.

But for number of fields, as long as we expose the API, then the layer above Lucene can handle soft/hard limits, notifying the user correctly, rejecting updates, etc.?

On Thu, Jan 14, 2021 at 5:36 PM Marcus Eagan <[hidden email]> wrote:
I like Oren's idea and Simon's proposal of unlimited by default but configurable. 
Marcus 

On Thu, Jan 14, 2021 at 12:16 AM Simon Willnauer <[hidden email]> wrote:
I personally have pretty positive experience with what I call softlimits. At elastic we use them all over the place to catch issues when a user likely misconfigures something or if there is likely a issue on the users end. 
I think having an option on the IW that allows to limit the fieldnumbers. We can even extract a general limits object with total num docs etc. if we want. We can still set stuff to unlimited by default.

WDYT

Sent from a mobile device

On 14. Jan 2021, at 06:36, David Smiley <[hidden email]> wrote:


I don't like the idea of IndexWriter limiting field names, but I do like the idea of un-deprecating that method, which appeared to have a trivial implementation.  Try commenting on the issue of it's deprecations, which has various watchers to get their attention.

~ David Smiley
Apache Lucene/Solr Search Developer


On Wed, Jan 13, 2021 at 5:02 PM Oren Ovadia <[hidden email]> wrote:
Hi All,

I work on Lucene at MongoDB.

I would like to limit the amount of fields in an index to prevent tenants from causing a mapping explosion. 

Since IndexWriter.getFieldNames has been deprecated, there is no way to do this without using a reader (which comes with a set of problems regarding flush/commit rates).

Would love to add to Lucene the ability to have IndexWriters limiting the number of fields. Curious to hear your thoughts.

Thanks,
Oren



--
Marcus Eagan