Feature: Solr implicitly defined field types?

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Jörn Franke
I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:

While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
Thanks for your thoughtful response Jörn!
...
On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <[hidden email]> wrote:
I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.

Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
 
A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
 
Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
 
I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
 
I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!

Happy New Year,
~ David

Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:

While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Jörn Franke
Hi David,

I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.

Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.

Am 30.12.2018 um 07:01 schrieb David Smiley <[hidden email]>:

Thanks for your thoughtful response Jörn!
...
On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <[hidden email]> wrote:
I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.

Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
 
A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
 
Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
 
I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
 
I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!

Happy New Year,
~ David

Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:

While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
Broadly, you refer to "locale" issues.  Solr's way of dealing with this today is with optional & configurable use of URPs.  The schema-less / data-driven mode has some of these enabled; you can see it in the solrconfig.xml including many date formats.  You can look into that for further info if you like.  The primitive field types are not locale sensitive.

Update: It's looking like 8.0 will only employ this implicit field type mechanism for _nest_path_ which probably won't be in the default schema.  Assuming it isn't, then it'll only be documented in the context of this particular feature.  It'd be nice to see the scope of fields expanded and at that juncture it could/should be more broadly documented.  That can wait to people have energy to do it.

On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <[hidden email]> wrote:
Hi David,

I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.

Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.

Am 30.12.2018 um 07:01 schrieb David Smiley <[hidden email]>:

Thanks for your thoughtful response Jörn!
...
On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <[hidden email]> wrote:
I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.

Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
 
A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
 
Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
 
I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
 
I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!

Happy New Year,
~ David

Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:

While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Gus Heck
I'm perhaps slightly conservative with respect to configuration, but I'm not fond of hidden configuration that I can't see. What I don't like is looking at a config file and not seeing the full story. That means i have to read the config and ALSO go read some part of the documentation that I've failed to memorize, and probably need to google to find to be fully aware of what's going on....  (and no I didn't like it when some standard stuff disappeared from solrconfig.xml a while back either). Small changes of course seem reasonable, but the further we drift into implicit things, especially if we get a collection of several implicit things described in various disparate parts of the manual the more cryptic the system becomes. That's my opinion, YMMV.

-Gus

On Thu, Jan 3, 2019 at 2:57 PM David Smiley <[hidden email]> wrote:
Broadly, you refer to "locale" issues.  Solr's way of dealing with this today is with optional & configurable use of URPs.  The schema-less / data-driven mode has some of these enabled; you can see it in the solrconfig.xml including many date formats.  You can look into that for further info if you like.  The primitive field types are not locale sensitive.

Update: It's looking like 8.0 will only employ this implicit field type mechanism for _nest_path_ which probably won't be in the default schema.  Assuming it isn't, then it'll only be documented in the context of this particular feature.  It'd be nice to see the scope of fields expanded and at that juncture it could/should be more broadly documented.  That can wait to people have energy to do it.

On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <[hidden email]> wrote:
Hi David,

I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.

Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.

Am 30.12.2018 um 07:01 schrieb David Smiley <[hidden email]>:

Thanks for your thoughtful response Jörn!
...
On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <[hidden email]> wrote:
I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.

Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
 
A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
 
Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
 
I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
 
I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!

Happy New Year,
~ David

Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:

While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
I'm thinking this feature would be used conservatively -- and thus just primitive types that wouldn't have an interesting configuration to them, or for something you are really not expected to change (the nest path of nested docs).  So you wouldn't feel you had to go read the docs.  The schema might even have a comment to mention a list of implicit field types (a one-liner comma delimited list).

On Fri, Jan 4, 2019 at 10:34 AM Gus Heck <[hidden email]> wrote:
I'm perhaps slightly conservative with respect to configuration, but I'm not fond of hidden configuration that I can't see. What I don't like is looking at a config file and not seeing the full story. That means i have to read the config and ALSO go read some part of the documentation that I've failed to memorize, and probably need to google to find to be fully aware of what's going on....  (and no I didn't like it when some standard stuff disappeared from solrconfig.xml a while back either). Small changes of course seem reasonable, but the further we drift into implicit things, especially if we get a collection of several implicit things described in various disparate parts of the manual the more cryptic the system becomes. That's my opinion, YMMV.

-Gus

On Thu, Jan 3, 2019 at 2:57 PM David Smiley <[hidden email]> wrote:
Broadly, you refer to "locale" issues.  Solr's way of dealing with this today is with optional & configurable use of URPs.  The schema-less / data-driven mode has some of these enabled; you can see it in the solrconfig.xml including many date formats.  You can look into that for further info if you like.  The primitive field types are not locale sensitive.

Update: It's looking like 8.0 will only employ this implicit field type mechanism for _nest_path_ which probably won't be in the default schema.  Assuming it isn't, then it'll only be documented in the context of this particular feature.  It'd be nice to see the scope of fields expanded and at that juncture it could/should be more broadly documented.  That can wait to people have energy to do it.

On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <[hidden email]> wrote:
Hi David,

I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.

Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.

Am 30.12.2018 um 07:01 schrieb David Smiley <[hidden email]>:

Thanks for your thoughtful response Jörn!
...
On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <[hidden email]> wrote:
I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me. 

RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.

Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
 
A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.

Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
 
Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.

Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc). 

I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
 
I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.

The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
 
I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!

Happy New Year,
~ David

Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:

While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.  

There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.  

(A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.

(B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.

I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Alexandre Rafalovitch
What about if a system schema was loaded at a startup implicitly.
Then, if a new schema is loaded and type definition is missing, it is
copied - at that time - into the specific schema. So, on the first
rewrite those - and only those used - types will be written out.

This allows to version the system types the same way as we version
normal schema. I agree with Gus that hidden configuration causes all
sorts of challenges.

And - for tooling purposes - there definitely needs to be a way to get
all used definitions, explicit and implicit, used and just available.
That also points towards something that already has self-describing
mechanism (like Schema API) available.

Regards,
   Alex.


On Fri, 4 Jan 2019 at 10:45, David Smiley <[hidden email]> wrote:

>
> I'm thinking this feature would be used conservatively -- and thus just primitive types that wouldn't have an interesting configuration to them, or for something you are really not expected to change (the nest path of nested docs).  So you wouldn't feel you had to go read the docs.  The schema might even have a comment to mention a list of implicit field types (a one-liner comma delimited list).
>
> On Fri, Jan 4, 2019 at 10:34 AM Gus Heck <[hidden email]> wrote:
>>
>> I'm perhaps slightly conservative with respect to configuration, but I'm not fond of hidden configuration that I can't see. What I don't like is looking at a config file and not seeing the full story. That means i have to read the config and ALSO go read some part of the documentation that I've failed to memorize, and probably need to google to find to be fully aware of what's going on....  (and no I didn't like it when some standard stuff disappeared from solrconfig.xml a while back either). Small changes of course seem reasonable, but the further we drift into implicit things, especially if we get a collection of several implicit things described in various disparate parts of the manual the more cryptic the system becomes. That's my opinion, YMMV.
>>
>> -Gus
>>
>> On Thu, Jan 3, 2019 at 2:57 PM David Smiley <[hidden email]> wrote:
>>>
>>> Broadly, you refer to "locale" issues.  Solr's way of dealing with this today is with optional & configurable use of URPs.  The schema-less / data-driven mode has some of these enabled; you can see it in the solrconfig.xml including many date formats.  You can look into that for further info if you like.  The primitive field types are not locale sensitive.
>>>
>>> Update: It's looking like 8.0 will only employ this implicit field type mechanism for _nest_path_ which probably won't be in the default schema.  Assuming it isn't, then it'll only be documented in the context of this particular feature.  It'd be nice to see the scope of fields expanded and at that juncture it could/should be more broadly documented.  That can wait to people have energy to do it.
>>>
>>> On Sun, Dec 30, 2018 at 4:54 AM Jörn Franke <[hidden email]> wrote:
>>>>
>>>> Hi David,
>>>>
>>>> I now get the idea and yes this makes sense. It would require though some tutorial or best practices, eg overriding a platform data type may make not so much sense - it may confuse new developers in an existing project that know Solr, but then get a platform type that has not the default behavior.
>>>>
>>>> Could you deal with different languages in platform types? Eg for dates it does not seem a problem, because Solr expects only one specific type of date that needs to be somehow converted beforehand (maybe that conversion could be also part of a platform type), but decimals are different in some languages or Boolean values.
>>>>
>>>> Am 30.12.2018 um 07:01 schrieb David Smiley <[hidden email]>:
>>>>
>>>> Thanks for your thoughtful response Jörn!
>>>> ...
>>>> On Sat, Dec 29, 2018 at 4:14 AM Jörn Franke <[hidden email]> wrote:
>>>>>
>>>>> I think it is a good idea, but I see some potential complexity for “deployment” of collections. For instance, in environments where Solr is used as a shared platform amongst several stakeholders, every time you deploy/modify a collection you need to take care that the platform types exist. If it exists in the Test environment then i need to make sure that it exists as well in acceptance/production. The problem is that the platform type could have been defined by somebody else who has not yet (eg due to project/sprint delays) not updated the other environments. Another issue is if I move to another Solr cluster in the same environment. Then, I have to make sure that all platform types move with me.
>>>>
>>>>
>>>> RE "the platform type could have been defined by somebody else":  I'm not imagining it'd be configurable, thus the "somebody else" is the Solr project/committers.
>>>>
>>>> Otherwise, I think I get your point, but perhaps I don't.  It's the same point for any use of some new feature of Solr.  If you use some new feature, you have to take care that all Solr instances you deploy your configuration to can handle that new feature.  That's a fairly generic point that would apply to just about anything in Solr.
>>>>
>>>>>
>>>>> A (minor) issue is that platform types may change (for whatever reasons) and that then potentially all collections have to be reindexed or we have different versions of the same platform type making things not easier.
>>>>
>>>>
>>>> Yes it's possible.  Though I think that point is apart from the feature I propose.  You're saying that you might want to use an "int" field and then one day realize you want some newer/better definition of what an "int" is (e.g. trie -> points).  Sure.  That's true wether the field type is explicit or implicit.  There's nothing stopping you from explicitly defining the field type if you want to; the names would not be reserved. If you want to stick with your current index running the new Solr version, then you would keep luceneMatchVersion what it was, which would effectively retain the interpretation of the implicit field types.
>>>>
>>>>>
>>>>> Currently we have all our Schema definitions in a version management system (we use the Schema API but the JSON requests are out there) so that projects can inspire from each other. Needless to say, that careful type engineering requires also some documentation on technical design and may be indeed very Collection specific.
>>>>>
>>>>> Another issue could be that a platform type may also imply a certain platform solrconfig.xml (eg lib directive etc).
>>>>
>>>>
>>>> I'm imagining platform types would be basic primitive types (int, boolean, etc. and some special situations like in the issue I referenced).  They would not depend on contrib libs... though I could imagine one day an evolution of this in which a contrib could somehow auto-add implicit field types.
>>>>
>>>>>
>>>>> I am not sure yet what are the exact benefits of referring to types of other collections in the Solr runtime itself instead of having a version system and letting projects decide if they want to adapt types of other collections, but maybe I am overlooking something here.
>>>>
>>>>
>>>> The notion of implicit field types is not a cross-config (cross-collection) thing.  Implicit field types are nothing more than built-in shortcuts.
>>>>
>>>> I recall one of my very early observations of Solr's schema was of surprise to see primitive types defined in the schema.  Consider in SQL DDL statements that refer to varchar and such.  Your DDL doesn't need to define what a varchar is!
>>>>
>>>> Happy New Year,
>>>> ~ David
>>>>
>>>>> Am 28.12.2018 um 17:36 schrieb David Smiley <[hidden email]>:
>>>>>
>>>>> While working on https://issues.apache.org/jira/browse/SOLR-12768 it occurred to me that it would be nice if Solr had implicitly defined field types.  This would allow you to define a field in your schema that refers to a type that is not also in your schema -- at least not explicitly (need not explicitly be put in your schema.xml if classic, or need not be passed to schema manipulation API if you use that).  The idea would be that these types would be Solr platform provided field types that need not be defined by you.
>>>>>
>>>>> There are multiple ways this loose idea might be conceived / imagined into a concrete proposal.
>>>>>
>>>>> (A) The main idea I'm kicking around right now is that Solr would _not_ throw an error at the moment of reading your field definition that it doesn't see your type... instead it would see it's a platform type (via some built-in hard-coded registry) and then register that type on the fly.  So if you were to read the schema then you'd see it.  In this way, it's kind of a shortcut.  Platform field types that you don't actually refer to will never end up being put into your schema.
>>>>>
>>>>> (B) A schema could pre-initialize with the platform/implicit types.  This is the simplest idea but I don't like it because you may not even need some of these types.  I'm not going to go down this path now but wanted to mention it.
>>>>>
>>>>> I'm exploring (A) right now... I'm hoping to do this for at least a "_nest_path_"  field in support of nested documents in 8.0, but conceivably the idea would be expanded to lots of things in our base schema right now (int, str, etc.)
>>>>> --
>>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>>>>
>>>> --
>>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>>>
>>> --
>>> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
>>> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com
>>
>>
>>
>> --
>> http://www.the111shift.com
>
> --
> Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
> LinkedIn: http://linkedin.com/in/davidwsmiley | Book: http://www.solrenterprisesearchserver.com

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Shawn Heisey-2
I'm jumping into this conversation a little bit late.  Sorry for any
problems that causes.

On 1/4/2019 9:52 AM, Alexandre Rafalovitch wrote:
> What about if a system schema was loaded at a startup implicitly.
> Then, if a new schema is loaded and type definition is missing, it is
> copied - at that time - into the specific schema. So, on the first
> rewrite those - and only those used - types will be written out.

Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

A file-based system schema where implicit types are explicitly defined
is an interesting idea that I think would get around the problem
described above.  We would need to decide exactly what can be defined in
the system schema -- my initial bias would be to only allow types, not
fields or other schema config, to be defined there.  Probably a good
location for the system schema file would be the ZK chroot or the solr
home, depending on whether the system is in cloud mode.

Thanks,
Shawn


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <[hidden email]> wrote:
Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Gus Heck
To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 

I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 

Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...


On Fri, Jan 4, 2019 at 10:52 PM David Smiley <[hidden email]> wrote:
On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <[hidden email]> wrote:
Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
You would see these types in the HTTP schema API, and thus you would also end up seeing it on the admin schema screen (which uses that API).
It would not be saved back to the XML file unless you're further manipulating your schema via the HTTP schema API (managed schema).  I ought to verify all this manually.  As I'm sure you already know, comments / formatting do not survive that round-trip.

I'm a convention over configuration believer, and thus I prefer CoC over explicitness/verbosity.  I suppose all CoC arguments could be shot down with generic statements of perceived maintenance/understandability benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me ask you this: why is it okay for databases to not have definitions of what primitives types are yet in Solr you would rather it be explicit always?  That analogy is the crux of it.  I'm not arguing for "text_general" or other text analyzed types to be implicits; who knows where to draw the line there.  I thought primitives would be a slam dunk.

On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <[hidden email]> wrote:
To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 

I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 

Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...


On Fri, Jan 4, 2019 at 10:52 PM David Smiley <[hidden email]> wrote:
On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <[hidden email]> wrote:
Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Jan Høydahl / Cominvent
In some other thread or Jira that I cannot find now I proposed a new tag in schema to make this explicit. So instead of 50 tags defining all primitive types and dynamicFields, we could have one tag:

<primitiveFiledTypes enabled=«true» dynamicMappings=«true» lazy=«true»/>

This is just a draft idea. This would give a way to disable these implicit primitive types if they are made default on. A lazy mode could delay adding to scheme until first use if that saves any resources.

Jan

5. jan. 2019 kl. 21:29 skrev David Smiley <[hidden email]>:

You would see these types in the HTTP schema API, and thus you would also end up seeing it on the admin schema screen (which uses that API).
It would not be saved back to the XML file unless you're further manipulating your schema via the HTTP schema API (managed schema).  I ought to verify all this manually.  As I'm sure you already know, comments / formatting do not survive that round-trip.

I'm a convention over configuration believer, and thus I prefer CoC over explicitness/verbosity.  I suppose all CoC arguments could be shot down with generic statements of perceived maintenance/understandability benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me ask you this: why is it okay for databases to not have definitions of what primitives types are yet in Solr you would rather it be explicit always?  That analogy is the crux of it.  I'm not arguing for "text_general" or other text analyzed types to be implicits; who knows where to draw the line there.  I thought primitives would be a slam dunk.

On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <[hidden email][hidden email]> wrote:
To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 

I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 

Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...


On Fri, Jan 4, 2019 at 10:52 PM David Smiley <[hidden email]> wrote:
On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <[hidden email]> wrote:
Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
Hmmm.  My opinion is neutral on a <primitiveFieldTypes>.  It would have more implementation & documentation complexity to it IMO than an implicit primitive type as I've been pushing.  But still; it's alright.

Since I can't seem to convince anyone on the merits of implicit field types, I will back out this part of SOLR-12768.  Instead I suppose I will add a new field type for that particular issue's need.

~ David

On Sat, Jan 5, 2019 at 5:29 PM Jan Høydahl <[hidden email]> wrote:
In some other thread or Jira that I cannot find now I proposed a new tag in schema to make this explicit. So instead of 50 tags defining all primitive types and dynamicFields, we could have one tag:

<primitiveFiledTypes enabled=«true» dynamicMappings=«true» lazy=«true»/>

This is just a draft idea. This would give a way to disable these implicit primitive types if they are made default on. A lazy mode could delay adding to scheme until first use if that saves any resources.

Jan

5. jan. 2019 kl. 21:29 skrev David Smiley <[hidden email]>:

You would see these types in the HTTP schema API, and thus you would also end up seeing it on the admin schema screen (which uses that API).
It would not be saved back to the XML file unless you're further manipulating your schema via the HTTP schema API (managed schema).  I ought to verify all this manually.  As I'm sure you already know, comments / formatting do not survive that round-trip.

I'm a convention over configuration believer, and thus I prefer CoC over explicitness/verbosity.  I suppose all CoC arguments could be shot down with generic statements of perceived maintenance/understandability benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me ask you this: why is it okay for databases to not have definitions of what primitives types are yet in Solr you would rather it be explicit always?  That analogy is the crux of it.  I'm not arguing for "text_general" or other text analyzed types to be implicits; who knows where to draw the line there.  I thought primitives would be a slam dunk.

On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <[hidden email][hidden email]> wrote:
To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 

I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 

Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...


On Fri, Jan 4, 2019 at 10:52 PM David Smiley <[hidden email]> wrote:
On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <[hidden email]> wrote:
Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

Jan Høydahl / Cominvent
I'd really like to see these implicit types.
Whether they are defined in code, in a implicit-types.xml in webapp is just implementation. Also, a <primitiveFieldTypes> would just be necessary if there is ever a need to take more explicit control, but if the right defaults are established, I see only positive effects from shipping with implicit int, long, date, bool, float, double ++ Perhaps you can sum up your final suggestion and if you don't get any vetos then go ahead :)

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

7. jan. 2019 kl. 14:40 skrev David Smiley <[hidden email]>:

Hmmm.  My opinion is neutral on a <primitiveFieldTypes>.  It would have more implementation & documentation complexity to it IMO than an implicit primitive type as I've been pushing.  But still; it's alright.

Since I can't seem to convince anyone on the merits of implicit field types, I will back out this part of SOLR-12768.  Instead I suppose I will add a new field type for that particular issue's need.

~ David

On Sat, Jan 5, 2019 at 5:29 PM Jan Høydahl <[hidden email]> wrote:
In some other thread or Jira that I cannot find now I proposed a new tag in schema to make this explicit. So instead of 50 tags defining all primitive types and dynamicFields, we could have one tag:

<primitiveFiledTypes enabled=«true» dynamicMappings=«true» lazy=«true»/>

This is just a draft idea. This would give a way to disable these implicit primitive types if they are made default on. A lazy mode could delay adding to scheme until first use if that saves any resources.

Jan

5. jan. 2019 kl. 21:29 skrev David Smiley <[hidden email]>:

You would see these types in the HTTP schema API, and thus you would also end up seeing it on the admin schema screen (which uses that API).
It would not be saved back to the XML file unless you're further manipulating your schema via the HTTP schema API (managed schema).  I ought to verify all this manually.  As I'm sure you already know, comments / formatting do not survive that round-trip.

I'm a convention over configuration believer, and thus I prefer CoC over explicitness/verbosity.  I suppose all CoC arguments could be shot down with generic statements of perceived maintenance/understandability benefits.  Shrug; yet surely there's a case for CoC in some cases?  Let me ask you this: why is it okay for databases to not have definitions of what primitives types are yet in Solr you would rather it be explicit always?  That analogy is the crux of it.  I'm not arguing for "text_general" or other text analyzed types to be implicits; who knows where to draw the line there.  I thought primitives would be a slam dunk.

On Sat, Jan 5, 2019 at 3:07 PM Gus Heck <[hidden email][hidden email]> wrote:
To my mind the only types (or fields) that should get built-in are the ones that would break solr if they were changed. Anything else should show up in the config file. Your _nest_path_ probably falls into the "it would break solr if it changed" category. 

I notice in your initial post you say "So if you were to read the schema then you'd see it." if that implies that there would be a way to fetch the final_efective_schema.xml file from the server via the admin ui that might make me feel better about this. Such a file should essentially be the schema.xml (or managed_schema.xml) with a "implicit generated types - do not edit" section. Comments etc should be preserved from the original, and possibly a provenance comment (which fields rely on the implicit addition so it's easy to spot an accidental usage of the implicit type) with each implicitly added type. 

Simplicity of code and code maintenance is of course excellent. Simplicity for the person trying to troubleshoot a system they've just been hired to fix/improve is also excellent. I'd prefer to SEE what's going on than have to remember what's going on modulo some version matrix in my head. Hard enough remembering which admin commands are available on version X...


On Fri, Jan 4, 2019 at 10:52 PM David Smiley <[hidden email]> wrote:
On Fri, Jan 4, 2019 at 12:51 PM Shawn Heisey <[hidden email]> wrote:
Looking at what came before, my preference would have been implicitly
defined default types -- things like int, string, etc, defined in code. 
The only problem with that comes at Solr upgrade time ... what if we
decide for a later version (even if it's limited to a major release)
that IntPointField shouldn't be the implicit class for "int"?  Someone
who upgrades an index using that implicit type to the new version will
find that Solr will no longer work.  Which makes the idea unworkable.

I addressed this earlier -- search for "luceneMatchVersion" which is key.

RE a file based system schema (what Alexandre suggested)... that sounds workable but a more complex idea that would take more code & documentation -- at least relative to the very simple idea of some built-ins in the code (my proposal).  See SOLR-12768.patch  changes to IndexSchema. 
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker


--
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker

Reply | Threaded
Open this post in threaded view
|

Re: Feature: Solr implicitly defined field types?

david.w.smiley@gmail.com
I'm glad to hear at least somebody other than me likes the idea :-)

I started some manual experimentation with it.  After I got past one little bug, sure enough it worked and would show up in the admin screen.  It showed up because of the /admin/luke handler interacting with IndexSchema, not as I said due to using the HTTP schema API since it doesn't use that.  Either way, that works.  And after some schema manipulation (performed easily via Solr's admin screen which has a form to add field types), I saw the schema get persisted which, as I expected, displayed the field type definition there.

But then I got to wonder if that's actually a good thing, and I'm now thinking probably not.  (We could have implicit types with or without this behavior.)  Why not?
  (A) This field type was serialized incorrectly; there were no analyzers when there should have been some.  This has little to do with implicit field types; it's due to assumptions in our schema / field type serialization that simply give up unless it sees a TokenizerChain subclass of Analyzer, whereas in my code I chose to use a CustomAnalzyer Lucene utility in-code.  I could "fix" this by using TokenizerChain instead or change the serialization code, but still, it ought to be tested since it's a sneaky bug (won't throw an error).  Or alternatively never persist implicit field types; though _that_ would need to be tested.
  (B) It can sometimes thwart future changes we may choose for a type's definition.  Since it shows up, it's somewhat locked in at the time the schema is manipulated with the schema API (with whatever the impl is considering Solr/luceneMatchVersion was at that time).  After that point, if the user were to keep the config, then delete all data, then update luceneMatchVersion in solrconfig, then index again, it would still have the same field type definition as it did prior because the field type is explicitly defined at this point.  This isn't a huge deal since apps deploy/publish their configuration in different ways, and most popular ways would be immune to this (to be affected, the app must manipulate the schema with the API).  Even apps that do manipulate the schema with the API might do major revision upgrades in a from-scratch way instead of using the same config in-place.  And it's a hypothetical scenario of a future point in time where we eventually change our mind on what some particular implicit field type ought to do.

To have implicit field types not persisted, the simplest impl would probably simply never save back such implicit field types into the IndexSchema's registry of field types, and thus it won't be iterated to be persisted.  'course it'd need a test.

~ David
--
Lucene/Solr Search Committer (PMC), Developer, Author, Speaker