[jira] [Commented] (SOLR-13131) Category Routed Aliases

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (SOLR-13131) Category Routed Aliases

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-13131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739731#comment-16739731 ]

Gus Heck commented on SOLR-13131:

h1. Functionality
h2. New Parameter Value

*router.name* would gain a new valid value of "category"
h2. New Params

This feature would need some safety valves on it to avoid collection creation (similar in spirit to router.maxFutureMs for TRAs). To that end I suggest:
 # *router.maxCardinality* to place a limit on the total number of collections that can be created (maybe required?)
 # *router.mustMatch* to provide pattern matching for valid data and reject requests that would create an undesired collection (optional)
 # {color:#707070}*router.dictionary*{color}  might also be added to provide a set of acceptable values (optional) - This may or may not be implemented as part of this ticket.

With respect to router.dictionary, I could imagine there being a desire to have that dictionary used as a spell checker for segments of the values. One could break the value on _ (or something else) and make sure all the parts are spelled properly. One could also imagine the dictionary being applied to specific matching groups from router.mustMatch, but all of this dictionary based checking would be a future enhancement. I'm mentioning it here to get the idea out there for future reference.
h2. Routed Field Constraints

The data in the field to be routed will need to be constrained in a couple ways to make this work
 # The routed field would need to be single valued, and encountering multiple values should throw an error.
 # The value in the routed field must be convertible to a valid collection name. This conversion will likely be done by replacing any invalid characters with '_' and it is the user's responsibility to ensure that the resulting names are unique and do not interfere with other collections in the system. Values that resolve to an existing collection that is not part of the alias will cause an error to be returned, the existing collection will remain unaffected and will not become added to the alias.

h2. Validations

In addition to constraints on the values, the following validations will be enforced at the time the CategoryRoutedAlias is created
 # The *collections* attribute is not set (applies only to non-routed aliases)
 # None of the TimeRoutedAlias attributes are present
 # TimeRoutedAliases will also be modified to validate that *router.maxCardinality* and *router.mustMatch* are not set

h1. Implementation

The intention here is to first convert TimeRoutedAliasUpdateProcessor to RoutedAliasUpdateProcessor and move as much time related functionality to TimeRoutedAlias class as possible. If necessary TimeRoutedAliasUpdateProcessor may still remain as a (hopefully skinny) subclass of RoutedUpdateProcessor. I also hope to extract a RoutedAlias interface from TimeRoutedAlias and that will implemented on a new CategoryRoutedAlias class. Ideally I'd like to end up with a RoutedAliasUpdateProcessor and two concrete RoutedAlias implementations, though I'm not sure if that will really be possible. I'll break things down and make individual tickets for sub parts after I play with the code a little.

Both v1 api and v2 api will be supported
h1. Documentation
 # The TimeRoutedAliases page will be converted to a RoutedAliases page with sections for TimeRoutedAliases and CategoryRoutedAliases
 # The CreateAliasCommand Documentation will be updated
 # The v2 api will return documentation for the new and modified attributes via that api.


> Category Routed Aliases
> -----------------------
>                 Key: SOLR-13131
>                 URL: https://issues.apache.org/jira/browse/SOLR-13131
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Gus Heck
>            Assignee: Gus Heck
>            Priority: Major
> This ticket is to add a second type of routed alias in addition to the current time routed aliases. The new type of alias will allow data driven creation of collections based on the values of a field and automated organization of these collections under an alias that allows the collections to also be searched as a whole.
> The use case in mind at present is an IOT device type segregation, but I could also see this leading to the ability to direct updates to tenant specific hardware (in cooperation with autoscaling). 
> This ticket also looks forward to (but does not include) the creation of a Dimensionally Routed Alias which would allow organizing time routed data also segregated by device
> Further design details to be added in comments.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]