[jira] Created: (SOLR-265) Make IndexSchema updateable in live system

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (SOLR-265) Make IndexSchema updateable in live system

JIRA jira@apache.org
Make IndexSchema updateable in live system
------------------------------------------

                 Key: SOLR-265
                 URL: https://issues.apache.org/jira/browse/SOLR-265
             Project: Solr
          Issue Type: Improvement
          Components: update
    Affects Versions: 1.3
            Reporter: Will Johnson
            Priority: Minor
             Fix For: 1.3


I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated.  while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method.  

The patch as it exists is a straw man for discussion.  The one thing that concerned me was making IndexScheam schema non-final in SolrCore.  I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation.  Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea.

I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns.

Thoughts?

- will  


--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (SOLR-265) Make IndexSchema updateable in live system

JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Will Johnson updated SOLR-265:
------------------------------

    Attachment: IndexSchemaReload.patch

updates to:

* solconfig.xml to register handler
* IndexSchema to add reload() method that clears() all internal data structures and calls readconfig()
* a new o.a.s.handler.admin.IndexSchemaRequestHandler to trigger the updating



> Make IndexSchema updateable in live system
> ------------------------------------------
>
>                 Key: SOLR-265
>                 URL: https://issues.apache.org/jira/browse/SOLR-265
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated.  while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that concerned me was making IndexScheam schema non-final in SolrCore.  I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation.  Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505927 ]

Yonik Seeley commented on SOLR-265:
-----------------------------------

Doesn't this have thread-saftey and unsolvable consistency issues?
It doesn't seem like a specific instance of IndexSchema should change in the middle of a request.

Perhaps it's better to create a new IndexSchema, and keep track of the current schema in the SolrCore (a dependency that wasn't there before, but someone needs to keep track of the current schema if it can change).


> Make IndexSchema updateable in live system
> ------------------------------------------
>
>                 Key: SOLR-265
>                 URL: https://issues.apache.org/jira/browse/SOLR-265
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated.  while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that concerned me was making IndexScheam schema non-final in SolrCore.  I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation.  Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505938 ]

Will Johnson commented on SOLR-265:
-----------------------------------

After doing some more thinking about the issue after I submitted the
patch I agree that there probably are some threading issues to work out.
I was working on another approach that was much larger (only keep 1 copy
in SolrCore accessible by getSchema() and add protection there) but that
required a much larger code change than the one posted so I went with
the shorter to at least promote discussion.  If the single schema
getter() makes sense, I'll be happy to provide such a patch.

There do seem to be other alternatives though:

First is a ModifySchema handler that could support adding fields etc
which should be easier to defend against from a synchronization
standpoint. At least there are fewer times when fields.clear() has been
called but new values have not been added back.  As this is all I care
about at the moment I'd be happy, but I assume someone might want to do
something else more complex.

The second is to wrap up the clear/repopulate methods with some basic
protection but actually allow different schemas inside a single request.
This could be done by requiring all new schemas to be 'compatible' in
some defined way.  Since there doesn't seem to be any validation that
goes on if I stop the app, change the schema and then restart it,
compatible might just mean valid xml.  If field 'new_x' suddenly appears
during the middle of my post it shouldn't have any effect as my posted
data won't contain 'new_x.'  from a client's contractual perspective, if
you want new fields processed correctly you have to wait for
updateSchema to finish.

In any case, it seems to me that restarting a webapp and suffering
downtime is a heavy price to pay just to add a new field or even to just
change an existing field property.

- will






> Make IndexSchema updateable in live system
> ------------------------------------------
>
>                 Key: SOLR-265
>                 URL: https://issues.apache.org/jira/browse/SOLR-265
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated.  while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that concerned me was making IndexScheam schema non-final in SolrCore.  I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation.  Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12505950 ]

Ryan McKinley commented on SOLR-265:
------------------------------------

I haven't looked at the patch, but have a couple questions:
* What is the motivation/use case for editing the schema at runtime?  (I'm not suggesting there aren't good ones, just curious)
* Would changes be saved?
* Why not dynamic fields?

> it seems to me that restarting a webapp and suffering
> downtime is a heavy price to pay just to add a new field or even to just
> change an existing field property.

*adding* fields should be relatively straightforward -- the more I learn about lucene indexing(indexes), it seems like most schema *changes* require the index to be rebuilt anyway.

> Make IndexSchema updateable in live system
> ------------------------------------------
>
>                 Key: SOLR-265
>                 URL: https://issues.apache.org/jira/browse/SOLR-265
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated.  while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that concerned me was making IndexScheam schema non-final in SolrCore.  I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation.  Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

Will Johnson
>I haven't looked at the patch, but have a couple questions:
>* What is the motivation/use case for editing the schema at runtime?  (I'm not suggesting there aren't good ones, just curious)

to add new fields on the fly without having any search downtime

>* Would changes be saved?

the patch as is just re-reads the schema from the location it's originally set from.  all changes are 'saved'

> * Why not dynamic fields?

becuase the field names start to get too complex.  for example you could model the id field in the default schema as a dynamic field:

<field name="id" type="string" indexed="true" stored="true" required="true" />

becomes:

*_str_it_st_rt_mvf

working that out for all possible combinations seems a bit onerous.  the default dynamic fields cover most cases but i'm sure my product managers will want one that i don't have the day after we go live.  also, if i have extra info about a field like the fact that i don't want it stored i shoudl be able to take advantage of that without having to bounce anything.


> it seems to me that restarting a webapp and suffering
> downtime is a heavy price to pay just to add a new field or even to just
> change an existing field property.

>*adding* fields should be relatively straightforward -- the more I learn about lucene indexing(indexes), it seems like most >schema *changes* require the index to be rebuilt anyway.

correct, i'm fine if we want to restrict the schema 'changes' to only allow the addition of new fields but the index schema also reflects things like default query parsing options and copy fields which shouldn't require and index changes at all which why i went for a more loose approach to start.

- will

 

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

Chris Hostetter-3

: >* What is the motivation/use case for editing the schema at runtime?
: (I'm not suggesting there aren't good ones, just curious)
:
: to add new fields on the fly without having any search downtime

i haven't read anything in the jira issue this refrences, but in instances
where reliability and uptime are of high concern, you'll typically have a
master/multi-slave setup with the slaves sitting behind a load balancer
-- in that configuration, you can deploy any change to your schema by:

1) stop rsyncd on the master
2) stop solr on master, change schema.xml on master, start solr on master
3) rebuild index from scratch on master (if needed)
4) disable snappuller on all slaves (can overlap step#3)
5) start rsyncd on master
6) take half your slaves out of your load balancers rotation
7) enable snappuller on the slaves out of rotation
8) do step#2 for the slaves out of rotation
9) swap which half of the slaves are in rotation
10) repeat steps 7 and 8

this process results in 0 downtime for any schema.xml change, regardless
whether the changes require rebuilding your index.


: things like default query parsing options and copy fields which
: shouldn't require and index changes at all which why i went for a more
: loose approach to start.

if you change/add a copyField declaration, you'll need to reindex ...
copyField is evaluated when a document is being indexed.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: [jira] Commented: (SOLR-265) Make IndexSchema updateable in live system

Will Johnson
>i haven't read anything in the jira issue this refrences, but in
instances
>where reliability and uptime are of high concern, you'll typically have
a
>master/multi-slave setup with the slaves sitting behind a load balancer
>-- in that configuration, you can deploy any change to your schema by:

>this process results in 0 downtime for any schema.xml change,
regardless
>whether the changes require rebuilding your index.

True, but that implies indexing downtime which is also bad.  Also, the
master/slave setups kill indexing latency which is my primary concern
and the reason I went with solr to begin with.  also while you're
suggested steps work they're a bit heavy on the operations side compared
to a client's ability to add a field by hitting a url.  

>if you change/add a copyField declaration, you'll need to reindex ...
>copyField is evaluated when a document is being indexed.

True, but not if you haven't fed any data into that copy field yet.  Ie
'from now on' I want all data from field x copied into field y.  

- will

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (SOLR-265) Make IndexSchema updateable in live system

JIRA jira@apache.org
In reply to this post by JIRA jira@apache.org

     [ https://issues.apache.org/jira/browse/SOLR-265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryan McKinley resolved SOLR-265.
--------------------------------

    Resolution: Fixed

SOLR-350 lets you "RELOAD" a core.  this will reload the IndexSchema also

> Make IndexSchema updateable in live system
> ------------------------------------------
>
>                 Key: SOLR-265
>                 URL: https://issues.apache.org/jira/browse/SOLR-265
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>    Affects Versions: 1.3
>            Reporter: Will Johnson
>            Priority: Minor
>             Fix For: 1.3
>
>         Attachments: IndexSchemaReload.patch
>
>
> I've seen a few items on the mailing lists recently surrounding updating a schema on the file system and not seeing the changes get propagated.  while I think that automatically detecting schema changes on disk may be unrealistic, I do think it would be useful to be able to update the schema without having to bounce the webapp.  the forthcoming patch adds a method to SolrCore to do just that as well as a request handler to be able to call said method.  
> The patch as it exists is a straw man for discussion.  The one thing that concerned me was making IndexScheam schema non-final in SolrCore.  I'm not quite sure why it needs to be final to begin with so perhaps someone can shed some light on the situation.  Also, I think it would be useful to be able to upload a schema through the admin GUI, have it persisted to disk and then call relaodSchema()but that seemed like a good bit of effort for a patch that might not even be a good idea.
> I'd also point out that this specific problem is one I've been trying to address recently and while I think it could be solved with various dynamic fields the combination of all the options for fields seemed to create too many variables to make useful dynamic name patterns.
> Thoughts?
> - will  

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.