need for re-indexing when using managed schema

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

need for re-indexing when using managed schema

Joseph Lorenzini
Hi all,

I have question about the managed schema functionality.  According to the
docs, "All changes to a collection’s schema require reindexing". This would
imply that if you use a managed schema and you use the schema API to update
the schema, then doing a full re-index is necessary each time.

Is this accurate or can a full re-index be avoided?

Thanks,
Joe
Reply | Threaded
Open this post in threaded view
|

Re: need for re-indexing when using managed schema

Erick Erickson
That’s a little overstated, a full explanation of what’s safe and what’s not is several pages and depends on what you mean by “safe”.

Any modification to a schema, even if they don’t cause something to outright break, may leave the index in an inconsistent state. For instance, remember that Lucene and Solr really don’t care if doc1 doesn’t have a particular field X and doc2 does. If you do something as “safe” as add a new field, only documents indexed after that change will have the field. Your index will continue to function with no errors in that case, but any searches on the new field won’t return any docs indexed before the change until the older docs are re-indexed.

So you can see where this is going. “If you add a field _and then reindex all your documents_, it’s perfectly safe. However, between the time you add the field and the re-indexing is complete, you results may be inconsistent.

On the other hand,  if you change, say, a DocValues field from multValued="true" to multiValued=“false” the results are undefined _even if you reindex all your docs_.

On the other, other hand, if you delete a field, the meta-data is still in your index, the only way to get rid of it is to delete your index and re-index or index to a new collection and searches may return docs on the deleted field if it was created with a dynamic field definition that’s still in the schema”.

On the other, other, other hand… the list goes on and on.

So since even something as non-breaking as adding a new field requires you to re-index all your older docs anyway to get back to a consistent state, so it’s just easiest to plan on re-indexing all your docs whenever you change the schema. And, I’d also advise, index to a new collection…

Best,
Erick

> On Dec 16, 2019, at 12:57 PM, Joseph Lorenzini <[hidden email]> wrote:
>
> Hi all,
>
> I have question about the managed schema functionality.  According to the
> docs, "All changes to a collection’s schema require reindexing". This would
> imply that if you use a managed schema and you use the schema API to update
> the schema, then doing a full re-index is necessary each time.
>
> Is this accurate or can a full re-index be avoided?
>
> Thanks,
> Joe