In-place update vs Atomic updates

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

In-place update vs Atomic updates

kshitij tyagi
Hi,

What are the major differences between atomic and in-place updates, I have
gone through the documentation but it does not give detail internal
information.

1. Does doing in-place update prevents solr cache burst or not, what are
the benefits of using in-place updates?

I want to update one of the fields of the documnet but I do not want to
burst my cache.

What is the best approach to achieve the same.

Thanks,
Kshitij
Reply | Threaded
Open this post in threaded view
|

Re: In-place update vs Atomic updates

Shawn Heisey-2
On 1/8/2018 4:05 AM, kshitij tyagi wrote:
> What are the major differences between atomic and in-place updates, I have
> gone through the documentation but it does not give detail internal
> information.

Atomic updates are nearly identical to simple indexing, except that the
existing document is read from the index to populate a new document
along with whatever updates were requested, then the new document is
indexed and the old one is deleted.

> 1. Does doing in-place update prevents solr cache burst or not, what are
> the benefits of using in-place updates?

In-place updates are only possible on a field where only docValues is
true.  The settings for things like indexed and stored must be false.

An in-place update finds the segment containing the document and writes
a whole new file containing the value of every document in the segment
for the updated field.  If the segment contains ten million documents,
then information for ten million values will be written for a single
document update.

> I want to update one of the fields of the documnet but I do not want to
> burst my cache.

When the index changes for ANY reason, no matter how the change is
accomplished, caches must be thrown away when a new searcher is built.
Lucene and Solr have no way of knowing that a change doesn't affect some
cache entries, so the only thing it can do is assume that all the
information in the cache is now invalid.  What you are asking for here
is not possible at the moment, and chances are that if code was written
to do it, that it would be far slower than simply invalidating the
caches and doing autowarming.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: In-place update vs Atomic updates

kshitij tyagi
Hi Shawn,

Thanks for the information,

1. Does in place updates opens a new searcher by itself or not?
2. As the entire segment is rewriten, it means that frequent in place
updates are expensive as each in place update will rewrite the entire
segment again? Correct me here if my understanding is not correct.

Thanks,
Kshitij

On Mon, Jan 8, 2018 at 9:19 PM, Shawn Heisey <[hidden email]> wrote:

> On 1/8/2018 4:05 AM, kshitij tyagi wrote:
>
>> What are the major differences between atomic and in-place updates, I have
>> gone through the documentation but it does not give detail internal
>> information.
>>
>
> Atomic updates are nearly identical to simple indexing, except that the
> existing document is read from the index to populate a new document along
> with whatever updates were requested, then the new document is indexed and
> the old one is deleted.
>
> 1. Does doing in-place update prevents solr cache burst or not, what are
>> the benefits of using in-place updates?
>>
>
> In-place updates are only possible on a field where only docValues is
> true.  The settings for things like indexed and stored must be false.
>
> An in-place update finds the segment containing the document and writes a
> whole new file containing the value of every document in the segment for
> the updated field.  If the segment contains ten million documents, then
> information for ten million values will be written for a single document
> update.
>
> I want to update one of the fields of the documnet but I do not want to
>> burst my cache.
>>
>
> When the index changes for ANY reason, no matter how the change is
> accomplished, caches must be thrown away when a new searcher is built.
> Lucene and Solr have no way of knowing that a change doesn't affect some
> cache entries, so the only thing it can do is assume that all the
> information in the cache is now invalid.  What you are asking for here is
> not possible at the moment, and chances are that if code was written to do
> it, that it would be far slower than simply invalidating the caches and
> doing autowarming.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: In-place update vs Atomic updates

Shawn Heisey-2
On 1/8/2018 10:17 PM, kshitij tyagi wrote:
> 1. Does in place updates opens a new searcher by itself or not?
> 2. As the entire segment is rewriten, it means that frequent in place
> updates are expensive as each in place update will rewrite the entire
> segment again? Correct me here if my understanding is not correct.

Opening a new searcher is not related to the update.  It's something
that happens at commit time, if the commit has openSearcher=true (which
is the default setting).

In-place updates don't rewrite the entire segment, they only rewrite
part of the docValues information for the segment -- only the portion
for the fields that got updated.  The information is written into a new
file, and the original file is untouched.

If there are multiple fields with docValues and not all of them are
updated, then it would not be possible to delete the old file until the
segment gets merged.  I am not sure about what happens if *every* field
with docValues is eligible for in-place updates and all of them get
updated.  If that were the case, then it would be possible to have an
optimization that removes the old docValues file, but I have no idea
whether Lucene actually has that as an optimization.  I would not expect
most indexes to be eligible for the optimization even if Lucene can do it.

Yes, frequent in-place updates can be expensive, and can make the index
larger, because the values in the updated field for every document in
the segment will be written to a new file.  If you never optimize the
index and mostly update recently added documents, then the segments
involved will probably be small, and performance would be pretty good.

Thanks,
Shawn