Quantcast

Atomic updates to increase single field bulk updates?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Atomic updates to increase single field bulk updates?

Sebastian Riemer
Dear solr users,

when updating documents in bulk (i.e. 40.000 documents at once), and only changing the value of a single Boolean-Flag, I currently re-index all whole 40.000 objects. However, the process of obtaining all relevant information for each object from the database is one of relatively high cost.

I now wonder, if in this situation it would be a good idea to implement a single-field update routine using atomic updates? In that case, I could skip any necessary lookups in the relational database, since the only information would be the new value for that Boolean-Flag, and the list of those 40.000 document ids.

I am aware of the requirements to use atomic updates, but as I understood, those would not have a big impact on performance and only a slight increase in index size?

What is your opinion on that?

Thanks for your input, have a nice evening!

Sebastian

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Atomic updates to increase single field bulk updates?

Markus Jelsma-2
Hello Sebastian,

Except for the requirement to have all fields stored, there is from Solr/Lucene's point of view not much difference between indexing a partial update or a complete document. Under the hood a partial update is a complete object anyway. Using partial updates you gain a little bandwidth at the expense of additional stored fields.

If your backend is the bottleneck, it would probably be very beneficial for you to switch to atomic updates: decrease stress on your database and decrease reindexing time.

Regards,
Markus

-----Original message-----

> From:Sebastian Riemer <[hidden email]>
> Sent: Wednesday 15th February 2017 19:31
> To: [hidden email]
> Subject: Atomic updates to increase single field bulk updates?
>
> Dear solr users,
>
> when updating documents in bulk (i.e. 40.000 documents at once), and only changing the value of a single Boolean-Flag, I currently re-index all whole 40.000 objects. However, the process of obtaining all relevant information for each object from the database is one of relatively high cost.
>
> I now wonder, if in this situation it would be a good idea to implement a single-field update routine using atomic updates? In that case, I could skip any necessary lookups in the relational database, since the only information would be the new value for that Boolean-Flag, and the list of those 40.000 document ids.
>
> I am aware of the requirements to use atomic updates, but as I understood, those would not have a big impact on performance and only a slight increase in index size?
>
> What is your opinion on that?
>
> Thanks for your input, have a nice evening!
>
> Sebastian
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Atomic updates to increase single field bulk updates?

Chris Hostetter-3

: partial update or a complete document. Under the hood a partial update
: is a complete object anyway. Using partial updates you gain a little
: bandwidth at the expense of additional stored fields.

FWIW: once SOLR-5944 lands in a released version, that won't always be
true -- atomic updates on numeric fields that are docValues="true" and
nothing else (stored=false, indexed=false) will use updatable docvalues
under the covers and should be much more efficient then either reindexing
the entire document, or the default atomic update codepath of re-indexing
all fields from stored values.



-Hoss
http://www.lucidworks.com/
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Atomic updates to increase single field bulk updates?

Bram Van Dam
In reply to this post by Sebastian Riemer
> I am aware of the requirements to use atomic updates, but as I understood, those would not have a big impact on performance and only a slight increase in index size?

AFAIK there won't be a difference in index size between atomic updates
and full updates, as the end result is the same.

But you will probably see a performance increase because you'll only
have to send 40000 boolean flags instead of 40000 full documents.

Using atomic updates sounds like a good idea to me.

 - Bram

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Atomic updates to increase single field bulk updates?

Erick Erickson
Well, "it depends". The Atomic update has to first go out to disk and
decompress the original stored fields in 16K blocks,
then overlay the atomic update on the uncompressed doc, then re-index
the doc. 40K times in your example.

So yes, the stream going to Solr will be smaller if you do atomic
updates, but the processing on Solr will be heavier.

Plus, if you're not storing all the fields anyway, storing them just
for atomic up dates adds some load to the system as the index on disk
is bigger so merges take more I/O and the like.

However, you state that "the process of obtaining all relevant
information for each object from the database is one of relatively
high cost." so likely the extra work on Solr's part is worth it to
you.

Best,
Erick

On Fri, Feb 17, 2017 at 2:36 AM, Bram Van Dam <[hidden email]> wrote:

>> I am aware of the requirements to use atomic updates, but as I understood, those would not have a big impact on performance and only a slight increase in index size?
>
> AFAIK there won't be a difference in index size between atomic updates
> and full updates, as the end result is the same.
>
> But you will probably see a performance increase because you'll only
> have to send 40000 boolean flags instead of 40000 full documents.
>
> Using atomic updates sounds like a good idea to me.
>
>  - Bram
>
Loading...