DovValues and in-place udpates

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

DovValues and in-place udpates

Brian Yee-2
I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.

Consider this scenario:
InStock = 1 for a product.
InStock changes to 0 which triggers a fast in-place update with docValues.
But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
The slow update finishes and overwrites InStock=0 which is incorrect.

How can we deal with this situation?
Reply | Threaded
Open this post in threaded view
|

Re: DovValues and in-place udpates

Erick Erickson
"But it also triggers a slow update that will rebuild the entire document..."

Why do you think this? The whole _point_ of in-place updates is that
they don't have to re-index the whole document.... And the only way to
do that effectively would be if all the fields are stored, which is
not a requirement for in-place updates.

Best,
Erick

On Mon, Feb 12, 2018 at 8:02 AM, Brian Yee <[hidden email]> wrote:

> I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
>
> Consider this scenario:
> InStock = 1 for a product.
> InStock changes to 0 which triggers a fast in-place update with docValues.
> But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
> During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
> The slow update finishes and overwrites InStock=0 which is incorrect.
>
> How can we deal with this situation?
Reply | Threaded
Open this post in threaded view
|

RE: DovValues and in-place udpates

Brian Yee-2
True, I could remove the trigger to rebuild the entire document. But what if a different field changes and the whole document is triggered for update for a different field. We have the same problem.

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Monday, February 12, 2018 11:17 AM
To: solr-user <[hidden email]>
Subject: Re: DovValues and in-place udpates

"But it also triggers a slow update that will rebuild the entire document..."

Why do you think this? The whole _point_ of in-place updates is that they don't have to re-index the whole document.... And the only way to do that effectively would be if all the fields are stored, which is not a requirement for in-place updates.

Best,
Erick

On Mon, Feb 12, 2018 at 8:02 AM, Brian Yee <[hidden email]> wrote:

> I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
>
> Consider this scenario:
> InStock = 1 for a product.
> InStock changes to 0 which triggers a fast in-place update with docValues.
> But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
> During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
> The slow update finishes and overwrites InStock=0 which is incorrect.
>
> How can we deal with this situation?
Reply | Threaded
Open this post in threaded view
|

Re: DovValues and in-place udpates

Charlie Hull-3
In reply to this post by Brian Yee-2
On 12/02/2018 16:02, Brian Yee wrote:

> I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
>
> Consider this scenario:
> InStock = 1 for a product.
> InStock changes to 0 which triggers a fast in-place update with docValues.
> But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
> During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
> The slow update finishes and overwrites InStock=0 which is incorrect.
>
> How can we deal with this situation?
>
It's a slightly crazy idea, but in the past we've solved a similar
problem by building a custom Lucene codec that is backed by a Redis
database. You change the stock value in Redis and Lucene doesn't
actually notice and re-index.
http://www.flax.co.uk/blog/2012/06/22/updating-individual-fields-in-lucene-with-a-redis-backed-codec/

Not sure if this is a better way than DocValues, it was quite a while
ago and Lucene has moved on a bit since then....

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk
Reply | Threaded
Open this post in threaded view
|

RE: DovValues and in-place udpates

Chris Hostetter-3
In reply to this post by Brian Yee-2

: True, I could remove the trigger to rebuild the entire document. But
: what if a different field changes and the whole document is triggered
: for update for a different field. We have the same problem.

at a high level, your concern is really compleltey orthoginal to the
question of in-place updates, it's a broader question of having 2 diff    
systems that might want to modify the same document in solr, but one
system is "slower" then the other (because it has to fetch more external  
data or only operates in batches, etc...)

This is where things like optimistic concurrency are really powerful.

When you trigger your "slow" updates (or any updates for that matter),
keep track of the current (aka "expected") _version_ field of the solr
document when your updater starts processing -- and pass that in along      
with the new update -- solr will reject an update if the specified
_version_ doesn't match what's in the index.

https://lucene.apache.org/solr/guide/updating-parts-of-documents.html#optimistic-concurrency

So imagine the current instock=1 version of your product is 42, and you
start a "slow" update to change the "name" field ... while that's in
progress a "fast" update sets instock=0 and now you have a new
_version_=666.  When the "slow" updater is done building up the entire
document, and sends it to solr along with the _version_=42 assumption,
solr will reject the update with a "Conflict (409)" HTTP Status, and your
slow update code can say "ok ... i must have stale data, let's try again"



:
: -----Original Message-----
: From: Erick Erickson [mailto:[hidden email]]
: Sent: Monday, February 12, 2018 11:17 AM
: To: solr-user <[hidden email]>
: Subject: Re: DovValues and in-place udpates
:
: "But it also triggers a slow update that will rebuild the entire document..."
:
: Why do you think this? The whole _point_ of in-place updates is that they don't have to re-index the whole document.... And the only way to do that effectively would be if all the fields are stored, which is not a requirement for in-place updates.
:
: Best,
: Erick
:
: On Mon, Feb 12, 2018 at 8:02 AM, Brian Yee <[hidden email]> wrote:
: > I asked a question here about fast inventory updates last week and I was recommended to use docValues with partial in-place updates. I think this will work well, but there is a problem I can't think of a good solution for.
: >
: > Consider this scenario:
: > InStock = 1 for a product.
: > InStock changes to 0 which triggers a fast in-place update with docValues.
: > But it also triggers a slow update that will rebuild the entire document. Let's say that takes 10 minutes because we do updates in batches.
: > During that 5 minutes, InStock changes again to 1 which triggers a fast update to solr. So in Solr InStock=1 which is correct.
: > The slow update finishes and overwrites InStock=0 which is incorrect.
: >
: > How can we deal with this situation?
:

-Hoss
http://www.lucidworks.com/