solrj - Batching and Optimistic Concurrency

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

solrj - Batching and Optimistic Concurrency

lstusr 5u93n4
Hi All,

I have a scenario where I'm trying to enable batching on the solrj client,
but trying to see how that works with Optimistic Concurrency.

From what I can tell, if I pass a list of SolrInputDocument to my solr
client, and a document somewhere in that list contains a `_version_` field
that would cause the Optimistic Concurrency check to fail:
 - all documents in the list before the conflicting doc get saved correctly.
 - no documents in the list after the conflicting doc get saved.

What I would really like is to "send a list of documents to solr, set the
_version_ on all of these documents to -1 so that they don't save if they
already exist, and have solr save all of the "new" documents in the list".

So three questions related to this:

1) Is Optimistic Concurrency the best mechanism for this, or is there some
other "don't overwrite" flag I can set that would work better?

2) If Optimisic Concurrency is the right way to go, Is there a mode that I
can set that would allow ALL non-conflicting documents in a batch to be
saved?

3) If questions 1 or 2 are not possible, I could  trap the resulting
RouteException  with a 409 code and remove the offending document from the
list. But:
  a) can I safely remove ALL documents in the list before the offending
one, assuming they've been saved?
  b) is there a better way to get the ID of the offending document besides
parsing the 'Error from server at
http://my.solr.instance:8983/solr/test_shard1_replica_n1: version conflict
for doc2` string from the exception?

Thanks!

Kyle
Reply | Threaded
Open this post in threaded view
|

Re: solrj - Batching and Optimistic Concurrency

Erick Erickson
You can add, say, a ScriptUpdateProcessor that checks this for you
pretty easily.

Have you looked at the Overwrite=false option (assuming you're not
assigning _version_ yourself)?

Best,
Erick
On Mon, Dec 3, 2018 at 11:57 AM lstusr 5u93n4 <[hidden email]> wrote:

>
> Hi All,
>
> I have a scenario where I'm trying to enable batching on the solrj client,
> but trying to see how that works with Optimistic Concurrency.
>
> From what I can tell, if I pass a list of SolrInputDocument to my solr
> client, and a document somewhere in that list contains a `_version_` field
> that would cause the Optimistic Concurrency check to fail:
>  - all documents in the list before the conflicting doc get saved correctly.
>  - no documents in the list after the conflicting doc get saved.
>
> What I would really like is to "send a list of documents to solr, set the
> _version_ on all of these documents to -1 so that they don't save if they
> already exist, and have solr save all of the "new" documents in the list".
>
> So three questions related to this:
>
> 1) Is Optimistic Concurrency the best mechanism for this, or is there some
> other "don't overwrite" flag I can set that would work better?
>
> 2) If Optimisic Concurrency is the right way to go, Is there a mode that I
> can set that would allow ALL non-conflicting documents in a batch to be
> saved?
>
> 3) If questions 1 or 2 are not possible, I could  trap the resulting
> RouteException  with a 409 code and remove the offending document from the
> list. But:
>   a) can I safely remove ALL documents in the list before the offending
> one, assuming they've been saved?
>   b) is there a better way to get the ID of the offending document besides
> parsing the 'Error from server at
> http://my.solr.instance:8983/solr/test_shard1_replica_n1: version conflict
> for doc2` string from the exception?
>
> Thanks!
>
> Kyle
Reply | Threaded
Open this post in threaded view
|

Re: solrj - Batching and Optimistic Concurrency

Erick Erickson
And I forgot to mention TolerantUpdateProcessor, might be another approach.

On Mon, Dec 3, 2018 at 12:57 PM Erick Erickson <[hidden email]> wrote:

>
> You can add, say, a ScriptUpdateProcessor that checks this for you
> pretty easily.
>
> Have you looked at the Overwrite=false option (assuming you're not
> assigning _version_ yourself)?
>
> Best,
> Erick
> On Mon, Dec 3, 2018 at 11:57 AM lstusr 5u93n4 <[hidden email]> wrote:
> >
> > Hi All,
> >
> > I have a scenario where I'm trying to enable batching on the solrj client,
> > but trying to see how that works with Optimistic Concurrency.
> >
> > From what I can tell, if I pass a list of SolrInputDocument to my solr
> > client, and a document somewhere in that list contains a `_version_` field
> > that would cause the Optimistic Concurrency check to fail:
> >  - all documents in the list before the conflicting doc get saved correctly.
> >  - no documents in the list after the conflicting doc get saved.
> >
> > What I would really like is to "send a list of documents to solr, set the
> > _version_ on all of these documents to -1 so that they don't save if they
> > already exist, and have solr save all of the "new" documents in the list".
> >
> > So three questions related to this:
> >
> > 1) Is Optimistic Concurrency the best mechanism for this, or is there some
> > other "don't overwrite" flag I can set that would work better?
> >
> > 2) If Optimisic Concurrency is the right way to go, Is there a mode that I
> > can set that would allow ALL non-conflicting documents in a batch to be
> > saved?
> >
> > 3) If questions 1 or 2 are not possible, I could  trap the resulting
> > RouteException  with a 409 code and remove the offending document from the
> > list. But:
> >   a) can I safely remove ALL documents in the list before the offending
> > one, assuming they've been saved?
> >   b) is there a better way to get the ID of the offending document besides
> > parsing the 'Error from server at
> > http://my.solr.instance:8983/solr/test_shard1_replica_n1: version conflict
> > for doc2` string from the exception?
> >
> > Thanks!
> >
> > Kyle
Reply | Threaded
Open this post in threaded view
|

Re: solrj - Batching and Optimistic Concurrency

lstusr 5u93n4
Hi Erick,

Looks like TolerantUpdateProcessor is exactly what I need. Thanks!

Kyle.

P.S. I can find the doc for TolerantUpdateProcessorFactory here:
http://lucene.apache.org/solr/7_5_0/solr-core/org/apache/solr/update/processor/TolerantUpdateProcessor.html
, but it seems to be missing from the guide at
https://lucene.apache.org/solr/guide/7_5/update-request-processors.html .
Not sure if that's something the solr maintainers want to add, just thought
I'd point it out for future searchers following this thread.

On Mon, 3 Dec 2018 at 16:05, Erick Erickson <[hidden email]> wrote:

> And I forgot to mention TolerantUpdateProcessor, might be another approach.
>
> On Mon, Dec 3, 2018 at 12:57 PM Erick Erickson <[hidden email]>
> wrote:
> >
> > You can add, say, a ScriptUpdateProcessor that checks this for you
> > pretty easily.
> >
> > Have you looked at the Overwrite=false option (assuming you're not
> > assigning _version_ yourself)?
> >
> > Best,
> > Erick
> > On Mon, Dec 3, 2018 at 11:57 AM lstusr 5u93n4 <[hidden email]>
> wrote:
> > >
> > > Hi All,
> > >
> > > I have a scenario where I'm trying to enable batching on the solrj
> client,
> > > but trying to see how that works with Optimistic Concurrency.
> > >
> > > From what I can tell, if I pass a list of SolrInputDocument to my solr
> > > client, and a document somewhere in that list contains a `_version_`
> field
> > > that would cause the Optimistic Concurrency check to fail:
> > >  - all documents in the list before the conflicting doc get saved
> correctly.
> > >  - no documents in the list after the conflicting doc get saved.
> > >
> > > What I would really like is to "send a list of documents to solr, set
> the
> > > _version_ on all of these documents to -1 so that they don't save if
> they
> > > already exist, and have solr save all of the "new" documents in the
> list".
> > >
> > > So three questions related to this:
> > >
> > > 1) Is Optimistic Concurrency the best mechanism for this, or is there
> some
> > > other "don't overwrite" flag I can set that would work better?
> > >
> > > 2) If Optimisic Concurrency is the right way to go, Is there a mode
> that I
> > > can set that would allow ALL non-conflicting documents in a batch to be
> > > saved?
> > >
> > > 3) If questions 1 or 2 are not possible, I could  trap the resulting
> > > RouteException  with a 409 code and remove the offending document from
> the
> > > list. But:
> > >   a) can I safely remove ALL documents in the list before the offending
> > > one, assuming they've been saved?
> > >   b) is there a better way to get the ID of the offending document
> besides
> > > parsing the 'Error from server at
> > > http://my.solr.instance:8983/solr/test_shard1_replica_n1: version
> conflict
> > > for doc2` string from the exception?
> > >
> > > Thanks!
> > >
> > > Kyle
>