Atomic solrj update

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Atomic solrj update

Prem
I am trying to partially update of 50M data in a collection from CSV using
Atomic script(solrj).But it is taking 2 hrs for 1M records.is there anyway i
can speed up my update.
Using HTTPClient to establish connection and also i am validating whether
the particular document is available in collection or not and after that
updating the document.




--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Atomic solrj update

Jörn Franke
One needs to see the code or get more insights on your design. Do you reuse the HTTPClient or do you create for every request a new one?
How often do you commit?
Do you do parallel updates from the client (multiple threads?).

> Am 13.12.2019 um 06:56 schrieb Prem <[hidden email]>:
>
> I am trying to partially update of 50M data in a collection from CSV using
> Atomic script(solrj).But it is taking 2 hrs for 1M records.is there anyway i
> can speed up my update.
> Using HTTPClient to establish connection and also i am validating whether
> the particular document is available in collection or not and after that
> updating the document.
>
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Atomic solrj update

Shawn Heisey-2
In reply to this post by Prem
On 12/12/2019 10:00 PM, Prem wrote:
> I am trying to partially update of 50M data in a collection from CSV using
> Atomic script(solrj).But it is taking 2 hrs for 1M records.is there anyway i
> can speed up my update.

How many documents are you sending in one request?

> Using HTTPClient to establish connection and also i am validating whether
> the particular document is available in collection or not and after that
> updating the document.

I thought you were using SolrJ ... but here you say you're using HTTPClient.

Can you share your code?  What Solr server version? If you're using
SolrJ, what version of that?

If your program checks whether every single document already exists
before sending an update, that is going to be quite slow.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Atomic solrj update

Paras Lehana
Hi Prem,

Using HTTPClient to establish connection and also i am *validating* whether
> the particular document is *available* in collection or not and after that
> updating the document.


 Why do you need to validate the particular document before updating.
Atomic updates either update the document if it's already available or
create the document if it's not. I guess you don't want to create the
document if it doesn't exist, right?



On Fri, 13 Dec 2019 at 11:42, Shawn Heisey <[hidden email]> wrote:

> On 12/12/2019 10:00 PM, Prem wrote:
> > I am trying to partially update of 50M data in a collection from CSV
> using
> > Atomic script(solrj).But it is taking 2 hrs for 1M records.is there
> anyway i
> > can speed up my update.
>
> How many documents are you sending in one request?
>
> > Using HTTPClient to establish connection and also i am validating whether
> > the particular document is available in collection or not and after that
> > updating the document.
>
> I thought you were using SolrJ ... but here you say you're using
> HTTPClient.
>
> Can you share your code?  What Solr server version? If you're using
> SolrJ, what version of that?
>
> If your program checks whether every single document already exists
> before sending an update, that is going to be quite slow.
>
> Thanks,
> Shawn
>


--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>