Copying data

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Copying data

Jayadevan Maymala
Hi all,

I have a 3 node Solr cluster in production (with zoo keeper). In dev, I
have one node Solr instance, no zoo keeper. Which is the best way to copy
over the production solr data to dev?
Operating system is CentOS 7.7, Solr Version 7.3
Collection size is in the 40-50 GB range.

Regards,
Jayadevan
Reply | Threaded
Open this post in threaded view
|

Re: Copying data

Erick Erickson
It’s not at all clear what the problem is. If you have a single-shard collection, just
1> create the stand-alone core
2> shut down the Solr instance
3> replace the stand-alone core's data dir with one from any of your prod machines.
4> start Solr

An alternative is to use the replication API to replace the index on your stand-alone core with one from one of the prod machines, see: https://lucene.apache.org/solr/guide/7_7/index-replication.html. You have to specify the masterURL and shouldn’t need to do anything with the configuration.

But assuming you have 3 shards:

First, it’s easy enough to create a three-shard collection on your dev machine, either using embedded ZK or a separate ZK instance on the dev machine, so that’s one option. The advantage there is it’s the same environment. To do that, just create the 30shard replica

you can use the core admin API MERGEINDEXES command. What you’ll do is

1> create your core on your dev machine
2> copy one of the data dirs from one of the prod machines to the data dir of your new core.
3> copy the other two data dirs somewhere on the prod machine
4> use MERGEINDEXES, see: https://lucene.apache.org/solr/guide/7_4/coreadmin-api.html

Best,
Erick

> On Mar 16, 2020, at 12:32 AM, Jayadevan Maymala <[hidden email]> wrote:
>
> Hi all,
>
> I have a 3 node Solr cluster in production (with zoo keeper). In dev, I
> have one node Solr instance, no zoo keeper. Which is the best way to copy
> over the production solr data to dev?
> Operating system is CentOS 7.7, Solr Version 7.3
> Collection size is in the 40-50 GB range.
>
> Regards,
> Jayadevan

Reply | Threaded
Open this post in threaded view
|

Re: Copying data

Jayadevan Maymala
On Mon, Mar 16, 2020 at 5:53 PM Erick Erickson <[hidden email]>
wrote:

> It’s not at all clear what the problem is. If you have a single-shard
> collection, just
> 1> create the stand-alone core
> 2> shut down the Solr instance
> 3> replace the stand-alone core's data dir with one from any of your prod
> machines.
> 4> start Solr
>

I did not know that just copying the data directory will work. Will try
that. Thanks.