Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

kevinc
Hi all,

I'm sure I've done this before but this seems to be falling down a bit and I
was wondering if anyone had any helpful ideas.

I have a large index (51GB) that exists in a 4 node Solr Cloud instance. The
reprocessing for this takes a long time and so we normally reindex on a
secondary cluster and swap them out.

I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
cluster with 1 shard and replication factor of 3.

I want to copy over the index and have it replicate to the rest of the
cluster. I have taken a copy of the data directory from the reprocessed core
and copied it into the leader's data directory. This shows up correctly as
having a 51GB index and the documents are searchable.

I have tried the following curl commands to kick off replication:

curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
text/xml" --data-binary @test.xml
curl
http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E

I've tried this a few times and had a few different results:
The index gets set to 0 and has the single record I commit
A timed index gets created (index.201904082111232) and index.properties then
points to that
I had an issue with IndexWriter being closed
The index stays consistent and doesn't replicate
I've tried copying the index to both the leader and one other node to see if
that helps but I'm faced with similar results as above.

Does anyone have any advice to how I can get this index moved and replicated
onto this new cluster?

Thanks a lot!
Kevin.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

Erick Erickson
Here’s what I’d do:

1> Just spin up a _one_ node cluster and copy the index from your offline process and start Solr. I’l probably do this with Solr down.
2> Use the ADDREPLICA command to build out that cluster. The index copy associated with ADDREPLICA is robust. I’d wait until each replica showed green before adding the next one if you have any concerns about saturating your network, if you added the replicas all at once they you’ll have N simultaneous copies of the 50G index.

I’m not quite sure what’s happening in your situation, there are a lot of possibilities. The above should just avoid most all of the places where something could go wrong with your process.

Best,
Erick

> On Apr 8, 2019, at 7:59 AM, kevinc <[hidden email]> wrote:
>
> Hi all,
>
> I'm sure I've done this before but this seems to be falling down a bit and I
> was wondering if anyone had any helpful ideas.
>
> I have a large index (51GB) that exists in a 4 node Solr Cloud instance. The
> reprocessing for this takes a long time and so we normally reindex on a
> secondary cluster and swap them out.
>
> I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
> cluster with 1 shard and replication factor of 3.
>
> I want to copy over the index and have it replicate to the rest of the
> cluster. I have taken a copy of the data directory from the reprocessed core
> and copied it into the leader's data directory. This shows up correctly as
> having a 51GB index and the documents are searchable.
>
> I have tried the following curl commands to kick off replication:
>
> curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
> text/xml" --data-binary @test.xml
> curl
> http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E
>
> I've tried this a few times and had a few different results:
> The index gets set to 0 and has the single record I commit
> A timed index gets created (index.201904082111232) and index.properties then
> points to that
> I had an issue with IndexWriter being closed
> The index stays consistent and doesn't replicate
> I've tried copying the index to both the leader and one other node to see if
> that helps but I'm faced with similar results as above.
>
> Does anyone have any advice to how I can get this index moved and replicated
> onto this new cluster?
>
> Thanks a lot!
> Kevin.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

Shawn Heisey-2
In reply to this post by kevinc
On 4/8/2019 8:59 AM, kevinc wrote:

> I have reindexed to a single Solr 6.6.0 index and spun up a new 3 node Solr
> cluster with 1 shard and replication factor of 3.
>
> I want to copy over the index and have it replicate to the rest of the
> cluster. I have taken a copy of the data directory from the reprocessed core
> and copied it into the leader's data directory. This shows up correctly as
> having a 51GB index and the documents are searchable.
>
> I have tried the following curl commands to kick off replication:
>
> curl http://localhost:8983/solr/solrCollection1/update -H "Content-Type:
> text/xml" --data-binary @test.xml
> curl
> http://localhost:8983/solr/solrCollection1/update?stream.body=%3Ccommit/%3E

I think the following is probably what you're going to want to do in
order to transplant an existing index into a new cloud:

* Make sure you have a copy of the source index directory.
* Do not copy the tlog directory from the source.
* Create the collection in the target cloud.
* Shut down the target cloud completely.
* Delete all the index directories in the cloud.
* Copy the source index directory to one of the cloud nodes.
* Start that cloud node up.  Make sure it is all working.
* Start up the other nodes.

Once the other nodes are started, they will automatically notice that
they don't have an index directory and will copy the index from the leader.

These instructions assume a single shard in both the source and the
target.  If you are changing the number of shards, it will be a lot
easier to simply reindex into the new cloud.

Erick's message indicates another way you could go ... create the new
index with a single replica, get that working, and then use ADDREPLICA
(part of the Collections API) to add more replicas.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

Shawn Heisey-2
On 4/8/2019 10:06 AM, Shawn Heisey wrote:
> * Make sure you have a copy of the source index directory.
> * Do not copy the tlog directory from the source.
> * Create the collection in the target cloud.
> * Shut down the target cloud completely.
> * Delete all the index directories in the cloud.
> * Copy the source index directory to one of the cloud nodes.
> * Start that cloud node up.  Make sure it is all working.
> * Start up the other nodes.

At the "delete all the index directories in the cloud" step, I should
have written "delete the contents of all data directories for the
collection in the cloud" ... everything in data should be deleted, not
just the index directory.  Don't want it replaying transaction logs when
Solr starts!

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Moving index from stand-alone Solr 6.6.0 to 3 node Solr Cloud 6.6.0 with Zookeeper

Erick Erickson
Glad to hear it. Now, if you want to be really bold (and I haven’t verified it, but it _should_ work).

Rather than copy the index, try this:

1> spin up a one-replica empty collection
2> use the REPLICATION API to copy the index from the re-indexed source.
3> ADDREPLICAs as before.

<2> looks something like:
<a href="http://_slave_host:port_/solr/_core_name_/replication?command=fetchindex&masterUrl=http://solr_with_new_index:port/solr/core_name_/replication">http://_slave_host:port_/solr/_core_name_/replication?command=fetchindex&masterUrl=http://solr_with_new_index:port/solr/core_name_/replication.

_core_name_ in this case is something like collection1_shard1_replica1, i.e. what shows up in the “cores” dropdown.

The replication API is still used by SolrCloud for “full sync” and has been around forever, so it’s well-tested. Again, though, I don’t use this regularly so no guarantees…..

See: https://lucene.apache.org/solr/guide/7_5/index-replication.html

Best,
Erick

> On Apr 9, 2019, at 12:38 AM, kevinc <[hidden email]> wrote:
>
> Thanks so much - your approaches worked a treat!
>
> Best,
> Kevin.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html