Move index directory to another partition

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Move index directory to another partition

Mahmoud Almokadem
Hello,

I've a SolrCloud of four instances on Amazon and the EBS volumes that
contain the data on everynode is going to be full, unfortunately Amazon
doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
move the index to.

I can stop the updates on the index, but I'm afraid to use "cp" command to
copy the files that are "on merge" operation.

The copy operation may take several  hours.

How can I move the data directory without stopping the instance?

Thanks,
Mahmoud
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Shawn Heisey-2
On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote:

> I've a SolrCloud of four instances on Amazon and the EBS volumes that
> contain the data on everynode is going to be full, unfortunately Amazon
> doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
> move the index to.
>
> I can stop the updates on the index, but I'm afraid to use "cp" command to
> copy the files that are "on merge" operation.
>
> The copy operation may take several  hours.
>
> How can I move the data directory without stopping the instance?

Use rsync to do the copy.  Do an initial copy while Solr is running,
then do a second copy, which should be pretty fast because rsync will
see the data from the first copy.  Then shut Solr down and do a third
rsync which will only copy a VERY small changeset.  Reconfigure Solr
and/or the OS to use the new location, and start Solr back up.  Because
you mentioned "cp" I am assuming that you're NOT on Windows, and that
the OS will most likely allow you to do anything you need with index
files while Solr has them open.

If you have set up your replicas with SolrCloud properly, then your
collections will not go offline when one Solr instance is shut down, and
that instance will be brought back into sync with the rest of the
cluster when it starts back up.  Using multiple passes with rsync should
mean that Solr will not need to be shutdown for very long.

The options I typically use for this kind of copy with rsync are "-avH
--delete".  I would recommend that you research rsync options so that
you fully understand what I have suggested.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Walter Underwood
Way back in the 1.x days, replication was done with shell scripts and rsync, right?

wunder
Walter Underwood
[hidden email]
http://observer.wunderwood.org/  (my blog)


> On Aug 1, 2017, at 2:45 PM, Shawn Heisey <[hidden email]> wrote:
>
> On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote:
>> I've a SolrCloud of four instances on Amazon and the EBS volumes that
>> contain the data on everynode is going to be full, unfortunately Amazon
>> doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
>> move the index to.
>>
>> I can stop the updates on the index, but I'm afraid to use "cp" command to
>> copy the files that are "on merge" operation.
>>
>> The copy operation may take several  hours.
>>
>> How can I move the data directory without stopping the instance?
>
> Use rsync to do the copy.  Do an initial copy while Solr is running,
> then do a second copy, which should be pretty fast because rsync will
> see the data from the first copy.  Then shut Solr down and do a third
> rsync which will only copy a VERY small changeset.  Reconfigure Solr
> and/or the OS to use the new location, and start Solr back up.  Because
> you mentioned "cp" I am assuming that you're NOT on Windows, and that
> the OS will most likely allow you to do anything you need with index
> files while Solr has them open.
>
> If you have set up your replicas with SolrCloud properly, then your
> collections will not go offline when one Solr instance is shut down, and
> that instance will be brought back into sync with the rest of the
> cluster when it starts back up.  Using multiple passes with rsync should
> mean that Solr will not need to be shutdown for very long.
>
> The options I typically use for this kind of copy with rsync are "-avH
> --delete".  I would recommend that you research rsync options so that
> you fully understand what I have suggested.
>
> Thanks,
> Shawn
>

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Mahmoud Almokadem
In reply to this post by Shawn Heisey-2
Thanks Shawn,

I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one
replication factor but I think the downtime will be less than five minutes
after following your steps.

But how can I start Solr backup or why should I run it although I copied
the index and changed theo path?

And what do you mean with "Using multiple passes with rsync"?

Thanks,
Mahmoud


On Tuesday, August 1, 2017, Shawn Heisey <[hidden email]> wrote:

> On 7/31/2017 12:28 PM, Mahmoud Almokadem wrote:
> > I've a SolrCloud of four instances on Amazon and the EBS volumes that
> > contain the data on everynode is going to be full, unfortunately Amazon
> > doesn't support expanding the EBS. So, I'll attach larger EBS volumes to
> > move the index to.
> >
> > I can stop the updates on the index, but I'm afraid to use "cp" command
> to
> > copy the files that are "on merge" operation.
> >
> > The copy operation may take several  hours.
> >
> > How can I move the data directory without stopping the instance?
>
> Use rsync to do the copy.  Do an initial copy while Solr is running,
> then do a second copy, which should be pretty fast because rsync will
> see the data from the first copy.  Then shut Solr down and do a third
> rsync which will only copy a VERY small changeset.  Reconfigure Solr
> and/or the OS to use the new location, and start Solr back up.  Because
> you mentioned "cp" I am assuming that you're NOT on Windows, and that
> the OS will most likely allow you to do anything you need with index
> files while Solr has them open.
>
> If you have set up your replicas with SolrCloud properly, then your
> collections will not go offline when one Solr instance is shut down, and
> that instance will be brought back into sync with the rest of the
> cluster when it starts back up.  Using multiple passes with rsync should
> mean that Solr will not need to be shutdown for very long.
>
> The options I typically use for this kind of copy with rsync are "-avH
> --delete".  I would recommend that you research rsync options so that
> you fully understand what I have suggested.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Shawn Heisey-2
On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote:
> I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one
> replication factor but I think the downtime will be less than five minutes after following your steps.
>
> But how can I start Solr backup or why should I run it although I copied
> the index and changed theo path?
>
> And what do you mean with "Using multiple passes with rsync"?

The first time you copy the data, which you could do with cp if you
want, the time required will be limited by the size of the data and the
speed of the disks.  Depending on the size, it could take several hours
like you estimated.  I would suggest using rsync for the first copy just
because you're going to need the same command again for the later passes.

Doing a second pass with rsync should go very quickly.  How fast would
depend on the rate that the index data is changing.  You might need to
do this step more than once just so that it gets faster each time, in
preparation for the final pass.

A final pass with rsync might only take a few seconds, and if Solr is
stopped before that final copy is started, then there's no way the index
data can change.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Erick Erickson
WARNING: what I currently understand about the limitations of AWS
could fill volumes so I might be completely out to lunch.

If you ADDREPLICA with the new replica's  data residing on the new EBS
volume, then wait for it to sync (which it'll do all by itself) then
DELETEREPLICA on the original you'll be all set.

In recent Solr's, theres also the MOVENODE collections API call.

Best,
Erick

On Tue, Aug 1, 2017 at 6:03 PM, Shawn Heisey <[hidden email]> wrote:

> On 8/1/2017 4:00 PM, Mahmoud Almokadem wrote:
>> I'm using ubuntu and I'll try rsync command. Unfortunately I'm using one
>> replication factor but I think the downtime will be less than five minutes after following your steps.
>>
>> But how can I start Solr backup or why should I run it although I copied
>> the index and changed theo path?
>>
>> And what do you mean with "Using multiple passes with rsync"?
>
> The first time you copy the data, which you could do with cp if you
> want, the time required will be limited by the size of the data and the
> speed of the disks.  Depending on the size, it could take several hours
> like you estimated.  I would suggest using rsync for the first copy just
> because you're going to need the same command again for the later passes.
>
> Doing a second pass with rsync should go very quickly.  How fast would
> depend on the rate that the index data is changing.  You might need to
> do this step more than once just so that it gets faster each time, in
> preparation for the final pass.
>
> A final pass with rsync might only take a few seconds, and if Solr is
> stopped before that final copy is started, then there's no way the index
> data can change.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Shawn Heisey-2
On 8/1/2017 7:09 PM, Erick Erickson wrote:
> WARNING: what I currently understand about the limitations of AWS
> could fill volumes so I might be completely out to lunch.
>
> If you ADDREPLICA with the new replica's  data residing on the new EBS
> volume, then wait for it to sync (which it'll do all by itself) then
> DELETEREPLICA on the original you'll be all set.
>
> In recent Solr's, theres also the MOVENODE collections API call.

I did consider mentioning that as a possible way forward, but I hate to
rely on special configurations with core.properties, particularly if the
newly built replica core instanceDirs aren't in the solr home (or
coreRootDirectory) at all.  I didn't want to try and explain the precise
steps required to get that plan to work.  I would expect to need some
arcane Collections API work or manual ZK modification to reach a correct
state -- steps that would be prone to error.

The idea I mentioned seemed to me to be the way forward that would
require the least specialized knowledge.  Here's a simplified stating of
the steps:

* Mount the new volume somewhere.
* Use multiple rsync passes to get the data copied.
* Stop Solr.
* Do a final rsync pass.
* Unmount the original volume.
* Remount the new volume in the original location.
* Start Solr.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

David Hastings
To add to this, not sure of solr cloud uses it, but you're going to want to destroy the wrote.lock file as well

> On Aug 1, 2017, at 9:31 PM, Shawn Heisey <[hidden email]> wrote:
>
>> On 8/1/2017 7:09 PM, Erick Erickson wrote:
>> WARNING: what I currently understand about the limitations of AWS
>> could fill volumes so I might be completely out to lunch.
>>
>> If you ADDREPLICA with the new replica's  data residing on the new EBS
>> volume, then wait for it to sync (which it'll do all by itself) then
>> DELETEREPLICA on the original you'll be all set.
>>
>> In recent Solr's, theres also the MOVENODE collections API call.
>
> I did consider mentioning that as a possible way forward, but I hate to
> rely on special configurations with core.properties, particularly if the
> newly built replica core instanceDirs aren't in the solr home (or
> coreRootDirectory) at all.  I didn't want to try and explain the precise
> steps required to get that plan to work.  I would expect to need some
> arcane Collections API work or manual ZK modification to reach a correct
> state -- steps that would be prone to error.
>
> The idea I mentioned seemed to me to be the way forward that would
> require the least specialized knowledge.  Here's a simplified stating of
> the steps:
>
> * Mount the new volume somewhere.
> * Use multiple rsync passes to get the data copied.
> * Stop Solr.
> * Do a final rsync pass.
> * Unmount the original volume.
> * Remount the new volume in the original location.
> * Start Solr.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Erick Erickson
Shawn:

Not entirely sure about AWS intricacies, but getting a new replica to
use a particular index directory in the general case is just
specifying dataDir=some_directory on the ADDREPLICA command. The index
just needs an HTTP connection (uses the old replication process) so
nothing huge there. Then DELETEREPLICA for the old one. There's
nothing that ZK has to know about to make this work, it's all local to
the Solr instance.

Or I'm completely out in the weeds.

Best,
Erick

On Tue, Aug 1, 2017 at 7:52 PM, Dave <[hidden email]> wrote:

> To add to this, not sure of solr cloud uses it, but you're going to want to destroy the wrote.lock file as well
>
>> On Aug 1, 2017, at 9:31 PM, Shawn Heisey <[hidden email]> wrote:
>>
>>> On 8/1/2017 7:09 PM, Erick Erickson wrote:
>>> WARNING: what I currently understand about the limitations of AWS
>>> could fill volumes so I might be completely out to lunch.
>>>
>>> If you ADDREPLICA with the new replica's  data residing on the new EBS
>>> volume, then wait for it to sync (which it'll do all by itself) then
>>> DELETEREPLICA on the original you'll be all set.
>>>
>>> In recent Solr's, theres also the MOVENODE collections API call.
>>
>> I did consider mentioning that as a possible way forward, but I hate to
>> rely on special configurations with core.properties, particularly if the
>> newly built replica core instanceDirs aren't in the solr home (or
>> coreRootDirectory) at all.  I didn't want to try and explain the precise
>> steps required to get that plan to work.  I would expect to need some
>> arcane Collections API work or manual ZK modification to reach a correct
>> state -- steps that would be prone to error.
>>
>> The idea I mentioned seemed to me to be the way forward that would
>> require the least specialized knowledge.  Here's a simplified stating of
>> the steps:
>>
>> * Mount the new volume somewhere.
>> * Use multiple rsync passes to get the data copied.
>> * Stop Solr.
>> * Do a final rsync pass.
>> * Unmount the original volume.
>> * Remount the new volume in the original location.
>> * Start Solr.
>>
>> Thanks,
>> Shawn
>>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Shawn Heisey-2
On 8/2/2017 9:17 AM, Erick Erickson wrote:
> Not entirely sure about AWS intricacies, but getting a new replica to
> use a particular index directory in the general case is just
> specifying dataDir=some_directory on the ADDREPLICA command. The index
> just needs an HTTP connection (uses the old replication process) so
> nothing huge there. Then DELETEREPLICA for the old one. There's
> nothing that ZK has to know about to make this work, it's all local to
> the Solr instance.

I was envisioning a scenario where the entire solr home is on the old
volume that's going away.  If I were setting up a Solr install where the
large/fast storage was a separate filesystem, I would put the solr home
(or possibly even the entire install) under that mount point.  It would
be a lot easier than setting dataDir in core.properties for every core,
especially in a cloud install.

If the dataDir property is already in use to relocate index data, then
ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
expect most SolrCloud users to use that method.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Erick Erickson
bq: I was envisioning a scenario where the entire solr home is on the old
volume that's going away.  If I were setting up a Solr install where the
large/fast storage was a separate filesystem, I would put the solr home
(or possibly even the entire install) under that mount point.  It would
be a lot easier than setting dataDir in core.properties for every core,
especially in a cloud install.

Agreed. Nothing in what I said precludes this. If you don't specify dataDir,
then the index for a new replica goes in the default place, i.e. under
your install
directory usually. In your case under your new mount point. I usually don't
recommend trying to take control of where dataDir points, just let it default.
I only mentioned it so you'd be aware it exists. So if your new install
is associated with a bigger/better/larger EBS it's all automatic.

bq: If the dataDir property is already in use to relocate index data, then
ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
expect most SolrCloud users to use that method.

I really don't understand this. Each Solr replica has an associated
dataDir whether you specified it or not (the default is relative to
the core.properties file). ADDREPLICA creates a new replica in a new
place, initially the data directory and index are empty. The new
replica goes into recovery and uses the standard replication process
to copy the index via HTTP from a healthy replica and write it to its
data directory. Once that's done, the replica becomes live. There's
nothing about dataDir already being in use here at all.

When you start Solr there's the default place Solr expects to find the
replicas. This is not necessarily where Solr is executing from, see
the "-s" option in bin/solr start -s.....

If you're talking about using dataDir to point to an existing index,
yes that would be a problem and not something I meant to imply at all.

Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's
commonly used to more replicas around a cluster.

Best,
Erick

On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey <[hidden email]> wrote:

> On 8/2/2017 9:17 AM, Erick Erickson wrote:
>> Not entirely sure about AWS intricacies, but getting a new replica to
>> use a particular index directory in the general case is just
>> specifying dataDir=some_directory on the ADDREPLICA command. The index
>> just needs an HTTP connection (uses the old replication process) so
>> nothing huge there. Then DELETEREPLICA for the old one. There's
>> nothing that ZK has to know about to make this work, it's all local to
>> the Solr instance.
>
> I was envisioning a scenario where the entire solr home is on the old
> volume that's going away.  If I were setting up a Solr install where the
> large/fast storage was a separate filesystem, I would put the solr home
> (or possibly even the entire install) under that mount point.  It would
> be a lot easier than setting dataDir in core.properties for every core,
> especially in a cloud install.
>
> If the dataDir property is already in use to relocate index data, then
> ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> expect most SolrCloud users to use that method.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Move index directory to another partition

Mahmoud Almokadem
Thanks all for your commits.

I followed Shawn steps (rsync) cause everything on that volume (ZooKeeper,
Solr home and data) and everything went great.

Thanks again,
Mahmoud


On Sun, Aug 6, 2017 at 12:47 AM, Erick Erickson <[hidden email]>
wrote:

> bq: I was envisioning a scenario where the entire solr home is on the old
> volume that's going away.  If I were setting up a Solr install where the
> large/fast storage was a separate filesystem, I would put the solr home
> (or possibly even the entire install) under that mount point.  It would
> be a lot easier than setting dataDir in core.properties for every core,
> especially in a cloud install.
>
> Agreed. Nothing in what I said precludes this. If you don't specify
> dataDir,
> then the index for a new replica goes in the default place, i.e. under
> your install
> directory usually. In your case under your new mount point. I usually don't
> recommend trying to take control of where dataDir points, just let it
> default.
> I only mentioned it so you'd be aware it exists. So if your new install
> is associated with a bigger/better/larger EBS it's all automatic.
>
> bq: If the dataDir property is already in use to relocate index data, then
> ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> expect most SolrCloud users to use that method.
>
> I really don't understand this. Each Solr replica has an associated
> dataDir whether you specified it or not (the default is relative to
> the core.properties file). ADDREPLICA creates a new replica in a new
> place, initially the data directory and index are empty. The new
> replica goes into recovery and uses the standard replication process
> to copy the index via HTTP from a healthy replica and write it to its
> data directory. Once that's done, the replica becomes live. There's
> nothing about dataDir already being in use here at all.
>
> When you start Solr there's the default place Solr expects to find the
> replicas. This is not necessarily where Solr is executing from, see
> the "-s" option in bin/solr start -s.....
>
> If you're talking about using dataDir to point to an existing index,
> yes that would be a problem and not something I meant to imply at all.
>
> Why wouldn't most SolrCloud users use ADDREPLICA/DELTEREPLICA? It's
> commonly used to more replicas around a cluster.
>
> Best,
> Erick
>
> On Fri, Aug 4, 2017 at 11:15 AM, Shawn Heisey <[hidden email]> wrote:
> > On 8/2/2017 9:17 AM, Erick Erickson wrote:
> >> Not entirely sure about AWS intricacies, but getting a new replica to
> >> use a particular index directory in the general case is just
> >> specifying dataDir=some_directory on the ADDREPLICA command. The index
> >> just needs an HTTP connection (uses the old replication process) so
> >> nothing huge there. Then DELETEREPLICA for the old one. There's
> >> nothing that ZK has to know about to make this work, it's all local to
> >> the Solr instance.
> >
> > I was envisioning a scenario where the entire solr home is on the old
> > volume that's going away.  If I were setting up a Solr install where the
> > large/fast storage was a separate filesystem, I would put the solr home
> > (or possibly even the entire install) under that mount point.  It would
> > be a lot easier than setting dataDir in core.properties for every core,
> > especially in a cloud install.
> >
> > If the dataDir property is already in use to relocate index data, then
> > ADDREPLICA and DELETEREPLICA would be a great way to go.  I would not
> > expect most SolrCloud users to use that method.
> >
> > Thanks,
> > Shawn
> >
>
Loading...