daily SolrCloud collection wipes

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

daily SolrCloud collection wipes

Werner Detter
Hi,
 
I've got a SolrCloud instance with two collections running (Solr 7.7.2) on
Debian Stretch VMs. Every morning round about 03:3* am the collection gets
reset by $something and I have no clue what causes this and how to prevent
it as there areeven no log entries in SolrCloud (even with increased log level)

It seems like it's some internal trigger. For the sake of testings I've
completely recreated[1]the collections yesterday from scratch but they've
been reset again.

Maybe anybody else experienced something similar and can give a hint how to
track down the source of the collection resets

Thanks,
Werner


[1]
curl "http://localhost:8983/solr/admin/collections?action=DELETE&name=$SOLR_COLLECTION"
/bin/su - solr -c "/opt/solr/bin/solr create -c $SOLR_COLLECTION -s 1 -rf 2"
/bin/bash /opt/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd putfile /configs/$SOLR_COLLECTION/schema.xml /root/$SOLR_COLLECTION/conf/schema.xml
/bin/bash /opt/solr/server/scripts/cloud-scripts/zkcli.sh -zkhost localhost:2181 -cmd putfile /configs/$SOLR_COLLECTION/solrconfig.xml /root/$SOLR_COLLECTION/conf/solrconfig.xml
/usr/bin/curl "http://localhost:8983/solr/admin/collections?action=MODIFYCOLLECTION&collection=$SOLR_COLLECTION&maxShardsPerNode=1"

/usr/share/zookeeper/bin/zkCli.sh
rmr /configs/$SOLR_COLLECTION/managed-schema
Reply | Threaded
Open this post in threaded view
|

Re: daily SolrCloud collection wipes

Shawn Heisey-2
On 11/14/2019 12:28 AM, Werner Detter wrote:
> I've got a SolrCloud instance with two collections running (Solr 7.7.2) on
> Debian Stretch VMs. Every morning round about 03:3* am the collection gets
> reset by $something and I have no clue what causes this and how to prevent
> it as there areeven no log entries in SolrCloud (even with increased log level)

There is only one thing I know of that can delete data from an index
without an external trigger.  It is the document expiration feature.

https://lucidworks.com/post/document-expiration/

Without some kind of action or intentional config, Solr will never
delete anything automatically.  Solr does NOT contain any kind of
scheduling capability, and it might never get that functionality,
because ALL modern operating systems have something built in which can
schedule operations.

What precisely do you mean by "reset" in the above?  Is the collection
still there but empty, or is the collection gone?

Can you grab and share the entire solr.log shortly after this happens,
and the previous logfile as well, which will most likely be named
solr.log.1?

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: daily SolrCloud collection wipes

Werner Detter
Hi Shawn,

> There is only one thing I know of that can delete data from an index
> without an external trigger.  It is the document expiration feature.
>
> https://lucidworks.com/post/document-expiration/
>
> Without some kind of action or intentional config, Solr will never
> delete anything automatically.  Solr does NOT contain any kind of
> scheduling capability, and it might never get that functionality,
> because ALL modern operating systems have something built in which can
> schedule operations.
>
> What precisely do you mean by "reset" in the above?  Is the collection
> still there but empty, or is the collection gone?
>
> Can you grab and share the entire solr.log shortly after this happens,
> and the previous logfile as well, which will most likely be named
> solr.log.1?

first, thanks for your response. By "reset" I mean: collection still exists
but documents have been dropped (from actually round 50k to 0). It happened
twice within the same timeframe early in the morning the last two days so I
was wondering if something within Solr like this:

".scheduled_maintenance":{
      "name":".scheduled_maintenance",
      "event":"scheduled",
      "startTime":"NOW",
      "every":"+1DAY",
      "enabled":true,
      "actions"
        {
          "name":"inactive_shard_plan",
          "class":"solr.InactiveShardPlanAction"},
        {
          "name":"inactive_markers_plan",
          "class":"solr.InactiveMarkersPlanAction"},
        {
          "name":"execute_plan",
          "class":"solr.ExecutePlanAction"}]}},

could be the reason for the resets due to $something =) But I'm not sure about those
Solr maintenance things, that's why I initially asked on the mailinglist here. But
you said Solr doesn't contain any internal scheduling capability which means this
is probably something else. There are no crons on the operating system itself that do
any kind of solr maintenance.

Currently logging is disabled due to performance on the live setup. But tonight, bevor
this happens, we'll enable logging an we'll hopefully see something to track down the
source for the documents deletion in the collection.

Thanks,
Werner


Reply | Threaded
Open this post in threaded view
|

Re: daily SolrCloud collection wipes

Shawn Heisey-2
On 11/14/2019 9:17 AM, Werner Detter wrote:

> first, thanks for your response. By "reset" I mean: collection still exists
> but documents have been dropped (from actually round 50k to 0). It happened
> twice within the same timeframe early in the morning the last two days so I
> was wondering if something within Solr like this:
>
> ".scheduled_maintenance":{
>        "name":".scheduled_maintenance",
>        "event":"scheduled",
>        "startTime":"NOW",
>        "every":"+1DAY",
>        "enabled":true,
>        "actions"
>          {
>            "name":"inactive_shard_plan",
>            "class":"solr.InactiveShardPlanAction"},
>          {
>            "name":"inactive_markers_plan",
>            "class":"solr.InactiveMarkersPlanAction"},
>          {
>            "name":"execute_plan",
>            "class":"solr.ExecutePlanAction"}]}},
>
> could be the reason for the resets due to $something =) But I'm not sure about those
> Solr maintenance things, that's why I initially asked on the mailinglist here. But
> you said Solr doesn't contain any internal scheduling capability which means this
> is probably something else. There are no crons on the operating system itself that do
> any kind of solr maintenance.

I was unaware of that config.  Had to look it up.  I have never looked
at the autoscaling feature.  I'm not even sure what that config will
actually do.  To me, it doesn't look like it's configured to do much.

Someone who is familiar with that feature will need to chime in and
confirm/refute my thoughts, but as far as I know, it is only capable of
things like adding or removing replicas, not deleting the data or the index.

Seeing the logs, with them set to the defaults that Solr ships, might
reveal something.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: daily SolrCloud collection wipes

Werner Detter
In reply to this post by Werner Detter
Hi,

> Currently logging is disabled due to performance on the live setup. But tonight, bevor
> this happens, we'll enable logging an we'll hopefully see something to track down the
> source for the documents deletion in the collection.

tcpdump and the logs revealed the source of the problem: the collections were indeed
cleared by a delete *:* query.

Cheers,
Werner

Reply | Threaded
Open this post in threaded view
|

Re: daily SolrCloud collection wipes

Erick Erickson
Werner:

Who is sending the delete-by-query? What I’m really wondering is if it’s done by something internal to Solr (in which case I’d like to track it down) or something outside Solr in which case we don’t need to be concerned...

Thanks,
Erick

> On Nov 15, 2019, at 1:38 AM, Werner Detter <[hidden email]> wrote:
>
> Hi,
>
>> Currently logging is disabled due to performance on the live setup. But tonight, bevor
>> this happens, we'll enable logging an we'll hopefully see something to track down the
>> source for the documents deletion in the collection.
>
> tcpdump and the logs revealed the source of the problem: the collections were indeed
> cleared by a delete *:* query.
>
> Cheers,
> Werner
>

Reply | Threaded
Open this post in threaded view
|

Re: daily SolrCloud collection wipes

Andrzej Białecki-2
In reply to this post by Shawn Heisey-2
This default autoscaling config helps to keep some aspects of SolrCloud clean - specifically:
* Inactive shard plan: it periodically checks whether there are old shards in INACTIVE state that can be removed. Shards in this state are left-over parent shards remaining after a *successful* SPLITSHARD operation (i.e. the SPLITSHARD has completed successfully and the new sub-shards are ACTIVE and in use, and the parent shards are no longer in use). That’s likely not your case.
* inactive markers plan has to do with Overseer state recovery when an overseer leader crashes. Again, this likely has nothing to do with your case.

As Shawn said, logs should be able to tell you what’s really happening. For example, there could be some wild external process in your setup that periodically cleans up the collections :)

> On 14 Nov 2019, at 18:25, Shawn Heisey <[hidden email]> wrote:
>
> On 11/14/2019 9:17 AM, Werner Detter wrote:
>> first, thanks for your response. By "reset" I mean: collection still exists
>> but documents have been dropped (from actually round 50k to 0). It happened
>> twice within the same timeframe early in the morning the last two days so I
>> was wondering if something within Solr like this:
>> ".scheduled_maintenance":{
>>       "name":".scheduled_maintenance",
>>       "event":"scheduled",
>>       "startTime":"NOW",
>>       "every":"+1DAY",
>>       "enabled":true,
>>       "actions"
>>         {
>>           "name":"inactive_shard_plan",
>>           "class":"solr.InactiveShardPlanAction"},
>>         {
>>           "name":"inactive_markers_plan",
>>           "class":"solr.InactiveMarkersPlanAction"},
>>         {
>>           "name":"execute_plan",
>>           "class":"solr.ExecutePlanAction"}]}},
>> could be the reason for the resets due to $something =) But I'm not sure about those
>> Solr maintenance things, that's why I initially asked on the mailinglist here. But
>> you said Solr doesn't contain any internal scheduling capability which means this
>> is probably something else. There are no crons on the operating system itself that do
>> any kind of solr maintenance.
>
> I was unaware of that config.  Had to look it up.  I have never looked at the autoscaling feature.  I'm not even sure what that config will actually do.  To me, it doesn't look like it's configured to do much.
>
> Someone who is familiar with that feature will need to chime in and confirm/refute my thoughts, but as far as I know, it is only capable of things like adding or removing replicas, not deleting the data or the index.
>
> Seeing the logs, with them set to the defaults that Solr ships, might reveal something.
>
> Thanks,
> Shawn
>