Collection out of disk space, commit problem

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Collection out of disk space, commit problem

WebsterHomer
Over the weekend one of our Dev solrcloud ran out of disk space. Examining
the problem we found one collection that had 2 months of uncommitted tlog
files. Unfortuneatly the solr logs rolled over and so I cannot see the
commit behavior during the last time data was loaded to it.

The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
     <autoCommit>
       <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
       <openSearcher>false</openSearcher>
    </autoCommit>

solr.autoCommit.maxTime  is set to 60000

     <autoSoftCommit>
       <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
     </autoSoftCommit>
solr.autoSoftCommit.maxTime  is set to 3000

I found tlog files dated to Feb. 27. There is an automated job that reloads
the data once a week. It looks like no commits occurred from Feb 27 onward.
Once the disk got full solr got very unhappy.

This solrcloud has 2 shards and one replica per shard.

We have a second development solrcloud which has the same collections with
identical configurations except that these collections have 2 shards and 2
replicas per shard. That one doesn't seem to have the tlog files
accumulating.

I have long suspected that autoCommit is not reliable, and this seems to
indicate that it is not.

We have several collections that share the same configuration, and have
similar ETL jobs loading them. This is the second time that this particular
collection has had this  problem.

--


This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.
Reply | Threaded
Open this post in threaded view
|

Re: Collection out of disk space, commit problem

Erick Erickson
Webster:

Do you by any chance have CDCR configured? If so, insure that
buffering is disabled. Buffering was intended to be enabled
_temporarily_ during, say, a maintenance window and was conceived
before the bootstrapping capability was added to CDCR.

But I don't recall your other e-mails mention CDCR so I mention this
on the off chance...

Best,
Erick

On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer <[hidden email]> wrote:

> Over the weekend one of our Dev solrcloud ran out of disk space. Examining
> the problem we found one collection that had 2 months of uncommitted tlog
> files. Unfortuneatly the solr logs rolled over and so I cannot see the
> commit behavior during the last time data was loaded to it.
>
> The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
>      <autoCommit>
>        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
>        <openSearcher>false</openSearcher>
>     </autoCommit>
>
> solr.autoCommit.maxTime  is set to 60000
>
>      <autoSoftCommit>
>        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
>      </autoSoftCommit>
> solr.autoSoftCommit.maxTime  is set to 3000
>
> I found tlog files dated to Feb. 27. There is an automated job that reloads
> the data once a week. It looks like no commits occurred from Feb 27 onward.
> Once the disk got full solr got very unhappy.
>
> This solrcloud has 2 shards and one replica per shard.
>
> We have a second development solrcloud which has the same collections with
> identical configurations except that these collections have 2 shards and 2
> replicas per shard. That one doesn't seem to have the tlog files
> accumulating.
>
> I have long suspected that autoCommit is not reliable, and this seems to
> indicate that it is not.
>
> We have several collections that share the same configuration, and have
> similar ETL jobs loading them. This is the second time that this particular
> collection has had this  problem.
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.
Reply | Threaded
Open this post in threaded view
|

Re: Collection out of disk space, commit problem

WebsterHomer
Erick,

Thanks, Normally our dev environment does not use CDCR, except when we're
doing active development on it. As it happens the collection in question,
was one we used to test cdcr. Or rather the configuration for it was, as
the specific collection has been deleted and created many times. Even
though we had cdcr turned off it seems that buffers got set to "enabled"
Which seems to be the default, and it is a really bad default!

Because it's dev and we don't do cdcr there, I might not have thought to
look at that. So thank you for that

Web

On Mon, Apr 2, 2018 at 1:10 PM, Erick Erickson <[hidden email]>
wrote:

> Webster:
>
> Do you by any chance have CDCR configured? If so, insure that
> buffering is disabled. Buffering was intended to be enabled
> _temporarily_ during, say, a maintenance window and was conceived
> before the bootstrapping capability was added to CDCR.
>
> But I don't recall your other e-mails mention CDCR so I mention this
> on the off chance...
>
> Best,
> Erick
>
> On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer <[hidden email]>
> wrote:
> > Over the weekend one of our Dev solrcloud ran out of disk space.
> Examining
> > the problem we found one collection that had 2 months of uncommitted tlog
> > files. Unfortuneatly the solr logs rolled over and so I cannot see the
> > commit behavior during the last time data was loaded to it.
> >
> > The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
> >      <autoCommit>
> >        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
> >        <openSearcher>false</openSearcher>
> >     </autoCommit>
> >
> > solr.autoCommit.maxTime  is set to 60000
> >
> >      <autoSoftCommit>
> >        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
> >      </autoSoftCommit>
> > solr.autoSoftCommit.maxTime  is set to 3000
> >
> > I found tlog files dated to Feb. 27. There is an automated job that
> reloads
> > the data once a week. It looks like no commits occurred from Feb 27
> onward.
> > Once the disk got full solr got very unhappy.
> >
> > This solrcloud has 2 shards and one replica per shard.
> >
> > We have a second development solrcloud which has the same collections
> with
> > identical configurations except that these collections have 2 shards and
> 2
> > replicas per shard. That one doesn't seem to have the tlog files
> > accumulating.
> >
> > I have long suspected that autoCommit is not reliable, and this seems to
> > indicate that it is not.
> >
> > We have several collections that share the same configuration, and have
> > similar ETL jobs loading them. This is the second time that this
> particular
> > collection has had this  problem.
> >
> > --
> >
> >
> > This message and any attachment are confidential and may be privileged or
> > otherwise protected from disclosure. If you are not the intended
> recipient,
> > you must not copy this message or attachment or disclose the contents to
> > any other person. If you have received this transmission in error, please
> > notify the sender immediately and delete the message and any attachment
> > from your system. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not accept liability for any omissions or errors in this
> > message which may arise as a result of E-Mail-transmission or for damages
> > resulting from any unauthorized changes of the content of this message
> and
> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> > subsidiaries do not guarantee that this message is free of viruses and
> does
> > not accept liability for any damages caused by any virus transmitted
> > therewith.
> >
> > Click http://www.emdgroup.com/disclaimer to access the German, French,
> > Spanish and Portuguese versions of this disclaimer.
>

--


This message and any attachment are confidential and may be privileged or
otherwise protected from disclosure. If you are not the intended recipient,
you must not copy this message or attachment or disclose the contents to
any other person. If you have received this transmission in error, please
notify the sender immediately and delete the message and any attachment
from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not accept liability for any omissions or errors in this
message which may arise as a result of E-Mail-transmission or for damages
resulting from any unauthorized changes of the content of this message and
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee that this message is free of viruses and does
not accept liability for any damages caused by any virus transmitted
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French,
Spanish and Portuguese versions of this disclaimer.
Reply | Threaded
Open this post in threaded view
|

Re: Collection out of disk space, commit problem

Erick Erickson
Homer:

Yeah, the buffering bits are trappy, and in fact is being removed in
CDCR going forward.

Too bad you fell into that trap, there's hope going forward though...

Erick

On Mon, Apr 2, 2018 at 11:42 AM, Webster Homer <[hidden email]> wrote:

> Erick,
>
> Thanks, Normally our dev environment does not use CDCR, except when we're
> doing active development on it. As it happens the collection in question,
> was one we used to test cdcr. Or rather the configuration for it was, as
> the specific collection has been deleted and created many times. Even
> though we had cdcr turned off it seems that buffers got set to "enabled"
> Which seems to be the default, and it is a really bad default!
>
> Because it's dev and we don't do cdcr there, I might not have thought to
> look at that. So thank you for that
>
> Web
>
> On Mon, Apr 2, 2018 at 1:10 PM, Erick Erickson <[hidden email]>
> wrote:
>
>> Webster:
>>
>> Do you by any chance have CDCR configured? If so, insure that
>> buffering is disabled. Buffering was intended to be enabled
>> _temporarily_ during, say, a maintenance window and was conceived
>> before the bootstrapping capability was added to CDCR.
>>
>> But I don't recall your other e-mails mention CDCR so I mention this
>> on the off chance...
>>
>> Best,
>> Erick
>>
>> On Mon, Apr 2, 2018 at 10:35 AM, Webster Homer <[hidden email]>
>> wrote:
>> > Over the weekend one of our Dev solrcloud ran out of disk space.
>> Examining
>> > the problem we found one collection that had 2 months of uncommitted tlog
>> > files. Unfortuneatly the solr logs rolled over and so I cannot see the
>> > commit behavior during the last time data was loaded to it.
>> >
>> > The solrconfig.xml has both autoCommit and autoSoftCommit enabled.
>> >      <autoCommit>
>> >        <maxTime>${solr.autoCommit.maxTime:60000}</maxTime>
>> >        <openSearcher>false</openSearcher>
>> >     </autoCommit>
>> >
>> > solr.autoCommit.maxTime  is set to 60000
>> >
>> >      <autoSoftCommit>
>> >        <maxTime>${solr.autoSoftCommit.maxTime:5000}</maxTime>
>> >      </autoSoftCommit>
>> > solr.autoSoftCommit.maxTime  is set to 3000
>> >
>> > I found tlog files dated to Feb. 27. There is an automated job that
>> reloads
>> > the data once a week. It looks like no commits occurred from Feb 27
>> onward.
>> > Once the disk got full solr got very unhappy.
>> >
>> > This solrcloud has 2 shards and one replica per shard.
>> >
>> > We have a second development solrcloud which has the same collections
>> with
>> > identical configurations except that these collections have 2 shards and
>> 2
>> > replicas per shard. That one doesn't seem to have the tlog files
>> > accumulating.
>> >
>> > I have long suspected that autoCommit is not reliable, and this seems to
>> > indicate that it is not.
>> >
>> > We have several collections that share the same configuration, and have
>> > similar ETL jobs loading them. This is the second time that this
>> particular
>> > collection has had this  problem.
>> >
>> > --
>> >
>> >
>> > This message and any attachment are confidential and may be privileged or
>> > otherwise protected from disclosure. If you are not the intended
>> recipient,
>> > you must not copy this message or attachment or disclose the contents to
>> > any other person. If you have received this transmission in error, please
>> > notify the sender immediately and delete the message and any attachment
>> > from your system. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not accept liability for any omissions or errors in this
>> > message which may arise as a result of E-Mail-transmission or for damages
>> > resulting from any unauthorized changes of the content of this message
>> and
>> > any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
>> > subsidiaries do not guarantee that this message is free of viruses and
>> does
>> > not accept liability for any damages caused by any virus transmitted
>> > therewith.
>> >
>> > Click http://www.emdgroup.com/disclaimer to access the German, French,
>> > Spanish and Portuguese versions of this disclaimer.
>>
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.