SolrCloud shows cluster still healthy even the node data directory is deleted

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

SolrCloud shows cluster still healthy even the node data directory is deleted

Amy Bai-2
Hi community,

I found that SolrCloud won't check the IO status if the SolrCloud process is alive.
E.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud   Admin UI to create/query collections.
Is this reasonable?
Can someone explain why SOLR handles it like this?
Thanks so much.


Regards,
Amy
Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud shows cluster still healthy even the node data directory is deleted

Erick Erickson
Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai <[hidden email]> wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud   Admin UI to create/query collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud shows cluster still healthy even the node data directory is deleted

Amy Bai-2
Hi Erick,

Thanks for your kindly reply.
There are two things that confuse me:

1. index/search queries keep failing because one of the node data directory is gone, but the node is not marked as down.

2. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas.

Regards,
Amy
________________________________
From: Erick Erickson <[hidden email]>
Sent: Monday, November 9, 2020 8:43 PM
To: [hidden email] <[hidden email]>
Subject: Re: SolrCloud shows cluster still healthy even the node data directory is deleted

Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai <[hidden email]> wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud   Admin UI to create/query collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy

Reply | Threaded
Open this post in threaded view
|

Re: SolrCloud shows cluster still healthy even the node data directory is deleted

Radar Lei-2
Hi Erick,

I understand this is how the file handler works.

But for the SolrCloud users, they didn't see the expected replica failover happens, then we can not say SolrCloud is totally HA enabled. Do we have plan to handle the HA for disk failures? Thanks.

Regards,
Radar

From: Amy Bai <[hidden email]>
Date: Wednesday, November 11, 2020 at 8:19 PM
To: [hidden email] <[hidden email]>
Subject: Re: SolrCloud shows cluster still healthy even the node data directory is deleted
Hi Erick,

Thanks for your kindly reply.
There are two things that confuse me:

1. index/search queries keep failing because one of the node data directory is gone, but the node is not marked as down.

2. The replicas on the failed node are not working, but the Index/search queries didn't failover to other healthy replicas.

Regards,
Amy
________________________________
From: Erick Erickson <[hidden email]>
Sent: Monday, November 9, 2020 8:43 PM
To: [hidden email] <[hidden email]>
Subject: Re: SolrCloud shows cluster still healthy even the node data directory is deleted

Depends. *nix systems have delete-on-close semantics, that is as
long as there’s a single file handle open, the file will be still be
available to the process using it. Only when the last file handle is
closed will the file actually be deleted.

Solr (Lucene actually) has  file handle open to every file in the index
all the time.

These files aren’t visible when you do a directory listing. So if you
stop Solr, are the files gone? NOTE: When you start Solr again, if
there are existing replicas that are healthy then the entire index
should be copied from another replica….

Best,
Erick

> On Nov 9, 2020, at 3:30 AM, Amy Bai <[hidden email]> wrote:
>
> Hi community,
>
> I found that SolrCloud won't check the IO status if the SolrCloud process is alive.
> E.g. If I delete the SolrCloud data directory, there are no errors report, and I can still log in to the SolrCloud   Admin UI to create/query collections.
> Is this reasonable?
> Can someone explain why SOLR handles it like this?
> Thanks so much.
>
>
> Regards,
> Amy