hdfs - documents missing after hard poweroff

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

hdfs - documents missing after hard poweroff

Kyle Fransham
Hi,

Sometimes after a full poweroff of the solr cloud nodes, we see missing
documents from the index. Is there anything about our setup or our recovery
procedure that could cause this? Details are below:

We see the following (somewhat random) behaviour:

 - add 10 documents to index. Commit.
 - query for all documents - 10 documents returned.
 - restart all solr nodes and reset the collection (procedure is below).
 - query for all  documents 10 documents returned.
 - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.

To summarize, after a full reboot of all the solr nodes, we are finding
that (sometimes) not all documents are in the index. This situation doesn't
remedy itself by waiting. Restarting all will sometimes re-add them,
sometimes not.

Our procedure for recovering from a hard poweroff is:
 - manually delete all *.lock files from the index folders on hdfs.
 - fully delete the znode from zookeeper.
 - re-add an empty znode in zookeeper.
 - start up all solr nodes.
 - re-add the configset.
 - re-issue the collection create command.

After doing the above, we find that we are able to see all of the files in
the index about 60% of the time. Other times, we are missing some
documents.

Some other things about our environment:
 - we're doing this test with 1 collection that has 18 shards distributed
across 3 solr cloud nodes.
 - solr version 7.5.0
 - hdfs is not running on the solr nodes, and is not being restarted.

Any thoughts or tips are greatly appreciated,

Kyle

--
CONFIDENTIALITY NOTICE: The information contained in this email is
privileged and confidential and intended only for the use of the individual
or entity to whom it is addressed.   If you receive this message in error,
please notify the sender immediately at 613-729-1100 and destroy the
original message and all copies. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: hdfs - documents missing after hard poweroff

Kevin Risden-3
So I'm definitely curious what is going on here.

Are you still able to reproduce this? Can you check if files have been
modified on HDFS? I'd be curious if tlogs or the index is changing
underneath for the different restarts. Since there is no new indexing I
would guess not but something to check.

Can you run check index on the index to make sure its not corrupt when you
don't get the full result set.

Kevin Risden


On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham <[hidden email]>
wrote:

> Hi,
>
> Sometimes after a full poweroff of the solr cloud nodes, we see missing
> documents from the index. Is there anything about our setup or our recovery
> procedure that could cause this? Details are below:
>
> We see the following (somewhat random) behaviour:
>
>  - add 10 documents to index. Commit.
>  - query for all documents - 10 documents returned.
>  - restart all solr nodes and reset the collection (procedure is below).
>  - query for all  documents 10 documents returned.
>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>
> To summarize, after a full reboot of all the solr nodes, we are finding
> that (sometimes) not all documents are in the index. This situation doesn't
> remedy itself by waiting. Restarting all will sometimes re-add them,
> sometimes not.
>
> Our procedure for recovering from a hard poweroff is:
>  - manually delete all *.lock files from the index folders on hdfs.
>  - fully delete the znode from zookeeper.
>  - re-add an empty znode in zookeeper.
>  - start up all solr nodes.
>  - re-add the configset.
>  - re-issue the collection create command.
>
> After doing the above, we find that we are able to see all of the files in
> the index about 60% of the time. Other times, we are missing some
> documents.
>
> Some other things about our environment:
>  - we're doing this test with 1 collection that has 18 shards distributed
> across 3 solr cloud nodes.
>  - solr version 7.5.0
>  - hdfs is not running on the solr nodes, and is not being restarted.
>
> Any thoughts or tips are greatly appreciated,
>
> Kyle
>
> --
> CONFIDENTIALITY NOTICE: The information contained in this email is
> privileged and confidential and intended only for the use of the
> individual
> or entity to whom it is addressed.   If you receive this message in error,
> please notify the sender immediately at 613-729-1100 and destroy the
> original message and all copies. Thank you.
>
Reply | Threaded
Open this post in threaded view
|

Re: hdfs - documents missing after hard poweroff

Kevin Risden-3
Also do you have auto add replicas turned on for these collections over
HDFS?

Kevin Risden


On Wed, Oct 31, 2018 at 8:20 PM Kevin Risden <[hidden email]> wrote:

> So I'm definitely curious what is going on here.
>
> Are you still able to reproduce this? Can you check if files have been
> modified on HDFS? I'd be curious if tlogs or the index is changing
> underneath for the different restarts. Since there is no new indexing I
> would guess not but something to check.
>
> Can you run check index on the index to make sure its not corrupt when you
> don't get the full result set.
>
> Kevin Risden
>
>
> On Tue, Oct 16, 2018 at 10:23 AM Kyle Fransham <[hidden email]>
> wrote:
>
>> Hi,
>>
>> Sometimes after a full poweroff of the solr cloud nodes, we see missing
>> documents from the index. Is there anything about our setup or our
>> recovery
>> procedure that could cause this? Details are below:
>>
>> We see the following (somewhat random) behaviour:
>>
>>  - add 10 documents to index. Commit.
>>  - query for all documents - 10 documents returned.
>>  - restart all solr nodes and reset the collection (procedure is below).
>>  - query for all  documents 10 documents returned.
>>  - restart+reset all again. - sometimes 7, 8, 9, or 10 documents returned.
>>
>> To summarize, after a full reboot of all the solr nodes, we are finding
>> that (sometimes) not all documents are in the index. This situation
>> doesn't
>> remedy itself by waiting. Restarting all will sometimes re-add them,
>> sometimes not.
>>
>> Our procedure for recovering from a hard poweroff is:
>>  - manually delete all *.lock files from the index folders on hdfs.
>>  - fully delete the znode from zookeeper.
>>  - re-add an empty znode in zookeeper.
>>  - start up all solr nodes.
>>  - re-add the configset.
>>  - re-issue the collection create command.
>>
>> After doing the above, we find that we are able to see all of the files in
>> the index about 60% of the time. Other times, we are missing some
>> documents.
>>
>> Some other things about our environment:
>>  - we're doing this test with 1 collection that has 18 shards distributed
>> across 3 solr cloud nodes.
>>  - solr version 7.5.0
>>  - hdfs is not running on the solr nodes, and is not being restarted.
>>
>> Any thoughts or tips are greatly appreciated,
>>
>> Kyle
>>
>> --
>> CONFIDENTIALITY NOTICE: The information contained in this email is
>> privileged and confidential and intended only for the use of the
>> individual
>> or entity to whom it is addressed.   If you receive this message in
>> error,
>> please notify the sender immediately at 613-729-1100 and destroy the
>> original message and all copies. Thank you.
>>
>