Solr node is out of sync (looks Healthy)

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr node is out of sync (looks Healthy)

Daniel Carrasco
Hello,

We're using Solr to manage products data on our shop and the last week some
customers called us telling that price between shop and shopping basket
differs. After research a bit I've noticed that it happens sometimes on
page refresh.
After disabling all cache I've queried all solr instances to see if data is
correct and I've seen that one of them give a different price for the
product, so looks like the instance has not got the updated data.

   - How is possible that a node on a cluster have different data?
   - How i can check if data is in sync?, because the cluster looks al
   healthy on admin, and the node is active and OK.
   - Is there any way to detect this error? and How I can force resyncs?

After restart the node it got synced, so the data now is OK, but I can't
restart the nodes every time to see if data is right (it tooks a lot of
time to be synced again).

My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11. All
nodes are standalone (I'm not using hadoop).

Thanks and greetings!
--
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Emir Arnautović
Hi Daniel,
Can you tell us more about your document update process. How do you commit changes? Since it got fixed after restart, it seems to me that on that one node index searcher was not reopened after updates. Do you see any errors/warnings on that node?

Also, what do you mean by “All nodes are standalone”?

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email]> wrote:
>
> Hello,
>
> We're using Solr to manage products data on our shop and the last week some
> customers called us telling that price between shop and shopping basket
> differs. After research a bit I've noticed that it happens sometimes on
> page refresh.
> After disabling all cache I've queried all solr instances to see if data is
> correct and I've seen that one of them give a different price for the
> product, so looks like the instance has not got the updated data.
>
>   - How is possible that a node on a cluster have different data?
>   - How i can check if data is in sync?, because the cluster looks al
>   healthy on admin, and the node is active and OK.
>   - Is there any way to detect this error? and How I can force resyncs?
>
> After restart the node it got synced, so the data now is OK, but I can't
> restart the nodes every time to see if data is right (it tooks a lot of
> time to be synced again).
>
> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11. All
> nodes are standalone (I'm not using hadoop).
>
> Thanks and greetings!
> --
> _________________________________________
>
>      Daniel Carrasco Marín
>      Ingeniería para la Innovación i2TIC, S.L.
>      Tlf:  +34 911 12 32 84 Ext: 223
>      www.i2tic.com
> _________________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Daniel Carrasco
Hello, thanks for your help.

I answer bellow.

Greetings!!

2018-02-12 11:31 GMT+01:00 Emir Arnautović <[hidden email]>:

> Hi Daniel,
> Can you tell us more about your document update process. How do you commit
> changes? Since it got fixed after restart, it seems to me that on that one
> node index searcher was not reopened after updates. Do you see any
> errors/warnings on that node?
>

i've asked to the programmers and looks like they are using the collections
dataimport using curl. I think the data is imported from a Microsoft SQL
server using a solr plugin.


> Also, what do you mean by “All nodes are standalone”?
>

I mean that nodes don't share filesystem (I'm planning to use Hadoop, but
I've to learn to create and maintain the cluster first). All nodes has its
own data drive and are connected to the cluster using zookeeper.


>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email]> wrote:
> >
> > Hello,
> >
> > We're using Solr to manage products data on our shop and the last week
> some
> > customers called us telling that price between shop and shopping basket
> > differs. After research a bit I've noticed that it happens sometimes on
> > page refresh.
> > After disabling all cache I've queried all solr instances to see if data
> is
> > correct and I've seen that one of them give a different price for the
> > product, so looks like the instance has not got the updated data.
> >
> >   - How is possible that a node on a cluster have different data?
> >   - How i can check if data is in sync?, because the cluster looks al
> >   healthy on admin, and the node is active and OK.
> >   - Is there any way to detect this error? and How I can force resyncs?
> >
> > After restart the node it got synced, so the data now is OK, but I can't
> > restart the nodes every time to see if data is right (it tooks a lot of
> > time to be synced again).
> >
> > My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11. All
> > nodes are standalone (I'm not using hadoop).
> >
> > Thanks and greetings!
> > --
> > _________________________________________
> >
> >      Daniel Carrasco Marín
> >      Ingeniería para la Innovación i2TIC, S.L.
> >      Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
> >      www.i2tic.com
> > _________________________________________
>
>


--
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Emir Arnautović
Hi Daniel,
Maybe it is Monday and I am still not warmed up, but your details seems a bit imprecise to me. Maybe not directly related to your problem, but just to exclude that you are having some strange Solr setup, here is my understanding: You are running a single SolrCloud cluster with 8 nodes. It has a single collection with X shards and Y replicas. You use DIH to index data and you use curl to interact with Solr and start DIH process. You see some of replicas for some of shards having less data and after node restart it ends up being ok.

Is this right? If it is, what is X and Y? Do you have autocommit set up or you commit explicitly? Did you check logs on node with less data and did you see any errors/warnings? Do you do full imports or incremental imports?

Not related to issue, but just a note that Solr does not guaranty consistency at any time - it has something called “eventual consistency” - once updates stop all replicas will (should) end up in the same state. Having said that, using Solr results directly in your UI would either require you to cache used documents on UI/middle layer or implement some sort of stickiness or retrieve only ID from Solr and load data from primary storage. If you have static data, and you update index once a day, you can use aliases and switch between new and old index and you will suffer from this issue only at the time when doing switches.

Regards,
Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Feb 2018, at 12:00, Daniel Carrasco <[hidden email]> wrote:
>
> Hello, thanks for your help.
>
> I answer bellow.
>
> Greetings!!
>
> 2018-02-12 11:31 GMT+01:00 Emir Arnautović <[hidden email] <mailto:[hidden email]>>:
>
>> Hi Daniel,
>> Can you tell us more about your document update process. How do you commit
>> changes? Since it got fixed after restart, it seems to me that on that one
>> node index searcher was not reopened after updates. Do you see any
>> errors/warnings on that node?
>>
>
> i've asked to the programmers and looks like they are using the collections
> dataimport using curl. I think the data is imported from a Microsoft SQL
> server using a solr plugin.
>
>
>> Also, what do you mean by “All nodes are standalone”?
>>
>
> I mean that nodes don't share filesystem (I'm planning to use Hadoop, but
> I've to learn to create and maintain the cluster first). All nodes has its
> own data drive and are connected to the cluster using zookeeper.
>
>
>>
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> We're using Solr to manage products data on our shop and the last week
>> some
>>> customers called us telling that price between shop and shopping basket
>>> differs. After research a bit I've noticed that it happens sometimes on
>>> page refresh.
>>> After disabling all cache I've queried all solr instances to see if data
>> is
>>> correct and I've seen that one of them give a different price for the
>>> product, so looks like the instance has not got the updated data.
>>>
>>>  - How is possible that a node on a cluster have different data?
>>>  - How i can check if data is in sync?, because the cluster looks al
>>>  healthy on admin, and the node is active and OK.
>>>  - Is there any way to detect this error? and How I can force resyncs?
>>>
>>> After restart the node it got synced, so the data now is OK, but I can't
>>> restart the nodes every time to see if data is right (it tooks a lot of
>>> time to be synced again).
>>>
>>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11. All
>>> nodes are standalone (I'm not using hadoop).
>>>
>>> Thanks and greetings!
>>> --
>>> _________________________________________
>>>
>>>     Daniel Carrasco Marín
>>>     Ingeniería para la Innovación i2TIC, S.L.
>>>     Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
>>>     www.i2tic.com <http://www.i2tic.com/>
>>> _________________________________________
>>
>>
>
>
> --
> _________________________________________
>
>      Daniel Carrasco Marín
>      Ingeniería para la Innovación i2TIC, S.L.
>      Tlf:  +34 911 12 32 84 Ext: 223
>      www.i2tic.com <http://www.i2tic.com/>
> _________________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Daniel Carrasco
Hello,

2018-02-12 12:32 GMT+01:00 Emir Arnautović <[hidden email]>:

> Hi Daniel,
> Maybe it is Monday and I am still not warmed up, but your details seems a
> bit imprecise to me. Maybe not directly related to your problem, but just
> to exclude that you are having some strange Solr setup, here is my
> understanding: You are running a single SolrCloud cluster with 8 nodes. It
> has a single collection with X shards and Y replicas. You use DIH to index
> data and you use curl to interact with Solr and start DIH process. You see
> some of replicas for some of shards having less data and after node restart
> it ends up being ok.


> Is this right? If it is, what is X and Y?


Near to reality:

   - I've a SolrCloud cluster with 8 nodes but has multiple collections.
   - Every collection has only one shard for performance purpose (I did
   some test splitting shards and queries were slower).
   - Every collection has 8 replicas (one by node)
   - After restart the node it start to recover the collections. I don't
   know if Solr serve data directly on that state or get the data from other
   nodes before serve it, but even while is recovering, the data looks OK.



> Do you have autocommit set up or you commit explicitly?


I'm not sure about that. How I can check it?

On curl command is not specified, but will be true by default, right?



> Did you check logs on node with less data and did you see any
> errors/warnings?


I'm not sure when it failed and the cluster has a lot warnings and error
every time (maybe related with queries from shop), so is hard to determine
if import error exists and what's the error related to the import. Is like
search a needle on a haystack


> Do you do full imports or incremental imports?
>

I've checked the curl command and looks like is doing full imports without
clean data:
http://' . $solr_ip .
':8983/solr/descriptions/dataimport?command=full-import&clean=false&entity=description_'.$idm[$j].'_lastupdate


>
> Not related to issue, but just a note that Solr does not guaranty
> consistency at any time - it has something called “eventual consistency” -
> once updates stop all replicas will (should) end up in the same state.
> Having said that, using Solr results directly in your UI would either
> require you to cache used documents on UI/middle layer or implement some
> sort of stickiness or retrieve only ID from Solr and load data from primary
> storage. If you have static data, and you update index once a day, you can
> use aliases and switch between new and old index and you will suffer from
> this issue only at the time when doing switches.
>

But is normal that data will be inconsistent for a very long time?, because
looks like the data is inconsistent from about a week...

Another question: With HDFS, data will be consistent?. With HDFS the data
will be shared between nodes and then updates will be avaible on all nodes
at same time, right?

Thanks!!


>
> Regards,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 12 Feb 2018, at 12:00, Daniel Carrasco <[hidden email]> wrote:
> >
> > Hello, thanks for your help.
> >
> > I answer bellow.
> >
> > Greetings!!
> >
> > 2018-02-12 11:31 GMT+01:00 Emir Arnautović <[hidden email]
> <mailto:[hidden email]>>:
> >
> >> Hi Daniel,
> >> Can you tell us more about your document update process. How do you
> commit
> >> changes? Since it got fixed after restart, it seems to me that on that
> one
> >> node index searcher was not reopened after updates. Do you see any
> >> errors/warnings on that node?
> >>
> >
> > i've asked to the programmers and looks like they are using the
> collections
> > dataimport using curl. I think the data is imported from a Microsoft SQL
> > server using a solr plugin.
> >
> >
> >> Also, what do you mean by “All nodes are standalone”?
> >>
> >
> > I mean that nodes don't share filesystem (I'm planning to use Hadoop, but
> > I've to learn to create and maintain the cluster first). All nodes has
> its
> > own data drive and are connected to the cluster using zookeeper.
> >
> >
> >>
> >> Regards,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email]>
> wrote:
> >>>
> >>> Hello,
> >>>
> >>> We're using Solr to manage products data on our shop and the last week
> >> some
> >>> customers called us telling that price between shop and shopping basket
> >>> differs. After research a bit I've noticed that it happens sometimes on
> >>> page refresh.
> >>> After disabling all cache I've queried all solr instances to see if
> data
> >> is
> >>> correct and I've seen that one of them give a different price for the
> >>> product, so looks like the instance has not got the updated data.
> >>>
> >>>  - How is possible that a node on a cluster have different data?
> >>>  - How i can check if data is in sync?, because the cluster looks al
> >>>  healthy on admin, and the node is active and OK.
> >>>  - Is there any way to detect this error? and How I can force resyncs?
> >>>
> >>> After restart the node it got synced, so the data now is OK, but I
> can't
> >>> restart the nodes every time to see if data is right (it tooks a lot of
> >>> time to be synced again).
> >>>
> >>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11.
> All
> >>> nodes are standalone (I'm not using hadoop).
> >>>
> >>> Thanks and greetings!
> >>> --
> >>> _________________________________________
> >>>
> >>>     Daniel Carrasco Marín
> >>>     Ingeniería para la Innovación i2TIC, S.L.
> >>>     Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
> >>>     www.i2tic.com <http://www.i2tic.com/>
> >>> _________________________________________
> >>
> >>
> >
> >
> > --
> > _________________________________________
> >
> >      Daniel Carrasco Marín
> >      Ingeniería para la Innovación i2TIC, S.L.
> >      Tlf:  +34 911 12 32 84 Ext: 223
> >      www.i2tic.com <http://www.i2tic.com/>
> > _________________________________________
>
>


--
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Emir Arnautović
Hi Daniel,
Please see inline comments.
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 12 Feb 2018, at 13:13, Daniel Carrasco <[hidden email]> wrote:
>
> Hello,
>
> 2018-02-12 12:32 GMT+01:00 Emir Arnautović <[hidden email] <mailto:[hidden email]>>:
>
>> Hi Daniel,
>> Maybe it is Monday and I am still not warmed up, but your details seems a
>> bit imprecise to me. Maybe not directly related to your problem, but just
>> to exclude that you are having some strange Solr setup, here is my
>> understanding: You are running a single SolrCloud cluster with 8 nodes. It
>> has a single collection with X shards and Y replicas. You use DIH to index
>> data and you use curl to interact with Solr and start DIH process. You see
>> some of replicas for some of shards having less data and after node restart
>> it ends up being ok.
>
>
>> Is this right? If it is, what is X and Y?
>
>
> Near to reality:
>
>   - I've a SolrCloud cluster with 8 nodes but has multiple collections.
>   - Every collection has only one shard for performance purpose (I did
>   some test splitting shards and queries were slower).
Distributed request comes with an overhead and if collection is small, that overhead can be larger than benefit of parallelising search.

>   - Every collection has 8 replicas (one by node)
I would compare all shards on all nodes (64 Solr cores) v.s. having just one replica (16 Solr cores)

>   - After restart the node it start to recover the collections. I don't
>   know if Solr serve data directly on that state or get the data from other
>   nodes before serve it, but even while is recovering, the data looks OK.
Recovery can be from transaction logs (logs can tell) and that would mean that there was no hard commit after some updates.

>
>
>
>> Do you have autocommit set up or you commit explicitly?
>
>
> I'm not sure about that. How I can check it?
It is part of solr.xml

>
> On curl command is not specified, but will be true by default, right?
I think it is for full import.

>
>
>
>> Did you check logs on node with less data and did you see any
>> errors/warnings?
>
>
> I'm not sure when it failed and the cluster has a lot warnings and error
> every time (maybe related with queries from shop), so is hard to determine
> if import error exists and what's the error related to the import. Is like
> search a needle on a haystack
Not with some logging solution - one such is Sematext’s Logsene: https://sematext.com/logsene/ <https://sematext.com/logsene/>

>
>
>> Do you do full imports or incremental imports?
>>
>
> I've checked the curl command and looks like is doing full imports without
> clean data:
> http://' <http://'/> . $solr_ip .
> ':8983/solr/descriptions/dataimport?command=full-import&clean=false&entity=description_’.$idm[$j].'_lastupdate
This is not a good strategy since Solr does not have real updates - it is delete + insert and deleted documents are purged on segment merges. Also this will not eliminate documents that are deleted in the primary storage. It is much better to index it in new collection and use aliases to point to used collection. This way you can even roll back if not happy with new index.

>
>
>>
>> Not related to issue, but just a note that Solr does not guaranty
>> consistency at any time - it has something called “eventual consistency” -
>> once updates stop all replicas will (should) end up in the same state.
>> Having said that, using Solr results directly in your UI would either
>> require you to cache used documents on UI/middle layer or implement some
>> sort of stickiness or retrieve only ID from Solr and load data from primary
>> storage. If you have static data, and you update index once a day, you can
>> use aliases and switch between new and old index and you will suffer from
>> this issue only at the time when doing switches.
>>
>
> But is normal that data will be inconsistent for a very long time?, because
> looks like the data is inconsistent from about a week…
It will become consistent once all changes are committed and searchers reopened.

>
> Another question: With HDFS, data will be consistent?. With HDFS the data
> will be shared between nodes and then updates will be avaible on all nodes
> at same time, right?
I am not too familiar with running Solr on HDFS, but I doubt that it is working the way you expect it to work. You might be able to have multiple Solr instances (not part of the same cluster - can be standalone Solr) reading from the same HDFS directory and one updating it, but you would probably have to reload core on read instances to be aware of changes. Not sure if you would get much out of it - just different replication mechanism. But it is late here and I never used Solr on HDFS so take this with a grain of salt.

>
> Thanks!!
>
>
>>
>> Regards,
>> Emir
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 12 Feb 2018, at 12:00, Daniel Carrasco <[hidden email]> wrote:
>>>
>>> Hello, thanks for your help.
>>>
>>> I answer bellow.
>>>
>>> Greetings!!
>>>
>>> 2018-02-12 11:31 GMT+01:00 Emir Arnautović <[hidden email]
>> <mailto:[hidden email] <mailto:[hidden email]>>>:
>>>
>>>> Hi Daniel,
>>>> Can you tell us more about your document update process. How do you
>> commit
>>>> changes? Since it got fixed after restart, it seems to me that on that
>> one
>>>> node index searcher was not reopened after updates. Do you see any
>>>> errors/warnings on that node?
>>>>
>>>
>>> i've asked to the programmers and looks like they are using the
>> collections
>>> dataimport using curl. I think the data is imported from a Microsoft SQL
>>> server using a solr plugin.
>>>
>>>
>>>> Also, what do you mean by “All nodes are standalone”?
>>>>
>>>
>>> I mean that nodes don't share filesystem (I'm planning to use Hadoop, but
>>> I've to learn to create and maintain the cluster first). All nodes has
>> its
>>> own data drive and are connected to the cluster using zookeeper.
>>>
>>>
>>>>
>>>> Regards,
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>>>
>>>>
>>>>
>>>>> On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email]>
>> wrote:
>>>>>
>>>>> Hello,
>>>>>
>>>>> We're using Solr to manage products data on our shop and the last week
>>>> some
>>>>> customers called us telling that price between shop and shopping basket
>>>>> differs. After research a bit I've noticed that it happens sometimes on
>>>>> page refresh.
>>>>> After disabling all cache I've queried all solr instances to see if
>> data
>>>> is
>>>>> correct and I've seen that one of them give a different price for the
>>>>> product, so looks like the instance has not got the updated data.
>>>>>
>>>>> - How is possible that a node on a cluster have different data?
>>>>> - How i can check if data is in sync?, because the cluster looks al
>>>>> healthy on admin, and the node is active and OK.
>>>>> - Is there any way to detect this error? and How I can force resyncs?
>>>>>
>>>>> After restart the node it got synced, so the data now is OK, but I
>> can't
>>>>> restart the nodes every time to see if data is right (it tooks a lot of
>>>>> time to be synced again).
>>>>>
>>>>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11.
>> All
>>>>> nodes are standalone (I'm not using hadoop).
>>>>>
>>>>> Thanks and greetings!
>>>>> --
>>>>> _________________________________________
>>>>>
>>>>>    Daniel Carrasco Marín
>>>>>    Ingeniería para la Innovación i2TIC, S.L.
>>>>>    Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
>>>>>    www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <http://www.i2tic.com/>>
>>>>> _________________________________________
>>>>
>>>>
>>>
>>>
>>> --
>>> _________________________________________
>>>
>>>     Daniel Carrasco Marín
>>>     Ingeniería para la Innovación i2TIC, S.L.
>>>     Tlf:  +34 911 12 32 84 Ext: 223
>>>     www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <http://www.i2tic.com/>>
>>> _________________________________________
>>
>>
>
>
> --
> _________________________________________
>
>      Daniel Carrasco Marín
>      Ingeniería para la Innovación i2TIC, S.L.
>      Tlf:  +34 911 12 32 84 Ext: 223
>      www.i2tic.com <http://www.i2tic.com/>
> _________________________________________

Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Daniel Carrasco
Hello,

I answer inline ;)

2018-02-12 23:56 GMT+01:00 Emir Arnautović <[hidden email]>:

> Hi Daniel,
> Please see inline comments.
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 12 Feb 2018, at 13:13, Daniel Carrasco <[hidden email]> wrote:
> >
> > Hello,
> >
> > 2018-02-12 12:32 GMT+01:00 Emir Arnautović <[hidden email]
> <mailto:[hidden email]>>:
> >
> >> Hi Daniel,
> >> Maybe it is Monday and I am still not warmed up, but your details seems
> a
> >> bit imprecise to me. Maybe not directly related to your problem, but
> just
> >> to exclude that you are having some strange Solr setup, here is my
> >> understanding: You are running a single SolrCloud cluster with 8 nodes.
> It
> >> has a single collection with X shards and Y replicas. You use DIH to
> index
> >> data and you use curl to interact with Solr and start DIH process. You
> see
> >> some of replicas for some of shards having less data and after node
> restart
> >> it ends up being ok.
> >
> >
> >> Is this right? If it is, what is X and Y?
> >
> >
> > Near to reality:
> >
> >   - I've a SolrCloud cluster with 8 nodes but has multiple collections.
> >   - Every collection has only one shard for performance purpose (I did
> >   some test splitting shards and queries were slower).
> Distributed request comes with an overhead and if collection is small,
> that overhead can be larger than benefit of parallelising search.
>
> >   - Every collection has 8 replicas (one by node)
> I would compare all shards on all nodes (64 Solr cores) v.s. having just
> one replica (16 Solr cores)
>

We've all shards on all nodes because we want to avoid the overhead of send
data between nodes (latency, network traffic). The page has a lot of
petitions per second and we want the fastest response posible with HA if
some nodes fails.

Also we've 8 collections: five are small (less than 15Mb), and three of
then has some GB (the bigger is about 10Gb).
Is the first SolrCloud I've created and I'd decided this architecture to
avoid what I say above, the overhead of sent the data between nodes when
the client ask for data to another node. Maybe is better to have a
replication factor of 3-4 for example and create shards for big collections?


>
> >   - After restart the node it start to recover the collections. I don't
> >   know if Solr serve data directly on that state or get the data from
> other
> >   nodes before serve it, but even while is recovering, the data looks OK.
> Recovery can be from transaction logs (logs can tell) and that would mean
> that there was no hard commit after some updates.
>
> >
> >
> >
> >> Do you have autocommit set up or you commit explicitly?
> >
> >
> > I'm not sure about that. How I can check it?
> It is part of solr.xml
>

I've checked the file and looks like there's no configuration about
autocommit. I'll search a bit about how works to see if can help.


>
> >
> > On curl command is not specified, but will be true by default, right?
> I think it is for full import.
>
> >
> >
> >
> >> Did you check logs on node with less data and did you see any
> >> errors/warnings?
> >
> >
> > I'm not sure when it failed and the cluster has a lot warnings and error
> > every time (maybe related with queries from shop), so is hard to
> determine
> > if import error exists and what's the error related to the import. Is
> like
> > search a needle on a haystack
> Not with some logging solution - one such is Sematext’s Logsene:
> https://sematext.com/logsene/ <https://sematext.com/logsene/>
>
> >
> >
> >> Do you do full imports or incremental imports?
> >>
> >
> > I've checked the curl command and looks like is doing full imports
> without
> > clean data:
> > http://' <http://'/> . $solr_ip .
> > ':8983/solr/descriptions/dataimport?command=full-
> import&clean=false&entity=description_’.$idm[$j].'_lastupdate
> This is not a good strategy since Solr does not have real updates - it is
> delete + insert and deleted documents are purged on segment merges. Also
> this will not eliminate documents that are deleted in the primary storage.
> It is much better to index it in new collection and use aliases to point to
> used collection. This way you can even roll back if not happy with new
> index.
>
>
But if you update three products for example and you create a new
collection with that updates, how you point that three products to original
collection?, or you've to reindex the whole collection on a new collection
and then create an alias?, because also I don't know if is a good idea to
reindex a whole collection of 10Gb every 5 minutes to create another
collection with the updates. Also you've to manage the way to delete the
old collections, to avoid to fill the disk.



> >
> >
> >>
> >> Not related to issue, but just a note that Solr does not guaranty
> >> consistency at any time - it has something called “eventual
> consistency” -
> >> once updates stop all replicas will (should) end up in the same state.
> >> Having said that, using Solr results directly in your UI would either
> >> require you to cache used documents on UI/middle layer or implement some
> >> sort of stickiness or retrieve only ID from Solr and load data from
> primary
> >> storage. If you have static data, and you update index once a day, you
> can
> >> use aliases and switch between new and old index and you will suffer
> from
> >> this issue only at the time when doing switches.
> >>
> >
> > But is normal that data will be inconsistent for a very long time?,
> because
> > looks like the data is inconsistent from about a week…
> It will become consistent once all changes are committed and searchers
> reopened.
>
> >
> > Another question: With HDFS, data will be consistent?. With HDFS the data
> > will be shared between nodes and then updates will be avaible on all
> nodes
> > at same time, right?
> I am not too familiar with running Solr on HDFS, but I doubt that it is
> working the way you expect it to work. You might be able to have multiple
> Solr instances (not part of the same cluster - can be standalone Solr)
> reading from the same HDFS directory and one updating it, but you would
> probably have to reload core on read instances to be aware of changes. Not
> sure if you would get much out of it - just different replication
> mechanism. But it is late here and I never used Solr on HDFS so take this
> with a grain of salt.
>

Today I've read a comment that said is like standalone server and it
creates a copy of the data for every node that has replica. Maybe is wrong,
but is not what I want.
I want to have a SolrCluster like now but sharing data through HDFS and
with the ability of autoscaling if there's too much load (AWS and GCP
autoscaling group), but maybe is like search unicorns...


> >
> > Thanks!!
> >
> >
> >>
> >> Regards,
> >> Emir
> >> --
> >> Monitoring - Log Management - Alerting - Anomaly Detection
> >> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
> >>
> >>
> >>
> >>> On 12 Feb 2018, at 12:00, Daniel Carrasco <[hidden email]>
> wrote:
> >>>
> >>> Hello, thanks for your help.
> >>>
> >>> I answer bellow.
> >>>
> >>> Greetings!!
> >>>
> >>> 2018-02-12 11:31 GMT+01:00 Emir Arnautović <
> [hidden email]
> >> <mailto:[hidden email] <mailto:emir.arnautovic@
> sematext.com>>>:
> >>>
> >>>> Hi Daniel,
> >>>> Can you tell us more about your document update process. How do you
> >> commit
> >>>> changes? Since it got fixed after restart, it seems to me that on that
> >> one
> >>>> node index searcher was not reopened after updates. Do you see any
> >>>> errors/warnings on that node?
> >>>>
> >>>
> >>> i've asked to the programmers and looks like they are using the
> >> collections
> >>> dataimport using curl. I think the data is imported from a Microsoft
> SQL
> >>> server using a solr plugin.
> >>>
> >>>
> >>>> Also, what do you mean by “All nodes are standalone”?
> >>>>
> >>>
> >>> I mean that nodes don't share filesystem (I'm planning to use Hadoop,
> but
> >>> I've to learn to create and maintain the cluster first). All nodes has
> >> its
> >>> own data drive and are connected to the cluster using zookeeper.
> >>>
> >>>
> >>>>
> >>>> Regards,
> >>>> Emir
> >>>> --
> >>>> Monitoring - Log Management - Alerting - Anomaly Detection
> >>>> Solr & Elasticsearch Consulting Support Training -
> http://sematext.com/
> >>>>
> >>>>
> >>>>
> >>>>> On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email]>
> >> wrote:
> >>>>>
> >>>>> Hello,
> >>>>>
> >>>>> We're using Solr to manage products data on our shop and the last
> week
> >>>> some
> >>>>> customers called us telling that price between shop and shopping
> basket
> >>>>> differs. After research a bit I've noticed that it happens sometimes
> on
> >>>>> page refresh.
> >>>>> After disabling all cache I've queried all solr instances to see if
> >> data
> >>>> is
> >>>>> correct and I've seen that one of them give a different price for the
> >>>>> product, so looks like the instance has not got the updated data.
> >>>>>
> >>>>> - How is possible that a node on a cluster have different data?
> >>>>> - How i can check if data is in sync?, because the cluster looks al
> >>>>> healthy on admin, and the node is active and OK.
> >>>>> - Is there any way to detect this error? and How I can force resyncs?
> >>>>>
> >>>>> After restart the node it got synced, so the data now is OK, but I
> >> can't
> >>>>> restart the nodes every time to see if data is right (it tooks a lot
> of
> >>>>> time to be synced again).
> >>>>>
> >>>>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11.
> >> All
> >>>>> nodes are standalone (I'm not using hadoop).
> >>>>>
> >>>>> Thanks and greetings!
> >>>>> --
> >>>>> _________________________________________
> >>>>>
> >>>>>    Daniel Carrasco Marín
> >>>>>    Ingeniería para la Innovación i2TIC, S.L.
> >>>>>    Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
> >>>>>    www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <
> http://www.i2tic.com/>>
> >>>>> _________________________________________
> >>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> _________________________________________
> >>>
> >>>     Daniel Carrasco Marín
> >>>     Ingeniería para la Innovación i2TIC, S.L.
> >>>     Tlf:  +34 911 12 32 84 Ext: 223
> >>>     www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <
> http://www.i2tic.com/>>
> >>> _________________________________________
> >>
> >>
> >
> >
> > --
> > _________________________________________
> >
> >      Daniel Carrasco Marín
> >      Ingeniería para la Innovación i2TIC, S.L.
> >      Tlf:  +34 911 12 32 84 Ext: 223
> >      www.i2tic.com <http://www.i2tic.com/>
> > _________________________________________
>
>
Thanks!!

--
_________________________________________

      Daniel Carrasco Marín
      Ingeniería para la Innovación i2TIC, S.L.
      Tlf:  +34 911 12 32 84 Ext: 223
      www.i2tic.com
_________________________________________
Reply | Threaded
Open this post in threaded view
|

Re: Solr node is out of sync (looks Healthy)

Emir Arnautović
Hi Daniel,
Back to your original question. What is the diff between doc number on replicas - a few docs or large number? My assumption is that you don’t have autocommit enabled and that you commit explicitly when indexing is done, and somehow on some replica(s) commit is processed before all docs are indexed.
Some inline comments.

Emir
--
Monitoring - Log Management - Alerting - Anomaly Detection
Solr & Elasticsearch Consulting Support Training - http://sematext.com/



> On 13 Feb 2018, at 10:22, Daniel Carrasco <[hidden email]> wrote:
>
> Hello,
>
> I answer inline ;)
>
> 2018-02-12 23:56 GMT+01:00 Emir Arnautović <[hidden email] <mailto:[hidden email]>>:
>
>> Hi Daniel,
>> Please see inline comments.
>> --
>> Monitoring - Log Management - Alerting - Anomaly Detection
>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>>
>>
>>
>>> On 12 Feb 2018, at 13:13, Daniel Carrasco <[hidden email]> wrote:
>>>
>>> Hello,
>>>
>>> 2018-02-12 12:32 GMT+01:00 Emir Arnautović <[hidden email]
>> <mailto:[hidden email] <mailto:[hidden email]>>>:
>>>
>>>> Hi Daniel,
>>>> Maybe it is Monday and I am still not warmed up, but your details seems
>> a
>>>> bit imprecise to me. Maybe not directly related to your problem, but
>> just
>>>> to exclude that you are having some strange Solr setup, here is my
>>>> understanding: You are running a single SolrCloud cluster with 8 nodes.
>> It
>>>> has a single collection with X shards and Y replicas. You use DIH to
>> index
>>>> data and you use curl to interact with Solr and start DIH process. You
>> see
>>>> some of replicas for some of shards having less data and after node
>> restart
>>>> it ends up being ok.
>>>
>>>
>>>> Is this right? If it is, what is X and Y?
>>>
>>>
>>> Near to reality:
>>>
>>>  - I've a SolrCloud cluster with 8 nodes but has multiple collections.
>>>  - Every collection has only one shard for performance purpose (I did
>>>  some test splitting shards and queries were slower).
>> Distributed request comes with an overhead and if collection is small,
>> that overhead can be larger than benefit of parallelising search.
>>
>>>  - Every collection has 8 replicas (one by node)
>> I would compare all shards on all nodes (64 Solr cores) v.s. having just
>> one replica (16 Solr cores)
>>
>
> We've all shards on all nodes because we want to avoid the overhead of send
> data between nodes (latency, network traffic). The page has a lot of
> petitions per second and we want the fastest response posible with HA if
> some nodes fails.
If you are using Solrj in your middle layer, you can initialize it with ZK and it will be aware where are collections and send directly to nodes with shards on it.

>
> Also we've 8 collections: five are small (less than 15Mb), and three of
> then has some GB (the bigger is about 10Gb).
> Is the first SolrCloud I've created and I'd decided this architecture to
> avoid what I say above, the overhead of sent the data between nodes when
> the client ask for data to another node. Maybe is better to have a
> replication factor of 3-4 for example and create shards for big collections?
Only test can tell if splitting large collection will give some benefits. If you are happy with your query latency, then you don’t have to split.

>
>
>>
>>>  - After restart the node it start to recover the collections. I don't
>>>  know if Solr serve data directly on that state or get the data from
>> other
>>>  nodes before serve it, but even while is recovering, the data looks OK.
>> Recovery can be from transaction logs (logs can tell) and that would mean
>> that there was no hard commit after some updates.
>>
>>>
>>>
>>>
>>>> Do you have autocommit set up or you commit explicitly?
>>>
>>>
>>> I'm not sure about that. How I can check it?
>> It is part of solr.xml
>>
>
> I've checked the file and looks like there's no configuration about
> autocommit. I’ll search a bit about how works to see if can help.
>
I might have pointed to the wrong filename - solrconfig.xml. Under updateHandler you should find autoCommit and autoSoftCommit.

>
>>
>>>
>>> On curl command is not specified, but will be true by default, right?
>> I think it is for full import.
>>
>>>
>>>
>>>
>>>> Did you check logs on node with less data and did you see any
>>>> errors/warnings?
>>>
>>>
>>> I'm not sure when it failed and the cluster has a lot warnings and error
>>> every time (maybe related with queries from shop), so is hard to
>> determine
>>> if import error exists and what's the error related to the import. Is
>> like
>>> search a needle on a haystack
>> Not with some logging solution - one such is Sematext’s Logsene:
>> https://sematext.com/logsene/ <https://sematext.com/logsene/> <https://sematext.com/logsene/ <https://sematext.com/logsene/>>
>>
>>>
>>>
>>>> Do you do full imports or incremental imports?
>>>>
>>>
>>> I've checked the curl command and looks like is doing full imports
>> without
>>> clean data:
>>> http://' <http://'/> <http://'/ <http://'/>> . $solr_ip .
>>> ':8983/solr/descriptions/dataimport?command=full-
>> import&clean=false&entity=description_’.$idm[$j].'_lastupdate
>> This is not a good strategy since Solr does not have real updates - it is
>> delete + insert and deleted documents are purged on segment merges. Also
>> this will not eliminate documents that are deleted in the primary storage.
>> It is much better to index it in new collection and use aliases to point to
>> used collection. This way you can even roll back if not happy with new
>> index.
>>
>>
> But if you update three products for example and you create a new
> collection with that updates, how you point that three products to original
> collection?, or you've to reindex the whole collection on a new collection
> and then create an alias?,
You create alias first and point to some existing collection and reconfigure your app to use alias. Then when doing full-import, you create new collection, do import, verify results and point alias to that new collection. You can keep old collection so you can switch back to it, or delete it. Note after initial reconfiguration of your app, this switching is transparent to your app.

> because also I don't know if is a good idea to
> reindex a whole collection of 10Gb every 5 minutes to create another
> collection with the updates.
I got impression that it is the only way you do updates. Do you do direct updates as well?

> Also you've to manage the way to delete the
> old collections, to avoid to fill the disk.
You can delete entire collection at any moment once alias is pointing to a new collection.

>
>
>
>>>
>>>
>>>>
>>>> Not related to issue, but just a note that Solr does not guaranty
>>>> consistency at any time - it has something called “eventual
>> consistency” -
>>>> once updates stop all replicas will (should) end up in the same state.
>>>> Having said that, using Solr results directly in your UI would either
>>>> require you to cache used documents on UI/middle layer or implement some
>>>> sort of stickiness or retrieve only ID from Solr and load data from
>> primary
>>>> storage. If you have static data, and you update index once a day, you
>> can
>>>> use aliases and switch between new and old index and you will suffer
>> from
>>>> this issue only at the time when doing switches.
>>>>
>>>
>>> But is normal that data will be inconsistent for a very long time?,
>> because
>>> looks like the data is inconsistent from about a week…
>> It will become consistent once all changes are committed and searchers
>> reopened.
>>
>>>
>>> Another question: With HDFS, data will be consistent?. With HDFS the data
>>> will be shared between nodes and then updates will be avaible on all
>> nodes
>>> at same time, right?
>> I am not too familiar with running Solr on HDFS, but I doubt that it is
>> working the way you expect it to work. You might be able to have multiple
>> Solr instances (not part of the same cluster - can be standalone Solr)
>> reading from the same HDFS directory and one updating it, but you would
>> probably have to reload core on read instances to be aware of changes. Not
>> sure if you would get much out of it - just different replication
>> mechanism. But it is late here and I never used Solr on HDFS so take this
>> with a grain of salt.
>>
>
> Today I've read a comment that said is like standalone server and it
> creates a copy of the data for every node that has replica. Maybe is wrong,
> but is not what I want.
> I want to have a SolrCluster like now but sharing data through HDFS and
> with the ability of autoscaling if there's too much load (AWS and GCP
> autoscaling group), but maybe is like search unicorns…
It is probably the best to start a new thread with HDFS questions.

>
>
>>>
>>> Thanks!!
>>>
>>>
>>>>
>>>> Regards,
>>>> Emir
>>>> --
>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>> Solr & Elasticsearch Consulting Support Training - http://sematext.com/ <http://sematext.com/>
>>>>
>>>>
>>>>
>>>>> On 12 Feb 2018, at 12:00, Daniel Carrasco <[hidden email] <mailto:[hidden email]>>
>> wrote:
>>>>>
>>>>> Hello, thanks for your help.
>>>>>
>>>>> I answer bellow.
>>>>>
>>>>> Greetings!!
>>>>>
>>>>> 2018-02-12 11:31 GMT+01:00 Emir Arnautović <
>> [hidden email] <mailto:[hidden email]>
>>>> <mailto:[hidden email] <mailto:[hidden email]> <mailto:emir.arnautovic@
>> sematext.com <http://sematext.com/>>>>:
>>>>>
>>>>>> Hi Daniel,
>>>>>> Can you tell us more about your document update process. How do you
>>>> commit
>>>>>> changes? Since it got fixed after restart, it seems to me that on that
>>>> one
>>>>>> node index searcher was not reopened after updates. Do you see any
>>>>>> errors/warnings on that node?
>>>>>>
>>>>>
>>>>> i've asked to the programmers and looks like they are using the
>>>> collections
>>>>> dataimport using curl. I think the data is imported from a Microsoft
>> SQL
>>>>> server using a solr plugin.
>>>>>
>>>>>
>>>>>> Also, what do you mean by “All nodes are standalone”?
>>>>>>
>>>>>
>>>>> I mean that nodes don't share filesystem (I'm planning to use Hadoop,
>> but
>>>>> I've to learn to create and maintain the cluster first). All nodes has
>>>> its
>>>>> own data drive and are connected to the cluster using zookeeper.
>>>>>
>>>>>
>>>>>>
>>>>>> Regards,
>>>>>> Emir
>>>>>> --
>>>>>> Monitoring - Log Management - Alerting - Anomaly Detection
>>>>>> Solr & Elasticsearch Consulting Support Training -
>> http://sematext.com/ <http://sematext.com/>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> On 12 Feb 2018, at 11:16, Daniel Carrasco <[hidden email] <mailto:[hidden email]>>
>>>> wrote:
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> We're using Solr to manage products data on our shop and the last
>> week
>>>>>> some
>>>>>>> customers called us telling that price between shop and shopping
>> basket
>>>>>>> differs. After research a bit I've noticed that it happens sometimes
>> on
>>>>>>> page refresh.
>>>>>>> After disabling all cache I've queried all solr instances to see if
>>>> data
>>>>>> is
>>>>>>> correct and I've seen that one of them give a different price for the
>>>>>>> product, so looks like the instance has not got the updated data.
>>>>>>>
>>>>>>> - How is possible that a node on a cluster have different data?
>>>>>>> - How i can check if data is in sync?, because the cluster looks al
>>>>>>> healthy on admin, and the node is active and OK.
>>>>>>> - Is there any way to detect this error? and How I can force resyncs?
>>>>>>>
>>>>>>> After restart the node it got synced, so the data now is OK, but I
>>>> can't
>>>>>>> restart the nodes every time to see if data is right (it tooks a lot
>> of
>>>>>>> time to be synced again).
>>>>>>>
>>>>>>> My configuration is: 8 Solr nodes using v7.1.0 and zookeeper v3.4.11.
>>>> All
>>>>>>> nodes are standalone (I'm not using hadoop).
>>>>>>>
>>>>>>> Thanks and greetings!
>>>>>>> --
>>>>>>> _________________________________________
>>>>>>>
>>>>>>>   Daniel Carrasco Marín
>>>>>>>   Ingeniería para la Innovación i2TIC, S.L.
>>>>>>>   Tlf:  +34 911 12 32 84 Ext: 223 <+34%20911%2012%2032%2084>
>>>>>>>   www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <http://www.i2tic.com/>> <http://www.i2tic.com/ <http://www.i2tic.com/> <
>> http://www.i2tic.com/ <http://www.i2tic.com/>>>
>>>>>>> _________________________________________
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> _________________________________________
>>>>>
>>>>>    Daniel Carrasco Marín
>>>>>    Ingeniería para la Innovación i2TIC, S.L.
>>>>>    Tlf:  +34 911 12 32 84 Ext: 223
>>>>>    www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <http://www.i2tic.com/>> <http://www.i2tic.com/ <http://www.i2tic.com/> <
>> http://www.i2tic.com/ <http://www.i2tic.com/>>>
>>>>> _________________________________________
>>>>
>>>>
>>>
>>>
>>> --
>>> _________________________________________
>>>
>>>     Daniel Carrasco Marín
>>>     Ingeniería para la Innovación i2TIC, S.L.
>>>     Tlf:  +34 911 12 32 84 Ext: 223
>>>     www.i2tic.com <http://www.i2tic.com/> <http://www.i2tic.com/ <http://www.i2tic.com/>>
>>> _________________________________________
>>
>>
> Thanks!!
>
> --
> _________________________________________
>
>      Daniel Carrasco Marín
>      Ingeniería para la Innovación i2TIC, S.L.
>      Tlf:  +34 911 12 32 84 Ext: 223
>      www.i2tic.com <http://www.i2tic.com/>
> _________________________________________