/replication?command=details does not show infos for all replicas on the core

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

/replication?command=details does not show infos for all replicas on the core

Arturas Mazeika
Hi Solr-Team,

I am benchmarking solr with the German Wikipedia pages on 4 nodes (Running
on ports 9999, 9998, 9997 and 9996), 4 shards, replication factor 2):

"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 3g -cloud -p 9999 -s
"F:\solr_server\solr-7.2.1\example\cloud\node1\solr"
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 3g -cloud -p 9998 -s
"F:\solr_server\solr-7.2.1\example\cloud\node2\solr" -z localhost:10999
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 3g -cloud -p 9997 -s
"F:\solr_server\solr-7.2.1\example\cloud\node3\solr" -z localhost:10999
"F:\solr_server\solr-7.2.1\bin\solr.cmd" start -m 3g -cloud -p 9996 -s
"F:\solr_server\solr-7.2.1\example\cloud\node4\solr" -z localhost:10999

created with

*http://localhost:9999/solr/admin/collections?action=CREATE&name=de_wiki_man&numShards=4&replicationFactor=2&maxShardsPerNode=2&wt=xml
<http://localhost:9999/solr/admin/collections?action=CREATE&name=de_wiki_man&numShards=4&replicationFactor=2&maxShardsPerNode=2&wt=xml>*

Then I inserted 40GB of data into the system and was curious how large the
index got. The query

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9996/solr/admin/cores?action=STATUS | grep
"size\|numDocs\|name" | sed "s/}},/\n/g"
      "name":"de_wiki_all_shard1_replica_n2",
        "numDocs":671396,
        "sizeInBytes":3781265902,
        "size":"3.52 GB"

      "name":"de_wiki_all_shard3_replica_n10",
        "numDocs":670564,
        "sizeInBytes":3874165653,
        "size":"3.61 GB"

      "name":"de_wiki_man_shard2_replica_n4",
        "numDocs":670498,
        "sizeInBytes":11936390483,
        "size":"11.12 GB"

      "name":"de_wiki_man_shard4_replica_n12",
        "numDocs":671484,
        "sizeInBytes":16153375004,
        "size":"15.04 GB"

      "name":"trans_shard1_replica_n1",
        "numDocs":0,
        "sizeInBytes":69,
        "size":"69 bytes"}}}}

but the query reports infos on only one shard:

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
    "indexSize":"15.04 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",

I wonder why the infos for the second replica are not shown. Comments?

Cheers,
Arturas









Additional infos:


F:\solr_server\solr-7.2.1>curl -s
http://localhost:9999/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
    "indexSize":"16.73 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node1\\solr\\de_wiki_man_shard1_replica_n1\\data\\index.20180629092013755",
        "indexSize":"15.32 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node2\\solr\\de_wiki_man_shard1_replica_n2\\data\\index/",

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9998/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
    "indexSize":"15.32 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node2\\solr\\de_wiki_man_shard1_replica_n2\\data\\index/",

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9997/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
    "indexSize":"16.51 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node3\\solr\\de_wiki_man_shard2_replica_n6\\data\\index.20180629063901343",
        "indexSize":"11.12 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node4\\solr\\de_wiki_man_shard2_replica_n4\\data\\index/",

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
"indexPath\|indexSize"
    "indexSize":"11.12 GB",

"indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node4\\solr\\de_wiki_man_shard2_replica_n4\\data\\index/",








F:\solr_server\solr-7.2.1>curl -s
http://localhost:9999/solr/admin/cores?action=STATUS | grep
"size\|numDocs\|name" | sed "s/}},/\n/g"
      "name":"de_wiki_all_shard1_replica_n1",
        "numDocs":671396,
        "sizeInBytes":3815456445,
        "size":"3.55 GB"

      "name":"de_wiki_all_shard3_replica_n8",
        "numDocs":670564,
        "sizeInBytes":3821193139,
        "size":"3.56 GB"

      "name":"de_wiki_man_shard1_replica_n1",
        "numDocs":1141843,
        "sizeInBytes":17967817775,
        "size":"16.73 GB"

      "name":"de_wiki_man_shard3_replica_n8",
        "numDocs":670823,
        "sizeInBytes":11625124732,
        "size":"10.83 GB"}}}}

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9998/solr/admin/cores?action=STATUS | grep
"size\|numDocs\|name" | sed "s/}},/\n/g"
      "name":"de_wiki_all_shard2_replica_n6",
        "numDocs":670221,
        "sizeInBytes":3828566867,
        "size":"3.57 GB"

      "name":"de_wiki_all_shard4_replica_n14",
        "numDocs":669221,
        "sizeInBytes":3772631249,
        "size":"3.51 GB"

      "name":"de_wiki_man_shard1_replica_n2",
        "numDocs":668807,
        "sizeInBytes":16449833639,
        "size":"15.32 GB"

      "name":"de_wiki_man_shard3_replica_n10",
        "numDocs":670823,
        "sizeInBytes":15987092480,
        "size":"14.89 GB"

      "name":"tph_shard1_replica_n1",
        "numDocs":978,
        "sizeInBytes":221949466,
        "size":"211.67 MB"}}}}

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9997/solr/admin/cores?action=STATUS | grep
"size\|numDocs\|name" | sed "s/}},/\n/g"
      "name":"de_wiki_all_shard2_replica_n4",
        "numDocs":670221,
        "sizeInBytes":3800346469,
        "size":"3.54 GB"

      "name":"de_wiki_all_shard4_replica_n12",
        "numDocs":669221,
        "sizeInBytes":3766456764,
        "size":"3.51 GB"

      "name":"de_wiki_man_shard2_replica_n6",
        "numDocs":670498,
        "sizeInBytes":17728524151,
        "size":"16.51 GB"

      "name":"de_wiki_man_shard4_replica_n14",
        "numDocs":671484,
        "sizeInBytes":12720635597,
        "size":"11.85 GB"}}}}

F:\solr_server\solr-7.2.1>curl -s
http://localhost:9996/solr/admin/cores?action=STATUS | grep
"size\|numDocs\|name" | sed "s/}},/\n/g"
      "name":"de_wiki_all_shard1_replica_n2",
        "numDocs":671396,
        "sizeInBytes":3781265902,
        "size":"3.52 GB"

      "name":"de_wiki_all_shard3_replica_n10",
        "numDocs":670564,
        "sizeInBytes":3874165653,
        "size":"3.61 GB"

      "name":"de_wiki_man_shard2_replica_n4",
        "numDocs":670498,
        "sizeInBytes":11936390483,
        "size":"11.12 GB"

      "name":"de_wiki_man_shard4_replica_n12",
        "numDocs":671484,
        "sizeInBytes":16153375004,
        "size":"15.04 GB"

      "name":"trans_shard1_replica_n1",
        "numDocs":0,
        "sizeInBytes":69,
        "size":"69 bytes"}}}}
Reply | Threaded
Open this post in threaded view
|

Re: /replication?command=details does not show infos for all replicas on the core

Shawn Heisey-2
On 6/29/2018 7:53 AM, Arturas Mazeika wrote:

> but the query reports infos on only one shard:
>
> F:\solr_server\solr-7.2.1>curl -s
> http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
> "indexPath\|indexSize"
>      "indexSize":"15.04 GB",
>
> "indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\node4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",
>
> I wonder why the infos for the second replica are not shown. Comments?

SolrCloud is aware of (and uses) the replication feature, but the
replication feature is not cloud-aware.  It is a core-level feature (not
a cloud-specific feature) and is only aware of that one specific core
(shard replica).  This is not likely to ever change.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: /replication?command=details does not show infos for all replicas on the core

Arturas Mazeika
Hi Shawn et al,

Thanks a lot for the clarification. It makes a lot of sense and explains
which functionality needs to be used to get the infos :-).

Out of curiosity: some cores give infos for both shards (through
replication query) and some only for one (if you still be able to see the
prev post). I wonder why..

Cheers,
Arturas

On Fri, Jun 29, 2018 at 4:30 PM, Shawn Heisey <[hidden email]> wrote:

> On 6/29/2018 7:53 AM, Arturas Mazeika wrote:
>
>> but the query reports infos on only one shard:
>>
>> F:\solr_server\solr-7.2.1>curl -s
>> http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
>> "indexPath\|indexSize"
>>      "indexSize":"15.04 GB",
>>
>> "indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\no
>> de4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",
>>
>> I wonder why the infos for the second replica are not shown. Comments?
>>
>
> SolrCloud is aware of (and uses) the replication feature, but the
> replication feature is not cloud-aware.  It is a core-level feature (not a
> cloud-specific feature) and is only aware of that one specific core (shard
> replica).  This is not likely to ever change.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: /replication?command=details does not show infos for all replicas on the core

Erick Erickson
Arturas:

Please make yourself a promise, "Only use the collections commands" ;)
At least for a while.

Trying to mix collection-level commands and core-level commands is
extremely confusing at the start. Under the covers, the Collections
API _uses_ the Core API, but in a very precise manner. Any seemingly
innocent mistake will be hard to untangle.

For your first question: "I wonder why the infos for the second
replica are not shown..." the answer is that you are using a
core-level API which does not "understand" anything about SolrCloud,
it's all purely local to that instance. So it's doing exactly what you
ask it to; reporting on the status of cores (replicas) _on that
particular Solr instance_. The _Collections_ API _is_ cloud/Zookeeper
aware and will report them all. What it does is fire the core-level
command out to all live Solr nodes and assemble the response into a
single cluster-wide report.

Second, the core-level "replication" command is all about old-style
master/slave index replication and I have no idea what it's reporting
on when you ask for replication status in SolrCloud. It has nothing to
do with, say, "replication factor" or anything else cloud related as
Shawn indicates. Old-style master/slave is used in SolrCloud under the
covers for "full sync", perhaps that's happened sometime (although
ideally it won't happen at all unless something goes wrong with normal
indexing and the only option is to copy the entire index from the
leader). The take-away is that the replication command is probably not
doing what you think it is.

Best,
Erick

On Fri, Jun 29, 2018 at 7:47 AM, Arturas Mazeika <[hidden email]> wrote:

> Hi Shawn et al,
>
> Thanks a lot for the clarification. It makes a lot of sense and explains
> which functionality needs to be used to get the infos :-).
>
> Out of curiosity: some cores give infos for both shards (through
> replication query) and some only for one (if you still be able to see the
> prev post). I wonder why..
>
> Cheers,
> Arturas
>
> On Fri, Jun 29, 2018 at 4:30 PM, Shawn Heisey <[hidden email]> wrote:
>
>> On 6/29/2018 7:53 AM, Arturas Mazeika wrote:
>>
>>> but the query reports infos on only one shard:
>>>
>>> F:\solr_server\solr-7.2.1>curl -s
>>> http://localhost:9996/solr/de_wiki_man/replication?command=details | grep
>>> "indexPath\|indexSize"
>>>      "indexSize":"15.04 GB",
>>>
>>> "indexPath":"F:\\solr_server\\solr-7.2.1\\example\\cloud\\no
>>> de4\\solr\\de_wiki_man_shard4_replica_n12\\data\\index/",
>>>
>>> I wonder why the infos for the second replica are not shown. Comments?
>>>
>>
>> SolrCloud is aware of (and uses) the replication feature, but the
>> replication feature is not cloud-aware.  It is a core-level feature (not a
>> cloud-specific feature) and is only aware of that one specific core (shard
>> replica).  This is not likely to ever change.
>>
>> Thanks,
>> Shawn
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: /replication?command=details does not show infos for all replicas on the core

Shawn Heisey
In reply to this post by Arturas Mazeika
On 6/29/2018 8:47 AM, Arturas Mazeika wrote:
> Out of curiosity: some cores give infos for both shards (through
> replication query) and some only for one (if you still be able to see the
> prev post). I wonder why..

Adding to what Erick said:

If SolrCloud has initiated a replication on that core at some point
since that Solr instance started, then you might see both the master and
slave side of that replication reported by the replication handler.  If
a replication has never been initiated, then you will only see info
about the local core.

The replication handler is used by SolrCloud for two things:

1) Index recovery when a replica gets too far out of sync.
2) Replicating data to TLOG and PULL replica types (new in 7.x).

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: /replication?command=details does not show infos for all replicas on the core

Arturas Mazeika
Hi Shawn,
hi Erick,
hi et al.,

Very nice clarifications indeed. I also looked at the index replication
section. In addition to the clarifications in this thread this brought
quite some light into the area (and shows that I need to read solrcloud
part of the manual more extensively). Thanks a lot indeed!

Cheers,
Arturas


On Fri, Jun 29, 2018 at 5:44 PM, Shawn Heisey <[hidden email]> wrote:

> On 6/29/2018 8:47 AM, Arturas Mazeika wrote:
>
>> Out of curiosity: some cores give infos for both shards (through
>> replication query) and some only for one (if you still be able to see the
>> prev post). I wonder why..
>>
>
> Adding to what Erick said:
>
> If SolrCloud has initiated a replication on that core at some point since
> that Solr instance started, then you might see both the master and slave
> side of that replication reported by the replication handler.  If a
> replication has never been initiated, then you will only see info about the
> local core.
>
> The replication handler is used by SolrCloud for two things:
>
> 1) Index recovery when a replica gets too far out of sync.
> 2) Replicating data to TLOG and PULL replica types (new in 7.x).
>
> Thanks,
> Shawn
>
>