[CDCR]Unable to locate core

classic Classic list List threaded Threaded
12 messages Options
Tim
Reply | Threaded
Open this post in threaded view
|

[CDCR]Unable to locate core

Tim
I'm trying to setup CDCR but I'm running into an issue where one or two
shards/replicas will not be replicated but the rest will out of the six
cores.

The only error that appears in the logs is: "Unable to locate core".

Occasionally restarting the instance will fix this but then the issue will
repeat itself next time there is an update to the source collection. But it
will not necessarily happen to the same core again.

Has anyone run into an error such as this before?




--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Tim
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Tim
After some more investigation it seems that we're running into the  same bug
found here <https://issues.apache.org/jira/browse/SOLR-11724>  .

However if my understanding is correct that bug in 7.3 was patched out.
Unfortunately we're running into the same behavior in 7.5

CDCR is replicating successfully to the leader node but is not replicating
to the followers.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Erick Erickson
CDCR does _not_ replicate to followers, it is a leader<->leader replication
of the raw document.

Once the document has been forwarded to the target's leader, then the
leader on the target system should forward it to followers on that
system just like any other update.

The Solr JIRA is unlikely the problem from what you describe.

1> are you sure you are _committing_ on the target system?
2> "unable to locate core" comes from where? The source? Target?
   CDCR?
3> is your target collection properly set up? Because it sounds
   a bit like your target cluster isn't running in SolrCloud mode.

Best,
Erick

On Fri, Feb 1, 2019 at 12:48 PM Tim <[hidden email]> wrote:

>
> After some more investigation it seems that we're running into the  same bug
> found here <https://issues.apache.org/jira/browse/SOLR-11724>  .
>
> However if my understanding is correct that bug in 7.3 was patched out.
> Unfortunately we're running into the same behavior in 7.5
>
> CDCR is replicating successfully to the leader node but is not replicating
> to the followers.
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Tim
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Tim
Thank you for the reply. Sorry I did not include more information in the
first post.

So maybe there's some confusion here from my end. So both the target and
source clusters are running in cloud mode. So I think you're correct that it
is a different issue. So it looks like the source leader to target leader is
successful but the target leader is then unsuccessful in replicating to its
followers.

The "unable to locate core" message is originally coming from the target
cluster.
*Here are the logs being generated from the source for reference:*
2019-02-02 20:10:19.551 INFO
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager CDCR
bootstrap successful in 3 seconds
2019-02-02 20:10:19.564 INFO
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Create
new update log reader for target testcollection with checkpoint
1624389130873995265 @ testcollection:shard3
2019-02-02 20:10:19.568 ERROR
(cdcr-bootstrap-status-81-thread-1-processing-n:sourcehost001.com:30100_solr
x:testcollection_shard3_replica_n10 c:testcollection s:shard3 r:core_node12)
[c:testcollection s:shard3 r:core_node12
x:testcollection_shard3_replica_n10] o.a.s.h.CdcrReplicatorManager Unable to
bootstrap the target collection testcollection shard: shard3
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://targethost001.com:30100/solr: Unable to locate core
testcollection_shard2_replica_n4
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_192]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_192]
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_192]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_192]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]
2019-02-02 20:10:19.602 INFO  (cdcr-bootstrap-status-86-thread-1) [   ]
o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds
2019-02-02 20:10:19.608 INFO  (cdcr-bootstrap-status-86-thread-1) [   ]
o.a.s.h.CdcrReplicatorManager Create new update log reader for target
testcollection with checkpoint 1624389130873995265 @ testcollection:shard2
2019-02-02 20:10:19.610 ERROR (cdcr-bootstrap-status-86-thread-1) [   ]
o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection
testcollection shard: shard2
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error
from server at http://targethost001.com:30100/solr: Unable to locate core
testcollection_shard2_replica_n4
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:817)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollower(CdcrReplicatorManager.java:439)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
org.apache.solr.handler.CdcrReplicatorManager.sendRequestRecoveryToFollowers(CdcrReplicatorManager.java:428)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
org.apache.solr.handler.CdcrReplicatorManager.access$300(CdcrReplicatorManager.java:63)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
org.apache.solr.handler.CdcrReplicatorManager$BootstrapStatusRunnable.run(CdcrReplicatorManager.java:306)
~[solr-core-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:55]
        at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
~[?:1.8.0_192]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
~[?:1.8.0_192]
        at
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
~[solr-solrj-7.5.0.jar:7.5.0 b5bf70b7e32d7ddd9742cc821d471c5fabd4e3df -
jimczi - 2018-09-18 13:07:58]
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
[?:1.8.0_192]
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
[?:1.8.0_192]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_192]

*
Here are the logs from the target:*
2019-02-02 20:10:19.566 INFO  (qtp1571967156-14) [  
x:testcollection_shard2_replica_n4] o.a.s.h.a.CoreAdminOperation It has been
requested that we recover: core=testcollection_shard2_replica_n4
2019-02-02 20:10:19.567 ERROR (qtp1571967156-14) [  
x:testcollection_shard2_replica_n4] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Unable to locate core
testcollection_shard2_replica_n4
        at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:167)
        at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
        at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at org.eclipse.jetty.server.Server.handle(Server.java:531)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
        at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
        at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
        at java.lang.Thread.run(Thread.java:748)

2019-02-02 20:10:19.567 INFO  (qtp1571967156-14) [  
x:testcollection_shard2_replica_n4] o.a.s.s.HttpSolrCall [admin] webapp=null
path=/admin/cores
params={core=testcollection_shard2_replica_n4&action=REQUESTRECOVERY&wt=javabin&version=2}
status=400 QTime=0
2019-02-02 20:10:19.607 INFO  (qtp1571967156-133) [c:testcollection s:shard1
r:core_node5 x:testcollection_shard1_replica_n2] o.a.s.c.S.Request
[testcollection_shard1_replica_n2]  webapp=/solr path=/cdcr
params={_stateVer_=testcollection:6&action=COLLECTIONCHECKPOINT&wt=javabin&version=2}
status=0 QTime=4
2019-02-02 20:10:19.609 INFO  (qtp1571967156-14) [  
x:testcollection_shard2_replica_n4] o.a.s.h.a.CoreAdminOperation It has been
requested that we recover: core=testcollection_shard2_replica_n4
2019-02-02 20:10:19.609 ERROR (qtp1571967156-14) [  
x:testcollection_shard2_replica_n4] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Unable to locate core
testcollection_shard2_replica_n4
        at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:167)
        at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
        at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at org.eclipse.jetty.server.Server.handle(Server.java:531)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
        at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
        at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
        at java.lang.Thread.run(Thread.java:748)

2019-02-02 20:10:19.609 INFO  (qtp1571967156-14) [  
x:testcollection_shard2_replica_n4] o.a.s.s.HttpSolrCall [admin] webapp=null
path=/admin/cores
params={core=testcollection_shard2_replica_n4&action=REQUESTRECOVERY&wt=javabin&version=2}
status=400 QTime=0
2019-02-02 20:10:19.701 INFO  (qtp1571967156-135) [  
x:testcollection_shard2_replica_n4] o.a.s.h.a.CoreAdminOperation It has been
requested that we recover: core=testcollection_shard2_replica_n4
2019-02-02 20:10:19.702 ERROR (qtp1571967156-135) [  
x:testcollection_shard2_replica_n4] o.a.s.h.RequestHandlerBase
org.apache.solr.common.SolrException: Unable to locate core
testcollection_shard2_replica_n4
        at
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:167)
        at
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:360)
        at
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:395)
        at
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:180)
        at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:199)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdmin(HttpSolrCall.java:734)
        at
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:715)
        at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:496)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:377)
        at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:323)
        at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1634)
        at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:533)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:146)
        at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:257)
        at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1595)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:255)
        at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1317)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:203)
        at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:473)
        at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1564)
        at
org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:201)
        at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1219)
        at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:144)
        at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:219)
        at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:126)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at
org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:335)
        at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:132)
        at org.eclipse.jetty.server.Server.handle(Server.java:531)
        at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:352)
        at
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:260)
        at
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:281)
        at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:102)
        at
org.eclipse.jetty.io.ChannelEndPoint$2.run(ChannelEndPoint.java:118)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:333)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:310)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:168)
        at
org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:126)
        at
org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:366)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:762)
        at
org.eclipse.jetty.util.thread.QueuedThreadPool$2.run(QueuedThreadPool.java:680)
        at java.lang.Thread.run(Thread.java:748)

2019-02-02 20:10:19.702 INFO  (qtp1571967156-135) [  
x:testcollection_shard2_replica_n4] o.a.s.s.HttpSolrCall [admin] webapp=null
path=/admin/cores
params={core=testcollection_shard2_replica_n4&action=REQUESTRECOVERY&wt=javabin&version=2}
status=400 QTime=0


So the leader to leader replication is successful but the target leader is
not able to replicate to the follower.

So after some more investigation I found it's the call to recover the core
that is having issues. When it makes the call to
/solr/admin/cores?core=testcollection_shard2_replica_n4&action=REQUESTRECOVERY
that is when it is unable to locate core. It's actually sending the call to
the wrong host.

So our setup is like the following: numShards: 3, replicationFact: 2,
maxShards: 2.

I wanted to retest it so I deleted and recreated the collection and the
aftermath looked like the following:
targetServer1s: s2r4 (shard 3, replica8), s3r10
targetServer2: s1,r1, s3r8,
targetServer3: s2r6, s3r10

sourceServer1: s2r4, s3r10
sourceServer2: s1r2, s3r8
sourceServer3: s1r1, s2r6

When running CDCR once again the replication to the leaders was successful.
The leaders for both were: s2r4, s3r10, s1r1.
After CDCR,  the source cluster made a call to recover s2r6. The source sent
this call to targetServer1 which contained s2r4, and s3r10.  As s2r6 is on
targetServer3, the call was unsuccessful as it was not able to locate s2r6
on targetServer1.

Unsure of a fix at this time.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Tim
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Tim
So it looks like I'm having an issue with this fix:
https://issues.apache.org/jira/browse/SOLR-11724

So I've messed around with this for a while and every time the leader to
leader replica portion works fine. But the Recovery portion (implemented as
part of the fix above) fails.

I've run a few tests and every time the recovery portion kicks off, it sends
the recovery command to the node which has the leader for a given replica
instead of the follower.
I've recreated the collection several times so that replicas are on
different nodes with the same results each time. It seems to be assumed that
the follower is on the same solr node as the leader.
 
For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
while the follower s3r8 is on node2, then the core recovery command meant
for s3r8 is being sent to node1 instead of node2.





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Natarajan, Rajeswari
I am also facing this issue. Any resolution found on this issue, Please update. Thanks

On 2/7/19, 10:42 AM, "Tim" <[hidden email]> wrote:

    So it looks like I'm having an issue with this fix:
    https://issues.apache.org/jira/browse/SOLR-11724
   
    So I've messed around with this for a while and every time the leader to
    leader replica portion works fine. But the Recovery portion (implemented as
    part of the fix above) fails.
   
    I've run a few tests and every time the recovery portion kicks off, it sends
    the recovery command to the node which has the leader for a given replica
    instead of the follower.
    I've recreated the collection several times so that replicas are on
    different nodes with the same results each time. It seems to be assumed that
    the follower is on the same solr node as the leader.
     
    For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
    while the follower s3r8 is on node2, then the core recovery command meant
    for s3r8 is being sent to node1 instead of node2.
   
   
   
   
   
    --
    Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
   

Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Natarajan, Rajeswari
In reply to this post by Tim
Hi

We are using solr 7.6 and trying out bidirectional CDCR and I also hit this issue.

Stacktrace

INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds                                                                              
INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager Create new update log reader for target abcd_ta with checkpoint -1 @ abcd_ta:shard1                                
ERROR (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection abcd_ta shard: shard1                                                    
olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.169.50.182:8983/solr: Unable to locate core kanna_ta_shard1_replica_n1                                                
lr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]        
lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
lr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
lr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]


I stepped through the code

private NamedList sendRequestRecoveryToFollower(SolrClient client, String coreName) throws SolrServerException, IOException {
    CoreAdminRequest.RequestRecovery recoverRequestCmd = new CoreAdminRequest.RequestRecovery();
    recoverRequestCmd.setAction(CoreAdminParams.CoreAdminAction.REQUESTRECOVERY);
    recoverRequestCmd.setCoreName(coreName);
    return client.request(recoverRequestCmd);
  }

 In the above method , recovery request command is admin command and it is specific to a core. In the  solrclient.request logic the code gets the liveservers and execute the command in a loop ,but  since this is admin command this is non re-triable.  Depending on which live server the code gets and where does the core lies , the recover request command might be successful or failure.  So I think there is problem with this code in trying to send the core command to all available live servers , the code I guess should find the correct server on which the core lies and send this request.

Regards,
Rajeswari

On 5/15/19, 10:59 AM, "Natarajan, Rajeswari" <[hidden email]> wrote:

    I am also facing this issue. Any resolution found on this issue, Please update. Thanks
   
    On 2/7/19, 10:42 AM, "Tim" <[hidden email]> wrote:
   
        So it looks like I'm having an issue with this fix:
        https://issues.apache.org/jira/browse/SOLR-11724
       
        So I've messed around with this for a while and every time the leader to
        leader replica portion works fine. But the Recovery portion (implemented as
        part of the fix above) fails.
       
        I've run a few tests and every time the recovery portion kicks off, it sends
        the recovery command to the node which has the leader for a given replica
        instead of the follower.
        I've recreated the collection several times so that replicas are on
        different nodes with the same results each time. It seems to be assumed that
        the follower is on the same solr node as the leader.
         
        For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
        while the follower s3r8 is on node2, then the core recovery command meant
        for s3r8 is being sent to node1 instead of node2.
       
       
       
       
       
        --
        Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
       
   
   

Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Natarajan, Rajeswari
Here is my close analysis:


SolrClient request goes to the below method  "request " in the class LBHttpSolrClient.java
There is a for loop to try  different live servers , but when  doRequest method  (in the request method below) sends exception there is no catch , so next re-try is not done. To solve this issue , there should be catch around doRequest and then the second time it will re-try the correct request. But in case there are multiple live servers, the request might timeout also.  This needs to be fixed to make CDCR bootstrap  work reliable. If not sometimes it will work good and sometimes not. I can work on this patch  if this is agreed.


public Rsp request(Req req) throws SolrServerException, IOException {
    Rsp rsp = new Rsp();
    Exception ex = null;
    boolean isNonRetryable = req.request instanceof IsUpdateRequest || ADMIN_PATHS.contains(req.request.getPath());
    List<ServerWrapper> skipped = null;

    final Integer numServersToTry = req.getNumServersToTry();
    int numServersTried = 0;

    boolean timeAllowedExceeded = false;
    long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
    long timeOutTime = System.nanoTime() + timeAllowedNano;
    for (String serverStr : req.getServers()) {
      if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
        break;
      }
     
      serverStr = normalize(serverStr);
      // if the server is currently a zombie, just skip to the next one
      ServerWrapper wrapper = zombieServers.get(serverStr);
      if (wrapper != null) {
        // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
        final int numDeadServersToTry = req.getNumDeadServersToTry();
        if (numDeadServersToTry > 0) {
          if (skipped == null) {
            skipped = new ArrayList<>(numDeadServersToTry);
            skipped.add(wrapper);
          }
          else if (skipped.size() < numDeadServersToTry) {
            skipped.add(wrapper);
          }
        }
        continue;
      }
      try {
        MDC.put("LBHttpSolrClient.url", serverStr);

        if (numServersToTry != null && numServersTried > numServersToTry.intValue()) {
          break;
        }

        HttpSolrClient client = makeSolrClient(serverStr);

        ++numServersTried;
        ex = doRequest(client, req, rsp, isNonRetryable, false, null);
        if (ex == null) {
          return rsp; // SUCCESS
        }
       //NO CATCH HERE ,  SO IT FAILS
      } finally {
        MDC.remove("LBHttpSolrClient.url");
      }
    }

    // try the servers we previously skipped
    if (skipped != null) {
      for (ServerWrapper wrapper : skipped) {
        if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
          break;
        }

        if (numServersToTry != null && numServersTried > numServersToTry.intValue()) {
          break;
        }

        try {
          MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
          ++numServersTried;
          ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, wrapper.getKey());
          if (ex == null) {
            return rsp; // SUCCESS
          }
        } finally {
          MDC.remove("LBHttpSolrClient.url");
        }
      }
    }


    final String solrServerExceptionMessage;
    if (timeAllowedExceeded) {
      solrServerExceptionMessage = "Time allowed to handle this request exceeded";
    } else {
      if (numServersToTry != null && numServersTried > numServersToTry.intValue()) {
        solrServerExceptionMessage = "No live SolrServers available to handle this request:"
            + " numServersTried="+numServersTried
            + " numServersToTry="+numServersToTry.intValue();
      } else {
        solrServerExceptionMessage = "No live SolrServers available to handle this request";
      }
    }
    if (ex == null) {
      throw new SolrServerException(solrServerExceptionMessage);
    } else {
      throw new SolrServerException(solrServerExceptionMessage+":" + zombieServers.keySet(), ex);
    }

  }


Thanks,
Rajeswari


On 5/19/19, 9:39 AM, "Natarajan, Rajeswari" <[hidden email]> wrote:

    Hi
   
    We are using solr 7.6 and trying out bidirectional CDCR and I also hit this issue.
   
    Stacktrace
   
    INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager CDCR bootstrap successful in 3 seconds                                                                              
    INFO  (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager Create new update log reader for target abcd_ta with checkpoint -1 @ abcd_ta:shard1                                
    ERROR (cdcr-bootstrap-status-17-thread-1) [   ] o.a.s.h.CdcrReplicatorManager Unable to bootstrap the target collection abcd_ta shard: shard1                                                    
    olrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://10.169.50.182:8983/solr: Unable to locate core kanna_ta_shard1_replica_n1                                                
    lr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
    lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]        
    lr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
    lr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:483) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
    lr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:413) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
    lr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1107) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
    lr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:884) ~[solr-solrj-7.6.0.jar:7.6.0 719cde97f84640faa1e3525690d262946571245f - nknize - 2018-12-07 14:47:53]
   
   
    I stepped through the code
   
    private NamedList sendRequestRecoveryToFollower(SolrClient client, String coreName) throws SolrServerException, IOException {
        CoreAdminRequest.RequestRecovery recoverRequestCmd = new CoreAdminRequest.RequestRecovery();
        recoverRequestCmd.setAction(CoreAdminParams.CoreAdminAction.REQUESTRECOVERY);
        recoverRequestCmd.setCoreName(coreName);
        return client.request(recoverRequestCmd);
      }
   
     In the above method , recovery request command is admin command and it is specific to a core. In the  solrclient.request logic the code gets the liveservers and execute the command in a loop ,but  since this is admin command this is non re-triable.  Depending on which live server the code gets and where does the core lies , the recover request command might be successful or failure.  So I think there is problem with this code in trying to send the core command to all available live servers , the code I guess should find the correct server on which the core lies and send this request.
   
    Regards,
    Rajeswari
   
    On 5/15/19, 10:59 AM, "Natarajan, Rajeswari" <[hidden email]> wrote:
   
        I am also facing this issue. Any resolution found on this issue, Please update. Thanks
       
        On 2/7/19, 10:42 AM, "Tim" <[hidden email]> wrote:
       
            So it looks like I'm having an issue with this fix:
            https://issues.apache.org/jira/browse/SOLR-11724
           
            So I've messed around with this for a while and every time the leader to
            leader replica portion works fine. But the Recovery portion (implemented as
            part of the fix above) fails.
           
            I've run a few tests and every time the recovery portion kicks off, it sends
            the recovery command to the node which has the leader for a given replica
            instead of the follower.
            I've recreated the collection several times so that replicas are on
            different nodes with the same results each time. It seems to be assumed that
            the follower is on the same solr node as the leader.
             
            For example, if s3r10 (shard 3, replica 10) is the leader and is on node1,
            while the follower s3r8 is on node2, then the core recovery command meant
            for s3r8 is being sent to node1 instead of node2.
           
           
           
           
           
            --
            Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
           
       
       
   
   

Reply | Threaded
Open this post in threaded view
|

[CDCR]Unable to locate core

Amrit Sarkar
>
> Thanks Natrajan,
>
> Solid analysis and I saw the issue being reported by multiple users in
> past few months and unfortunately I baked an incomplete code.
>
> I think the correct way of solving this issue is to identify the correct
> base-url for the respective core we need to trigger REQUESTRECOVERY to and
> create a local HttpSolrClient instead of using CloudSolrClient from
> CdcrReplicatorState. This will avoid unnecessary retry which will be
> redundant in our case.
>
> I baked a small patch few weeks back and will upload it on the SOLR-11724
> <https://issues.apache.org/jira/browse/SOLR-11724>.
>
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Natarajan, Rajeswari
Thanks Amrith for creating a patch. But the code in the LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as intended.
Regards
Rajeswari

public Rsp request(Req req) throws SolrServerException, IOException {
    Rsp rsp = new Rsp();
    Exception ex = null;
    boolean isNonRetryable = req.request instanceof IsUpdateRequest || ADMIN_PATHS.contains(req.request.getPath());
    List<ServerWrapper> skipped = null;

    final Integer numServersToTry = req.getNumServersToTry();
    int numServersTried = 0;

    boolean timeAllowedExceeded = false;
    long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
    long timeOutTime = System.nanoTime() + timeAllowedNano;
    for (String serverStr : req.getServers()) {
      if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
        break;
      }
     
      serverStr = normalize(serverStr);
      // if the server is currently a zombie, just skip to the next one
      ServerWrapper wrapper = zombieServers.get(serverStr);
      if (wrapper != null) {
        // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
        final int numDeadServersToTry = req.getNumDeadServersToTry();
        if (numDeadServersToTry > 0) {
          if (skipped == null) {
            skipped = new ArrayList<>(numDeadServersToTry);
            skipped.add(wrapper);
          }
          else if (skipped.size() < numDeadServersToTry) {
            skipped.add(wrapper);
          }
        }
        continue;
      }
      try {
        MDC.put("LBHttpSolrClient.url", serverStr);

        if (numServersToTry != null && numServersTried > numServersToTry.intValue()) {
          break;
        }

        HttpSolrClient client = makeSolrClient(serverStr);

        ++numServersTried;
        ex = doRequest(client, req, rsp, isNonRetryable, false, null);
        if (ex == null) {
          return rsp; // SUCCESS
        }
      } finally {
        MDC.remove("LBHttpSolrClient.url");
      }
    }

    // try the servers we previously skipped
    if (skipped != null) {
      for (ServerWrapper wrapper : skipped) {
        if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano, timeOutTime)) {
          break;
        }

        if (numServersToTry != null && numServersTried > numServersToTry.intValue()) {
          break;
        }

        try {
          MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
          ++numServersTried;
          ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true, wrapper.getKey());
          if (ex == null) {
            return rsp; // SUCCESS
          }
        } finally {
          MDC.remove("LBHttpSolrClient.url");
        }
      }
    }


    final String solrServerExceptionMessage;
    if (timeAllowedExceeded) {
      solrServerExceptionMessage = "Time allowed to handle this request exceeded";
    } else {
      if (numServersToTry != null && numServersTried > numServersToTry.intValue()) {
        solrServerExceptionMessage = "No live SolrServers available to handle this request:"
            + " numServersTried="+numServersTried
            + " numServersToTry="+numServersToTry.intValue();
      } else {
        solrServerExceptionMessage = "No live SolrServers available to handle this request";
      }
    }
    if (ex == null) {
      throw new SolrServerException(solrServerExceptionMessage);
    } else {
      throw new SolrServerException(solrServerExceptionMessage+":" + zombieServers.keySet(), ex);
    }

  }

On 5/19/19, 3:12 PM, "Amrit Sarkar" <[hidden email]> wrote:

    >
    > Thanks Natrajan,
    >
    > Solid analysis and I saw the issue being reported by multiple users in
    > past few months and unfortunately I baked an incomplete code.
    >
    > I think the correct way of solving this issue is to identify the correct
    > base-url for the respective core we need to trigger REQUESTRECOVERY to and
    > create a local HttpSolrClient instead of using CloudSolrClient from
    > CdcrReplicatorState. This will avoid unnecessary retry which will be
    > redundant in our case.
    >
    > I baked a small patch few weeks back and will upload it on the SOLR-11724
    > <https://issues.apache.org/jira/browse/SOLR-11724>.
    >
   

Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Amrit Sarkar
Sounds legit to me.

Can you create a Jira and list down the problem statement and design
solution there. I am confident it will attract committers' attention and
they can review the design and provide feedback.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2


On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
[hidden email]> wrote:

> Thanks Amrith for creating a patch. But the code in the
> LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
> intended.
> Regards
> Rajeswari
>
> public Rsp request(Req req) throws SolrServerException, IOException {
>     Rsp rsp = new Rsp();
>     Exception ex = null;
>     boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
> ADMIN_PATHS.contains(req.request.getPath());
>     List<ServerWrapper> skipped = null;
>
>     final Integer numServersToTry = req.getNumServersToTry();
>     int numServersTried = 0;
>
>     boolean timeAllowedExceeded = false;
>     long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
>     long timeOutTime = System.nanoTime() + timeAllowedNano;
>     for (String serverStr : req.getServers()) {
>       if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>         break;
>       }
>
>       serverStr = normalize(serverStr);
>       // if the server is currently a zombie, just skip to the next one
>       ServerWrapper wrapper = zombieServers.get(serverStr);
>       if (wrapper != null) {
>         // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
>         final int numDeadServersToTry = req.getNumDeadServersToTry();
>         if (numDeadServersToTry > 0) {
>           if (skipped == null) {
>             skipped = new ArrayList<>(numDeadServersToTry);
>             skipped.add(wrapper);
>           }
>           else if (skipped.size() < numDeadServersToTry) {
>             skipped.add(wrapper);
>           }
>         }
>         continue;
>       }
>       try {
>         MDC.put("LBHttpSolrClient.url", serverStr);
>
>         if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>           break;
>         }
>
>         HttpSolrClient client = makeSolrClient(serverStr);
>
>         ++numServersTried;
>         ex = doRequest(client, req, rsp, isNonRetryable, false, null);
>         if (ex == null) {
>           return rsp; // SUCCESS
>         }
>       } finally {
>         MDC.remove("LBHttpSolrClient.url");
>       }
>     }
>
>     // try the servers we previously skipped
>     if (skipped != null) {
>       for (ServerWrapper wrapper : skipped) {
>         if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
> timeOutTime)) {
>           break;
>         }
>
>         if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>           break;
>         }
>
>         try {
>           MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
>           ++numServersTried;
>           ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
> wrapper.getKey());
>           if (ex == null) {
>             return rsp; // SUCCESS
>           }
>         } finally {
>           MDC.remove("LBHttpSolrClient.url");
>         }
>       }
>     }
>
>
>     final String solrServerExceptionMessage;
>     if (timeAllowedExceeded) {
>       solrServerExceptionMessage = "Time allowed to handle this request
> exceeded";
>     } else {
>       if (numServersToTry != null && numServersTried >
> numServersToTry.intValue()) {
>         solrServerExceptionMessage = "No live SolrServers available to
> handle this request:"
>             + " numServersTried="+numServersTried
>             + " numServersToTry="+numServersToTry.intValue();
>       } else {
>         solrServerExceptionMessage = "No live SolrServers available to
> handle this request";
>       }
>     }
>     if (ex == null) {
>       throw new SolrServerException(solrServerExceptionMessage);
>     } else {
>       throw new SolrServerException(solrServerExceptionMessage+":" +
> zombieServers.keySet(), ex);
>     }
>
>   }
>
> On 5/19/19, 3:12 PM, "Amrit Sarkar" <[hidden email]> wrote:
>
>     >
>     > Thanks Natrajan,
>     >
>     > Solid analysis and I saw the issue being reported by multiple users
> in
>     > past few months and unfortunately I baked an incomplete code.
>     >
>     > I think the correct way of solving this issue is to identify the
> correct
>     > base-url for the respective core we need to trigger REQUESTRECOVERY
> to and
>     > create a local HttpSolrClient instead of using CloudSolrClient from
>     > CdcrReplicatorState. This will avoid unnecessary retry which will be
>     > redundant in our case.
>     >
>     > I baked a small patch few weeks back and will upload it on the
> SOLR-11724
>     > <https://issues.apache.org/jira/browse/SOLR-11724>.
>     >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: [CDCR]Unable to locate core

Natarajan, Rajeswari
Thanks Amrith. Created a bug
https://issues.apache.org/jira/browse/SOLR-13481

Regards,
Rajeswari

On 5/19/19, 3:44 PM, "Amrit Sarkar" <[hidden email]> wrote:

    Sounds legit to me.
   
    Can you create a Jira and list down the problem statement and design
    solution there. I am confident it will attract committers' attention and
    they can review the design and provide feedback.
   
    Amrit Sarkar
    Search Engineer
    Lucidworks, Inc.
    415-589-9269
    www.lucidworks.com
    Twitter http://twitter.com/lucidworks
    LinkedIn: https://www.linkedin.com/in/sarkaramrit2
    Medium: https://medium.com/@sarkaramrit2
   
   
    On Mon, May 20, 2019 at 3:59 AM Natarajan, Rajeswari <
    [hidden email]> wrote:
   
    > Thanks Amrith for creating a patch. But the code in the
    > LBHttpSolrClient.java needs to be fixed too, if the for loop  to work as
    > intended.
    > Regards
    > Rajeswari
    >
    > public Rsp request(Req req) throws SolrServerException, IOException {
    >     Rsp rsp = new Rsp();
    >     Exception ex = null;
    >     boolean isNonRetryable = req.request instanceof IsUpdateRequest ||
    > ADMIN_PATHS.contains(req.request.getPath());
    >     List<ServerWrapper> skipped = null;
    >
    >     final Integer numServersToTry = req.getNumServersToTry();
    >     int numServersTried = 0;
    >
    >     boolean timeAllowedExceeded = false;
    >     long timeAllowedNano = getTimeAllowedInNanos(req.getRequest());
    >     long timeOutTime = System.nanoTime() + timeAllowedNano;
    >     for (String serverStr : req.getServers()) {
    >       if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
    > timeOutTime)) {
    >         break;
    >       }
    >
    >       serverStr = normalize(serverStr);
    >       // if the server is currently a zombie, just skip to the next one
    >       ServerWrapper wrapper = zombieServers.get(serverStr);
    >       if (wrapper != null) {
    >         // System.out.println("ZOMBIE SERVER QUERIED: " + serverStr);
    >         final int numDeadServersToTry = req.getNumDeadServersToTry();
    >         if (numDeadServersToTry > 0) {
    >           if (skipped == null) {
    >             skipped = new ArrayList<>(numDeadServersToTry);
    >             skipped.add(wrapper);
    >           }
    >           else if (skipped.size() < numDeadServersToTry) {
    >             skipped.add(wrapper);
    >           }
    >         }
    >         continue;
    >       }
    >       try {
    >         MDC.put("LBHttpSolrClient.url", serverStr);
    >
    >         if (numServersToTry != null && numServersTried >
    > numServersToTry.intValue()) {
    >           break;
    >         }
    >
    >         HttpSolrClient client = makeSolrClient(serverStr);
    >
    >         ++numServersTried;
    >         ex = doRequest(client, req, rsp, isNonRetryable, false, null);
    >         if (ex == null) {
    >           return rsp; // SUCCESS
    >         }
    >       } finally {
    >         MDC.remove("LBHttpSolrClient.url");
    >       }
    >     }
    >
    >     // try the servers we previously skipped
    >     if (skipped != null) {
    >       for (ServerWrapper wrapper : skipped) {
    >         if (timeAllowedExceeded = isTimeExceeded(timeAllowedNano,
    > timeOutTime)) {
    >           break;
    >         }
    >
    >         if (numServersToTry != null && numServersTried >
    > numServersToTry.intValue()) {
    >           break;
    >         }
    >
    >         try {
    >           MDC.put("LBHttpSolrClient.url", wrapper.client.getBaseURL());
    >           ++numServersTried;
    >           ex = doRequest(wrapper.client, req, rsp, isNonRetryable, true,
    > wrapper.getKey());
    >           if (ex == null) {
    >             return rsp; // SUCCESS
    >           }
    >         } finally {
    >           MDC.remove("LBHttpSolrClient.url");
    >         }
    >       }
    >     }
    >
    >
    >     final String solrServerExceptionMessage;
    >     if (timeAllowedExceeded) {
    >       solrServerExceptionMessage = "Time allowed to handle this request
    > exceeded";
    >     } else {
    >       if (numServersToTry != null && numServersTried >
    > numServersToTry.intValue()) {
    >         solrServerExceptionMessage = "No live SolrServers available to
    > handle this request:"
    >             + " numServersTried="+numServersTried
    >             + " numServersToTry="+numServersToTry.intValue();
    >       } else {
    >         solrServerExceptionMessage = "No live SolrServers available to
    > handle this request";
    >       }
    >     }
    >     if (ex == null) {
    >       throw new SolrServerException(solrServerExceptionMessage);
    >     } else {
    >       throw new SolrServerException(solrServerExceptionMessage+":" +
    > zombieServers.keySet(), ex);
    >     }
    >
    >   }
    >
    > On 5/19/19, 3:12 PM, "Amrit Sarkar" <[hidden email]> wrote:
    >
    >     >
    >     > Thanks Natrajan,
    >     >
    >     > Solid analysis and I saw the issue being reported by multiple users
    > in
    >     > past few months and unfortunately I baked an incomplete code.
    >     >
    >     > I think the correct way of solving this issue is to identify the
    > correct
    >     > base-url for the respective core we need to trigger REQUESTRECOVERY
    > to and
    >     > create a local HttpSolrClient instead of using CloudSolrClient from
    >     > CdcrReplicatorState. This will avoid unnecessary retry which will be
    >     > redundant in our case.
    >     >
    >     > I baked a small patch few weeks back and will upload it on the
    > SOLR-11724
    >     > <https://issues.apache.org/jira/browse/SOLR-11724>.
    >     >
    >
    >
    >