Error Adding a Replica to SOLR Cloud 8.2.0

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

Error Adding a Replica to SOLR Cloud 8.2.0

Joe Lerner
Hi,

We have a simple SOLR architecture: One collection, one shard, 3 Zookeepers. Currently we have 2 nodes, each with a replica. Both show Active in CLUSTERSTATUS. So, all good.

We are trying to add a 3rd node/replica. Adding the node into the cluster is fine. ADDREPLICA starts fine. I am monitoring the size of its files with du, and it steadily goes up, and after around 10 minutes, it gets to around 52G, and then fails. du on the active/working solrs shows 51G. So, it seems like it is almost finished!

I am 99.99% sure we are not out of disc space, because I am closely monitoring, and we just doubled it, and it failed anyway at the same time.

The offending problem seems to be:
2021-01-07 16:12:27.641 WARN  (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.h.IndexFetcher No content received for file: tlog.0000000000000010685.1688244814485651456
2021-01-07 16:12:27.642 ERROR (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.h.IndexFetcher Error fetching file, doing one retry...:org.apache.solr.common.SolrException: Unable to download tlog.0000000000000010685.1688244814485651456 completely. Downloaded 0!=1527958
	at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1802)
	at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1682)
	at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1652)
	at org.apache.solr.handler.IndexFetcher.downloadTlogFiles(IndexFetcher.java:992)
	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:578)
	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)
	at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)
	at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:250)
	at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:662)
	at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:336)
	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317)
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

2021-01-07 16:12:27.642 WARN  (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.h.IndexFetcher No content received for file: tlog.0000000000000010685.1688244814485651456
2021-01-07 16:12:27.643 ERROR (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.h.IndexFetcher Error deleting file: tlog.0000000000000010685.1688244814485651456 => java.nio.file.NoSuchFileException: /opt/solr/server/solr/ourapp-rprod-active_shard1_replica_n19/data/tlog/tlog.20210107160916591/tlog.0000000000000010685.1688244814485651456
	at sun.nio.fs.UnixException.translateToIOException(Unknown Source)
java.nio.file.NoSuchFileException: /opt/solr/server/solr/ourapp-rprod-active_shard1_replica_n19/data/tlog/tlog.20210107160916591/tlog.0000000000000010685.1688244814485651456
	at sun.nio.fs.UnixException.translateToIOException(Unknown Source) ~[?:1.8.0_251]
	at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) ~[?:1.8.0_251]
	at sun.nio.fs.UnixException.rethrowAsIOException(Unknown Source) ~[?:1.8.0_251]
	at sun.nio.fs.UnixFileSystemProvider.implDelete(Unknown Source) ~[?:1.8.0_251]
	at sun.nio.fs.AbstractFileSystemProvider.delete(Unknown Source) ~[?:1.8.0_251]
	at java.nio.file.Files.delete(Unknown Source) ~[?:1.8.0_251]
	at org.apache.solr.handler.IndexFetcher$LocalFsFile.delete(IndexFetcher.java:1935) ~[?:?]
	at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1796) ~[?:?]
	at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1682) ~[?:?]
	at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1657) ~[?:?]
	at org.apache.solr.handler.IndexFetcher.downloadTlogFiles(IndexFetcher.java:992) ~[?:?]
	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:578) ~[?:?]
	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351) ~[?:?]
	at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424) ~[?:?]
	at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:250) ~[?:?]
	at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:662) ~[?:?]
	at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:336) ~[?:?]
	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317) ~[?:?]
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) ~[metrics-core-4.0.5.jar:4.0.5]
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) ~[?:1.8.0_251]
	at java.util.concurrent.FutureTask.run(Unknown Source) ~[?:1.8.0_251]
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209) ~[?:?]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) ~[?:1.8.0_251]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) ~[?:1.8.0_251]
	at java.lang.Thread.run(Unknown Source) [?:1.8.0_251]
2021-01-07 16:12:27.644 INFO  (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.u.DefaultSolrCoreState New IndexWriter is ready to be used.
2021-01-07 16:12:36.399 ERROR (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.h.ReplicationHandler Index fetch failed :org.apache.solr.common.SolrException: Unable to download tlog.0000000000000010685.1688244814485651456 completely. Downloaded 0!=1527958
	at org.apache.solr.handler.IndexFetcher$FileFetcher.cleanup(IndexFetcher.java:1802)
	at org.apache.solr.handler.IndexFetcher$FileFetcher.fetch(IndexFetcher.java:1682)
	at org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1657)
	at org.apache.solr.handler.IndexFetcher.downloadTlogFiles(IndexFetcher.java:992)
	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:578)
	at org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:351)
	at org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:424)
	at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:250)
	at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:662)
	at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:336)
	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317)
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

2021-01-07 16:12:36.399 ERROR (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.c.RecoveryStrategy Error while trying to recover:org.apache.solr.common.SolrException: Replication for recovery failed.
	at org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:253)
	at org.apache.solr.cloud.RecoveryStrategy.doSyncOrReplicateRecovery(RecoveryStrategy.java:662)
	at org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:336)
	at org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:317)
	at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)

2021-01-07 16:12:36.399 ERROR (recoveryExecutor-7-thread-1-processing-n:11.22.33.187:8984_solr x:ourapp-rprod-active_shard1_replica_n19 c:ourapp-rprod-active s:shard1 r:core_node20) [c:ourapp-rprod-active s:shard1 r:core_node20 x:ourapp-rprod-active_shard1_replica_n19] o.a.s.c.RecoveryStrategy Recovery failed - trying again... (0)

I've attached our full log here:

solrlog-2.txt


Thanks for any help!

Joe