async BACKUP under Solr8.3

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

async BACKUP under Solr8.3

Oakley, Craig (NIH/NLM/NCBI) [C]-2
For Solr 8.3, when I attempt a command of the form

host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug

And then when I run /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get "msg":"found [bug] in failed tasks"

The solr.log file has a stack trace like the following
2019-11-18 17:31:31.369 ERROR (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ] o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard: <a href="http://host:port/solr">http://host:port/solr => org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
        at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
org.apache.solr.client.solrj.SolrServerException: Timeout occured while waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
        at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408) ~[?:?]
        at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754) ~[?:?]
        at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
        at org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238) ~[?:?]
        at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199) ~[?:?]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_232]
        at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_232]
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181) ~[metrics-core-4.0.5.jar:4.0.5]
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210) ~[?:?]
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_232]
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_232]
        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
Caused by: java.util.concurrent.TimeoutException
        at org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216) ~[?:?]
        at org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399) ~[?:?]
        ... 12 more

If I remove the async=bug, then it works

In fact, the backup looks successful, but REQUESTSTATUS does not recognize it as such

I notice that the 3:30am 11/4/19 Email to [hidden email] mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with failures in underlying sub-operations can result in data loss"

Did a fix to SPLITSHARD break BACKUP?

Has anyone been successful running solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?

Thanks
Reply | Threaded
Open this post in threaded view
|

Re: async BACKUP under Solr8.3

Mikhail Khludnev-2
Hello, Craig.
There was a significant  fix for async BACKUP in 8.1, if I remember it
correctly.
Which version you used for it before? How many nodes, shards, replicas
`bug` has?
Unfortunately this stacktrace is not really representative, it just says
that some node (ok, it's overseer) fails to wait another one.
Ideally we need a log from overseer node and subordinate node during backup
operation.
Thanks.

On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
<[hidden email]> wrote:

> For Solr 8.3, when I attempt a command of the form
>
>
> host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug
>
> And then when I run
> /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get
> "msg":"found [bug] in failed tasks"
>
> The solr.log file has a stack trace like the following
> 2019-11-18 17:31:31.369 ERROR
> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
> <a href="http://host:port/solr">http://host:port/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> ~[?:?]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_232]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ~[metrics-core-4.0.5.jar:4.0.5]
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
> ~[?:?]
>         ... 12 more
>
> If I remove the async=bug, then it works
>
> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
> it as such
>
> I notice that the 3:30am 11/4/19 Email to [hidden email]
> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
> failures in underlying sub-operations can result in data loss"
>
> Did a fix to SPLITSHARD break BACKUP?
>
> Has anyone been successful running
> solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?
>
> Thanks
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

RE: async BACKUP under Solr8.3

Oakley, Craig (NIH/NLM/NCBI) [C]-2
This is on a test server: simple case: one node, one shard, one replica

In production we currently use Solr7.4 and the async BACKUP works fine. I could test whether I get the same symptoms on Solr8.1 and/or 8.2

Thanks

-----Original Message-----
From: Mikhail Khludnev <[hidden email]>
Sent: Tuesday, November 19, 2019 12:40 AM
To: solr-user <[hidden email]>
Subject: Re: async BACKUP under Solr8.3

Hello, Craig.
There was a significant  fix for async BACKUP in 8.1, if I remember it
correctly.
Which version you used for it before? How many nodes, shards, replicas
`bug` has?
Unfortunately this stacktrace is not really representative, it just says
that some node (ok, it's overseer) fails to wait another one.
Ideally we need a log from overseer node and subordinate node during backup
operation.
Thanks.

On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
<[hidden email]> wrote:

> For Solr 8.3, when I attempt a command of the form
>
>
> host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug
>
> And then when I run
> /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get
> "msg":"found [bug] in failed tasks"
>
> The solr.log file has a stack trace like the following
> 2019-11-18 17:31:31.369 ERROR
> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
> <a href="http://host:port/solr">http://host:port/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> ~[?:?]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_232]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ~[metrics-core-4.0.5.jar:4.0.5]
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
> ~[?:?]
>         ... 12 more
>
> If I remove the async=bug, then it works
>
> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
> it as such
>
> I notice that the 3:30am 11/4/19 Email to [hidden email]
> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
> failures in underlying sub-operations can result in data loss"
>
> Did a fix to SPLITSHARD break BACKUP?
>
> Has anyone been successful running
> solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?
>
> Thanks
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

RE: async BACKUP under Solr8.3

Oakley, Craig (NIH/NLM/NCBI) [C]-2
FYI, I DO succeed in doing an async backup in Solr8.1

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
Sent: Tuesday, November 19, 2019 9:03 AM
To: [hidden email]
Subject: RE: async BACKUP under Solr8.3

This is on a test server: simple case: one node, one shard, one replica

In production we currently use Solr7.4 and the async BACKUP works fine. I could test whether I get the same symptoms on Solr8.1 and/or 8.2

Thanks

-----Original Message-----
From: Mikhail Khludnev <[hidden email]>
Sent: Tuesday, November 19, 2019 12:40 AM
To: solr-user <[hidden email]>
Subject: Re: async BACKUP under Solr8.3

Hello, Craig.
There was a significant  fix for async BACKUP in 8.1, if I remember it
correctly.
Which version you used for it before? How many nodes, shards, replicas
`bug` has?
Unfortunately this stacktrace is not really representative, it just says
that some node (ok, it's overseer) fails to wait another one.
Ideally we need a log from overseer node and subordinate node during backup
operation.
Thanks.

On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
<[hidden email]> wrote:

> For Solr 8.3, when I attempt a command of the form
>
>
> host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug
>
> And then when I run
> /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get
> "msg":"found [bug] in failed tasks"
>
> The solr.log file has a stack trace like the following
> 2019-11-18 17:31:31.369 ERROR
> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
> <a href="http://host:port/solr">http://host:port/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> ~[?:?]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_232]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ~[metrics-core-4.0.5.jar:4.0.5]
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
> ~[?:?]
>         ... 12 more
>
> If I remove the async=bug, then it works
>
> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
> it as such
>
> I notice that the 3:30am 11/4/19 Email to [hidden email]
> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
> failures in underlying sub-operations can result in data loss"
>
> Did a fix to SPLITSHARD break BACKUP?
>
> Has anyone been successful running
> solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?
>
> Thanks
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

RE: async BACKUP under Solr8.3

Oakley, Craig (NIH/NLM/NCBI) [C]-2
In some collections I am having problems with Solr8.1.1 through 8.3; with other collections it is fine in Solr8.1.1 through 8.3

I'm investigating what might be wrong with the collections which have the problems.

Thanks

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
Sent: Tuesday, November 19, 2019 9:53 AM
To: [hidden email]
Subject: RE: async BACKUP under Solr8.3

FYI, I DO succeed in doing an async backup in Solr8.1

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
Sent: Tuesday, November 19, 2019 9:03 AM
To: [hidden email]
Subject: RE: async BACKUP under Solr8.3

This is on a test server: simple case: one node, one shard, one replica

In production we currently use Solr7.4 and the async BACKUP works fine. I could test whether I get the same symptoms on Solr8.1 and/or 8.2

Thanks

-----Original Message-----
From: Mikhail Khludnev <[hidden email]>
Sent: Tuesday, November 19, 2019 12:40 AM
To: solr-user <[hidden email]>
Subject: Re: async BACKUP under Solr8.3

Hello, Craig.
There was a significant  fix for async BACKUP in 8.1, if I remember it
correctly.
Which version you used for it before? How many nodes, shards, replicas
`bug` has?
Unfortunately this stacktrace is not really representative, it just says
that some node (ok, it's overseer) fails to wait another one.
Ideally we need a log from overseer node and subordinate node during backup
operation.
Thanks.

On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
<[hidden email]> wrote:

> For Solr 8.3, when I attempt a command of the form
>
>
> host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug
>
> And then when I run
> /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get
> "msg":"found [bug] in failed tasks"
>
> The solr.log file has a stack trace like the following
> 2019-11-18 17:31:31.369 ERROR
> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
> <a href="http://host:port/solr">http://host:port/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> ~[?:?]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_232]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ~[metrics-core-4.0.5.jar:4.0.5]
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
> ~[?:?]
>         ... 12 more
>
> If I remove the async=bug, then it works
>
> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
> it as such
>
> I notice that the 3:30am 11/4/19 Email to [hidden email]
> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
> failures in underlying sub-operations can result in data loss"
>
> Did a fix to SPLITSHARD break BACKUP?
>
> Has anyone been successful running
> solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?
>
> Thanks
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

RE: async BACKUP under Solr8.3

Oakley, Craig (NIH/NLM/NCBI) [C]-2
For the record, the solution was to edit solr.xml changing

<int name="socketTimeout">${socketTimeout:0}</int>

to

<int name="socketTimeout">${socketTimeout:600000}</int>

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
Sent: Tuesday, November 19, 2019 6:19 PM
To: [hidden email]
Subject: RE: async BACKUP under Solr8.3

In some collections I am having problems with Solr8.1.1 through 8.3; with other collections it is fine in Solr8.1.1 through 8.3

I'm investigating what might be wrong with the collections which have the problems.

Thanks

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
Sent: Tuesday, November 19, 2019 9:53 AM
To: [hidden email]
Subject: RE: async BACKUP under Solr8.3

FYI, I DO succeed in doing an async backup in Solr8.1

-----Original Message-----
From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
Sent: Tuesday, November 19, 2019 9:03 AM
To: [hidden email]
Subject: RE: async BACKUP under Solr8.3

This is on a test server: simple case: one node, one shard, one replica

In production we currently use Solr7.4 and the async BACKUP works fine. I could test whether I get the same symptoms on Solr8.1 and/or 8.2

Thanks

-----Original Message-----
From: Mikhail Khludnev <[hidden email]>
Sent: Tuesday, November 19, 2019 12:40 AM
To: solr-user <[hidden email]>
Subject: Re: async BACKUP under Solr8.3

Hello, Craig.
There was a significant  fix for async BACKUP in 8.1, if I remember it
correctly.
Which version you used for it before? How many nodes, shards, replicas
`bug` has?
Unfortunately this stacktrace is not really representative, it just says
that some node (ok, it's overseer) fails to wait another one.
Ideally we need a log from overseer node and subordinate node during backup
operation.
Thanks.

On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
<[hidden email]> wrote:

> For Solr 8.3, when I attempt a command of the form
>
>
> host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug
>
> And then when I run
> /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get
> "msg":"found [bug] in failed tasks"
>
> The solr.log file has a stack trace like the following
> 2019-11-18 17:31:31.369 ERROR
> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
> <a href="http://host:port/solr">http://host:port/solr =>
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> ~[?:?]
>         at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> ~[?:?]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> ~[?:1.8.0_232]
>         at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> ~[?:1.8.0_232]
>         at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ~[metrics-core-4.0.5.jar:4.0.5]
>         at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
> ~[?:?]
>         at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> ~[?:1.8.0_232]
>         at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> ~[?:1.8.0_232]
>         at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
> Caused by: java.util.concurrent.TimeoutException
>         at
> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
> ~[?:?]
>         at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
> ~[?:?]
>         ... 12 more
>
> If I remove the async=bug, then it works
>
> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
> it as such
>
> I notice that the 3:30am 11/4/19 Email to [hidden email]
> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
> failures in underlying sub-operations can result in data loss"
>
> Did a fix to SPLITSHARD break BACKUP?
>
> Has anyone been successful running
> solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?
>
> Thanks
>


--
Sincerely yours
Mikhail Khludnev
Reply | Threaded
Open this post in threaded view
|

Re: async BACKUP under Solr8.3

Erick Erickson
Hmmm, any idea how/why it was set to zero? I just looked at the Git history for that file and don’t see it ever being set to 0….

> On Nov 22, 2019, at 11:19 AM, Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]> wrote:
>
> For the record, the solution was to edit solr.xml changing
>
> <int name="socketTimeout">${socketTimeout:0}</int>
>
> to
>
> <int name="socketTimeout">${socketTimeout:600000}</int>
>
> -----Original Message-----
> From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
> Sent: Tuesday, November 19, 2019 6:19 PM
> To: [hidden email]
> Subject: RE: async BACKUP under Solr8.3
>
> In some collections I am having problems with Solr8.1.1 through 8.3; with other collections it is fine in Solr8.1.1 through 8.3
>
> I'm investigating what might be wrong with the collections which have the problems.
>
> Thanks
>
> -----Original Message-----
> From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
> Sent: Tuesday, November 19, 2019 9:53 AM
> To: [hidden email]
> Subject: RE: async BACKUP under Solr8.3
>
> FYI, I DO succeed in doing an async backup in Solr8.1
>
> -----Original Message-----
> From: Oakley, Craig (NIH/NLM/NCBI) [C] <[hidden email]>
> Sent: Tuesday, November 19, 2019 9:03 AM
> To: [hidden email]
> Subject: RE: async BACKUP under Solr8.3
>
> This is on a test server: simple case: one node, one shard, one replica
>
> In production we currently use Solr7.4 and the async BACKUP works fine. I could test whether I get the same symptoms on Solr8.1 and/or 8.2
>
> Thanks
>
> -----Original Message-----
> From: Mikhail Khludnev <[hidden email]>
> Sent: Tuesday, November 19, 2019 12:40 AM
> To: solr-user <[hidden email]>
> Subject: Re: async BACKUP under Solr8.3
>
> Hello, Craig.
> There was a significant  fix for async BACKUP in 8.1, if I remember it
> correctly.
> Which version you used for it before? How many nodes, shards, replicas
> `bug` has?
> Unfortunately this stacktrace is not really representative, it just says
> that some node (ok, it's overseer) fails to wait another one.
> Ideally we need a log from overseer node and subordinate node during backup
> operation.
> Thanks.
>
> On Tue, Nov 19, 2019 at 2:13 AM Oakley, Craig (NIH/NLM/NCBI) [C]
> <[hidden email]> wrote:
>
>> For Solr 8.3, when I attempt a command of the form
>>
>>
>> host:port/solr/admin/collections?action=BACKUP&name=snapshot1&collection=col1&location=/tmp&async=bug
>>
>> And then when I run
>> /solr/admin/collections?action=REQUESTSTATUS&requestid=bug I get
>> "msg":"found [bug] in failed tasks"
>>
>> The solr.log file has a stack trace like the following
>> 2019-11-18 17:31:31.369 ERROR
>> (OverseerThreadFactory-9-thread-5-processing-n:host:port_solr) [c:col1   ]
>> o.a.s.c.a.c.OverseerCollectionMessageHandler Error from shard:
>> <a href="http://host:port/solr">http://host:port/solr =>
>> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
>> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>>        at
>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
>> org.apache.solr.client.solrj.SolrServerException: Timeout occured while
>> waiting response from server at: <a href="http://host:port/solr/admin/cores">http://host:port/solr/admin/cores
>>        at
>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:408)
>> ~[?:?]
>>        at
>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:754)
>> ~[?:?]
>>        at
>> org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1290) ~[?:?]
>>        at
>> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
>> ~[?:?]
>>        at
>> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
>> ~[?:?]
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[?:1.8.0_232]
>>        at
>> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
>> ~[?:1.8.0_232]
>>        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
>> ~[?:1.8.0_232]
>>        at
>> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
>> ~[metrics-core-4.0.5.jar:4.0.5]
>>        at
>> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:210)
>> ~[?:?]
>>        at
>> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>> ~[?:1.8.0_232]
>>        at
>> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>> ~[?:1.8.0_232]
>>        at java.lang.Thread.run(Thread.java:748) [?:1.8.0_232]
>> Caused by: java.util.concurrent.TimeoutException
>>        at
>> org.eclipse.jetty.client.util.InputStreamResponseListener.get(InputStreamResponseListener.java:216)
>> ~[?:?]
>>        at
>> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:399)
>> ~[?:?]
>>        ... 12 more
>>
>> If I remove the async=bug, then it works
>>
>> In fact, the backup looks successful, but REQUESTSTATUS does not recognize
>> it as such
>>
>> I notice that the 3:30am 11/4/19 Email to [hidden email]
>> mentions in Solr 8.3.0 Release Highlights "Fix for SPLITSHARD (async) with
>> failures in underlying sub-operations can result in data loss"
>>
>> Did a fix to SPLITSHARD break BACKUP?
>>
>> Has anyone been successful running
>> solr/admin/collections?action=BACKUP&async=requestname under Solr8.3?
>>
>> Thanks
>>
>
>
> --
> Sincerely yours
> Mikhail Khludnev