Apache Solr 4.10.x - Collection Reload times out

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Apache Solr 4.10.x - Collection Reload times out

alessandro.benedetti
I have been recently facing an issue with the Collection Reload in a couple of Solr Cloud clusters :

1) re-index a collection
2) collection happily working
3) trigger collection reload
4) reload times out ( silently, no message in any of the Solr node logs)
5) no effect on the collection ( it still serves query)

If I restart, the collection doesn't start as it finds the write.lock in the index.
Sometimes this even avoid the entire cluster to be restarted ( even if the clusterstate.json actually shows only few collection down) and Solr is not reachable.
Of course i can mitigate the problem just cleaning up the indexes and restart (avoiding the reload in favor of just restarts in the future), but this is annoying.

I index through the DIH and I use a DirectSolrSpellChecker .
Should I take a look into Zookeeper ? I tried to check the Overseer queues and some other checks, not sure the best places to look though in there...

Could this be related ?[1] I don't think so, but I am a bit puzzled...

[1] https://issues.apache.org/jira/browse/SOLR-6246


---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Solr 4.10.x - Collection Reload times out

Erick Erickson
I doubt SOLR-6246 is related, DirectSolrSpellChecker just looks in the
index using (on a quick scan) IndexReader which doesn't hold a lock
IIUC so it shouldn't leave anything around. Additionally, there is no
real "build" step since it's looking at the index rather than creating
a new one as AnalyzingInfixSuggester does. The write lock in that JIRA
was for the "sidecar" index that AnalyzingInfixSuggester created.

Which doesn't help your original issue. Have you tried specifying the
"async" parameter when you issue the RELOAD command then checking the
status with REQUESTSTATUS? I'm wondering if you restart your cluster
_after_ the reload is successfully completed whether you'd have the
same problem. Or whether you'd get some more helpful information if
the request actually fails somehow.

Also, why issue a reload? If you're re-indexing in the background and
want to atomically switch you could use collection aliasing (obviously
you'd need more disk space/resources which may make it not a viable
option). It looks like
> alias points to C1
> create C2 (or delete all data in an existing C2)
> index to C2
> check C2
> point alias to C2

Next time of course you index to C1 and switch the alias to C1 when
you're happy with it.

But even if you do the alias thing it'd still be good to see if we can
figure out what's going on because on the surface what you're
describing should be OK.

Best,
Erick

On Fri, Jul 14, 2017 at 8:11 AM, alessandro.benedetti
<[hidden email]> wrote:

> I have been recently facing an issue with the Collection Reload in a couple
> of Solr Cloud clusters :
>
> 1) re-index a collection
> 2) collection happily working
> 3) trigger collection reload
> 4) reload times out ( silently, no message in any of the Solr node logs)
> 5) no effect on the collection ( it still serves query)
>
> If I restart, the collection doesn't start as it finds the write.lock in the
> index.
> Sometimes this even avoid the entire cluster to be restarted ( even if the
> clusterstate.json actually shows only few collection down) and Solr is not
> reachable.
> Of course i can mitigate the problem just cleaning up the indexes and
> restart (avoiding the reload in favor of just restarts in the future), but
> this is annoying.
>
> I index through the DIH and I use a DirectSolrSpellChecker .
> Should I take a look into Zookeeper ? I tried to check the Overseer queues
> and some other checks, not sure the best places to look though in there...
>
> Could this be related ?[1] I don't think so, but I am a bit puzzled...
>
> [1] https://issues.apache.org/jira/browse/SOLR-6246
>
>
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Solr 4.10.x - Collection Reload times out

alessandro.benedetti
Thanks for the prompt response Erick,
the reason that I am issuing a Collection reload is because I modify from time to the time the Solrconfig for example, with different spellcheck and request parameter default params.
So after the upload to Zookeeper I reload the collection to reflect the modification.
Aliasing is definitely a valid option but at the moment I don't have set up the infrastructure necessary to programmatically operate that.

Returning to my issue, I see no effect at all if I try to run the request async ( it seems like it is completely ignoring the parameter) .

http://blabla:8983/solr/admin/collections?action=RELOAD&name=news&async=55

I checked the source code and the async param seems to be in 4.10.2 version, so this is really weird.
I will proceed with my investigations.
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Solr 4.10.x - Collection Reload times out

alessandro.benedetti
Taking a look to 4.10.2 source I may see why the async call does not work :

    log.info("Reloading Collection : " + req.getParamString());
    String name = req.getParams().required().get("name");
   
    ZkNodeProps m = new ZkNodeProps(Overseer.QUEUE_OPERATION,
        OverseerCollectionProcessor.RELOADCOLLECTION, "name", name);


    handleResponse(OverseerCollectionProcessor.RELOADCOLLECTION, m, rsp);


Are we sure we are actually passing the "async" param as a ZkNodeProp ?
Because the handleResponse does :

private void handleResponse(String operation, ZkNodeProps m,
      SolrQueryResponse rsp, long timeout)
...
if(m.containsKey(ASYNC) && m.get(ASYNC) != null) {
 
       String asyncId = m.getStr(ASYNC);
...
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Solr 4.10.x - Collection Reload times out

alessandro.benedetti
Additional information :
Try single core reload I identified that an entire shard is not reloading ( while the other shard is ).
Taking a look to the "not reloading" shard ( 2 replicas) , it seems that the core reload stucks here :

org.apache.solr.core.SolrCores#waitAddPendingCoreOps

The problem is that the wait seems to continue indefinitely and silently.
Apart a restart, is there any way to clean up the pending core operations ?
I will continue my investigations
---------------
Alessandro Benedetti
Search Consultant, R&D Software Engineer, Director
Sease Ltd. - www.sease.io
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Apache Solr 4.10.x - Collection Reload times out

Erick Erickson
1> are you replaying the tlog? If you have a large tlog for some
reason you may be replaying it. Although a reload should do a commit
first.

2> What do the Solr logs show the node in question to be doing?

3> Sorry to mislead you, async is not a 4.10 option for the RELOAD
command so that was bogus on my part, that support was added later.

Best,
Erick


On Thu, Jul 20, 2017 at 4:38 AM, alessandro.benedetti
<[hidden email]> wrote:

> Additional information :
> Try single core reload I identified that an entire shard is not reloading (
> while the other shard is ).
> Taking a look to the "not reloading" shard ( 2 replicas) , it seems that the
> core reload stucks here :
>
> org.apache.solr.core.SolrCores#waitAddPendingCoreOps
>
> The problem is that the wait seems to continue indefinitely and silently.
> Apart a restart, is there any way to clean up the pending core operations ?
> I will continue my investigations
>
>
>
>
> -----
> ---------------
> Alessandro Benedetti
> Search Consultant, R&D Software Engineer, Director
> Sease Ltd. - www.sease.io
> --
> View this message in context: http://lucene.472066.n3.nabble.com/Apache-Solr-4-10-x-Collection-Reload-times-out-tp4346075p4346966.html
> Sent from the Solr - User mailing list archive at Nabble.com.
Loading...