RE: 7.3 appears to leak

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
Hello again,

Back to this topic, upgrade to 7.4 didn't mysteriously fix the leak our main text search collection has as i had so vigorously hoped. Again, it are SortedIntDocSet instances that leak consistently on the 15 minute index/commit interval.

Some facts:
* problem started after upgrading from 7.2.1 to 7.3.0;
* it occurs only in our main text search collection, all other collections are unaffected;
* despite what i said earlier, it is so far unreproducible outside production, even when mimicking production as good as we can;
* SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
* filterCache is enabled using FastLRUCache;
* filter queries are simple field:value using strings, and three filter query for time range using [NOW/DAY TO NOW+1DAY/DAY] syntax for 'today', 'last week' and 'last month', but rarely used;
* reloading the core manually frees OldGen;
* custom URP's don't cause the problem, disabling them doesn't solve it;
* the collection uses custom extensions for QueryComponent and QueryElevationComponent, ExtendedDismaxQParser and MoreLikeThisQParser, a whole bunch of TokenFilters, and several DocTransformers and due it being only reproducible on production, i really cannot switch these back to Solr/Lucene versions;
* useFilterForSortedQuery is/was not defined in schema so it was default (true?), SOLR-11769 could be the culprit, i disabled it just now only for the node running 7.4.0, rest of collection runs 7.2.1;

The 7.4.0 node with useFilterForSortedQuery=false now seems to be running fine for the last three commits. While typing this i may just have been lucky after so many hours/days of tediousness. To confirm i will run 7.4.0 on a second node in the cluster, but with different values for useFilterForSortedQuery...

I am unlucky after all :( so i'll revert to 7.2.1 again (but why did it 'seem' to run fine for three commits?). But we need it fixed and it is clear whatever i do, i am not one damn step closer to solving this. So what next? I need the list's help to find the leak.

So please, thanks,
Markus
 
-----Original message-----

> From:Shalin Shekhar Mangar <[hidden email]>
> Sent: Friday 27th April 2018 12:11
> To: [hidden email]
> Subject: Re: 7.3 appears to leak
>
> Hi Markus,
>
> Can you give an idea of what your filter queries look like? Any custom
> plugins or things we should be aware of? Simple indexing artificial docs,
> querying and committing doesn't seem to reproduce the issue for me.
>
> On Thu, Apr 26, 2018 at 10:13 PM, Markus Jelsma <[hidden email]>
> wrote:
>
> > Hello,
> >
> > We just finished upgrading our three separate clusters from 7.2.1 to 7.3,
> > which went fine, except for our main text search collection, it appears to
> > leak memory on commit!
> >
> > After initial upgrade we saw the cluster slowly starting to run out of
> > memory within about an hour and a half. We increased heap in case 7.3 just
> > requires more of it, but the heap consumption graph is still growing on
> > each commit. Heap space cannot be reclaimed by forcing the garbage
> > collector to run, everything just piles up in the OldGen. Running with this
> > slightly larger heap, the first nodes will run out of memory in about two
> > and a half hours after cluster restart.
> >
> > The heap eating cluster is a 2shard/3replica system on separate nodes.
> > Each replica is about 50 GB in size and about 8.5 million documents. On
> > 7.2.1 it ran fine with just a 2 GB heap. With 7.3 and 2.5 GB heap, it will
> > take just a little longer for it to run out of memory.
> >
> > I inspected reports shown by the sampler of VisualVM and spotted one
> > peculiarity, the number of instances of SortedIntDocSet kept growing on
> > each commit by about the same amount as the number of cached filter
> > queries. But this doesn't happen on the logs cluster, SortedIntDocSet
> > instances are neatly collected there. The number of instances also accounts
> > for the number of commits since start up times the cache sizes
> >
> > Our other two clusters don't have this problem, one of them receives very
> > few commits per day, but the other receives data all the time, it logs user
> > interactions so a large amount of data is coming in all the time. I cannot
> > reproduce it locally by indexing data and committing all the time, the peak
> > usage in OldGen stays about the same. But, i can reproduce it locally when
> > i introduce queries, and filter queries while indexing pieces of data and
> > committing it.
> >
> > So, what is the problem? I dug in the CHANGES.txt of both Lucene and Solr,
> > but nothing really caught my attention. Does anyone here have an idea where
> > to look?
> >
> > Many thanks,
> > Markus
> >
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>
Reply | Threaded
Open this post in threaded view
|

Re: 7.3 appears to leak

Yonik Seeley
> * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;

If these are actually filterCache entries being leaked, it stands to
reason that a whole searcher is being leaked somewhere.

-Yonik
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Yonik,

If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.

There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.

The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.

I will get back to this topic tomorrow!

Many thanks,
Markus

 
 
-----Original message-----

> From:Yonik Seeley <[hidden email]>
> Sent: Thursday 28th June 2018 23:30
> To: [hidden email]
> Subject: Re: 7.3 appears to leak
>
> > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
>
> If these are actually filterCache entries being leaked, it stands to
> reason that a whole searcher is being leaked somewhere.
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Yonik,

I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.

I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.

You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.

So, what is next? I have no clues left.

Many, many thanks,
Markus
 
-----Original message-----

> From:Markus Jelsma <[hidden email]>
> Sent: Thursday 28th June 2018 23:52
> To: [hidden email]
> Subject: RE: 7.3 appears to leak
>
> Hello Yonik,
>
> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
>
> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
>
> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
>
> I will get back to this topic tomorrow!
>
> Many thanks,
> Markus
>
>  
>  
> -----Original message-----
> > From:Yonik Seeley <[hidden email]>
> > Sent: Thursday 28th June 2018 23:30
> > To: [hidden email]
> > Subject: Re: 7.3 appears to leak
> >
> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> >
> > If these are actually filterCache entries being leaked, it stands to
> > reason that a whole searcher is being leaked somewhere.
> >
> > -Yonik
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: 7.3 appears to leak

Erick Erickson
bq. The only custom stuff left is an extension of SearchHandler that
only writes numFound to the response headers.

Well, one more to go ;). It's incredibly easy to overlook
innocent-seeming calls that increment the underlying reference count
of some objects but don't decrement them, usually through a close
call. Which isn't necessarily a close if the underlying reference
count is still > 0.

You may infer that I've been there and done that ;). Sometime the
compiler warnings about "resource leak" can help pinpoint those too.

Best,
Erick

On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
<[hidden email]> wrote:

> Hello Yonik,
>
> I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
>
> I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
>
> You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
>
> So, what is next? I have no clues left.
>
> Many, many thanks,
> Markus
>
> -----Original message-----
>> From:Markus Jelsma <[hidden email]>
>> Sent: Thursday 28th June 2018 23:52
>> To: [hidden email]
>> Subject: RE: 7.3 appears to leak
>>
>> Hello Yonik,
>>
>> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
>>
>> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
>>
>> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
>>
>> I will get back to this topic tomorrow!
>>
>> Many thanks,
>> Markus
>>
>>
>>
>> -----Original message-----
>> > From:Yonik Seeley <[hidden email]>
>> > Sent: Thursday 28th June 2018 23:30
>> > To: [hidden email]
>> > Subject: Re: 7.3 appears to leak
>> >
>> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
>> >
>> > If these are actually filterCache entries being leaked, it stands to
>> > reason that a whole searcher is being leaked somewhere.
>> >
>> > -Yonik
>> >
>>
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Erick,

The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:

  public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
    super.handleRequestBody(req, rsp);
   
    if (rsp.getToLog().get("hits") instanceof Integer) {
      rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
    }
    if (rsp.getToLog().get("hits") instanceof Long) {
      rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
    }
  }

I am not sure this qualifies as one more to go.

Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.

I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?

Let me know :)

Many thanks!
Markus

-----Original message-----

> From:Erick Erickson <[hidden email]>
> Sent: Friday 29th June 2018 18:46
> To: solr-user <[hidden email]>
> Subject: Re: 7.3 appears to leak
>
> bq. The only custom stuff left is an extension of SearchHandler that
> only writes numFound to the response headers.
>
> Well, one more to go ;). It's incredibly easy to overlook
> innocent-seeming calls that increment the underlying reference count
> of some objects but don't decrement them, usually through a close
> call. Which isn't necessarily a close if the underlying reference
> count is still > 0.
>
> You may infer that I've been there and done that ;). Sometime the
> compiler warnings about "resource leak" can help pinpoint those too.
>
> Best,
> Erick
>
> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
> <[hidden email]> wrote:
> > Hello Yonik,
> >
> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
> >
> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
> >
> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
> >
> > So, what is next? I have no clues left.
> >
> > Many, many thanks,
> > Markus
> >
> > -----Original message-----
> >> From:Markus Jelsma <[hidden email]>
> >> Sent: Thursday 28th June 2018 23:52
> >> To: [hidden email]
> >> Subject: RE: 7.3 appears to leak
> >>
> >> Hello Yonik,
> >>
> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
> >>
> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
> >>
> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
> >>
> >> I will get back to this topic tomorrow!
> >>
> >> Many thanks,
> >> Markus
> >>
> >>
> >>
> >> -----Original message-----
> >> > From:Yonik Seeley <[hidden email]>
> >> > Sent: Thursday 28th June 2018 23:30
> >> > To: [hidden email]
> >> > Subject: Re: 7.3 appears to leak
> >> >
> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> >> >
> >> > If these are actually filterCache entries being leaked, it stands to
> >> > reason that a whole searcher is being leaked somewhere.
> >> >
> >> > -Yonik
> >> >
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: 7.3 appears to leak

Erick Erickson
This is truly puzzling then, I'm clueless. It's hard to imagine this
is lurking out there and nobody else notices, but you've eliminated
the custom code. And this is also very peculiar:

* it occurs only in our main text search collection, all other
collections are unaffected;
* despite what i said earlier, it is so far unreproducible outside
production, even when mimicking production as good as we can;

Here's a tedious idea. Restart Solr with the -v option, I _think_ that
shows you each and every jar file Solr loads. Is it "somehow" possible
that your main collection is loading some jar from somewhere that's
different than you expect? 'cause silly ideas like this are all I can
come up with.

Erick

On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
<[hidden email]> wrote:

> Hello Erick,
>
> The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
>
>   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
>     super.handleRequestBody(req, rsp);
>
>     if (rsp.getToLog().get("hits") instanceof Integer) {
>       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
>     }
>     if (rsp.getToLog().get("hits") instanceof Long) {
>       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
>     }
>   }
>
> I am not sure this qualifies as one more to go.
>
> Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
>
> I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
>
> Let me know :)
>
> Many thanks!
> Markus
>
> -----Original message-----
>> From:Erick Erickson <[hidden email]>
>> Sent: Friday 29th June 2018 18:46
>> To: solr-user <[hidden email]>
>> Subject: Re: 7.3 appears to leak
>>
>> bq. The only custom stuff left is an extension of SearchHandler that
>> only writes numFound to the response headers.
>>
>> Well, one more to go ;). It's incredibly easy to overlook
>> innocent-seeming calls that increment the underlying reference count
>> of some objects but don't decrement them, usually through a close
>> call. Which isn't necessarily a close if the underlying reference
>> count is still > 0.
>>
>> You may infer that I've been there and done that ;). Sometime the
>> compiler warnings about "resource leak" can help pinpoint those too.
>>
>> Best,
>> Erick
>>
>> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
>> <[hidden email]> wrote:
>> > Hello Yonik,
>> >
>> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
>> >
>> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
>> >
>> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
>> >
>> > So, what is next? I have no clues left.
>> >
>> > Many, many thanks,
>> > Markus
>> >
>> > -----Original message-----
>> >> From:Markus Jelsma <[hidden email]>
>> >> Sent: Thursday 28th June 2018 23:52
>> >> To: [hidden email]
>> >> Subject: RE: 7.3 appears to leak
>> >>
>> >> Hello Yonik,
>> >>
>> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
>> >>
>> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
>> >>
>> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
>> >>
>> >> I will get back to this topic tomorrow!
>> >>
>> >> Many thanks,
>> >> Markus
>> >>
>> >>
>> >>
>> >> -----Original message-----
>> >> > From:Yonik Seeley <[hidden email]>
>> >> > Sent: Thursday 28th June 2018 23:30
>> >> > To: [hidden email]
>> >> > Subject: Re: 7.3 appears to leak
>> >> >
>> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
>> >> >
>> >> > If these are actually filterCache entries being leaked, it stands to
>> >> > reason that a whole searcher is being leaked somewhere.
>> >> >
>> >> > -Yonik
>> >> >
>> >>
>>
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Erick,

Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.

Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?

Many thanks?
Markus
 
-----Original message-----

> From:Erick Erickson <[hidden email]>
> Sent: Friday 29th June 2018 19:34
> To: solr-user <[hidden email]>
> Subject: Re: 7.3 appears to leak
>
> This is truly puzzling then, I'm clueless. It's hard to imagine this
> is lurking out there and nobody else notices, but you've eliminated
> the custom code. And this is also very peculiar:
>
> * it occurs only in our main text search collection, all other
> collections are unaffected;
> * despite what i said earlier, it is so far unreproducible outside
> production, even when mimicking production as good as we can;
>
> Here's a tedious idea. Restart Solr with the -v option, I _think_ that
> shows you each and every jar file Solr loads. Is it "somehow" possible
> that your main collection is loading some jar from somewhere that's
> different than you expect? 'cause silly ideas like this are all I can
> come up with.
>
> Erick
>
> On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
> <[hidden email]> wrote:
> > Hello Erick,
> >
> > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
> >
> >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
> >     super.handleRequestBody(req, rsp);
> >
> >     if (rsp.getToLog().get("hits") instanceof Integer) {
> >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
> >     }
> >     if (rsp.getToLog().get("hits") instanceof Long) {
> >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
> >     }
> >   }
> >
> > I am not sure this qualifies as one more to go.
> >
> > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
> >
> > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
> >
> > Let me know :)
> >
> > Many thanks!
> > Markus
> >
> > -----Original message-----
> >> From:Erick Erickson <[hidden email]>
> >> Sent: Friday 29th June 2018 18:46
> >> To: solr-user <[hidden email]>
> >> Subject: Re: 7.3 appears to leak
> >>
> >> bq. The only custom stuff left is an extension of SearchHandler that
> >> only writes numFound to the response headers.
> >>
> >> Well, one more to go ;). It's incredibly easy to overlook
> >> innocent-seeming calls that increment the underlying reference count
> >> of some objects but don't decrement them, usually through a close
> >> call. Which isn't necessarily a close if the underlying reference
> >> count is still > 0.
> >>
> >> You may infer that I've been there and done that ;). Sometime the
> >> compiler warnings about "resource leak" can help pinpoint those too.
> >>
> >> Best,
> >> Erick
> >>
> >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
> >> <[hidden email]> wrote:
> >> > Hello Yonik,
> >> >
> >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
> >> >
> >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
> >> >
> >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
> >> >
> >> > So, what is next? I have no clues left.
> >> >
> >> > Many, many thanks,
> >> > Markus
> >> >
> >> > -----Original message-----
> >> >> From:Markus Jelsma <[hidden email]>
> >> >> Sent: Thursday 28th June 2018 23:52
> >> >> To: [hidden email]
> >> >> Subject: RE: 7.3 appears to leak
> >> >>
> >> >> Hello Yonik,
> >> >>
> >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
> >> >>
> >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
> >> >>
> >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
> >> >>
> >> >> I will get back to this topic tomorrow!
> >> >>
> >> >> Many thanks,
> >> >> Markus
> >> >>
> >> >>
> >> >>
> >> >> -----Original message-----
> >> >> > From:Yonik Seeley <[hidden email]>
> >> >> > Sent: Thursday 28th June 2018 23:30
> >> >> > To: [hidden email]
> >> >> > Subject: Re: 7.3 appears to leak
> >> >> >
> >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> >> >> >
> >> >> > If these are actually filterCache entries being leaked, it stands to
> >> >> > reason that a whole searcher is being leaked somewhere.
> >> >> >
> >> >> > -Yonik
> >> >> >
> >> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: 7.3 appears to leak

kydryavtsev andrey
If it is not possible to find a resource leak by code analysis and there is no better ideas, I can suggest a brute force approach:
- Clone Solr's sources from appropriate branch https://github.com/apache/lucene-solr/tree/branch_7_3
- Log every searcher's holder increment/decrement operation in a way to catch every caller name (use Thread.currentThread().getStackTrace() or something) https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
- Build custom artefacts and upload them on prod
- After memory leak happened - analyse logs to see what part of functionality doesn't decrement searcher after counter was incremented. If searchers are leaked - there should be such code I guess.

This is not something someone would like to do, but it is what it is.



Thank you,

Andrey Kudryavtsev


03.07.2018, 14:26, "Markus Jelsma" <[hidden email]>:

> Hello Erick,
>
> Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.
>
> Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?
>
> Many thanks?
> Markus
>
> -----Original message-----
>>  From:Erick Erickson <[hidden email]>
>>  Sent: Friday 29th June 2018 19:34
>>  To: solr-user <[hidden email]>
>>  Subject: Re: 7.3 appears to leak
>>
>>  This is truly puzzling then, I'm clueless. It's hard to imagine this
>>  is lurking out there and nobody else notices, but you've eliminated
>>  the custom code. And this is also very peculiar:
>>
>>  * it occurs only in our main text search collection, all other
>>  collections are unaffected;
>>  * despite what i said earlier, it is so far unreproducible outside
>>  production, even when mimicking production as good as we can;
>>
>>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
>>  shows you each and every jar file Solr loads. Is it "somehow" possible
>>  that your main collection is loading some jar from somewhere that's
>>  different than you expect? 'cause silly ideas like this are all I can
>>  come up with.
>>
>>  Erick
>>
>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
>>  <[hidden email]> wrote:
>>  > Hello Erick,
>>  >
>>  > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
>>  >
>>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
>>  >     super.handleRequestBody(req, rsp);
>>  >
>>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
>>  >     }
>>  >     if (rsp.getToLog().get("hits") instanceof Long) {
>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
>>  >     }
>>  >   }
>>  >
>>  > I am not sure this qualifies as one more to go.
>>  >
>>  > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
>>  >
>>  > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
>>  >
>>  > Let me know :)
>>  >
>>  > Many thanks!
>>  > Markus
>>  >
>>  > -----Original message-----
>>  >> From:Erick Erickson <[hidden email]>
>>  >> Sent: Friday 29th June 2018 18:46
>>  >> To: solr-user <[hidden email]>
>>  >> Subject: Re: 7.3 appears to leak
>>  >>
>>  >> bq. The only custom stuff left is an extension of SearchHandler that
>>  >> only writes numFound to the response headers.
>>  >>
>>  >> Well, one more to go ;). It's incredibly easy to overlook
>>  >> innocent-seeming calls that increment the underlying reference count
>>  >> of some objects but don't decrement them, usually through a close
>>  >> call. Which isn't necessarily a close if the underlying reference
>>  >> count is still > 0.
>>  >>
>>  >> You may infer that I've been there and done that ;). Sometime the
>>  >> compiler warnings about "resource leak" can help pinpoint those too.
>>  >>
>>  >> Best,
>>  >> Erick
>>  >>
>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
>>  >> <[hidden email]> wrote:
>>  >> > Hello Yonik,
>>  >> >
>>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
>>  >> >
>>  >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
>>  >> >
>>  >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
>>  >> >
>>  >> > So, what is next? I have no clues left.
>>  >> >
>>  >> > Many, many thanks,
>>  >> > Markus
>>  >> >
>>  >> > -----Original message-----
>>  >> >> From:Markus Jelsma <[hidden email]>
>>  >> >> Sent: Thursday 28th June 2018 23:52
>>  >> >> To: [hidden email]
>>  >> >> Subject: RE: 7.3 appears to leak
>>  >> >>
>>  >> >> Hello Yonik,
>>  >> >>
>>  >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
>>  >> >>
>>  >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
>>  >> >>
>>  >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
>>  >> >>
>>  >> >> I will get back to this topic tomorrow!
>>  >> >>
>>  >> >> Many thanks,
>>  >> >> Markus
>>  >> >>
>>  >> >>
>>  >> >>
>>  >> >> -----Original message-----
>>  >> >> > From:Yonik Seeley <[hidden email]>
>>  >> >> > Sent: Thursday 28th June 2018 23:30
>>  >> >> > To: [hidden email]
>>  >> >> > Subject: Re: 7.3 appears to leak
>>  >> >> >
>>  >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
>>  >> >> >
>>  >> >> > If these are actually filterCache entries being leaked, it stands to
>>  >> >> > reason that a whole searcher is being leaked somewhere.
>>  >> >> >
>>  >> >> > -Yonik
>>  >> >> >
>>  >> >>
>>  >>
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Andrey,

I didn't think of that! I will try it when i have the courage again, probably next week or so.

Many thanks,
Markus
 
 
-----Original message-----

> From:Kydryavtsev Andrey <[hidden email]>
> Sent: Wednesday 4th July 2018 14:48
> To: [hidden email]
> Subject: Re: 7.3 appears to leak
>
> If it is not possible to find a resource leak by code analysis and there is no better ideas, I can suggest a brute force approach:
> - Clone Solr's sources from appropriate branch https://github.com/apache/lucene-solr/tree/branch_7_3
> - Log every searcher's holder increment/decrement operation in a way to catch every caller name (use Thread.currentThread().getStackTrace() or something) https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
> - Build custom artefacts and upload them on prod
> - After memory leak happened - analyse logs to see what part of functionality doesn't decrement searcher after counter was incremented. If searchers are leaked - there should be such code I guess.
>
> This is not something someone would like to do, but it is what it is.
>
>
>
> Thank you,
>
> Andrey Kudryavtsev
>
>
> 03.07.2018, 14:26, "Markus Jelsma" <[hidden email]>:
> > Hello Erick,
> >
> > Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.
> >
> > Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?
> >
> > Many thanks?
> > Markus
> >
> > -----Original message-----
> >>  From:Erick Erickson <[hidden email]>
> >>  Sent: Friday 29th June 2018 19:34
> >>  To: solr-user <[hidden email]>
> >>  Subject: Re: 7.3 appears to leak
> >>
> >>  This is truly puzzling then, I'm clueless. It's hard to imagine this
> >>  is lurking out there and nobody else notices, but you've eliminated
> >>  the custom code. And this is also very peculiar:
> >>
> >>  * it occurs only in our main text search collection, all other
> >>  collections are unaffected;
> >>  * despite what i said earlier, it is so far unreproducible outside
> >>  production, even when mimicking production as good as we can;
> >>
> >>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
> >>  shows you each and every jar file Solr loads. Is it "somehow" possible
> >>  that your main collection is loading some jar from somewhere that's
> >>  different than you expect? 'cause silly ideas like this are all I can
> >>  come up with.
> >>
> >>  Erick
> >>
> >>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
> >>  <[hidden email]> wrote:
> >>  > Hello Erick,
> >>  >
> >>  > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
> >>  >
> >>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
> >>  >     super.handleRequestBody(req, rsp);
> >>  >
> >>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
> >>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
> >>  >     }
> >>  >     if (rsp.getToLog().get("hits") instanceof Long) {
> >>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
> >>  >     }
> >>  >   }
> >>  >
> >>  > I am not sure this qualifies as one more to go.
> >>  >
> >>  > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
> >>  >
> >>  > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
> >>  >
> >>  > Let me know :)
> >>  >
> >>  > Many thanks!
> >>  > Markus
> >>  >
> >>  > -----Original message-----
> >>  >> From:Erick Erickson <[hidden email]>
> >>  >> Sent: Friday 29th June 2018 18:46
> >>  >> To: solr-user <[hidden email]>
> >>  >> Subject: Re: 7.3 appears to leak
> >>  >>
> >>  >> bq. The only custom stuff left is an extension of SearchHandler that
> >>  >> only writes numFound to the response headers.
> >>  >>
> >>  >> Well, one more to go ;). It's incredibly easy to overlook
> >>  >> innocent-seeming calls that increment the underlying reference count
> >>  >> of some objects but don't decrement them, usually through a close
> >>  >> call. Which isn't necessarily a close if the underlying reference
> >>  >> count is still > 0.
> >>  >>
> >>  >> You may infer that I've been there and done that ;). Sometime the
> >>  >> compiler warnings about "resource leak" can help pinpoint those too.
> >>  >>
> >>  >> Best,
> >>  >> Erick
> >>  >>
> >>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
> >>  >> <[hidden email]> wrote:
> >>  >> > Hello Yonik,
> >>  >> >
> >>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
> >>  >> >
> >>  >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
> >>  >> >
> >>  >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
> >>  >> >
> >>  >> > So, what is next? I have no clues left.
> >>  >> >
> >>  >> > Many, many thanks,
> >>  >> > Markus
> >>  >> >
> >>  >> > -----Original message-----
> >>  >> >> From:Markus Jelsma <[hidden email]>
> >>  >> >> Sent: Thursday 28th June 2018 23:52
> >>  >> >> To: [hidden email]
> >>  >> >> Subject: RE: 7.3 appears to leak
> >>  >> >>
> >>  >> >> Hello Yonik,
> >>  >> >>
> >>  >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
> >>  >> >>
> >>  >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
> >>  >> >>
> >>  >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
> >>  >> >>
> >>  >> >> I will get back to this topic tomorrow!
> >>  >> >>
> >>  >> >> Many thanks,
> >>  >> >> Markus
> >>  >> >>
> >>  >> >>
> >>  >> >>
> >>  >> >> -----Original message-----
> >>  >> >> > From:Yonik Seeley <[hidden email]>
> >>  >> >> > Sent: Thursday 28th June 2018 23:30
> >>  >> >> > To: [hidden email]
> >>  >> >> > Subject: Re: 7.3 appears to leak
> >>  >> >> >
> >>  >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> >>  >> >> >
> >>  >> >> > If these are actually filterCache entries being leaked, it stands to
> >>  >> >> > reason that a whole searcher is being leaked somewhere.
> >>  >> >> >
> >>  >> >> > -Yonik
> >>  >> >> >
> >>  >> >>
> >>  >>
>
Reply | Threaded
Open this post in threaded view
|

Re: 7.3 appears to leak

Thomas Scheffler
Hi,

we noticed the same problems here in a rather small setup. 40.000 metadata documents with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has brought some tika issues the real problems started to appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was enough, now 6G aren’t enough to index all files.

kind regards,

Thomas

> Am 04.07.2018 um 15:03 schrieb Markus Jelsma <[hidden email]>:
>
> Hello Andrey,
>
> I didn't think of that! I will try it when i have the courage again, probably next week or so.
>
> Many thanks,
> Markus
>
>
> -----Original message-----
>> From:Kydryavtsev Andrey <[hidden email]>
>> Sent: Wednesday 4th July 2018 14:48
>> To: [hidden email]
>> Subject: Re: 7.3 appears to leak
>>
>> If it is not possible to find a resource leak by code analysis and there is no better ideas, I can suggest a brute force approach:
>> - Clone Solr's sources from appropriate branch https://github.com/apache/lucene-solr/tree/branch_7_3
>> - Log every searcher's holder increment/decrement operation in a way to catch every caller name (use Thread.currentThread().getStackTrace() or something) https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
>> - Build custom artefacts and upload them on prod
>> - After memory leak happened - analyse logs to see what part of functionality doesn't decrement searcher after counter was incremented. If searchers are leaked - there should be such code I guess.
>>
>> This is not something someone would like to do, but it is what it is.
>>
>>
>>
>> Thank you,
>>
>> Andrey Kudryavtsev
>>
>>
>> 03.07.2018, 14:26, "Markus Jelsma" <[hidden email]>:
>>> Hello Erick,
>>>
>>> Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.
>>>
>>> Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?
>>>
>>> Many thanks?
>>> Markus
>>>
>>> -----Original message-----
>>>>  From:Erick Erickson <[hidden email]>
>>>>  Sent: Friday 29th June 2018 19:34
>>>>  To: solr-user <[hidden email]>
>>>>  Subject: Re: 7.3 appears to leak
>>>>
>>>>  This is truly puzzling then, I'm clueless. It's hard to imagine this
>>>>  is lurking out there and nobody else notices, but you've eliminated
>>>>  the custom code. And this is also very peculiar:
>>>>
>>>>  * it occurs only in our main text search collection, all other
>>>>  collections are unaffected;
>>>>  * despite what i said earlier, it is so far unreproducible outside
>>>>  production, even when mimicking production as good as we can;
>>>>
>>>>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
>>>>  shows you each and every jar file Solr loads. Is it "somehow" possible
>>>>  that your main collection is loading some jar from somewhere that's
>>>>  different than you expect? 'cause silly ideas like this are all I can
>>>>  come up with.
>>>>
>>>>  Erick
>>>>
>>>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
>>>>  <[hidden email]> wrote:
>>>>  > Hello Erick,
>>>>  >
>>>>  > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
>>>>  >
>>>>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
>>>>  >     super.handleRequestBody(req, rsp);
>>>>  >
>>>>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
>>>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
>>>>  >     }
>>>>  >     if (rsp.getToLog().get("hits") instanceof Long) {
>>>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
>>>>  >     }
>>>>  >   }
>>>>  >
>>>>  > I am not sure this qualifies as one more to go.
>>>>  >
>>>>  > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
>>>>  >
>>>>  > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
>>>>  >
>>>>  > Let me know :)
>>>>  >
>>>>  > Many thanks!
>>>>  > Markus
>>>>  >
>>>>  > -----Original message-----
>>>>  >> From:Erick Erickson <[hidden email]>
>>>>  >> Sent: Friday 29th June 2018 18:46
>>>>  >> To: solr-user <[hidden email]>
>>>>  >> Subject: Re: 7.3 appears to leak
>>>>  >>
>>>>  >> bq. The only custom stuff left is an extension of SearchHandler that
>>>>  >> only writes numFound to the response headers.
>>>>  >>
>>>>  >> Well, one more to go ;). It's incredibly easy to overlook
>>>>  >> innocent-seeming calls that increment the underlying reference count
>>>>  >> of some objects but don't decrement them, usually through a close
>>>>  >> call. Which isn't necessarily a close if the underlying reference
>>>>  >> count is still > 0.
>>>>  >>
>>>>  >> You may infer that I've been there and done that ;). Sometime the
>>>>  >> compiler warnings about "resource leak" can help pinpoint those too.
>>>>  >>
>>>>  >> Best,
>>>>  >> Erick
>>>>  >>
>>>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
>>>>  >> <[hidden email]> wrote:
>>>>  >> > Hello Yonik,
>>>>  >> >
>>>>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
>>>>  >> >
>>>>  >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
>>>>  >> >
>>>>  >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
>>>>  >> >
>>>>  >> > So, what is next? I have no clues left.
>>>>  >> >
>>>>  >> > Many, many thanks,
>>>>  >> > Markus
>>>>  >> >
>>>>  >> > -----Original message-----
>>>>  >> >> From:Markus Jelsma <[hidden email]>
>>>>  >> >> Sent: Thursday 28th June 2018 23:52
>>>>  >> >> To: [hidden email]
>>>>  >> >> Subject: RE: 7.3 appears to leak
>>>>  >> >>
>>>>  >> >> Hello Yonik,
>>>>  >> >>
>>>>  >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
>>>>  >> >>
>>>>  >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
>>>>  >> >>
>>>>  >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
>>>>  >> >>
>>>>  >> >> I will get back to this topic tomorrow!
>>>>  >> >>
>>>>  >> >> Many thanks,
>>>>  >> >> Markus
>>>>  >> >>
>>>>  >> >>
>>>>  >> >>
>>>>  >> >> -----Original message-----
>>>>  >> >> > From:Yonik Seeley <[hidden email]>
>>>>  >> >> > Sent: Thursday 28th June 2018 23:30
>>>>  >> >> > To: [hidden email]
>>>>  >> >> > Subject: Re: 7.3 appears to leak
>>>>  >> >> >
>>>>  >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
>>>>  >> >> >
>>>>  >> >> > If these are actually filterCache entries being leaked, it stands to
>>>>  >> >> > reason that a whole searcher is being leaked somewhere.
>>>>  >> >> >
>>>>  >> >> > -Yonik
>>>>  >> >> >
>>>>  >> >>
>>>>  >>
>>


signature.asc (849 bytes) Download Attachment
Reply | Threaded
Open this post in threaded view
|

RE: 7.3 appears to leak

Markus Jelsma-2
In reply to this post by Markus Jelsma-2
Hello Thomas,

To be absolutely sure you suffer from the same problem as one of our collections, can you confirm that your Solr cores are leaking a SolrIndexSearcher instance on each commit? If not, there may be a second problem.

Also, do you run any custom plugins or apply patches to your Solr instances? Or is your Solr a 100 % official build?

Thanks,
Markus

 
 
-----Original message-----

> From:Thomas Scheffler <[hidden email]>
> Sent: Monday 16th July 2018 13:39
> To: [hidden email]
> Subject: Re: 7.3 appears to leak
>
> Hi,
>
> we noticed the same problems here in a rather small setup. 40.000 metadata documents with nearly as much files that have „literal.*“ fields with it. While 7.2.1 has brought some tika issues the real problems started to appear with version 7.3.0 which are currently unresolved in 7.4.0. Memory consumption is out-of-roof. Where previously 512MB heap was enough, now 6G aren’t enough to index all files.
>
> kind regards,
>
> Thomas
>
> > Am 04.07.2018 um 15:03 schrieb Markus Jelsma <[hidden email]>:
> >
> > Hello Andrey,
> >
> > I didn't think of that! I will try it when i have the courage again, probably next week or so.
> >
> > Many thanks,
> > Markus
> >
> >
> > -----Original message-----
> >> From:Kydryavtsev Andrey <[hidden email]>
> >> Sent: Wednesday 4th July 2018 14:48
> >> To: [hidden email]
> >> Subject: Re: 7.3 appears to leak
> >>
> >> If it is not possible to find a resource leak by code analysis and there is no better ideas, I can suggest a brute force approach:
> >> - Clone Solr's sources from appropriate branch https://github.com/apache/lucene-solr/tree/branch_7_3
> >> - Log every searcher's holder increment/decrement operation in a way to catch every caller name (use Thread.currentThread().getStackTrace() or something) https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
> >> - Build custom artefacts and upload them on prod
> >> - After memory leak happened - analyse logs to see what part of functionality doesn't decrement searcher after counter was incremented. If searchers are leaked - there should be such code I guess.
> >>
> >> This is not something someone would like to do, but it is what it is.
> >>
> >>
> >>
> >> Thank you,
> >>
> >> Andrey Kudryavtsev
> >>
> >>
> >> 03.07.2018, 14:26, "Markus Jelsma" <[hidden email]>:
> >>> Hello Erick,
> >>>
> >>> Even the silliest ideas may help us, but unfortunately this is not the case. All our Solr nodes run binaries from the same source from our central build server, with the same libraries thanks to provisioning. Only schema and config are different, but the <lib/> directive is the same all over.
> >>>
> >>> Are there any other ideas, speculations, whatever, on why only our main text collection leaks a SolrIndexSearcher instance on commit since 7.3.0 and every version up?
> >>>
> >>> Many thanks?
> >>> Markus
> >>>
> >>> -----Original message-----
> >>>>  From:Erick Erickson <[hidden email]>
> >>>>  Sent: Friday 29th June 2018 19:34
> >>>>  To: solr-user <[hidden email]>
> >>>>  Subject: Re: 7.3 appears to leak
> >>>>
> >>>>  This is truly puzzling then, I'm clueless. It's hard to imagine this
> >>>>  is lurking out there and nobody else notices, but you've eliminated
> >>>>  the custom code. And this is also very peculiar:
> >>>>
> >>>>  * it occurs only in our main text search collection, all other
> >>>>  collections are unaffected;
> >>>>  * despite what i said earlier, it is so far unreproducible outside
> >>>>  production, even when mimicking production as good as we can;
> >>>>
> >>>>  Here's a tedious idea. Restart Solr with the -v option, I _think_ that
> >>>>  shows you each and every jar file Solr loads. Is it "somehow" possible
> >>>>  that your main collection is loading some jar from somewhere that's
> >>>>  different than you expect? 'cause silly ideas like this are all I can
> >>>>  come up with.
> >>>>
> >>>>  Erick
> >>>>
> >>>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
> >>>>  <[hidden email]> wrote:
> >>>>  > Hello Erick,
> >>>>  >
> >>>>  > The custom search handler doesn't interact with SolrIndexSearcher, this is really all it does:
> >>>>  >
> >>>>  >   public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp) throws Exception {
> >>>>  >     super.handleRequestBody(req, rsp);
> >>>>  >
> >>>>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
> >>>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Integer)rsp.getToLog().get("hits")));
> >>>>  >     }
> >>>>  >     if (rsp.getToLog().get("hits") instanceof Long) {
> >>>>  >       rsp.addHttpHeader("X-Solr-Hits", String.valueOf((Long)rsp.getToLog().get("hits")));
> >>>>  >     }
> >>>>  >   }
> >>>>  >
> >>>>  > I am not sure this qualifies as one more to go.
> >>>>  >
> >>>>  > Re: compiler warnings on resources, yes! This and tests failing due to resources leaks have always warned me when i forgot to release something or decrement a reference. But except for the above method (and the token filters which i really can't disable) are all that is left.
> >>>>  >
> >>>>  > I am quite desperate about this problem so although i am unwilling to disable stuff, i can do it if i must. But i so reason, yet, to remove the search handler or the token filter stuff, i mean, how could those leak a SolrIndexSearcher?
> >>>>  >
> >>>>  > Let me know :)
> >>>>  >
> >>>>  > Many thanks!
> >>>>  > Markus
> >>>>  >
> >>>>  > -----Original message-----
> >>>>  >> From:Erick Erickson <[hidden email]>
> >>>>  >> Sent: Friday 29th June 2018 18:46
> >>>>  >> To: solr-user <[hidden email]>
> >>>>  >> Subject: Re: 7.3 appears to leak
> >>>>  >>
> >>>>  >> bq. The only custom stuff left is an extension of SearchHandler that
> >>>>  >> only writes numFound to the response headers.
> >>>>  >>
> >>>>  >> Well, one more to go ;). It's incredibly easy to overlook
> >>>>  >> innocent-seeming calls that increment the underlying reference count
> >>>>  >> of some objects but don't decrement them, usually through a close
> >>>>  >> call. Which isn't necessarily a close if the underlying reference
> >>>>  >> count is still > 0.
> >>>>  >>
> >>>>  >> You may infer that I've been there and done that ;). Sometime the
> >>>>  >> compiler warnings about "resource leak" can help pinpoint those too.
> >>>>  >>
> >>>>  >> Best,
> >>>>  >> Erick
> >>>>  >>
> >>>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
> >>>>  >> <[hidden email]> wrote:
> >>>>  >> > Hello Yonik,
> >>>>  >> >
> >>>>  >> > I took one node of the 7.2.1 cluster out of the load balancer so it would only receive shard queries, this way i could kind of 'safely' disable our custom components one by one, while keeping functionality in place by letting the other 7.2.1 nodes continue on with the full configuration.
> >>>>  >> >
> >>>>  >> > I am now at a point where literally all custom components are deleted or commented out in the config for the node running 7.4. The only custom stuff left is an extension of SearchHandler that only writes numFound to the response headers, and all the token filters in our schema.
> >>>>  >> >
> >>>>  >> > You were right, it was leaking exactly one SolrIndexSearcher instance on each commit. But, with all our stuff gone, the leak is still there! I triple checked it! Of course, the bastard is locally still not reproducible.
> >>>>  >> >
> >>>>  >> > So, what is next? I have no clues left.
> >>>>  >> >
> >>>>  >> > Many, many thanks,
> >>>>  >> > Markus
> >>>>  >> >
> >>>>  >> > -----Original message-----
> >>>>  >> >> From:Markus Jelsma <[hidden email]>
> >>>>  >> >> Sent: Thursday 28th June 2018 23:52
> >>>>  >> >> To: [hidden email]
> >>>>  >> >> Subject: RE: 7.3 appears to leak
> >>>>  >> >>
> >>>>  >> >> Hello Yonik,
> >>>>  >> >>
> >>>>  >> >> If leaking a whole SolrIndexSearcher would cause this problem, then the only custom component would be our copy/paste-and-enhance version of the elevator component, is the root of all problems. It is a direct copy of the 7.2 source where only things like getAnalyzedQuery, the ElevationObj and the loop over the map entries is changed.
> >>>>  >> >>
> >>>>  >> >> There are no changes to code related to the searcher. Other component where we get a RefCount of searcher is used without issues, we always decrement the reference after using it. But those components are not in use in this collection.
> >>>>  >> >>
> >>>>  >> >> The source has changed a lot with 7.4 but we still use the old code. I will investigate the component thoroughly, even revert to the old 7.2 vanilla component for a brief period in production for one machine. It may not be a problem if i don't let our load balancer access it directly, so it only serves shard queries.
> >>>>  >> >>
> >>>>  >> >> I will get back to this topic tomorrow!
> >>>>  >> >>
> >>>>  >> >> Many thanks,
> >>>>  >> >> Markus
> >>>>  >> >>
> >>>>  >> >>
> >>>>  >> >>
> >>>>  >> >> -----Original message-----
> >>>>  >> >> > From:Yonik Seeley <[hidden email]>
> >>>>  >> >> > Sent: Thursday 28th June 2018 23:30
> >>>>  >> >> > To: [hidden email]
> >>>>  >> >> > Subject: Re: 7.3 appears to leak
> >>>>  >> >> >
> >>>>  >> >> > > * SortedIntDocSet instances ánd ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> >>>>  >> >> >
> >>>>  >> >> > If these are actually filterCache entries being leaked, it stands to
> >>>>  >> >> > reason that a whole searcher is being leaked somewhere.
> >>>>  >> >> >
> >>>>  >> >> > -Yonik
> >>>>  >> >> >
> >>>>  >> >>
> >>>>  >>
> >>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: 7.3 appears to leak

Tomás Fernández Löbbe
I created SOLR-12743 to track this.

On Mon, Jul 16, 2018 at 12:30 PM Markus Jelsma <[hidden email]>
wrote:

> Hello Thomas,
>
> To be absolutely sure you suffer from the same problem as one of our
> collections, can you confirm that your Solr cores are leaking a
> SolrIndexSearcher instance on each commit? If not, there may be a second
> problem.
>
> Also, do you run any custom plugins or apply patches to your Solr
> instances? Or is your Solr a 100 % official build?
>
> Thanks,
> Markus
>
>
>
> -----Original message-----
> > From:Thomas Scheffler <[hidden email]>
> > Sent: Monday 16th July 2018 13:39
> > To: [hidden email]
> > Subject: Re: 7.3 appears to leak
> >
> > Hi,
> >
> > we noticed the same problems here in a rather small setup. 40.000
> metadata documents with nearly as much files that have „literal.*“ fields
> with it. While 7.2.1 has brought some tika issues the real problems started
> to appear with version 7.3.0 which are currently unresolved in 7.4.0.
> Memory consumption is out-of-roof. Where previously 512MB heap was enough,
> now 6G aren’t enough to index all files.
> >
> > kind regards,
> >
> > Thomas
> >
> > > Am 04.07.2018 um 15:03 schrieb Markus Jelsma <
> [hidden email]>:
> > >
> > > Hello Andrey,
> > >
> > > I didn't think of that! I will try it when i have the courage again,
> probably next week or so.
> > >
> > > Many thanks,
> > > Markus
> > >
> > >
> > > -----Original message-----
> > >> From:Kydryavtsev Andrey <[hidden email]>
> > >> Sent: Wednesday 4th July 2018 14:48
> > >> To: [hidden email]
> > >> Subject: Re: 7.3 appears to leak
> > >>
> > >> If it is not possible to find a resource leak by code analysis and
> there is no better ideas, I can suggest a brute force approach:
> > >> - Clone Solr's sources from appropriate branch
> https://github.com/apache/lucene-solr/tree/branch_7_3
> > >> - Log every searcher's holder increment/decrement operation in a way
> to catch every caller name (use Thread.currentThread().getStackTrace() or
> something)
> https://github.com/apache/lucene-solr/blob/branch_7_3/solr/core/src/java/org/apache/solr/util/RefCounted.java
> > >> - Build custom artefacts and upload them on prod
> > >> - After memory leak happened - analyse logs to see what part of
> functionality doesn't decrement searcher after counter was incremented. If
> searchers are leaked - there should be such code I guess.
> > >>
> > >> This is not something someone would like to do, but it is what it is.
> > >>
> > >>
> > >>
> > >> Thank you,
> > >>
> > >> Andrey Kudryavtsev
> > >>
> > >>
> > >> 03.07.2018, 14:26, "Markus Jelsma" <[hidden email]>:
> > >>> Hello Erick,
> > >>>
> > >>> Even the silliest ideas may help us, but unfortunately this is not
> the case. All our Solr nodes run binaries from the same source from our
> central build server, with the same libraries thanks to provisioning. Only
> schema and config are different, but the <lib/> directive is the same all
> over.
> > >>>
> > >>> Are there any other ideas, speculations, whatever, on why only our
> main text collection leaks a SolrIndexSearcher instance on commit since
> 7.3.0 and every version up?
> > >>>
> > >>> Many thanks?
> > >>> Markus
> > >>>
> > >>> -----Original message-----
> > >>>>  From:Erick Erickson <[hidden email]>
> > >>>>  Sent: Friday 29th June 2018 19:34
> > >>>>  To: solr-user <[hidden email]>
> > >>>>  Subject: Re: 7.3 appears to leak
> > >>>>
> > >>>>  This is truly puzzling then, I'm clueless. It's hard to imagine
> this
> > >>>>  is lurking out there and nobody else notices, but you've eliminated
> > >>>>  the custom code. And this is also very peculiar:
> > >>>>
> > >>>>  * it occurs only in our main text search collection, all other
> > >>>>  collections are unaffected;
> > >>>>  * despite what i said earlier, it is so far unreproducible outside
> > >>>>  production, even when mimicking production as good as we can;
> > >>>>
> > >>>>  Here's a tedious idea. Restart Solr with the -v option, I _think_
> that
> > >>>>  shows you each and every jar file Solr loads. Is it "somehow"
> possible
> > >>>>  that your main collection is loading some jar from somewhere that's
> > >>>>  different than you expect? 'cause silly ideas like this are all I
> can
> > >>>>  come up with.
> > >>>>
> > >>>>  Erick
> > >>>>
> > >>>>  On Fri, Jun 29, 2018 at 9:56 AM, Markus Jelsma
> > >>>>  <[hidden email]> wrote:
> > >>>>  > Hello Erick,
> > >>>>  >
> > >>>>  > The custom search handler doesn't interact with
> SolrIndexSearcher, this is really all it does:
> > >>>>  >
> > >>>>  >   public void handleRequestBody(SolrQueryRequest req,
> SolrQueryResponse rsp) throws Exception {
> > >>>>  >     super.handleRequestBody(req, rsp);
> > >>>>  >
> > >>>>  >     if (rsp.getToLog().get("hits") instanceof Integer) {
> > >>>>  >       rsp.addHttpHeader("X-Solr-Hits",
> String.valueOf((Integer)rsp.getToLog().get("hits")));
> > >>>>  >     }
> > >>>>  >     if (rsp.getToLog().get("hits") instanceof Long) {
> > >>>>  >       rsp.addHttpHeader("X-Solr-Hits",
> String.valueOf((Long)rsp.getToLog().get("hits")));
> > >>>>  >     }
> > >>>>  >   }
> > >>>>  >
> > >>>>  > I am not sure this qualifies as one more to go.
> > >>>>  >
> > >>>>  > Re: compiler warnings on resources, yes! This and tests failing
> due to resources leaks have always warned me when i forgot to release
> something or decrement a reference. But except for the above method (and
> the token filters which i really can't disable) are all that is left.
> > >>>>  >
> > >>>>  > I am quite desperate about this problem so although i am
> unwilling to disable stuff, i can do it if i must. But i so reason, yet, to
> remove the search handler or the token filter stuff, i mean, how could
> those leak a SolrIndexSearcher?
> > >>>>  >
> > >>>>  > Let me know :)
> > >>>>  >
> > >>>>  > Many thanks!
> > >>>>  > Markus
> > >>>>  >
> > >>>>  > -----Original message-----
> > >>>>  >> From:Erick Erickson <[hidden email]>
> > >>>>  >> Sent: Friday 29th June 2018 18:46
> > >>>>  >> To: solr-user <[hidden email]>
> > >>>>  >> Subject: Re: 7.3 appears to leak
> > >>>>  >>
> > >>>>  >> bq. The only custom stuff left is an extension of SearchHandler
> that
> > >>>>  >> only writes numFound to the response headers.
> > >>>>  >>
> > >>>>  >> Well, one more to go ;). It's incredibly easy to overlook
> > >>>>  >> innocent-seeming calls that increment the underlying reference
> count
> > >>>>  >> of some objects but don't decrement them, usually through a
> close
> > >>>>  >> call. Which isn't necessarily a close if the underlying
> reference
> > >>>>  >> count is still > 0.
> > >>>>  >>
> > >>>>  >> You may infer that I've been there and done that ;). Sometime
> the
> > >>>>  >> compiler warnings about "resource leak" can help pinpoint those
> too.
> > >>>>  >>
> > >>>>  >> Best,
> > >>>>  >> Erick
> > >>>>  >>
> > >>>>  >> On Fri, Jun 29, 2018 at 9:16 AM, Markus Jelsma
> > >>>>  >> <[hidden email]> wrote:
> > >>>>  >> > Hello Yonik,
> > >>>>  >> >
> > >>>>  >> > I took one node of the 7.2.1 cluster out of the load balancer
> so it would only receive shard queries, this way i could kind of 'safely'
> disable our custom components one by one, while keeping functionality in
> place by letting the other 7.2.1 nodes continue on with the full
> configuration.
> > >>>>  >> >
> > >>>>  >> > I am now at a point where literally all custom components are
> deleted or commented out in the config for the node running 7.4. The only
> custom stuff left is an extension of SearchHandler that only writes
> numFound to the response headers, and all the token filters in our schema.
> > >>>>  >> >
> > >>>>  >> > You were right, it was leaking exactly one SolrIndexSearcher
> instance on each commit. But, with all our stuff gone, the leak is still
> there! I triple checked it! Of course, the bastard is locally still not
> reproducible.
> > >>>>  >> >
> > >>>>  >> > So, what is next? I have no clues left.
> > >>>>  >> >
> > >>>>  >> > Many, many thanks,
> > >>>>  >> > Markus
> > >>>>  >> >
> > >>>>  >> > -----Original message-----
> > >>>>  >> >> From:Markus Jelsma <[hidden email]>
> > >>>>  >> >> Sent: Thursday 28th June 2018 23:52
> > >>>>  >> >> To: [hidden email]
> > >>>>  >> >> Subject: RE: 7.3 appears to leak
> > >>>>  >> >>
> > >>>>  >> >> Hello Yonik,
> > >>>>  >> >>
> > >>>>  >> >> If leaking a whole SolrIndexSearcher would cause this
> problem, then the only custom component would be our copy/paste-and-enhance
> version of the elevator component, is the root of all problems. It is a
> direct copy of the 7.2 source where only things like getAnalyzedQuery, the
> ElevationObj and the loop over the map entries is changed.
> > >>>>  >> >>
> > >>>>  >> >> There are no changes to code related to the searcher. Other
> component where we get a RefCount of searcher is used without issues, we
> always decrement the reference after using it. But those components are not
> in use in this collection.
> > >>>>  >> >>
> > >>>>  >> >> The source has changed a lot with 7.4 but we still use the
> old code. I will investigate the component thoroughly, even revert to the
> old 7.2 vanilla component for a brief period in production for one machine.
> It may not be a problem if i don't let our load balancer access it
> directly, so it only serves shard queries.
> > >>>>  >> >>
> > >>>>  >> >> I will get back to this topic tomorrow!
> > >>>>  >> >>
> > >>>>  >> >> Many thanks,
> > >>>>  >> >> Markus
> > >>>>  >> >>
> > >>>>  >> >>
> > >>>>  >> >>
> > >>>>  >> >> -----Original message-----
> > >>>>  >> >> > From:Yonik Seeley <[hidden email]>
> > >>>>  >> >> > Sent: Thursday 28th June 2018 23:30
> > >>>>  >> >> > To: [hidden email]
> > >>>>  >> >> > Subject: Re: 7.3 appears to leak
> > >>>>  >> >> >
> > >>>>  >> >> > > * SortedIntDocSet instances ánd
> ConcurrentLRUCache$CacheEntry instances are both leaked on commit;
> > >>>>  >> >> >
> > >>>>  >> >> > If these are actually filterCache entries being leaked, it
> stands to
> > >>>>  >> >> > reason that a whole searcher is being leaked somewhere.
> > >>>>  >> >> >
> > >>>>  >> >> > -Yonik
> > >>>>  >> >> >
> > >>>>  >> >>
> > >>>>  >>
> > >>
> >
> >
> >
>