RamUsageCrawler

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

RamUsageCrawler

Michael Sokolov-4
Hi, I'm using RamUsageCrawler to size some things, and I find it seems
to underestimate the size of Map (eg HashMap and ConcurrentHashMap).
This is using a Java 10 runtime, with code compiled to Java 8. I
looked at the implementation and it seems as if for JRE classes, when
JRE >= 9, we can no longer use reflection to size them accurately?
Instead the implementation estimates the map size by treating it as an
array of (keys and value) plus some constant header size. But this
seems to neglect the size of the HashMap$Node (in the case of HashMap
- I haven't looked at ConcurrentHashMap or TreeMap or anything). In my
case, I have a great many maps of a relatively small number of shared
keys and values, so the crawler seems to be wildly under-counting. I'm
comparing to sizes gleaned from heap dumps, eclipse mat, and OOM
events.

I wonder if we can we improve on the estimates for Maps and other Collections?

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: RamUsageCrawler

Michael McCandless-2
I think you mean RamUsageEstimator (in Lucene's test-framework)?

It's entirely possible it fails to dig into Maps correctly with newer Java
releases; maybe Dawid or Uwe would know?

Mike McCandless

http://blog.mikemccandless.com


On Tue, Dec 4, 2018 at 12:18 PM Michael Sokolov <[hidden email]> wrote:

> Hi, I'm using RamUsageCrawler to size some things, and I find it seems
> to underestimate the size of Map (eg HashMap and ConcurrentHashMap).
> This is using a Java 10 runtime, with code compiled to Java 8. I
> looked at the implementation and it seems as if for JRE classes, when
> JRE >= 9, we can no longer use reflection to size them accurately?
> Instead the implementation estimates the map size by treating it as an
> array of (keys and value) plus some constant header size. But this
> seems to neglect the size of the HashMap$Node (in the case of HashMap
> - I haven't looked at ConcurrentHashMap or TreeMap or anything). In my
> case, I have a great many maps of a relatively small number of shared
> keys and values, so the crawler seems to be wildly under-counting. I'm
> comparing to sizes gleaned from heap dumps, eclipse mat, and OOM
> events.
>
> I wonder if we can we improve on the estimates for Maps and other
> Collections?
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: RamUsageCrawler

Dawid Weiss-2
> It's entirely possible it fails to dig into Maps correctly with newer Java
> releases; maybe Dawid or Uwe would know?

We have removed all reflection from that class a while ago exactly
because of encapsulation issues introduced in newer Java versions.

https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java

I think you may be thinking of RamUsageTester which is in the test
framework and indeed accumulates only keys and values from iterables.
These methods are for tests only and are rough. You shouldn't rely on
them for accurate memory consumption accounting (instead, use the
Accountable interface).

Dawid


D.

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: RamUsageCrawler

Michael Sokolov-4
That's what it looked like to me, too. I wonder if it would be worth
improving the estimate for some very common Collections classes? I see
this comment eg in BaseIndexFileFormatTestCase:

      // we have no way to estimate the size of these things in codecs although
      // something like a Collections.newSetFromMap(new HashMap<>()) uses quite
      // some memory... So for now the test ignores the overhead of such
      // collections but can we do better?

This is in testRamBytesUsed and there is a kind of fudge factor in
there for handling mismeasurement errors of the sort we are talking
about. Actually the test seems to be more about validating the
RamUsageTester than about validating the accounting in SegmentReader!
There are lots of other usages in tests, but I suppose they don't
require very precise handling of Collections classes (since they
pass)? Anyway it is certainly possible to improve the estimate quite a
bit and pretty easily for HashMap by simply counting the size of the
Node that is used for each entry, although given the dynamic nature of
these data structures (HashMap eg can use TreeNodes sometimes
depending on data distribution) it would be almost impossible to be
100% accurate.
On Thu, Dec 6, 2018 at 7:14 AM Dawid Weiss <[hidden email]> wrote:

>
> > It's entirely possible it fails to dig into Maps correctly with newer Java
> > releases; maybe Dawid or Uwe would know?
>
> We have removed all reflection from that class a while ago exactly
> because of encapsulation issues introduced in newer Java versions.
>
> https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
>
> I think you may be thinking of RamUsageTester which is in the test
> framework and indeed accumulates only keys and values from iterables.
> These methods are for tests only and are rough. You shouldn't rely on
> them for accurate memory consumption accounting (instead, use the
> Accountable interface).
>
> Dawid
>
>
> D.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: RamUsageCrawler

Dawid Weiss-2
I don't think it makes much sense, to be honest. Without actual
reflection you're binding things to a particular implementation
(you're assuming this and that JDK implementation). That's why we
decided to remove it instead of making it overly complex (and possibly
untrue). If a test is using it, perhaps the test itself should be
fixed.

D.
On Thu, Dec 6, 2018 at 4:35 PM Michael Sokolov <[hidden email]> wrote:

>
> That's what it looked like to me, too. I wonder if it would be worth
> improving the estimate for some very common Collections classes? I see
> this comment eg in BaseIndexFileFormatTestCase:
>
>       // we have no way to estimate the size of these things in codecs although
>       // something like a Collections.newSetFromMap(new HashMap<>()) uses quite
>       // some memory... So for now the test ignores the overhead of such
>       // collections but can we do better?
>
> This is in testRamBytesUsed and there is a kind of fudge factor in
> there for handling mismeasurement errors of the sort we are talking
> about. Actually the test seems to be more about validating the
> RamUsageTester than about validating the accounting in SegmentReader!
> There are lots of other usages in tests, but I suppose they don't
> require very precise handling of Collections classes (since they
> pass)? Anyway it is certainly possible to improve the estimate quite a
> bit and pretty easily for HashMap by simply counting the size of the
> Node that is used for each entry, although given the dynamic nature of
> these data structures (HashMap eg can use TreeNodes sometimes
> depending on data distribution) it would be almost impossible to be
> 100% accurate.
> On Thu, Dec 6, 2018 at 7:14 AM Dawid Weiss <[hidden email]> wrote:
> >
> > > It's entirely possible it fails to dig into Maps correctly with newer Java
> > > releases; maybe Dawid or Uwe would know?
> >
> > We have removed all reflection from that class a while ago exactly
> > because of encapsulation issues introduced in newer Java versions.
> >
> > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
> >
> > I think you may be thinking of RamUsageTester which is in the test
> > framework and indeed accumulates only keys and values from iterables.
> > These methods are for tests only and are rough. You shouldn't rely on
> > them for accurate memory consumption accounting (instead, use the
> > Accountable interface).
> >
> > Dawid
> >
> >
> > D.
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: RamUsageCrawler

Michael Sokolov-4
I agree, any attempt at improvement wouldn't be general. thanks for
the explanation.
On Thu, Dec 6, 2018 at 10:45 AM Dawid Weiss <[hidden email]> wrote:

>
> I don't think it makes much sense, to be honest. Without actual
> reflection you're binding things to a particular implementation
> (you're assuming this and that JDK implementation). That's why we
> decided to remove it instead of making it overly complex (and possibly
> untrue). If a test is using it, perhaps the test itself should be
> fixed.
>
> D.
> On Thu, Dec 6, 2018 at 4:35 PM Michael Sokolov <[hidden email]> wrote:
> >
> > That's what it looked like to me, too. I wonder if it would be worth
> > improving the estimate for some very common Collections classes? I see
> > this comment eg in BaseIndexFileFormatTestCase:
> >
> >       // we have no way to estimate the size of these things in codecs although
> >       // something like a Collections.newSetFromMap(new HashMap<>()) uses quite
> >       // some memory... So for now the test ignores the overhead of such
> >       // collections but can we do better?
> >
> > This is in testRamBytesUsed and there is a kind of fudge factor in
> > there for handling mismeasurement errors of the sort we are talking
> > about. Actually the test seems to be more about validating the
> > RamUsageTester than about validating the accounting in SegmentReader!
> > There are lots of other usages in tests, but I suppose they don't
> > require very precise handling of Collections classes (since they
> > pass)? Anyway it is certainly possible to improve the estimate quite a
> > bit and pretty easily for HashMap by simply counting the size of the
> > Node that is used for each entry, although given the dynamic nature of
> > these data structures (HashMap eg can use TreeNodes sometimes
> > depending on data distribution) it would be almost impossible to be
> > 100% accurate.
> > On Thu, Dec 6, 2018 at 7:14 AM Dawid Weiss <[hidden email]> wrote:
> > >
> > > > It's entirely possible it fails to dig into Maps correctly with newer Java
> > > > releases; maybe Dawid or Uwe would know?
> > >
> > > We have removed all reflection from that class a while ago exactly
> > > because of encapsulation issues introduced in newer Java versions.
> > >
> > > https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/util/RamUsageEstimator.java
> > >
> > > I think you may be thinking of RamUsageTester which is in the test
> > > framework and indeed accumulates only keys and values from iterables.
> > > These methods are for tests only and are rough. You shouldn't rely on
> > > them for accurate memory consumption accounting (instead, use the
> > > Accountable interface).
> > >
> > > Dawid
> > >
> > >
> > > D.
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: [hidden email]
> > > For additional commands, e-mail: [hidden email]
> > >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: [hidden email]
> > For additional commands, e-mail: [hidden email]
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]