Understanding fieldCache SUBREADER "insanity"

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Understanding fieldCache SUBREADER "insanity"

Aaron Daubman
Hi all,

In reviewing a solr instance with somewhat variable performance, I
noticed that its fieldCache stats show an insanity_count of 1 with the
insanity type SUBREADER:

---snip---
insanity_count : 1
insanity#0 : SUBREADER: Found caches for descendants of
ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
'ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
'ReadOnlyDirectoryReader(segments_k
_6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
---snip---

How can I decipher what this means and what, if anything, I should do
to fix/improve the "insanity"?

Thanks,
     Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Tomás Fernández Löbbe
Hi Aaron, here there is some information about the "insanity count":
http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache

As for the SUBREADER type, the javadocs say:
"Indicates an overlap in cache usage on a given field in sub/super readers."

This probably means that you are using the same field for faceting and for
sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
cache and faceting uses by default the global field cache. This can be a
problem because the field is duplicated in cache, and then it uses twice
the memory.

One way to solve this would be to change the faceting method on that field
to 'fcs', which uses segment level cache (but may be a little bit slower).

Tomás


On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman <[hidden email]> wrote:

> Hi all,
>
> In reviewing a solr instance with somewhat variable performance, I
> noticed that its fieldCache stats show an insanity_count of 1 with the
> insanity type SUBREADER:
>
> ---snip---
> insanity_count : 1
> insanity#0 : SUBREADER: Found caches for descendants of
> ReadOnlyDirectoryReader(segments_k
> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
> 'ReadOnlyDirectoryReader(segments_k
>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
> 'ReadOnlyDirectoryReader(segments_k
>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
>
> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
> ---snip---
>
> How can I decipher what this means and what, if anything, I should do
> to fix/improve the "insanity"?
>
> Thanks,
>      Aaron
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Aaron Daubman
Hi Tomás,

> This probably means that you are using the same field for faceting and for
> sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> cache and faceting uses by default the global field cache. This can be a
> problem because the field is duplicated in cache, and then it uses twice
> the memory.
>
> One way to solve this would be to change the faceting method on that field
> to 'fcs', which uses segment level cache (but may be a little bit slower).

Thanks for explaining what the sparse wiki and javadoc mean - I had
read them but had no idea what the implications were ;-)

We are not doing any explicit faceting, and this index is also
supposed to be a read-only, already-optimized, single-segment index -
both of these seem to indicate to (very unknowledgeable about this) me
that this could be more of a problem - e.g. what am I doing to cause
this since I don't think I need to be using segment-level anything
(should be a single segment if I understand optimization and RO
indicies) and I am not leveraging faceting?

Any pointers on where else to look for what might be causing this (one
issue I am currently troubleshooting is too-many-pauses caused by
too-frequent GC, so preventing this double-allocation could help)?

Thanks again,
     Aaron
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Yonik Seeley-4
In reply to this post by Tomás Fernández Löbbe
The other thing to realize is that it's only "insanity" if it's
unexpected or not-by-design (so the term is rather mis-named).
It's more for core developers - if you are just using Solr without
custom plugins, don't worry about it.

-Yonik
http://lucidworks.com


On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
<[hidden email]> wrote:

> Hi Aaron, here there is some information about the "insanity count":
> http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache
>
> As for the SUBREADER type, the javadocs say:
> "Indicates an overlap in cache usage on a given field in sub/super readers."
>
> This probably means that you are using the same field for faceting and for
> sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> cache and faceting uses by default the global field cache. This can be a
> problem because the field is duplicated in cache, and then it uses twice
> the memory.
>
> One way to solve this would be to change the faceting method on that field
> to 'fcs', which uses segment level cache (but may be a little bit slower).
>
> Tomás
>
>
> On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman <[hidden email]> wrote:
>
>> Hi all,
>>
>> In reviewing a solr instance with somewhat variable performance, I
>> noticed that its fieldCache stats show an insanity_count of 1 with the
>> insanity type SUBREADER:
>>
>> ---snip---
>> insanity_count : 1
>> insanity#0 : SUBREADER: Found caches for descendants of
>> ReadOnlyDirectoryReader(segments_k
>> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
>> 'ReadOnlyDirectoryReader(segments_k
>>
>> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
>> 'ReadOnlyDirectoryReader(segments_k
>>
>> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
>>
>> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
>> ---snip---
>>
>> How can I decipher what this means and what, if anything, I should do
>> to fix/improve the "insanity"?
>>
>> Thanks,
>>      Aaron
>>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Tomás Fernández Löbbe
Some function queries also use the field cache. I *think* those usually use
the segment level cache, but I'm not sure.

On Wed, Sep 19, 2012 at 4:36 PM, Yonik Seeley <[hidden email]> wrote:

> The other thing to realize is that it's only "insanity" if it's
> unexpected or not-by-design (so the term is rather mis-named).
> It's more for core developers - if you are just using Solr without
> custom plugins, don't worry about it.
>
> -Yonik
> http://lucidworks.com
>
>
> On Wed, Sep 19, 2012 at 3:27 PM, Tomás Fernández Löbbe
> <[hidden email]> wrote:
> > Hi Aaron, here there is some information about the "insanity count":
> > http://wiki.apache.org/solr/SolrCaching#The_Lucene_FieldCache
> >
> > As for the SUBREADER type, the javadocs say:
> > "Indicates an overlap in cache usage on a given field in sub/super
> readers."
> >
> > This probably means that you are using the same field for faceting and
> for
> > sorting (tf_normalizedTotalHotttnesss), sorting uses the segment level
> > cache and faceting uses by default the global field cache. This can be a
> > problem because the field is duplicated in cache, and then it uses twice
> > the memory.
> >
> > One way to solve this would be to change the faceting method on that
> field
> > to 'fcs', which uses segment level cache (but may be a little bit
> slower).
> >
> > Tomás
> >
> >
> > On Wed, Sep 19, 2012 at 3:16 PM, Aaron Daubman <[hidden email]>
> wrote:
> >
> >> Hi all,
> >>
> >> In reviewing a solr instance with somewhat variable performance, I
> >> noticed that its fieldCache stats show an insanity_count of 1 with the
> >> insanity type SUBREADER:
> >>
> >> ---snip---
> >> insanity_count : 1
> >> insanity#0 : SUBREADER: Found caches for descendants of
> >> ReadOnlyDirectoryReader(segments_k
> >> _6h9(3.3):C17198463)+tf_normalizedTotalHotttnesss
> >> 'ReadOnlyDirectoryReader(segments_k
> >>
> >>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1965982057
> >> 'ReadOnlyDirectoryReader(segments_k
> >>
> >>
> _6h9(3.3):C17198463)'=>'tf_normalizedTotalHotttnesss',float,null=>[F#1965982057
> >>
> >>
> 'MMapIndexInput(path="/io01/p/solr/playlist/a/playlist/index/_6h9.frq")'=>'tf_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#1308116426
> >> ---snip---
> >>
> >> How can I decipher what this means and what, if anything, I should do
> >> to fix/improve the "insanity"?
> >>
> >> Thanks,
> >>      Aaron
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Yonik Seeley-4
In reply to this post by Aaron Daubman
> already-optimized, single-segment index

That part is interesting... if true, then the type of "insanity" you
saw should be impossible, and either the insanity detection or
something else is broken.

-Yonik
http://lucidworks.com
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Aaron Daubman
Yonik, et al.

I believe I found the section of code pushing me into 'insanity' status:
---snip---
        int[] collapseIDs = null;
        float[] hotnessValues = null;
        String[] artistIDs = null;
        try {
            collapseIDs =
FieldCache.DEFAULT.getInts(searcher.getIndexReader(),
COLLAPSE_KEY_NAME);
            hotnessValues =
FieldCache.DEFAULT.getFloats(searcher.getIndexReader(),
HOTNESS_KEY_NAME);
            artistIDs =
FieldCache.DEFAULT.getStrings(searcher.getIndexReader(),
ARTIST_KEY_NAME);
        } ...
---snip---

Since it seems like this code is using the 'old-style' pre-Lucene 2.9
top-level indexReaders, is there any example code you can point me to
that could show how to convert to using the leaf level segmentReaders?
If the limited information I've been able to find is correct, this
could explain some of the significant memory usage I am seeing...

Thanks again,
     Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley <[hidden email]> wrote:
>> already-optimized, single-segment index
>
> That part is interesting... if true, then the type of "insanity" you
> saw should be impossible, and either the insanity detection or
> something else is broken.
>
> -Yonik
> http://lucidworks.com
Reply | Threaded
Open this post in threaded view
|

Re: Understanding fieldCache SUBREADER "insanity"

Aaron Daubman
In reply to this post by Yonik Seeley-4
Hi Yonik,

I've been attempting to fix the SUBREADER insanity in our custom
component, and have made perhaps some progress (or is this worse?) -
I've gone from SUBREADER to VALUEMISMATCH insanity:
---snip---
entries_count : 12
entry#0 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=>org.apache.lucene.util.FixedBitSet#1387502754
entry#1 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_track_count',class
org.apache.lucene.search.FieldCacheImpl$DocsWithFieldCache,null=>org.apache.lucene.util.Bits$MatchAllBits#233863705
entry#2 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#652215925
entry#3 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
java.lang.String,null=>[Ljava.lang.String;#1036517187
entry#4 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'thingID',class
java.lang.String,null=>[Ljava.lang.String;#357017445
entry#5 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_FLOAT_PARSER=>[F#322888397
entry#6 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',float,org.apache.lucene.search.FieldCache.DEFAULT_FLOAT_PARSER=>org.apache.lucene.search.FieldCache$CreationPlaceholder#1229311421
entry#7 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'f_normalizedTotalHotttnesss',float,null=>[F#322888397
entry#8 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_collapse',int,org.apache.lucene.search.FieldCache.DEFAULT_INT_PARSER=>org.apache.lucene.search.FieldCache$CreationPlaceholder#92920526
entry#9 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_collapse',int,null=>[I#494669113
entry#10 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_collapse',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>[I#494669113
entry#11 : 'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'i_track_count',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=>[I#994584654
insanity_count : 1
insanity#0 : VALUEMISMATCH: Multiple distinct value objects for
MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")+s_artistID
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
org.apache.lucene.search.FieldCache$StringIndex,null=>org.apache.lucene.search.FieldCache$StringIndex#652215925
'MMapIndexInput(path="/io01/p/solr/playlist/c/playlist/index/_c2.frq")'=>'s_artistID',class
java.lang.String,null=>[Ljava.lang.String;#1036517187
---snip---

Any suggestions on what the cause of this VALUEMISMATCH is, if it is
the "normal" case, or suggestions on how to "fix" it.

For anybody else with SUBREADER insanity issues, this is the change I
made to get this far (get the first leafReader, since we are using a
merged/optimized index):
---snip---
            SolrIndexReader reader = searcher.getReader().getLeafReaders()[0];
            collapseIDs = FieldCache.DEFAULT.getInts(reader, COLLAPSE_KEY_NAME);
            hotnessValues = FieldCache.DEFAULT.getFloats(reader,
HOTNESS_KEY_NAME);
            artistIDs = FieldCache.DEFAULT.getStrings(reader, ARTIST_KEY_NAME);
---snip---

Thanks,
     Aaron

On Wed, Sep 19, 2012 at 4:54 PM, Yonik Seeley <[hidden email]> wrote:
>> already-optimized, single-segment index
>
> That part is interesting... if true, then the type of "insanity" you
> saw should be impossible, and either the insanity detection or
> something else is broken.
>
> -Yonik
> http://lucidworks.com