OutOfMemory error while sorting

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

OutOfMemory error while sorting

Marcus Stratmann
Hello,

I have a new problem with OutOfMemory errors.
As I reported before, we have an index with more than 10 million
documents and 23 fields. Recently I added a new field which we will only
use for sorting purposes (by "adding" I mean building a new index). But
it turned out that every query using this field for sorting ends in an
out of memory error. Even sorting result sets containing just one
document does not work. The field is of type solr.StrField and strange
enough there are some other fields in the index of the same type which
do not cause these problems (but not all of them; our uniqueKey-field
has the same problems with sorting).
Now I am wondering why sorting works with some of the fields but not
with others. Could it be that this depends on the content?

Thanks,
Marcus
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory error while sorting

Chris Hostetter-3

This is a fairly typical Lucene issue (ie: not specific to Solr)...

Sorting on a field requires building a FieldCache for every document --
regardless of how many documents match your query.  This cache is reused
for all searches thta sort on that field.

For things like Integers and Floats, the size of the FieldCache is one
item (int/float, etc) per document.  for Strings, the size is one int
per document, plus the total of every unique string field value.

This is why sorting on some String fields use more memory then other
String fields -- it all depends on hoe heterogenous the values in that
field are.  A field that only contains 4 unique values takes up a lot less
room then a field where every document has a different value.

In the end, there isn't much you can do about this except allocate more
memory to your JVM -- One option you do have in Solr is to tune other
parameters in Solr so that more of the memory you already have allocated
to the JVM is available for sorting.  (ie: making your filterCaches
smaller for example)

Off the top of my head, i don't remember if omiting norms for fields
reduces the amount of resident memory needed by the index, or just the on
disk size, but you might wnat to try that also if there are fields you
know you don't need norms for (a String field you sort on is a good bet,
since you probably don't search on it, and even if you do the length is
always going to be 1)


: I have a new problem with OutOfMemory errors.
: As I reported before, we have an index with more than 10 million
: documents and 23 fields. Recently I added a new field which we will only
: use for sorting purposes (by "adding" I mean building a new index). But
: it turned out that every query using this field for sorting ends in an
: out of memory error. Even sorting result sets containing just one
: document does not work. The field is of type solr.StrField and strange
: enough there are some other fields in the index of the same type which
: do not cause these problems (but not all of them; our uniqueKey-field
: has the same problems with sorting).
: Now I am wondering why sorting works with some of the fields but not
: with others. Could it be that this depends on the content?



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory error while sorting

Yonik Seeley
On 6/14/06, Chris Hostetter <[hidden email]> wrote:
> Off the top of my head, i don't remember if omiting norms for fields
> reduces the amount of resident memory needed by the index

It does indeed.  1 byte per document for the indexed field.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory error while sorting

Marcus Stratmann
In reply to this post by Chris Hostetter-3
Hi,

Chris Hostetter wrote:
> This is a fairly typical Lucene issue (ie: not specific to Solr)...
Ah, I see. I should really put more attention on Lucene. But when
working with Solr I sometimes forget about the underlying technology.

> Sorting on a field requires building a FieldCache for every document --
> regardless of how many documents match your query.  This cache is reused
> for all searches thta sort on that field.
This makes things clear to me now. I always observed that Solr is slow
after a commit or optimze. When I put a newly created or updated index
into service the server always seemed to hang up. The CPU usage went to
nearly 100 percent and no queries were answered. I found out that
"warming" the server with serial queries, not parallel ones, bypassed
this problem (not to be confused with warming the caches!). So after a
commit I sent some hundred queries from our log to the server and this
worked fine. But now I know I only need a few specific queries to do the
job.

Thanks Chris for the great support! The Solr team is doing a very good
job. With your help I finally got Solr running. Our system is live now
and I will now switch over to the "Who uses Solr" thread to give you
some feedback.

Again, thank you very much!

Marcus
Reply | Threaded
Open this post in threaded view
|

Re: OutOfMemory error while sorting

Chris Hostetter-3

: nearly 100 percent and no queries were answered. I found out that
: "warming" the server with serial queries, not parallel ones, bypassed
: this problem (not to be confused with warming the caches!). So after a

Note that you can have Solr do this automatically for you in both
firstSearcher and newSearcher listeners (so you never risk having one of
your users hit the searcher before your warming queries).  Take a look at
the commented out usage of QuerySenderListener in the example
solrconfig.xml...

    <listener event="newSearcher" class="solr.QuerySenderListener">
      <arr name="queries">
        <lst> <str name="q">solr</str> <str name="start">0</str> <str
name="rows">10</str> </lst>
        <lst> <str name="q">rocks</str> <str name="start">0</str> <str
name="rows">10</str> </lst>
      </arr>
    </listener>


-Hoss