Out of memory on sorting

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Out of memory on sorting

rohit-4
Hi,

 

We are moving to a multi-core Solr installation with each of the core having
millions of documents, also documents would be added to the index on an
hourly basis.  Everything seems to run find and I getting the expected
result and performance, except where sorting is concerned.

 

I have an index size of 13217121 documents, now when I want to get documents
between two dates and then sort them by ID  solr goes out of memory. This is
with just me using the system, we might also have simultaneous users, how
can I improve this performance?

 

Rohit

Reply | Threaded
Open this post in threaded view
|

Re: Out of memory on sorting

Rajinimaski
Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the "newSearcher" and "firstSearcher" event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
<lst> <str name="q">solr rocks</str><str name="start">0</str><str
name="rows">10</str><str name="sort">empID asc</str></lst>



On Thu, May 19, 2011 at 2:39 PM, Rohit <[hidden email]> wrote:

> Hi,
>
>
>
> We are moving to a multi-core Solr installation with each of the core
> having
> millions of documents, also documents would be added to the index on an
> hourly basis.  Everything seems to run find and I getting the expected
> result and performance, except where sorting is concerned.
>
>
>
> I have an index size of 13217121 documents, now when I want to get
> documents
> between two dates and then sort them by ID  solr goes out of memory. This
> is
> with just me using the system, we might also have simultaneous users, how
> can I improve this performance?
>
>
>
> Rohit
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Out of memory on sorting

rohit-4
Thanks for pointing me in the right direction, now I see the configuration
for firstsearcher or newsearcher, the <str name="q"> needs to configured
previously. In my case the q is every changing, users can actually search
for anything and the possibilities of queries unlimited.

How can I make this generic?

-Rohit



-----Original Message-----
From: rajini maski [mailto:[hidden email]]
Sent: 19 May 2011 14:53
To: [hidden email]
Subject: Re: Out of memory on sorting

Explicit Warming of Sort Fields

If you do a lot of field based sorting, it is advantageous to add explicitly
warming queries to the "newSearcher" and "firstSearcher" event listeners in
your solrconfig which sort on those fields, so the FieldCache is populated
prior to any queries being executed by your users.
firstSearcher
<lst> <str name="q">solr rocks</str><str name="start">0</str><str
name="rows">10</str><str name="sort">empID asc</str></lst>



On Thu, May 19, 2011 at 2:39 PM, Rohit <[hidden email]> wrote:

> Hi,
>
>
>
> We are moving to a multi-core Solr installation with each of the core
> having
> millions of documents, also documents would be added to the index on an
> hourly basis.  Everything seems to run find and I getting the expected
> result and performance, except where sorting is concerned.
>
>
>
> I have an index size of 13217121 documents, now when I want to get
> documents
> between two dates and then sort them by ID  solr goes out of memory. This
> is
> with just me using the system, we might also have simultaneous users, how
> can I improve this performance?
>
>
>
> Rohit
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Out of memory on sorting

Erick Erickson
The warming queries warm up the caches used in sorting. So
just including the &sort=..... will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1> how is your sort field defined? String? Integer? If it's a string
     and you could change it to a numeric type, you'd use a lot
     less memory.
2> How many distinct terms? I'm guessing one/document actually,
     this is somewhat of an anti-pattern in Solr for all it's sometimes
     necessary.
3> How much memory are you allocating for the JVM?
4> What other fields are you sorting on and how many unique values
     in each? Solr Admin can help you here....

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit <[hidden email]> wrote:

> Thanks for pointing me in the right direction, now I see the configuration
> for firstsearcher or newsearcher, the <str name="q"> needs to configured
> previously. In my case the q is every changing, users can actually search
> for anything and the possibilities of queries unlimited.
>
> How can I make this generic?
>
> -Rohit
>
>
>
> -----Original Message-----
> From: rajini maski [mailto:[hidden email]]
> Sent: 19 May 2011 14:53
> To: [hidden email]
> Subject: Re: Out of memory on sorting
>
> Explicit Warming of Sort Fields
>
> If you do a lot of field based sorting, it is advantageous to add explicitly
> warming queries to the "newSearcher" and "firstSearcher" event listeners in
> your solrconfig which sort on those fields, so the FieldCache is populated
> prior to any queries being executed by your users.
> firstSearcher
> <lst> <str name="q">solr rocks</str><str name="start">0</str><str
> name="rows">10</str><str name="sort">empID asc</str></lst>
>
>
>
> On Thu, May 19, 2011 at 2:39 PM, Rohit <[hidden email]> wrote:
>
>> Hi,
>>
>>
>>
>> We are moving to a multi-core Solr installation with each of the core
>> having
>> millions of documents, also documents would be added to the index on an
>> hourly basis.  Everything seems to run find and I getting the expected
>> result and performance, except where sorting is concerned.
>>
>>
>>
>> I have an index size of 13217121 documents, now when I want to get
>> documents
>> between two dates and then sort them by ID  solr goes out of memory. This
>> is
>> with just me using the system, we might also have simultaneous users, how
>> can I improve this performance?
>>
>>
>>
>> Rohit
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Out of memory on sorting

rohit-4
Hi Erick,

My OOM problem starts when I query the core with 13217121 documents. My
schema and other details are given below,

1> how is your sort field defined? String? Integer? If it's a string and you
could change it to a numeric type, you'd use a lot less memory.

We primarily use two different sort criteria one is a date field and the
other is string (id). I cannot change the "id" field as this is also the
uniquekey for my schema.

2> How many distinct terms? I'm guessing one/document actually,this is
somewhat of an anti-pattern in Solr for all it's sometimes necessary.

Since one of the field is a timestamp instance and the other a unique key
all are distinct. (These are tweets happening for keyword)

3> How much memory are you allocating for the JVM?

I am starting solr with the following command java -Xms1024M -Xmx-2048M
start.jar


All out test case for moving to solr has passed, this is proving to be a big
set back. Help would be greatly appreciated.

Regards,
Rohit



-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: 19 May 2011 18:21
To: [hidden email]
Subject: Re: Out of memory on sorting

The warming queries warm up the caches used in sorting. So
just including the &sort=..... will warm the sort caches. the terms
searched are not important. The same is true with facets...

However, I don't understand how that relates to your OOM problems. I'd
expect the OOM to start happening on startup, you'd be doing
the operation that runs you out of memory on startup...

So, we need more details:
1> how is your sort field defined? String? Integer? If it's a string
     and you could change it to a numeric type, you'd use a lot
     less memory.
2> How many distinct terms? I'm guessing one/document actually,
     this is somewhat of an anti-pattern in Solr for all it's sometimes
     necessary.
3> How much memory are you allocating for the JVM?
4> What other fields are you sorting on and how many unique values
     in each? Solr Admin can help you here....

Best
Erick


On Thu, May 19, 2011 at 6:20 AM, Rohit <[hidden email]> wrote:

> Thanks for pointing me in the right direction, now I see the configuration
> for firstsearcher or newsearcher, the <str name="q"> needs to configured
> previously. In my case the q is every changing, users can actually search
> for anything and the possibilities of queries unlimited.
>
> How can I make this generic?
>
> -Rohit
>
>
>
> -----Original Message-----
> From: rajini maski [mailto:[hidden email]]
> Sent: 19 May 2011 14:53
> To: [hidden email]
> Subject: Re: Out of memory on sorting
>
> Explicit Warming of Sort Fields
>
> If you do a lot of field based sorting, it is advantageous to add
explicitly
> warming queries to the "newSearcher" and "firstSearcher" event listeners
in

> your solrconfig which sort on those fields, so the FieldCache is populated
> prior to any queries being executed by your users.
> firstSearcher
> <lst> <str name="q">solr rocks</str><str name="start">0</str><str
> name="rows">10</str><str name="sort">empID asc</str></lst>
>
>
>
> On Thu, May 19, 2011 at 2:39 PM, Rohit <[hidden email]> wrote:
>
>> Hi,
>>
>>
>>
>> We are moving to a multi-core Solr installation with each of the core
>> having
>> millions of documents, also documents would be added to the index on an
>> hourly basis.  Everything seems to run find and I getting the expected
>> result and performance, except where sorting is concerned.
>>
>>
>>
>> I have an index size of 13217121 documents, now when I want to get
>> documents
>> between two dates and then sort them by ID  solr goes out of memory. This
>> is
>> with just me using the system, we might also have simultaneous users, how
>> can I improve this performance?
>>
>>
>>
>> Rohit
>>
>>
>
>

Reply | Threaded
Open this post in threaded view
|

Re: Out of memory on sorting

Erick Erickson
See below:

On Thu, May 19, 2011 at 9:06 AM, Rohit <[hidden email]> wrote:
> Hi Erick,
>
> My OOM problem starts when I query the core with 13217121 documents. My
> schema and other details are given below,

Hmmmm, how many cores are you running and what are they doing? Because they
all use the same memory pool, so you may be getting some carry-over. So one
strategy would be just to move this core to a dedicated machine.

>
> 1> how is your sort field defined? String? Integer? If it's a string and you
> could change it to a numeric type, you'd use a lot less memory.
>
> We primarily use two different sort criteria one is a date field and the
> other is string (id). I cannot change the "id" field as this is also the
> uniquekey for my schema.

OK, but can you use a separate field just for sorting? Populate it with
a <copyField> and sort on that rather than ID. This is only helpful if
you can make a compact representation, e.g. integer.

>
> 2> How many distinct terms? I'm guessing one/document actually,this is
> somewhat of an anti-pattern in Solr for all it's sometimes necessary.
>
> Since one of the field is a timestamp instance and the other a unique key
> all are distinct. (These are tweets happening for keyword)
>

Not one, but two fields where all values are distinct. Although  I don't think
the timestamp is much of a problem, assuming you're storing it as one
of the numeric types (I'd especially make sure it was one of the Trie types,
specifically "tdate" if you're going to do range queries). There are tricks for
dealing with this, but your "id" field will get you a bigger bang for the buck,
concentrate on that first.

> 3> How much memory are you allocating for the JVM?
>
> I am starting solr with the following command java -Xms1024M -Xmx-2048M
> start.jar
>

Well, you can bump this higher if you're on 64 bit OSs, The other possibility is
to shard your index. But really, with 13M documents this should fit on one
machine.

What does your statistics page tell you, especially about cache usage?



>
> All out test case for moving to solr has passed, this is proving to be a big
> set back. Help would be greatly appreciated.
>
> Regards,
> Rohit
>
>
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: 19 May 2011 18:21
> To: [hidden email]
> Subject: Re: Out of memory on sorting
>
> The warming queries warm up the caches used in sorting. So
> just including the &sort=..... will warm the sort caches. the terms
> searched are not important. The same is true with facets...
>
> However, I don't understand how that relates to your OOM problems. I'd
> expect the OOM to start happening on startup, you'd be doing
> the operation that runs you out of memory on startup...
>
> So, we need more details:
> 1> how is your sort field defined? String? Integer? If it's a string
>     and you could change it to a numeric type, you'd use a lot
>     less memory.
> 2> How many distinct terms? I'm guessing one/document actually,
>     this is somewhat of an anti-pattern in Solr for all it's sometimes
>     necessary.
> 3> How much memory are you allocating for the JVM?
> 4> What other fields are you sorting on and how many unique values
>     in each? Solr Admin can help you here....
>
> Best
> Erick
>
>
> On Thu, May 19, 2011 at 6:20 AM, Rohit <[hidden email]> wrote:
>> Thanks for pointing me in the right direction, now I see the configuration
>> for firstsearcher or newsearcher, the <str name="q"> needs to configured
>> previously. In my case the q is every changing, users can actually search
>> for anything and the possibilities of queries unlimited.
>>
>> How can I make this generic?
>>
>> -Rohit
>>
>>
>>
>> -----Original Message-----
>> From: rajini maski [mailto:[hidden email]]
>> Sent: 19 May 2011 14:53
>> To: [hidden email]
>> Subject: Re: Out of memory on sorting
>>
>> Explicit Warming of Sort Fields
>>
>> If you do a lot of field based sorting, it is advantageous to add
> explicitly
>> warming queries to the "newSearcher" and "firstSearcher" event listeners
> in
>> your solrconfig which sort on those fields, so the FieldCache is populated
>> prior to any queries being executed by your users.
>> firstSearcher
>> <lst> <str name="q">solr rocks</str><str name="start">0</str><str
>> name="rows">10</str><str name="sort">empID asc</str></lst>
>>
>>
>>
>> On Thu, May 19, 2011 at 2:39 PM, Rohit <[hidden email]> wrote:
>>
>>> Hi,
>>>
>>>
>>>
>>> We are moving to a multi-core Solr installation with each of the core
>>> having
>>> millions of documents, also documents would be added to the index on an
>>> hourly basis.  Everything seems to run find and I getting the expected
>>> result and performance, except where sorting is concerned.
>>>
>>>
>>>
>>> I have an index size of 13217121 documents, now when I want to get
>>> documents
>>> between two dates and then sort them by ID  solr goes out of memory. This
>>> is
>>> with just me using the system, we might also have simultaneous users, how
>>> can I improve this performance?
>>>
>>>
>>>
>>> Rohit
>>>
>>>
>>
>>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Out of memory on sorting

pravesh
In reply to this post by rohit-4
For saving Memory:

1. allocate as much memory to the JVM (especially if you are using 64bit OS)
2. You can set "omitNorms=true" for your date & id fields (actually for all fields where index-time boosting & length normalization isn't required. This will require a full reindex)
3. Are you sorting on all document available in index. Try to limit it using filter queries.
4. Avoid match all docs query like, q=*:*  (if you are using this)
5. If you could do away with sorting on ID field, and sort on field with lesser unique terms


Hope this helps