Commit after how many updates?

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Commit after how many updates?

Max Hütter
Hi,

I have a question regarding Solr's behaviour, in the standard
installation. When use the start.jar with a rather complex schema and I
do about 1000 updates and then try to commit, I get this:

<result status="1">java.lang.OutOfMemoryError: Java heap space
</result>

I know I can fix it by giving the VM a larger heap size, but still I
wonder what a good number of updates would be?

What are your experiences?

--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  [hidden email]
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Mike Klaas
On 3/12/07, Maximilian Hütter <[hidden email]> wrote:

> Hi,
>
> I have a question regarding Solr's behaviour, in the standard
> installation. When use the start.jar with a rather complex schema and I
> do about 1000 updates and then try to commit, I get this:
>
> <result status="1">java.lang.OutOfMemoryError: Java heap space
> </result>
>
> I know I can fix it by giving the VM a larger heap size, but still I
> wonder what a good number of updates would be?
>
> What are your experiences?

That seems awfully few docs to cause OOM--I'm using autocommit @
100,000 docs without issues (then again, I give my instances a least a
gig of heap).

What is your current heap size?

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Max Hütter
Mike Klaas schrieb:

> On 3/12/07, Maximilian Hütter <[hidden email]> wrote:
>> Hi,
>>
>> I have a question regarding Solr's behaviour, in the standard
>> installation. When use the start.jar with a rather complex schema and I
>> do about 1000 updates and then try to commit, I get this:
>>
>> <result status="1">java.lang.OutOfMemoryError: Java heap space
>> </result>
>>
>> I know I can fix it by giving the VM a larger heap size, but still I
>> wonder what a good number of updates would be?
>>
>> What are your experiences?
>
> That seems awfully few docs to cause OOM--I'm using autocommit @
> 100,000 docs without issues (then again, I give my instances a least a
> gig of heap).
>
> What is your current heap size?
>
> -Mike
>
It is the default heap size for the Sun JVM, so I guess 64MB max. The
documents are rather large, but if you manage to index 100,000 docs,
there seems to be some problem with Solr.

What would be the recommended heap size for Solr?

--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  [hidden email]
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Mike Klaas
On 3/14/07, Maximilian Hütter <[hidden email]> wrote:

> It is the default heap size for the Sun JVM, so I guess 64MB max. The
> documents are rather large, but if you manage to index 100,000 docs,
> there seems to be some problem with Solr.

The documents are not held in memory until a commit occurs (just some
tracking info), so I'm not sure that that is the appropriate
conclusion.  Lucene keeps a few documents in memory
(maxBufferedDocs--you could lower this setting), and if your documents
are large, this could use a higher maximum amount of memory than in my
case.  Solr does keep all uniqueIds in memory until commit.

> What would be the recommended heap size for Solr?

That is difficult to answer, since it depends on so many factors.  64
megs seems on the rather low end.  Remember that you aren't just
trying to avoid OOM errors--more memory means bigger caches and
increased query performance.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Chris Hostetter-3
In reply to this post by Max Hütter

: It is the default heap size for the Sun JVM, so I guess 64MB max. The
: documents are rather large, but if you manage to index 100,000 docs,
: there seems to be some problem with Solr.

i think you mean "there DOES NOT seems to be some problem with Solr."
right ... why would Mike being able to commit only every 100,000 indicate
a problem with Solr?

: What would be the recommended heap size for Solr?

there isn't one ... it's entirely dependent on how big your documents are,
how many fields in yoru schema have norms enabled, what types of queries
your process, how big you configurae teh various solr caches, etc....

-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Max Hütter
Chris Hostetter schrieb:
> : It is the default heap size for the Sun JVM, so I guess 64MB max. The
> : documents are rather large, but if you manage to index 100,000 docs,
> : there seems to be some problem with Solr.
>
> i think you mean "there DOES NOT seems to be some problem with Solr."
> right ... why would Mike being able to commit only every 100,000 indicate
> a problem with Solr?

Your right, what I meant was: there is a problem with my Solr setup.

> : What would be the recommended heap size for Solr?
>
> there isn't one ... it's entirely dependent on how big your documents are,
> how many fields in yoru schema have norms enabled, what types of queries
> your process, how big you configurae teh various solr caches, etc....
>
> -Hoss
>

I thought so, but hoped there would be some experiences with heap space
settings for Solr. But I guess I have to try for myself.


--
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel            :  (+49) 0711 - 45 10 17 578
Fax            :  (+49) 0711 - 45 10 17 573
e-mail         :  [hidden email]
Sitz           :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Chris Hostetter-3

: I thought so, but hoped there would be some experiences with heap space
: settings for Solr. But I guess I have to try for myself.

there's lots of experience, but it's hard to translate to generic rules
... there's so many variables involved that it's hard to even recognize
what the equation is.

My advice: throw as much ram as you've got at it, slam it with realistic
load, watch your GC logs/graphs and dial it back as much as you can
without hurting things..



-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Mike Klaas
On 3/16/07, Chris Hostetter <[hidden email]> wrote:

>
> : I thought so, but hoped there would be some experiences with heap space
> : settings for Solr. But I guess I have to try for myself.
>
> there's lots of experience, but it's hard to translate to generic rules
> ... there's so many variables involved that it's hard to even recognize
> what the equation is.
>
> My advice: throw as much ram as you've got at it, slam it with realistic
> load, watch your GC logs/graphs and dial it back as much as you can
> without hurting things..

I'd temper this by suggesting that you always leave a healthy amount
for the OS disk cache as well--you definitely don't want Solr
occupying _all_ the memory on a machine.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Commit after how many updates?

Otis Gospodnetic-2
In reply to this post by Max Hütter
+1 to what Mike said.  I am running some Lucene benchmarks as we type and this is exactly what I just saw.
On a beefy box with 32GB RAM I'm searching 63GB worth of Lucene indices.  I gave the JVM 20GB (-Xmx20g) at first and saw a bit of disk IO.  Then I lowered that max heap to 10GB and the disk IO disappeared!  I increased the number of search threads from 16 to whatever 64x8 is, and still no IO.  There is no Solr in this benchmark I'm doing, but the same ideas apply.

iostat and vmstat are your friends.

Otis
 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Simpy -- http://www.simpy.com/  -  Tag  -  Search  -  Share

----- Original Message ----
From: Mike Klaas <[hidden email]>
To: [hidden email]
Sent: Friday, March 16, 2007 4:23:35 PM
Subject: Re: Commit after how many updates?

On 3/16/07, Chris Hostetter <[hidden email]> wrote:

>
> : I thought so, but hoped there would be some experiences with heap space
> : settings for Solr. But I guess I have to try for myself.
>
> there's lots of experience, but it's hard to translate to generic rules
> ... there's so many variables involved that it's hard to even recognize
> what the equation is.
>
> My advice: throw as much ram as you've got at it, slam it with realistic
> load, watch your GC logs/graphs and dial it back as much as you can
> without hurting things..

I'd temper this by suggesting that you always leave a healthy amount
for the OS disk cache as well--you definitely don't want Solr
occupying _all_ the memory on a machine.

-Mike