solr 3.5 taking long to index

classic Classic list List threaded Threaded
11 messages Options
Reply | Threaded
Open this post in threaded view
|

solr 3.5 taking long to index

rohit-4
We recently migrated from solr3.1 to solr3.5,  we have one master and one
slave configured. The master has two cores,

 

1) Core1 - 44555972 documents

2) Core2 - 29419244 documents

 

We commit every 5000 documents, but lately the commit is taking very long 15
minutes plus in some cases. What could have caused this, I have checked the
logs and the only warning i can see is,

 

"WARNING: Use of deprecated update request parameter update.processor
detected. Please use the new parameter update.chain instead, as support for
update.processor will be removed in a later version."

 

Memory details:

 

export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"

 

Solr Config:

 

<useCompoundFile>false</useCompoundFile>

<mergeFactor>10</mergeFactor>

<ramBufferSizeMB>32</ramBufferSizeMB>

<!-- <maxBufferedDocs>1000</maxBufferedDocs> -->

  <maxFieldLength>10000</maxFieldLength>

  <writeLockTimeout>1000</writeLockTimeout>

  <commitLockTimeout>10000</commitLockTimeout>

 

What could be causing this, as everything was running fine a few days back?

 

 

Regards,

Rohit

Mobile: +91-9901768202

About Me:  <http://about.me/rohitg> http://about.me/rohitg

 

Reply | Threaded
Open this post in threaded view
|

Re: solr 3.5 taking long to index

Lance Norskog-2
It's telling you the problem. Try your  solrconfig.xml against the one
in 3.5/solr/example/solr/conf. You will what has changed in the
suggested tools.


On Wed, Apr 11, 2012 at 10:42 AM, Rohit <[hidden email]> wrote:

> We recently migrated from solr3.1 to solr3.5,  we have one master and one
> slave configured. The master has two cores,
>
>
>
> 1) Core1 - 44555972 documents
>
> 2) Core2 - 29419244 documents
>
>
>
> We commit every 5000 documents, but lately the commit is taking very long 15
> minutes plus in some cases. What could have caused this, I have checked the
> logs and the only warning i can see is,
>
>
>
> "WARNING: Use of deprecated update request parameter update.processor
> detected. Please use the new parameter update.chain instead, as support for
> update.processor will be removed in a later version."
>
>
>
> Memory details:
>
>
>
> export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"
>
>
>
> Solr Config:
>
>
>
> <useCompoundFile>false</useCompoundFile>
>
> <mergeFactor>10</mergeFactor>
>
> <ramBufferSizeMB>32</ramBufferSizeMB>
>
> <!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
>
>  <maxFieldLength>10000</maxFieldLength>
>
>  <writeLockTimeout>1000</writeLockTimeout>
>
>  <commitLockTimeout>10000</commitLockTimeout>
>
>
>
> What could be causing this, as everything was running fine a few days back?
>
>
>
>
>
> Regards,
>
> Rohit
>
> Mobile: +91-9901768202
>
> About Me:  <http://about.me/rohitg> http://about.me/rohitg
>
>
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: solr 3.5 taking long to index

Bernd Fehling
In reply to this post by rohit-4

There were some changes in solrconfig.xml between solr3.1 and solr3.5.
Always read CHANGES.txt when switching to a new version.
Also helpful is comparing both versions of solrconfig.xml from the examples.

Are you sure you need a MaxPermSize of 5g?
Use jvisualvm to see what you really need.
This is also for all other JAVA_OPTS.



Am 11.04.2012 19:42, schrieb Rohit:

> We recently migrated from solr3.1 to solr3.5,  we have one master and one
> slave configured. The master has two cores,
>
>  
>
> 1) Core1 - 44555972 documents
>
> 2) Core2 - 29419244 documents
>
>  
>
> We commit every 5000 documents, but lately the commit is taking very long 15
> minutes plus in some cases. What could have caused this, I have checked the
> logs and the only warning i can see is,
>
>  
>
> "WARNING: Use of deprecated update request parameter update.processor
> detected. Please use the new parameter update.chain instead, as support for
> update.processor will be removed in a later version."
>
>  
>
> Memory details:
>
>  
>
> export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"
>
>  
>
> Solr Config:
>
>  
>
> <useCompoundFile>false</useCompoundFile>
>
> <mergeFactor>10</mergeFactor>
>
> <ramBufferSizeMB>32</ramBufferSizeMB>
>
> <!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
>
>   <maxFieldLength>10000</maxFieldLength>
>
>   <writeLockTimeout>1000</writeLockTimeout>
>
>   <commitLockTimeout>10000</commitLockTimeout>
>
>  
>
> What could be causing this, as everything was running fine a few days back?
>
>  
>
>  
>
> Regards,
>
> Rohit
>
> Mobile: +91-9901768202
>
> About Me:  <http://about.me/rohitg> http://about.me/rohitg
>
>  
>
>
Reply | Threaded
Open this post in threaded view
|

RE: solr 3.5 taking long to index

rohit-4
Thanks for pointing these out, but I still have one concern, why is the
Virtual Memory running in 300g+?

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-----Original Message-----
From: Bernd Fehling [mailto:[hidden email]]
Sent: 12 April 2012 11:58
To: [hidden email]
Subject: Re: solr 3.5 taking long to index


There were some changes in solrconfig.xml between solr3.1 and solr3.5.
Always read CHANGES.txt when switching to a new version.
Also helpful is comparing both versions of solrconfig.xml from the examples.

Are you sure you need a MaxPermSize of 5g?
Use jvisualvm to see what you really need.
This is also for all other JAVA_OPTS.



Am 11.04.2012 19:42, schrieb Rohit:

> We recently migrated from solr3.1 to solr3.5,  we have one master and
> one slave configured. The master has two cores,
>
>  
>
> 1) Core1 - 44555972 documents
>
> 2) Core2 - 29419244 documents
>
>  
>
> We commit every 5000 documents, but lately the commit is taking very
> long 15 minutes plus in some cases. What could have caused this, I
> have checked the logs and the only warning i can see is,
>
>  
>
> "WARNING: Use of deprecated update request parameter update.processor
> detected. Please use the new parameter update.chain instead, as
> support for update.processor will be removed in a later version."
>
>  
>
> Memory details:
>
>  
>
> export JAVA_OPTS="$JAVA_OPTS -Xms6g -Xmx36g -XX:MaxPermSize=5g"
>
>  
>
> Solr Config:
>
>  
>
> <useCompoundFile>false</useCompoundFile>
>
> <mergeFactor>10</mergeFactor>
>
> <ramBufferSizeMB>32</ramBufferSizeMB>
>
> <!-- <maxBufferedDocs>1000</maxBufferedDocs> -->
>
>   <maxFieldLength>10000</maxFieldLength>
>
>   <writeLockTimeout>1000</writeLockTimeout>
>
>   <commitLockTimeout>10000</commitLockTimeout>
>
>  
>
> What could be causing this, as everything was running fine a few days
back?

>
>  
>
>  
>
> Regards,
>
> Rohit
>
> Mobile: +91-9901768202
>
> About Me:  <http://about.me/rohitg> http://about.me/rohitg
>
>  
>
>


Reply | Threaded
Open this post in threaded view
|

Re: solr 3.5 taking long to index

Shawn Heisey-4
On 4/12/2012 12:42 PM, Rohit wrote:
> Thanks for pointing these out, but I still have one concern, why is the
> Virtual Memory running in 300g+?

Solr 3.5 uses MMapDirectoryFactory by default to read the index.  This
does an mmap on the files that make up your index, so their entire
contents are simply accessible to the application as virtual memory
(over 300GB in your case), the OS automatically takes care of swapping
disk pages in and out of real RAM as required.  This approach has less
overhead and tends to make better use of the OS disk cache than other
methods.  It does lead to confused questions and scary numbers in memory
usage reporting, though.

You have mentioned that you are giving 36GB of RAM to Solr.  How much
total RAM does the machine have?

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

RE: solr 3.5 taking long to index

rohit-4
The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue.

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg

-----Original Message-----
From: Shawn Heisey [mailto:[hidden email]]
Sent: 13 April 2012 05:15
To: [hidden email]
Subject: Re: solr 3.5 taking long to index

On 4/12/2012 12:42 PM, Rohit wrote:
> Thanks for pointing these out, but I still have one concern, why is
> the Virtual Memory running in 300g+?

Solr 3.5 uses MMapDirectoryFactory by default to read the index.  This does an mmap on the files that make up your index, so their entire contents are simply accessible to the application as virtual memory (over 300GB in your case), the OS automatically takes care of swapping disk pages in and out of real RAM as required.  This approach has less overhead and tends to make better use of the OS disk cache than other methods.  It does lead to confused questions and scary numbers in memory usage reporting, though.

You have mentioned that you are giving 36GB of RAM to Solr.  How much total RAM does the machine have?

Thanks,
Shawn


Reply | Threaded
Open this post in threaded view
|

Re: solr 3.5 taking long to index

Shawn Heisey-4
On 4/12/2012 8:42 PM, Rohit wrote:
> The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue.

For good performance, Solr relies on the OS having enough free RAM to
keep critical portions of the index in the disk cache.  Some numbers
that I have collected from your information so far are listed below.  
Please let me know if I've got any of this wrong:

46GB total RAM
36GB RAM allocated to Solr
300GB total index size

This leaves only 10GB of RAM free to cache 300GB of index, assuming that
this server is dedicated to Solr.  The critical portions of your index
are very likely considerably larger than 10GB, which causes constant
reading from the disk for queries and updates.  With a high commit rate
and a relatively low mergeFactor of 10, your index will be doing a lot
of merging during updates, and some of those merges are likely to be
quite large, further complicating the I/O situation.

Another thing that can lead to increasing index update times is cache
warming, also greatly affected by high I/O levels.  If you visit the
/solr/corename/admin/stats.jsp#cache URL, you can see the warmupTime for
each cache in milliseconds.

Adding more memory to the server would probably help things.  You'll
want to carefully check all the server and Solr statistics you can to
make sure that memory is the root of problem, before you actually spend
the money.  At the server level, look for things like a high iowait CPU
percentage.  For Solr, you can turn the logging level up to INFO in the
admin interface as well as turn on the infostream in solrconfig.xml for
extensive debugging.

I hope this is helpful.  If not, I can try to come up with more specific
things you can look at.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

RE: solr 3.5 taking long to index

rohit-4
Hi Shawn,

Thanks for the information, let me give this a try, since this is a live box I will try it during the weekend and update you.

Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-----Original Message-----
From: Shawn Heisey [mailto:[hidden email]]
Sent: 13 April 2012 11:01
To: [hidden email]
Subject: Re: solr 3.5 taking long to index

On 4/12/2012 8:42 PM, Rohit wrote:
> The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue.

For good performance, Solr relies on the OS having enough free RAM to keep critical portions of the index in the disk cache.  Some numbers that I have collected from your information so far are listed below.  
Please let me know if I've got any of this wrong:

46GB total RAM
36GB RAM allocated to Solr
300GB total index size

This leaves only 10GB of RAM free to cache 300GB of index, assuming that this server is dedicated to Solr.  The critical portions of your index are very likely considerably larger than 10GB, which causes constant reading from the disk for queries and updates.  With a high commit rate and a relatively low mergeFactor of 10, your index will be doing a lot of merging during updates, and some of those merges are likely to be quite large, further complicating the I/O situation.

Another thing that can lead to increasing index update times is cache warming, also greatly affected by high I/O levels.  If you visit the /solr/corename/admin/stats.jsp#cache URL, you can see the warmupTime for each cache in milliseconds.

Adding more memory to the server would probably help things.  You'll want to carefully check all the server and Solr statistics you can to make sure that memory is the root of problem, before you actually spend the money.  At the server level, look for things like a high iowait CPU percentage.  For Solr, you can turn the logging level up to INFO in the admin interface as well as turn on the infostream in solrconfig.xml for extensive debugging.

I hope this is helpful.  If not, I can try to come up with more specific things you can look at.

Thanks,
Shawn


Reply | Threaded
Open this post in threaded view
|

RE: solr 3.5 taking long to index

rohit-4
In reply to this post by Shawn Heisey-4
Hey Shawn,

Solr is working better, though not out of the woods, freed up some memory is the system and also increased the mergeFactor to 20.

Has another question, we had autocommit ON all this while in our solrconfig.xml, but since the upgrade we have been noticing keeping autocommit on is increasing the commit time, though I cannot find a reason are they related in anyway?


Regards,
Rohit
Mobile: +91-9901768202
About Me: http://about.me/rohitg


-----Original Message-----
From: Shawn Heisey [mailto:[hidden email]]
Sent: 13 April 2012 11:01
To: [hidden email]
Subject: Re: solr 3.5 taking long to index

On 4/12/2012 8:42 PM, Rohit wrote:
> The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue.

For good performance, Solr relies on the OS having enough free RAM to keep critical portions of the index in the disk cache.  Some numbers that I have collected from your information so far are listed below.  
Please let me know if I've got any of this wrong:

46GB total RAM
36GB RAM allocated to Solr
300GB total index size

This leaves only 10GB of RAM free to cache 300GB of index, assuming that this server is dedicated to Solr.  The critical portions of your index are very likely considerably larger than 10GB, which causes constant reading from the disk for queries and updates.  With a high commit rate and a relatively low mergeFactor of 10, your index will be doing a lot of merging during updates, and some of those merges are likely to be quite large, further complicating the I/O situation.

Another thing that can lead to increasing index update times is cache warming, also greatly affected by high I/O levels.  If you visit the /solr/corename/admin/stats.jsp#cache URL, you can see the warmupTime for each cache in milliseconds.

Adding more memory to the server would probably help things.  You'll want to carefully check all the server and Solr statistics you can to make sure that memory is the root of problem, before you actually spend the money.  At the server level, look for things like a high iowait CPU percentage.  For Solr, you can turn the logging level up to INFO in the admin interface as well as turn on the infostream in solrconfig.xml for extensive debugging.

I hope this is helpful.  If not, I can try to come up with more specific things you can look at.

Thanks,
Shawn


Reply | Threaded
Open this post in threaded view
|

Re: solr 3.5 taking long to index

Lance Norskog-2
You're doing more commits than you need. You may want to turn off
autocommit since you are running commit yourself. Every commit causes
segment activity, so if you want to minimize that, you don't need
autocommit.

About memory sizing: you should drop the memory assigned to Solr until
it slows down, then increase it a little. All of the rest should be
used by the OS for disk caching.

With this much ram, investigate "Large Pages" support. This is an
operating system hack to make large programs run faster in large ram
machines.

On Sat, Apr 14, 2012 at 7:33 PM, Rohit <[hidden email]> wrote:

> Hey Shawn,
>
> Solr is working better, though not out of the woods, freed up some memory is the system and also increased the mergeFactor to 20.
>
> Has another question, we had autocommit ON all this while in our solrconfig.xml, but since the upgrade we have been noticing keeping autocommit on is increasing the commit time, though I cannot find a reason are they related in anyway?
>
>
> Regards,
> Rohit
> Mobile: +91-9901768202
> About Me: http://about.me/rohitg
>
>
> -----Original Message-----
> From: Shawn Heisey [mailto:[hidden email]]
> Sent: 13 April 2012 11:01
> To: [hidden email]
> Subject: Re: solr 3.5 taking long to index
>
> On 4/12/2012 8:42 PM, Rohit wrote:
>> The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue.
>
> For good performance, Solr relies on the OS having enough free RAM to keep critical portions of the index in the disk cache.  Some numbers that I have collected from your information so far are listed below.
> Please let me know if I've got any of this wrong:
>
> 46GB total RAM
> 36GB RAM allocated to Solr
> 300GB total index size
>
> This leaves only 10GB of RAM free to cache 300GB of index, assuming that this server is dedicated to Solr.  The critical portions of your index are very likely considerably larger than 10GB, which causes constant reading from the disk for queries and updates.  With a high commit rate and a relatively low mergeFactor of 10, your index will be doing a lot of merging during updates, and some of those merges are likely to be quite large, further complicating the I/O situation.
>
> Another thing that can lead to increasing index update times is cache warming, also greatly affected by high I/O levels.  If you visit the /solr/corename/admin/stats.jsp#cache URL, you can see the warmupTime for each cache in milliseconds.
>
> Adding more memory to the server would probably help things.  You'll want to carefully check all the server and Solr statistics you can to make sure that memory is the root of problem, before you actually spend the money.  At the server level, look for things like a high iowait CPU percentage.  For Solr, you can turn the logging level up to INFO in the admin interface as well as turn on the infostream in solrconfig.xml for extensive debugging.
>
> I hope this is helpful.  If not, I can try to come up with more specific things you can look at.
>
> Thanks,
> Shawn
>
>



--
Lance Norskog
[hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: solr 3.5 taking long to index

Yonik Seeley-2-2
In reply to this post by rohit-4
On Thu, Apr 12, 2012 at 10:42 PM, Rohit <[hidden email]> wrote:
> The machine has a total ram of around 46GB. My Biggest concern is Solr index time gradually increasing and then the commit stops because of timeouts, out commit rate is very high, but I am not able to find the root cause of the issue.

The difference you're seeing between 3.1 and 3.5 may be due to a bug
in the former where fsync was not being called:
https://issues.apache.org/jira/browse/LUCENE-3418

> We commit every 5000 documents

If you are doing bulk indexing, wait until the end to commit.
Upcoming Solr4 has near realtime (soft commit) support to make doing
frequent commits (for the purposes of visibility) less expensive.

-Yonik
lucenerevolution.com - Lucene/Solr Open Source Search Conference.
Boston May 7-10