Solr server partial update is very slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr server partial update is very slow

Sujay Bawaskar-2
Hi,

We are getting below log without invoking commit operation after every
partial update call. We have configured soft commit and commit time as
below. With below configuration we are able to perform 800 partial updates
per minutes which I think is very slow. Our Index size is 10GB for this
particular core.
Is there any configuration we are missing here?

Log:
2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection] realtime]

Commit configuration:
solr.autoCommit.maxTime:1800000
solr.autoSoftCommit.maxTime:900000



--
Thanks,
Sujay P Bawaskar
M:+91-77091 53669
Reply | Threaded
Open this post in threaded view
|

Re: Solr server partial update is very slow

Erick Erickson
bq: We are getting below log without invoking commit operation after
every partial update call

Not sure what you mean here. If you're issuing a commit from the
client every time you update a doc (or even a batch) that's an
anti-pattern and you're opening searchers all the time. Don't do that
;).

I'd set my autoCommit time to something reasonable like 60 seconds (or
even 15) with openSearcher=false in solrconfig.xml. Set your soft
commit to however long you can stand, I try for at least 10 seconds,
but 60 or even 300 if possible, it all depends on how long after you
index a document it has to be available for search.

The settings you have are dangerous. See:

https://lucidworks.com/2013/08/23/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Best,
Erick

On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar <[hidden email]> wrote:

> Hi,
>
> We are getting below log without invoking commit operation after every
> partial update call. We have configured soft commit and commit time as
> below. With below configuration we are able to perform 800 partial updates
> per minutes which I think is very slow. Our Index size is 10GB for this
> particular core.
> Is there any configuration we are missing here?
>
> Log:
> 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
> o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection] realtime]
>
> Commit configuration:
> solr.autoCommit.maxTime:1800000
> solr.autoSoftCommit.maxTime:900000
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
Reply | Threaded
Open this post in threaded view
|

Re: Solr server partial update is very slow

Sujay Bawaskar-2
We are not issuing client side commit for partial update. We have
openSearcher=false
in solrconfig.xml, in this case we have set softCommit interval as 15
minutes. Solr version is 6.4.1.

Thanks,
Sujay

On Fri, Nov 10, 2017 at 11:58 AM, Erick Erickson <[hidden email]>
wrote:

> bq: We are getting below log without invoking commit operation after
> every partial update call
>
> Not sure what you mean here. If you're issuing a commit from the
> client every time you update a doc (or even a batch) that's an
> anti-pattern and you're opening searchers all the time. Don't do that
> ;).
>
> I'd set my autoCommit time to something reasonable like 60 seconds (or
> even 15) with openSearcher=false in solrconfig.xml. Set your soft
> commit to however long you can stand, I try for at least 10 seconds,
> but 60 or even 300 if possible, it all depends on how long after you
> index a document it has to be available for search.
>
> The settings you have are dangerous. See:
>
> https://lucidworks.com/2013/08/23/understanding-
> transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar <[hidden email]>
> wrote:
> > Hi,
> >
> > We are getting below log without invoking commit operation after every
> > partial update call. We have configured soft commit and commit time as
> > below. With below configuration we are able to perform 800 partial
> updates
> > per minutes which I think is very slow. Our Index size is 10GB for this
> > particular core.
> > Is there any configuration we are missing here?
> >
> > Log:
> > 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
> > o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
> realtime]
> >
> > Commit configuration:
> > solr.autoCommit.maxTime:1800000
> > solr.autoSoftCommit.maxTime:900000
> >
> >
> >
> > --
> > Thanks,
> > Sujay P Bawaskar
> > M:+91-77091 53669
>



--
Thanks,
Sujay P Bawaskar
M:+91-77091 53669
Reply | Threaded
Open this post in threaded view
|

Re: Solr server partial update is very slow

Sujay Bawaskar-2
Any reason we get below log even if client does not issue commit or we can
ignore this log?

Log: 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [
x:collection]  o.a.s.s.SolrIndexSearcher
Opening  [Searcher@7010b1c6[collection] realtime]

On Fri, Nov 10, 2017 at 12:06 PM, Sujay Bawaskar <[hidden email]>
wrote:

> We are not issuing client side commit for partial update. We have openSearcher=false
> in solrconfig.xml, in this case we have set softCommit interval as 15
> minutes. Solr version is 6.4.1.
>
> Thanks,
> Sujay
>
> On Fri, Nov 10, 2017 at 11:58 AM, Erick Erickson <[hidden email]>
> wrote:
>
>> bq: We are getting below log without invoking commit operation after
>> every partial update call
>>
>> Not sure what you mean here. If you're issuing a commit from the
>> client every time you update a doc (or even a batch) that's an
>> anti-pattern and you're opening searchers all the time. Don't do that
>> ;).
>>
>> I'd set my autoCommit time to something reasonable like 60 seconds (or
>> even 15) with openSearcher=false in solrconfig.xml. Set your soft
>> commit to however long you can stand, I try for at least 10 seconds,
>> but 60 or even 300 if possible, it all depends on how long after you
>> index a document it has to be available for search.
>>
>> The settings you have are dangerous. See:
>>
>> https://lucidworks.com/2013/08/23/understanding-transaction-
>> logs-softcommit-and-commit-in-sorlcloud/
>>
>> Best,
>> Erick
>>
>> On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar <[hidden email]>
>> wrote:
>> > Hi,
>> >
>> > We are getting below log without invoking commit operation after every
>> > partial update call. We have configured soft commit and commit time as
>> > below. With below configuration we are able to perform 800 partial
>> updates
>> > per minutes which I think is very slow. Our Index size is 10GB for this
>> > particular core.
>> > Is there any configuration we are missing here?
>> >
>> > Log:
>> > 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
>> > o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
>> realtime]
>> >
>> > Commit configuration:
>> > solr.autoCommit.maxTime:1800000
>> > solr.autoSoftCommit.maxTime:900000
>> >
>> >
>> >
>> > --
>> > Thanks,
>> > Sujay P Bawaskar
>> > M:+91-77091 53669
>>
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>



--
Thanks,
Sujay P Bawaskar
M:+91-77091 53669
Reply | Threaded
Open this post in threaded view
|

Re: Solr server partial update is very slow

Sujay Bawaskar-2
Hi Erick,

Some of the partial updates are taking huge time. Average QTime for updates
in 15 minute interval is 14344.

2017-11-10 08:15:11.863 INFO  (qtp225493257-43961) [   x:collection]
o.a.s.c.S.Request [collection]  webapp=/solr path=/update
params={wt=javabin&version=2} status=0 QTime=10073904.

On Fri, Nov 10, 2017 at 12:27 PM, Sujay Bawaskar <[hidden email]>
wrote:

> Any reason we get below log even if client does not issue commit or we can
> ignore this log?
>
> Log: 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
>  o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
> realtime]
>
> On Fri, Nov 10, 2017 at 12:06 PM, Sujay Bawaskar <[hidden email]>
> wrote:
>
>> We are not issuing client side commit for partial update. We have openSearcher=false
>> in solrconfig.xml, in this case we have set softCommit interval as 15
>> minutes. Solr version is 6.4.1.
>>
>> Thanks,
>> Sujay
>>
>> On Fri, Nov 10, 2017 at 11:58 AM, Erick Erickson <[hidden email]
>> > wrote:
>>
>>> bq: We are getting below log without invoking commit operation after
>>> every partial update call
>>>
>>> Not sure what you mean here. If you're issuing a commit from the
>>> client every time you update a doc (or even a batch) that's an
>>> anti-pattern and you're opening searchers all the time. Don't do that
>>> ;).
>>>
>>> I'd set my autoCommit time to something reasonable like 60 seconds (or
>>> even 15) with openSearcher=false in solrconfig.xml. Set your soft
>>> commit to however long you can stand, I try for at least 10 seconds,
>>> but 60 or even 300 if possible, it all depends on how long after you
>>> index a document it has to be available for search.
>>>
>>> The settings you have are dangerous. See:
>>>
>>> https://lucidworks.com/2013/08/23/understanding-transaction-
>>> logs-softcommit-and-commit-in-sorlcloud/
>>>
>>> Best,
>>> Erick
>>>
>>> On Thu, Nov 9, 2017 at 9:25 PM, Sujay Bawaskar <[hidden email]>
>>> wrote:
>>> > Hi,
>>> >
>>> > We are getting below log without invoking commit operation after every
>>> > partial update call. We have configured soft commit and commit time as
>>> > below. With below configuration we are able to perform 800 partial
>>> updates
>>> > per minutes which I think is very slow. Our Index size is 10GB for this
>>> > particular core.
>>> > Is there any configuration we are missing here?
>>> >
>>> > Log:
>>> > 2017-11-10 05:13:33.730 INFO  (qtp225493257-38746) [   x:collection]
>>> > o.a.s.s.SolrIndexSearcher Opening  [Searcher@7010b1c6[collection]
>>> realtime]
>>> >
>>> > Commit configuration:
>>> > solr.autoCommit.maxTime:1800000
>>> > solr.autoSoftCommit.maxTime:900000
>>> >
>>> >
>>> >
>>> > --
>>> > Thanks,
>>> > Sujay P Bawaskar
>>> > M:+91-77091 53669
>>>
>>
>>
>>
>> --
>> Thanks,
>> Sujay P Bawaskar
>> M:+91-77091 53669
>>
>
>
>
> --
> Thanks,
> Sujay P Bawaskar
> M:+91-77091 53669
>



--
Thanks,
Sujay P Bawaskar
M:+91-77091 53669
Reply | Threaded
Open this post in threaded view
|

Re: Solr server partial update is very slow

Shawn Heisey-2
In reply to this post by Sujay Bawaskar-2
On 11/11/2017 8:17 AM, Sujay Bawaskar wrote:

> Thanks Shawn. Its good to know that OpenSearcher is not causing any
> issue.
>
> We are good with 15 minutes of softCommit interval . We are using
> stand alone solr instance and not solr cloud. There are 100 cores on
> this machine but index ingestion was going on for single core. Total
> size of index is 100GB out of which this one with 10GB data is largest
> one.
> Standalone solr machine is hosted on dedicated instances with 4 CPU
> cores and 120 GB Memory. Solr JVM is configured with xms=40G and
> xmx=80G. In this case partial update is being performed by 200 solr
> clients simultaneously.

Looks like I managed to send my previous reply direct instead of to the
list.  I'm sending this one to the list.

Why is your heap 80GB?  That's *huge*.  With 80GB of the 120GB total
used by one Java process, you've got about 40GB left to cache the index
-- assuming that this one Solr instance is the only significant program
running on the server.  40GB to cache a 100GB index might be enough for
good performance, or it might not be enough.  There are no easy formulas
for figuring that out.A heap that size is also likely to experience some
occasional stop-the-world GC pauses that could take a VERY long time.

My dev Solr server (6.6.2-SNAPSHOT) has all of the indexes on it that
use several servers in production.  That's over 700GB of index data. 
This server runs with a 28GB heap, and the only reason it's *that* high
is because I had to increase the heap in order to successfully run some
data-mining grouping and facet queries.  Normally it works just fine
with about a 13GB heap.

200 simultaneous indexing requests seems excessive to me, especially
when the Solr server only has 4 CPUs.  Indexing several requests at the
same time is the best way to achieve fast indexing, but if you have too
many, it's could actually get *worse* than indexing with only one
thread/process.

Thanks,
Shawn

--------------------
For completeness, below is the full text of the thread where I replied
before:

> On Fri, Nov 10, 2017 at 8:59 PM, Shawn Heisey <[hidden email]
> <mailto:[hidden email]>>wrote:
>
>     On 11/9/2017 10:25 PM, Sujay Bawaskar wrote:
>     > We are getting below log without invoking commit operation after
>     every
>     > partial update call. We have configured soft commit and commit
>     time as
>     > below. With below configuration we are able to perform 800
>     partial updates
>     > per minutes which I think is very slow. Our Index size is 10GB
>     for this
>     > particular core.
>     > Is there any configuration we are missing here?
>     >
>     > Log:
>     > 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [   x:collection]
>     > o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection]
>     realtime]
>
>     This is a *realtime* searcher, for the realtime get handler. 
>     These will
>     be recreated frequently as you index.  Opening realtime searchers
>     should
>     be extremely fast and not really affect the system much, and this
>     happens without any configuration or user action.
>
>     The realtime get handler, which is typically accessed as /get, can
>     retrieve documents that haven't been made accessible to the normal
>     index
>     searcher.  If this feature were likely to cause performance
>     problems, it
>     would not be turned on by default.
>
>     https://lucene.apache.org/solr/guide/6_6/realtime-get.html
>     <https://lucene.apache.org/solr/guide/6_6/realtime-get.html>
>
>     Are you seeing any other frequent logs about opening searchers that
>     aren't realtime?
>
>     > Commit configuration:
>     > solr.autoCommit.maxTime:1800000
>     > solr.autoSoftCommit.maxTime:900000
>
>     I applaud your restraint here.  We frequently see users that want
>     these
>     things to happen on intervals measured in seconds, not minutes --
>     often
>     as low as *one* second.  That said, I think I would actually decrease
>     the autoCommit time to 60000, and make sure openSearcher is false.  I
>     would probably decrease the autoSoftCommit time to 120000 or 300000.
>
>     Why would I recommend much shorter intervals than you have
>     configured?
>     For autoCommit, it comes down to the mantra on the blog post that
>     Erick
>     gave you:  "Hard commits are about durability."  Half an hour between
>     hard commits doesn't address durability concerns very well, and a hard
>     commit that does NOT open a new searcher is very quick.  For
>     autoSoftCommit, my recommendation is just because fifteen minutes is a
>     VERY long interval for that, and you really don't need to wait that
>     long.  Unless your settings are pathological and cause commits to take
>     an unreasonable amount of time, doing them once every two minutes
>     won't
>     cause problems.
>
>     Those recommendations aren't set in stone.  If you have hard evidence
>     that you need different values, feel free, but I do think the
>     intervals
>     should be drastically reduced.
>
>     As for why your indexing is slow ... it is very unlikely to be related
>     to the log message you quoted, or your automatic commit settings. 
>     With
>     the information provided, I can't give you ANY recommendations --
>     a lot
>     more information will be required.
>
>     The entire solrconfig.xml would be useful.  You'll need to use a paste
>     website or a file sharing site, attachments are typically stripped by
>     the mailing list.  And here's some information that cannot be obtained
>     from solrconfig.xml that will be helpful:  Are you running in cloud
>     mode?  If so, are your indexes sharded?  How much total memory is
>     in the
>     machine?  Is there one Solr instance on the machine, or multiple?  Is
>     there other significant software on the machine, like a webserver or a
>     database server?  How much heap space does each Solr instance have?
>     What is the total amount of data being handled by all Solr
>     instances on
>     the machine?  I'm looking for both a document count and disk
>     space.  You
>     mentioned that your index is 10GB, but that doesn't say whether that's
>     the only index on the machine.
>

Reply | Threaded
Open this post in threaded view
|

Re: Solr server partial update is very slow

Sujay Bawaskar-2
HI Shawn,

At time of indexing with partial updates CPU utilization is max 12%. Solr
JVM heap size is minimum 40GB because we are using data-import handler
with SortedMapBackedCache which uses java heap at time of full import.
Memory utilization is also decent when partial updates are running. Only
thing is when partial update is running at 700 updates per minutes the
QTime reaches 5 seconds. Is it the case that direct partial updates from
200 clients causing index merging to be slower? Here we open 40*200  (At
least 40 partial updates from each of process) HTTP solr connection with
solj for partial updates.

Thanks,
Sujay

On Mon, Nov 13, 2017 at 1:56 AM, Shawn Heisey <[hidden email]> wrote:

> On 11/11/2017 8:17 AM, Sujay Bawaskar wrote:
>
> Thanks Shawn. Its good to know that OpenSearcher is not causing any issue.
>>
>> We are good with 15 minutes of softCommit interval . We are using stand
>> alone solr instance and not solr cloud. There are 100 cores on this machine
>> but index ingestion was going on for single core. Total size of index is
>> 100GB out of which this one with 10GB data is largest one.
>> Standalone solr machine is hosted on dedicated instances with 4 CPU cores
>> and 120 GB Memory. Solr JVM is configured with xms=40G and xmx=80G. In this
>> case partial update is being performed by 200 solr clients simultaneously.
>>
>
> Looks like I managed to send my previous reply direct instead of to the
> list.  I'm sending this one to the list.
>
> Why is your heap 80GB?  That's *huge*.  With 80GB of the 120GB total used
> by one Java process, you've got about 40GB left to cache the index --
> assuming that this one Solr instance is the only significant program
> running on the server.  40GB to cache a 100GB index might be enough for
> good performance, or it might not be enough.  There are no easy formulas
> for figuring that out.A heap that size is also likely to experience some
> occasional stop-the-world GC pauses that could take a VERY long time.
>
> My dev Solr server (6.6.2-SNAPSHOT) has all of the indexes on it that use
> several servers in production.  That's over 700GB of index data.  This
> server runs with a 28GB heap, and the only reason it's *that* high is
> because I had to increase the heap in order to successfully run some
> data-mining grouping and facet queries.  Normally it works just fine with
> about a 13GB heap.
>
> 200 simultaneous indexing requests seems excessive to me, especially when
> the Solr server only has 4 CPUs.  Indexing several requests at the same
> time is the best way to achieve fast indexing, but if you have too many,
> it's could actually get *worse* than indexing with only one thread/process.
>
> Thanks,
> Shawn
>
> --------------------
> For completeness, below is the full text of the thread where I replied
> before:
>
> On Fri, Nov 10, 2017 at 8:59 PM, Shawn Heisey <[hidden email]
>> <mailto:[hidden email]>>wrote:
>>
>>     On 11/9/2017 10:25 PM, Sujay Bawaskar wrote:
>>     > We are getting below log without invoking commit operation after
>>     every
>>     > partial update call. We have configured soft commit and commit
>>     time as
>>     > below. With below configuration we are able to perform 800
>>     partial updates
>>     > per minutes which I think is very slow. Our Index size is 10GB
>>     for this
>>     > particular core.
>>     > Is there any configuration we are missing here?
>>     >
>>     > Log:
>>     > 2017-11-10 05:13:33.730 INFO (qtp225493257-38746) [   x:collection]
>>     > o.a.s.s.SolrIndexSearcher Opening [Searcher@7010b1c6[collection]
>>     realtime]
>>
>>     This is a *realtime* searcher, for the realtime get handler.
>>     These will
>>     be recreated frequently as you index.  Opening realtime searchers
>>     should
>>     be extremely fast and not really affect the system much, and this
>>     happens without any configuration or user action.
>>
>>     The realtime get handler, which is typically accessed as /get, can
>>     retrieve documents that haven't been made accessible to the normal
>>     index
>>     searcher.  If this feature were likely to cause performance
>>     problems, it
>>     would not be turned on by default.
>>
>>     https://lucene.apache.org/solr/guide/6_6/realtime-get.html
>>     <https://lucene.apache.org/solr/guide/6_6/realtime-get.html>
>>
>>     Are you seeing any other frequent logs about opening searchers that
>>     aren't realtime?
>>
>>     > Commit configuration:
>>     > solr.autoCommit.maxTime:1800000
>>     > solr.autoSoftCommit.maxTime:900000
>>
>>     I applaud your restraint here.  We frequently see users that want
>>     these
>>     things to happen on intervals measured in seconds, not minutes --
>>     often
>>     as low as *one* second.  That said, I think I would actually decrease
>>     the autoCommit time to 60000, and make sure openSearcher is false.  I
>>     would probably decrease the autoSoftCommit time to 120000 or 300000.
>>
>>     Why would I recommend much shorter intervals than you have
>>     configured?
>>     For autoCommit, it comes down to the mantra on the blog post that
>>     Erick
>>     gave you:  "Hard commits are about durability."  Half an hour between
>>     hard commits doesn't address durability concerns very well, and a hard
>>     commit that does NOT open a new searcher is very quick.  For
>>     autoSoftCommit, my recommendation is just because fifteen minutes is a
>>     VERY long interval for that, and you really don't need to wait that
>>     long.  Unless your settings are pathological and cause commits to take
>>     an unreasonable amount of time, doing them once every two minutes
>>     won't
>>     cause problems.
>>
>>     Those recommendations aren't set in stone.  If you have hard evidence
>>     that you need different values, feel free, but I do think the
>>     intervals
>>     should be drastically reduced.
>>
>>     As for why your indexing is slow ... it is very unlikely to be related
>>     to the log message you quoted, or your automatic commit settings.
>>     With
>>     the information provided, I can't give you ANY recommendations --
>>     a lot
>>     more information will be required.
>>
>>     The entire solrconfig.xml would be useful.  You'll need to use a paste
>>     website or a file sharing site, attachments are typically stripped by
>>     the mailing list.  And here's some information that cannot be obtained
>>     from solrconfig.xml that will be helpful:  Are you running in cloud
>>     mode?  If so, are your indexes sharded?  How much total memory is
>>     in the
>>     machine?  Is there one Solr instance on the machine, or multiple?  Is
>>     there other significant software on the machine, like a webserver or a
>>     database server?  How much heap space does each Solr instance have?
>>     What is the total amount of data being handled by all Solr
>>     instances on
>>     the machine?  I'm looking for both a document count and disk
>>     space.  You
>>     mentioned that your index is 10GB, but that doesn't say whether that's
>>     the only index on the machine.
>>
>>
>


--
Thanks,
Sujay P Bawaskar
M:+91-77091 53669