High cpu usage when adding documents to v7.7 solr cloud

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

High cpu usage when adding documents to v7.7 solr cloud

lancasp22
We have a solr cloud on v7.7.0 and we observe very high cpu usage when we're indexing new documents.

The solr cloud in question has 50 shards and 2 replicas of each and we're using NRT. Obviously indexing takes some resources but we see pretty much 100% cpu usage when we're indexing documents and we haven't seen this before on other v6.3.0 solr clouds indexing under a similar load. In the v7.7.0 cloud we're using nested child documents but other than that the set-ups are quite similar.

For us performance is more important than having updates reflected in real-time and we have configured commits as follows:
<updateHandler class="solr.DirectUpdateHandler2">
                <maxPendingDeletes>100000</maxPendingDeletes>
                <autoCommit>
                                <maxDocs>1800000</maxDocs>
                                <maxTime>300000</maxTime>
                                <openSearcher>false</openSearcher>
                </autoCommit>
                <autoSoftCommit>
                                <maxTime>-1</maxTime>
                                <openSearcher>false</openSearcher>
                </autoSoftCommit>
                <updateLog>
                                <str name="dir">${solr.data.dir:}</str>
                </updateLog>
</updateHandler>

I can observe the problem on a test server with 3 shards without any replication but the same schema and solr config. If I add a simple document like {Id:TEST01} through the document page in the solr admin UI I immediately I see 100% cpu usage on one core of the test server and this lasts for 300 seconds - the same time as the maxTime for autoCommit. If I then change the maxTime to say 10 seconds, then the high cpu usage lasts for just 10 seconds. I can't see anything being logged that would indicate what solr is using the cpu for.

Have we made some error in our configuration or is this behaviour expected in v7? It just seems really odd that it's using loads of cpu just to add a single document and that the high usage lasts for the maxTime on the autocommit. I'm guessing that whatever is making the single document addition so inefficient is also affecting the performance of our live solr cloud and contributing to the 100% cpu usage that we observe when adding new documents. Any help, advice or insight would be appreciated.

Cheers,
Peter Lancaster | Developer
________________________________
This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of findmypast shall be understood as neither given nor endorsed by it.
________________________________
Reply | Threaded
Open this post in threaded view
|

Re: High cpu usage when adding documents to v7.7 solr cloud

Oleksandr Drapushko
Hi Peter,

This bug was introduced in Solr 7.7.0. It is related to Java 8. And it was
fixed in Solr 7.7.2.

Here are the ways to deal with it:
1. Upgrade to Solr 7.7.2
2. Patch your Solr 7.7
3. Use Java 9+

You can read more on this here:
https://issues.apache.org/jira/browse/SOLR-13349


Regards,
Oleksandr

On Tue, Oct 15, 2019 at 8:31 PM Peter Lancaster <
[hidden email]> wrote:

> We have a solr cloud on v7.7.0 and we observe very high cpu usage when
> we're indexing new documents.
>
> The solr cloud in question has 50 shards and 2 replicas of each and we're
> using NRT. Obviously indexing takes some resources but we see pretty much
> 100% cpu usage when we're indexing documents and we haven't seen this
> before on other v6.3.0 solr clouds indexing under a similar load. In the
> v7.7.0 cloud we're using nested child documents but other than that the
> set-ups are quite similar.
>
> For us performance is more important than having updates reflected in
> real-time and we have configured commits as follows:
> <updateHandler class="solr.DirectUpdateHandler2">
>                 <maxPendingDeletes>100000</maxPendingDeletes>
>                 <autoCommit>
>                                 <maxDocs>1800000</maxDocs>
>                                 <maxTime>300000</maxTime>
>                                 <openSearcher>false</openSearcher>
>                 </autoCommit>
>                 <autoSoftCommit>
>                                 <maxTime>-1</maxTime>
>                                 <openSearcher>false</openSearcher>
>                 </autoSoftCommit>
>                 <updateLog>
>                                 <str name="dir">${solr.data.dir:}</str>
>                 </updateLog>
> </updateHandler>
>
> I can observe the problem on a test server with 3 shards without any
> replication but the same schema and solr config. If I add a simple document
> like {Id:TEST01} through the document page in the solr admin UI I
> immediately I see 100% cpu usage on one core of the test server and this
> lasts for 300 seconds - the same time as the maxTime for autoCommit. If I
> then change the maxTime to say 10 seconds, then the high cpu usage lasts
> for just 10 seconds. I can't see anything being logged that would indicate
> what solr is using the cpu for.
>
> Have we made some error in our configuration or is this behaviour expected
> in v7? It just seems really odd that it's using loads of cpu just to add a
> single document and that the high usage lasts for the maxTime on the
> autocommit. I'm guessing that whatever is making the single document
> addition so inefficient is also affecting the performance of our live solr
> cloud and contributing to the 100% cpu usage that we observe when adding
> new documents. Any help, advice or insight would be appreciated.
>
> Cheers,
> Peter Lancaster | Developer
> ________________________________
> This message is confidential and may contain privileged information. You
> should not disclose its contents to any other person. If you are not the
> intended recipient, please notify the sender named above immediately. It is
> expressly declared that this e-mail does not constitute nor form part of a
> contract or unilateral obligation. Opinions, conclusions and other
> information in this message that do not relate to the official business of
> findmypast shall be understood as neither given nor endorsed by it.
> ________________________________
>
Reply | Threaded
Open this post in threaded view
|

RE: [EXTERNAL] Re: High cpu usage when adding documents to v7.7 solr cloud

lancasp22
Hi Oleksandr,

Thanks very much for help. Yes that jira looks like exactly our problem.

I'll give that a go tomorrow.

Cheers,
Peter.

-----Original Message-----
From: Oleksandr Drapushko [mailto:[hidden email]]
Sent: 15 October 2019 19:52
To: [hidden email]
Subject: [EXTERNAL] Re: High cpu usage when adding documents to v7.7 solr cloud

Hi Peter,

This bug was introduced in Solr 7.7.0. It is related to Java 8. And it was fixed in Solr 7.7.2.

Here are the ways to deal with it:
1. Upgrade to Solr 7.7.2
2. Patch your Solr 7.7
3. Use Java 9+

You can read more on this here:
https://issues.apache.org/jira/browse/SOLR-13349


Regards,
Oleksandr

On Tue, Oct 15, 2019 at 8:31 PM Peter Lancaster < [hidden email]> wrote:

> We have a solr cloud on v7.7.0 and we observe very high cpu usage when
> we're indexing new documents.
>
> The solr cloud in question has 50 shards and 2 replicas of each and
> we're using NRT. Obviously indexing takes some resources but we see
> pretty much 100% cpu usage when we're indexing documents and we
> haven't seen this before on other v6.3.0 solr clouds indexing under a
> similar load. In the
> v7.7.0 cloud we're using nested child documents but other than that
> the set-ups are quite similar.
>
> For us performance is more important than having updates reflected in
> real-time and we have configured commits as follows:
> <updateHandler class="solr.DirectUpdateHandler2">
>                 <maxPendingDeletes>100000</maxPendingDeletes>
>                 <autoCommit>
>                                 <maxDocs>1800000</maxDocs>
>                                 <maxTime>300000</maxTime>
>                                 <openSearcher>false</openSearcher>
>                 </autoCommit>
>                 <autoSoftCommit>
>                                 <maxTime>-1</maxTime>
>                                 <openSearcher>false</openSearcher>
>                 </autoSoftCommit>
>                 <updateLog>
>                                 <str name="dir">${solr.data.dir:}</str>
>                 </updateLog>
> </updateHandler>
>
> I can observe the problem on a test server with 3 shards without any
> replication but the same schema and solr config. If I add a simple
> document like {Id:TEST01} through the document page in the solr admin
> UI I immediately I see 100% cpu usage on one core of the test server
> and this lasts for 300 seconds - the same time as the maxTime for
> autoCommit. If I then change the maxTime to say 10 seconds, then the
> high cpu usage lasts for just 10 seconds. I can't see anything being
> logged that would indicate what solr is using the cpu for.
>
> Have we made some error in our configuration or is this behaviour
> expected in v7? It just seems really odd that it's using loads of cpu
> just to add a single document and that the high usage lasts for the
> maxTime on the autocommit. I'm guessing that whatever is making the
> single document addition so inefficient is also affecting the
> performance of our live solr cloud and contributing to the 100% cpu
> usage that we observe when adding new documents. Any help, advice or insight would be appreciated.
>
> Cheers,
> Peter Lancaster | Developer
> ________________________________
> This message is confidential and may contain privileged information.
> You should not disclose its contents to any other person. If you are
> not the intended recipient, please notify the sender named above
> immediately. It is expressly declared that this e-mail does not
> constitute nor form part of a contract or unilateral obligation.
> Opinions, conclusions and other information in this message that do
> not relate to the official business of findmypast shall be understood as neither given nor endorsed by it.
> ________________________________
>
________________________________

This message is confidential and may contain privileged information. You should not disclose its contents to any other person. If you are not the intended recipient, please notify the sender named above immediately. It is expressly declared that this e-mail does not constitute nor form part of a contract or unilateral obligation. Opinions, conclusions and other information in this message that do not relate to the official business of findmypast shall be understood as neither given nor endorsed by it.
________________________________