6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Clemens Wyss DEV
Just upgrading from 6.6 to 7.5 and am now seeing many "Connection evcitor"-threads which are all Thread.slee()ing ...

As of 6.6 I am keeping the SolrClients (one per core) in a HashMap. Is this ok or should I create a new SolrClient for each request I am doing?

SolrClient creation is as follows:
new HttpSolrClient.Builder( coreUrl )
.withConnectionTimeout( connectionTimeout )
.withSocketTimeout( forUpdating ? updateSocketTimeout : querySocketTimeout )
.build();

Also:
as I have querying and updating requests I'd like to make use of ConcurrentUpdateSolrClient for updating requests. But ConcurrentUpdateSolrClient does not seem to have the same fluent builder API

Thanks for any best practices advices regarding connection handling in SorJ
- Clemens
Reply | Threaded
Open this post in threaded view
|

Re: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Shawn Heisey-2
On 10/21/2018 10:13 AM, Clemens Wyss DEV wrote:
> Just upgrading from 6.6 to 7.5 and am now seeing many "Connection evcitor"-threads which are all Thread.slee()ing ...

What's the stacktrace on those threads?  If they're sleeping, then it's
unlikely that there's any real contribution to system load.

Are you having problems, or just seeing threads you didn't expect to see?

> As of 6.6 I am keeping the SolrClients (one per core) in a HashMap. Is this ok or should I create a new SolrClient for each request I am doing?

You should really be keeping one SolrClient per server node, and
indicating which core to access with each request.  One client object
can access every core on a node.  You do have to drop the core name from
the URL.

> as I have querying and updating requests I'd like to make use of ConcurrentUpdateSolrClient for updating requests. But ConcurrentUpdateSolrClient does not seem to have the same fluent builder API

ConcurrentUpdateSolrClient swallows exceptions -- if there's a problem
during indexing, your program will never know about it.  If you need
error handling, you'll need to use HttpSolrClient and handle multiple
indexing threads in your own code.  If it's OK with you to not have any
error handling, then ConcurrentUpdateSolrClient can work very well.

I do see a builder class for it:

http://lucene.apache.org/solr/7_5_0/solr-solrj/index.html?org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Clemens Wyss DEV
Thx Shawn!

> If they're sleeping, then it's unlikely that there's any real contribution to system load.
I know, but

> seeing threads you didn't expect to see?
exactly this

> You should really be keeping one SolrClient per server node,
>and indicating which core to access with each request
Due to the different timeouts (querying or updating) I think there should be at least two ...

> indicating which core to access with each request
e.g.? What I do for example when querying:
SolrClient solrClient = getSolrClient( coreName );
SolrQuery solrQuery = new SolrQuery();
...
QueryResponse response = solrClient.query( solrQuery );
...
If I omit the core in the url upon creation of the SolrClient, where can I then "indicate" the core?

> I do see a builder class for it
my fault

Thx again
- Clemens


-----Ursprüngliche Nachricht-----
Von: Shawn Heisey <[hidden email]>
Gesendet: Sonntag, 21. Oktober 2018 19:13
An: [hidden email]
Betreff: Re: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

On 10/21/2018 10:13 AM, Clemens Wyss DEV wrote:
> Just upgrading from 6.6 to 7.5 and am now seeing many "Connection evcitor"-threads which are all Thread.slee()ing ...

What's the stacktrace on those threads?  If they're sleeping, then it's unlikely that there's any real contribution to system load.

Are you having problems, or just seeing threads you didn't expect to see?

> As of 6.6 I am keeping the SolrClients (one per core) in a HashMap. Is this ok or should I create a new SolrClient for each request I am doing?

You should really be keeping one SolrClient per server node, and indicating which core to access with each request.  One client object can access every core on a node.  You do have to drop the core name from the URL.

> as I have querying and updating requests I'd like to make use of
> ConcurrentUpdateSolrClient for updating requests. But
> ConcurrentUpdateSolrClient does not seem to have the same fluent
> builder API

ConcurrentUpdateSolrClient swallows exceptions -- if there's a problem during indexing, your program will never know about it.  If you need error handling, you'll need to use HttpSolrClient and handle multiple indexing threads in your own code.  If it's OK with you to not have any error handling, then ConcurrentUpdateSolrClient can work very well.

I do see a builder class for it:

http://lucene.apache.org/solr/7_5_0/solr-solrj/index.html?org/apache/solr/client/solrj/impl/ConcurrentUpdateSolrClient.html

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Shawn Heisey-2
Reply | Threaded
Open this post in threaded view
|

AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Clemens Wyss DEV
On 10/21/2018 01:06 PM, Shawn Heisey wrote:
> You do it with the request, not with the client
For the UpdateRequests it is the "commitWithinMs"-parameter? To me this parameter sounds like telling the solr-server I need to see this data within "x ms". As we have autoCommit and autoSoftCommit
...
    <autoCommit>
       <maxTime>180000</maxTime> <!-- 3min -->
       <openSearcher>false</openSearcher>
     </autoCommit>

      <autoSoftCommit>
        <maxTime>10000</maxTime> <!-- 10sec -->
      </autoSoftCommit>
...
configured, I think I can/should omit this parameter?

What about when doing a normal query/search, i.e.
solrClient.query( solrQuery );
Where can I reduce the max-search-time I am willing to wait? Or shouldn't I?

Does this also mean I should NOT be setting any timeouts (neither connect nor so) when creating a SolrClient?

> stacktrace for the threads that worry you.  Do you have that?
All the same:
Thread.sleep(long) line: not available [native method]
IdleConnectionEvictor$1.run() line: 66
Thread.run() line: 748

Again, I am (and always was) aware that these stacktraces indicate no load (at least not when sleeping), nevertheless I am/was suprised to see that many. If I know they (come from SolrJ and) indicate no "wrong usage of SolrJ" from my side, I can live with them 😉
Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Shawn Heisey-2
On 10/21/2018 11:10 PM, Clemens Wyss DEV wrote:
> For the UpdateRequests it is the "commitWithinMs"-parameter? To me
> this parameter sounds like telling the solr-server I need to see this
> data within "x ms". As we have autoCommit and autoSoftCommit

The commitWithin parameter is effectively equivalent to autoSoftCommit. 
If you wanted to have different timeframes for visibility on some
updates, you could achieve that using commitWithin with a shorter
interval than what's in autoSoftCommit.

The ten seconds you have on autoSoftCommit is pretty aggressive.  If
your commits are taking 1-2 seconds or less, an interval that small
might be OK ... but Solr will be spending a LOT of resources doing
commits, which can become a performance problem.

The 3 minutes on autoCommit is quite long.  I'd probably go with 60
seconds, but a longer value isn't going to hurt anything and will result
in fewer resources being used for that operation.  The default in Solr's
example configs is 15 seconds ... which I personally feel is a little
too frequent, but it works very well for a lot of people.

> What about when doing a normal query/search, i.e.
> solrClient.query( solrQuery );
> Where can I reduce the max-search-time I am willing to wait? Or shouldn't I?

In general, this is not something you want to do.  But there is
something along those lines.  It's not guaranteed to always work,
depending on what phase of the query is taking a long time, but it is
sometimes very effective:

https://lucene.apache.org/solr/guide/7_5/common-query-parameters.html#CommonQueryParameters-ThetimeAllowedParameter

Configuring these timeouts is generally done at the client level, not
the request level.

> Does this also mean I should NOT be setting any timeouts (neither connect nor so) when creating a SolrClient?

The connect timeout is not a bad thing to have.  I'd personally set it
to something around (or less than) five seconds.  If it takes longer
than that to establish the connection, it's probably never going to
happen.  I've seen fifteen seconds here, which is REALLY long for that
timeout.

Socket timeouts are something that either you don't want, or you want to
be quite long, like two minutes.  If you issue a query that takes 30
seconds to run, and you set the socket timeout to 15 seconds, you're
never going to see the result.  The client will disconnect before the
server has a chance to respond. Setting the socket timeout just to make
sure the client doesn't stay connected forever is a good idea, but the
timeout must be much longer than you expect a query to ever take.

>> stacktrace for the threads that worry you.  Do you have that?
> All the same:
> Thread.sleep(long) line: not available [native method]
> IdleConnectionEvictor$1.run() line: 66
> Thread.run() line: 748

The IdleConnectionEvictor class was added in 5.5.3 and 6.2.0 by this issue:

https://issues.apache.org/jira/browse/SOLR-9290

Shalin worked on that issue, maybe they can shed some light on it and
indicate whether there should be many threads running that code.  I
won't discount the possibility that the thread count is excessive.  It
does seem like you wouldn't need more than one evictor thread per
client, but I didn't design it, so I can't say for sure.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

AW: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Clemens Wyss DEV
On 10/22/2018 6:15 AM, Shawn Heisey wrote:
> autoSoftCommit is pretty aggressive . If your commits are taking 1-2 seconds or les
well, some take minutes (re-index)!

> autoCommit is quite long.  I'd probably go with 60 seconds
Which means every 1min the "pending"/"soft" commits are effectively saved?

One additional question: having auto(soft)commits in place, do I at all need to explicitly commit UpdateRequest from SolrJ?

> added in 5.5.3 and 6.2.0 by this issue
hmmm, I have never seen these threads before, not even in 6.6

> Shalin worked on that issue, maybe they can shed some light on it and
>indicate whether there should be many threads running that code
I'd appreciate

Yet again, many thanks.
- Clemens

Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Shalin Shekhar Mangar
You can expect as many connection evictor threads as the number of http
client instances. This is true for both Solr 6.6 and 7.x.

I was intrigued as to why you were not seeing the same threads in both
versions. It turns out that I made a mistake in the patch I committed in
SOLR-9290 where instead of using Solr's DefaultSolrThreadFactory which
names threads with a proper prefix, I used Java's DefaultThreadFactory
which names threads like pool-123-thread-1282. So if you take a thread dump
from Solr 6.6, you should be able to find threads named like these which
are sleeping at a similar place in the stack.

On Tue, Oct 23, 2018 at 9:14 AM Clemens Wyss DEV <[hidden email]>
wrote:

> On 10/22/2018 6:15 AM, Shawn Heisey wrote:
> > autoSoftCommit is pretty aggressive . If your commits are taking 1-2
> seconds or les
> well, some take minutes (re-index)!
>
> > autoCommit is quite long.  I'd probably go with 60 seconds
> Which means every 1min the "pending"/"soft" commits are effectively saved?
>
> One additional question: having auto(soft)commits in place, do I at all
> need to explicitly commit UpdateRequest from SolrJ?
>
> > added in 5.5.3 and 6.2.0 by this issue
> hmmm, I have never seen these threads before, not even in 6.6
>
> > Shalin worked on that issue, maybe they can shed some light on it and
> >indicate whether there should be many threads running that code
> I'd appreciate
>
> Yet again, many thanks.
> - Clemens
>
>

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: AW: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Shawn Heisey-2
In reply to this post by Clemens Wyss DEV
On 10/22/2018 9:44 PM, Clemens Wyss DEV wrote:
> On 10/22/2018 6:15 AM, Shawn Heisey wrote:
>> autoSoftCommit is pretty aggressive . If your commits are taking 1-2 seconds or les
> well, some take minutes (re-index)!


Are you absolutely sure that you have commits taking that much time? 
I'm not talking about indexing, just the commit. Indexing a big batch of
documents can take a while, but even on a huge index, commits shouldn't
take a super long time, unless your cache warming is excessive.


>> autoCommit is quite long.  I'd probably go with 60 seconds
> Which means every 1min the "pending"/"soft" commits are effectively saved?
>
> One additional question: having auto(soft)commits in place, do I at all need to explicitly commit UpdateRequest from SolrJ?


With openSearcher set to false, the hard commits that autoCommit does do
NOT make changes visible.  A hard commit flushes outstanding data to
disk and starts a new transaction log.  If openSearcher is left at the
default of "true" then it would also open a new searcher, making changes
visible.

Hard commits are about durability, soft commits are about visibility.

If you have autoSoftCommit or use commitWithin, you do not need to send
explicit commits.

I see that Shalin has replied with info about his work on the class
you're concerned about.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

AW: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

Clemens Wyss DEV
In reply to this post by Shalin Shekhar Mangar
Hi Shalin,
> You can expect as many connection evictor threads
I have (whysoever (*)) 27 SolrClient instances instantiated but I see ~95 "Connection Evictor" threads ...

>It turns out that I made a mistake in the patch I committed in...which names threads like pool-123-thread-1282.
>So if you take a thread dump from Solr 6.6
Also I cannot prove, but I do not recall seeing many pool-xxx-thread-yyyy in my stack traces. In one I have at hand I see
2 "pool-x-thread-y"-threads
27 "ForkJoinPool.commonPool-worker-xx"-threads
So I guess it is/was the ForkJoinPool.commonPool-worker's, but 27 is not >90

Thx
- Clemens

(*) I will follow Shawn's advices in this thread asap

-----Ursprüngliche Nachricht-----
Von: Shalin Shekhar Mangar <[hidden email]>
Gesendet: Dienstag, 23. Oktober 2018 10:30
An: [hidden email]
Betreff: Re: AW: AW: 6.6 -> 7.5 SolrJ, seeing many "Connection evictor"-Threads

You can expect as many connection evictor threads as the number of http client instances. This is true for both Solr 6.6 and 7.x.

I was intrigued as to why you were not seeing the same threads in both versions. It turns out that I made a mistake in the patch I committed in
SOLR-9290 where instead of using Solr's DefaultSolrThreadFactory which names threads with a proper prefix, I used Java's DefaultThreadFactory which names threads like pool-123-thread-1282. So if you take a thread dump from Solr 6.6, you should be able to find threads named like these which are sleeping at a similar place in the stack.