Solr indexing performance

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr indexing performance

Rahul Goswami
Hello,

We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
parallel threads with 5000 docs per batch. This is a test setup and all
documents are indexed on the same node. We are seeing connection timeout
issues thereafter some time into indexing. I am yet to analyze GC pauses
and other possibilities, but as a guideline just wanted to know what
indexing rate might be "too high" for Solr so as to consider throttling ?
The documents are mostly metadata with about 25 odd fields, so not very
heavy.
Would be nice to know a baseline performance expectation for better
application design considerations.

Thanks,
Rahul
Reply | Threaded
Open this post in threaded view
|

Re: Solr indexing performance

Vincenzo D'Amore
Hi, the clients are reusing their SolrClient?

Ciao,
Vincenzo

--
mobile: 3498513251
skype: free.dev

> On 5 Dec 2019, at 18:28, Rahul Goswami <[hidden email]> wrote:
>
> ´╗┐Hello,
>
> We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
> parallel threads with 5000 docs per batch. This is a test setup and all
> documents are indexed on the same node. We are seeing connection timeout
> issues thereafter some time into indexing. I am yet to analyze GC pauses
> and other possibilities, but as a guideline just wanted to know what
> indexing rate might be "too high" for Solr so as to consider throttling ?
> The documents are mostly metadata with about 25 odd fields, so not very
> heavy.
> Would be nice to know a baseline performance expectation for better
> application design considerations.
>
> Thanks,
> Rahul
Reply | Threaded
Open this post in threaded view
|

Re: Solr indexing performance

Shawn Heisey-2
In reply to this post by Rahul Goswami
On 12/5/2019 10:28 AM, Rahul Goswami wrote:

> We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
> parallel threads with 5000 docs per batch. This is a test setup and all
> documents are indexed on the same node. We are seeing connection timeout
> issues thereafter some time into indexing. I am yet to analyze GC pauses
> and other possibilities, but as a guideline just wanted to know what
> indexing rate might be "too high" for Solr so as to consider throttling ?
> The documents are mostly metadata with about 25 odd fields, so not very
> heavy.
> Would be nice to know a baseline performance expectation for better
> application design considerations.

It's not really possible to give you a number here.  It depends on a lot
of things, and every install is going to be different.

On a setup that I once dealt with, where there was only a single thread
doing the indexing, indexing on each core happened at about 1000 docs
per second.  I've heard people mention rates beyond 50000 docs per
second.  I've also heard people talk about rates of indexing far lower
than what I was seeing.

When you say "connection timeout" issues ... that could mean a couple of
different things.  It could mean that the connection never gets
established because it times out while trying, or it could mean that the
connection gets established, and then times out after that.  Which are
you seeing?  Usually dealing with that involves changing timeout
settings on the client application.  Figuring out what's causing the
delays that lead to the timeouts might be harder.  GC pauses are a
primary candidate.

There are typically two bottlenecks possible when indexing.  One is that
the source system cannot supply the documents fast enough.  The other is
that the Solr server is sitting mostly idle while the indexing program
waits for an opportunity to send more documents.  The first is not
something we can help you with.  The second is dealt with by making the
indexing application multi-threaded or multi-process, or adding more
threads/processes.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Solr indexing performance

Paras Lehana
Can ulimit
<https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#ulimit-settings-nix-operating-systems>
settings impact this? Review once.

On Thu, 5 Dec 2019 at 23:31, Shawn Heisey <[hidden email]> wrote:

> On 12/5/2019 10:28 AM, Rahul Goswami wrote:
> > We have a Solr 7.2.1 Solr Cloud setup where the client is indexing in 5
> > parallel threads with 5000 docs per batch. This is a test setup and all
> > documents are indexed on the same node. We are seeing connection timeout
> > issues thereafter some time into indexing. I am yet to analyze GC pauses
> > and other possibilities, but as a guideline just wanted to know what
> > indexing rate might be "too high" for Solr so as to consider throttling ?
> > The documents are mostly metadata with about 25 odd fields, so not very
> > heavy.
> > Would be nice to know a baseline performance expectation for better
> > application design considerations.
>
> It's not really possible to give you a number here.  It depends on a lot
> of things, and every install is going to be different.
>
> On a setup that I once dealt with, where there was only a single thread
> doing the indexing, indexing on each core happened at about 1000 docs
> per second.  I've heard people mention rates beyond 50000 docs per
> second.  I've also heard people talk about rates of indexing far lower
> than what I was seeing.
>
> When you say "connection timeout" issues ... that could mean a couple of
> different things.  It could mean that the connection never gets
> established because it times out while trying, or it could mean that the
> connection gets established, and then times out after that.  Which are
> you seeing?  Usually dealing with that involves changing timeout
> settings on the client application.  Figuring out what's causing the
> delays that lead to the timeouts might be harder.  GC pauses are a
> primary candidate.
>
> There are typically two bottlenecks possible when indexing.  One is that
> the source system cannot supply the documents fast enough.  The other is
> that the Solr server is sitting mostly idle while the indexing program
> waits for an opportunity to send more documents.  The first is not
> something we can help you with.  The second is dealt with by making the
> indexing application multi-threaded or multi-process, or adding more
> threads/processes.
>
> Thanks,
> Shawn
>


--
--
Regards,

*Paras Lehana* [65871]
Development Engineer, Auto-Suggest,
IndiaMART Intermesh Ltd.

8th Floor, Tower A, Advant-Navis Business Park, Sector 142,
Noida, UP, IN - 201303

Mob.: +91-9560911996
Work: 01203916600 | Extn:  *8173*

--
*
*

 <https://www.facebook.com/IndiaMART/videos/578196442936091/>
Reply | Threaded
Open this post in threaded view
|

Re: Solr indexing performance

Shawn Heisey-2
On 12/5/2019 10:42 PM, Paras Lehana wrote:
> Can ulimit
> <https://lucene.apache.org/solr/guide/8_3/taking-solr-to-production.html#ulimit-settings-nix-operating-systems>
> settings impact this? Review once.

If the OS limits prevent Solr from opening a file or starting a thread,
it is far more likely that the indexing would fail.  It's not likely
that such problems would make indexing slow.

Thanks,
Shawn