Question about threading in search

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Question about threading in search

Peilin Yang
I was wondering if anyone can shed some light on an issue we're having:
we're comparing two different indexes on the same collection - one with
lots of different segments (default settings), and one with a force
merged into one segment. It seems that search is sometimes faster with
multiple segments.

We thought it might be because Lucene parallelizes searching over the
leafSlices?

But then we came across this:

https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L570-L619 <https://github.com/apache/lucene-solr/blob/master/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L570-L619>


where the javadoc says: "... this method will use the searcher's
ExecutorService in order to parallelize execution of the collection on
the configured leafSlices."

But we're using the vanilla search... but does the vanilla search
redirect to this anyway? Either way, we're not explicitly configuring
and ExecutorService...

Any insight on exactly what's going on?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: Question about threading in search

Toke Eskildsen-2
On Sat, 2017-09-02 at 18:33 -0700, Peilin Yang wrote:
> we're comparing two different indexes on the same collection - one
> with lots of different segments (default settings), and one with a
> force merged into one segment. It seems that search is sometimes
> faster with multiple segments.

If you are using Lucene 7+ and if some of the fields you are requesting
as part of your search result are stored as DocValues, you might have
encountered a performance regression with the streaming API:
https://issues.apache.org/jira/browse/LUCENE-8374

One peculiar effect of this issue is that fewer larger segments gets
slower DocValues retrieval, compared to more smaller segments. So a
force merge to 1 segment can result in worse performance.

- Toke Eskildsen, the Royal Danish Library, Denmark


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Question about threading in search

Erick Erickson
Please don't optimize to 1 segment unless you can afford to do it
quite regularly, see:
https://lucidworks.com/2017/10/13/segment-merging-deleted-documents-optimize-may-bad/

(NOTE: this is changing as of 7.5, see:
https://lucidworks.com/2018/06/20/solr-and-optimizing-your-index-take-ii/).

bq. It seems that search is sometimes faster with multiple segments.

In addition to what Toke said, this may just be an autowarming
problem. Measurements mean little/nothing unless they're performed on
a warmed-up index since there's quite a bit of reading from disk into
the heap and OS memory space that's required. You may just be seeing
that.

Best,
Erick


On Fri, Aug 17, 2018 at 2:26 AM, Toke Eskildsen <[hidden email]> wrote:

> On Sat, 2017-09-02 at 18:33 -0700, Peilin Yang wrote:
>> we're comparing two different indexes on the same collection - one
>> with lots of different segments (default settings), and one with a
>> force merged into one segment. It seems that search is sometimes
>> faster with multiple segments.
>
> If you are using Lucene 7+ and if some of the fields you are requesting
> as part of your search result are stored as DocValues, you might have
> encountered a performance regression with the streaming API:
> https://issues.apache.org/jira/browse/LUCENE-8374
>
> One peculiar effect of this issue is that fewer larger segments gets
> slower DocValues retrieval, compared to more smaller segments. So a
> force merge to 1 segment can result in worse performance.
>
> - Toke Eskildsen, the Royal Danish Library, Denmark
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]