Suggestion or recommendation for NRT

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Suggestion or recommendation for NRT

ramyogi
Hi,

We are using SOLR 7.5.0 version, We are testing one collection for both
Search and Index.
Our collection created with  below indexerconfig, We are using indexing
process KAFKA connect plugin with every 5 min commit (cloud SOLRJ) as below
https://github.com/jcustenborder/kafka-connect-solr

Our collection 30 shard and 3 replica with good RAM EC2 nodes ( 90 nodes) .
it is almost 2.5 TB size. We could see the performance impact for search
request when indexing in progress.   Any kind of recommendation or fine
tunning steps to be considered , Please provide any references if there is
available that will help.

<indexConfig>
        <mergedSegmentWarmer
class="org.apache.lucene.index.SimpleMergedSegmentWarmer"/>
        <maxIndexingThreads>150</maxIndexingThreads>
        <ramBufferSizeMB>8000</ramBufferSizeMB>
        <maxBufferedDocs>1000000</maxBufferedDocs>
        <mergePolicyFactory
class="org.apache.solr.index.TieredMergePolicyFactory">
            <int name="maxMergeAtOnce">10</int>
            <int name="segmentsPerTier">10</int>
        </mergePolicyFactory>
        <mergeScheduler
class="org.apache.lucene.index.ConcurrentMergeScheduler" />
        <lockType>${solr.lock.type:native}</lockType>
        <infoStream>true</infoStream>
    </indexConfig>






--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion or recommendation for NRT

ramyogi
Even though same document indexed over and over again due to incremental
update. Index size is being increased.
Do I miss any configuration to make optimization occur by internally ?



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion or recommendation for NRT

Erick Erickson
Updated documents are marked as deleted in the
old segment and added to a new segment. When
commits happen, merges occur and only then is the
space occupied by the deleted document reclaimed.

Which segments are merged on commit depends
on a number of factors.

Unless you can prove the extra space is a problem,
you should just ignore the issue. The percentage of
deleted documents should max out at around 33%
on Solr 7.5+.

For background on merging, see:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html

The third animation (TieredMergePolicy) is the default.

Best,
Erick

> On Jul 1, 2020, at 3:51 PM, ramyogi <[hidden email]> wrote:
>
> Even though same document indexed over and over again due to incremental
> update. Index size is being increased.
> Do I miss any configuration to make optimization occur by internally ?
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion or recommendation for NRT

ramyogi
Thanks Erick for the details and reference to understand better about merging
segment stuff.
When I compare  performance of uninterrupted/optimized ( segment count 1)
collection  for search request vs (indexing + search) in parallel  going on
collection   performance is 3 times higher,
for example : first one is responding 100ms in average but second one around
400ms.

is that expected behaviour like we need to tradeoff if we do Indexing and
search in the same collection parallel.
or we can still fine tune with some parameters for better performance then
please suggest some.



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Suggestion or recommendation for NRT

Erick Erickson
That seems high. It can be tricky to get tests. Are you running with
some kind of test runner? Do you have, say, 3-4 thousand queries
you run? Are you running the tests after warming the searchers?

Also, if you have indexed down to one segment, _then_ tried
adding docs and measuring you are not getting accurate results.

See: https://lucidworks.com/post/segment-merging-deleted-documents-optimize-may-bad/

Best,
Erick

> On Jul 1, 2020, at 5:55 PM, ramyogi <[hidden email]> wrote:
>
> Thanks Erick for the details and reference to understand better about merging
> segment stuff.
> When I compare  performance of uninterrupted/optimized ( segment count 1)
> collection  for search request vs (indexing + search) in parallel  going on
> collection   performance is 3 times higher,
> for example : first one is responding 100ms in average but second one around
> 400ms.
>
> is that expected behaviour like we need to tradeoff if we do Indexing and
> search in the same collection parallel.
> or we can still fine tune with some parameters for better performance then
> please suggest some.
>
>
>
> --
> Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Suggestion or recommendation for NRT

ramyogi
Thanks a lot for your time to respond for my clarifications.

We are having two environment,
ENV A and ENV B ( Both same capacity of RAM ( r5.2xlarge  and same number of
shards and replicas type (NRT) for the collection)

ENV A -  it is having a collection which is optimized ( segment count 1 and
numdocs = maxdocs ) it is used only for Search request. No delta updates are
being triggerred.


ENV B - It is having same collection copied from "ENV A" and continues DELTA
updates in progress so it is used for Indexing and search request. Indexing
using KAFKA connect plugin that uses SOLRJ with
solr.commit.within=300000 ( milli seconds )


We are comparing performance between those environments for search request
using automation test running with bunch of queries.

Regarding search warmup:

    <query>

        <maxBooleanClauses>10000</maxBooleanClauses>

        <filterCache class="solr.FastLRUCache"
                     size="10120"
                     initialSize="4192"
                     autowarmCount="0"/>

       
        <cache name="perSegFilter"
               class="solr.search.LRUCache"
               size="10"
               initialSize="0"
               autowarmCount="10"
               regenerator="solr.NoOpRegenerator"/>

        <fieldValueCache class="solr.FastLRUCache"
                         size="4096"
                         autowarmCount="1024"
                         showItems="32"/>

        <enableLazyFieldLoading>true</enableLazyFieldLoading>

        <queryResultWindowSize>20</queryResultWindowSize>

        <queryResultMaxDocsCached>200</queryResultMaxDocsCached>

        <listener event="newSearcher" class="solr.QuerySenderListener">
            <arr name="queries">
                <lst>
                    <str name="q">*:*</str>
                    <str name="facet">true</str>
                </lst>
            </arr>
        </listener>
        <listener event="firstSearcher" class="solr.QuerySenderListener">
            <arr name="queries">
                <lst>
                    <str name="q">*:*</str>
                    <str name="facet">true</str>
                </lst>
            </arr>
        </listener>

        <useColdSearcher>false</useColdSearcher>

        <maxWarmingSearchers>24</maxWarmingSearchers>

    </query>



--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html