Indexing in one collection affect index in another collection

classic Classic list List threaded Threaded
39 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi,

I am using Solr 7.5.0, and currently I am facing an issue of when I am
indexing in collection2, the indexing affects the records in collection1.
Although the records are still intact, it seems that the settings of the
termVecotrs get wipe out, and the index size of collection1 reduced from
3.3GB to 2.1GB after I do the indexing in collection2. Also, the search in
collection1, which was originall very fast, becomes very slow after the
indexing is done is collection2.

Anyone has faced such issues before or have any idea on what may have gone
wrong?

Regards,
Edwin
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Shawn Heisey-2
On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> I am using Solr 7.5.0, and currently I am facing an issue of when I am
> indexing in collection2, the indexing affects the records in collection1.
> Although the records are still intact, it seems that the settings of the
> termVecotrs get wipe out, and the index size of collection1 reduced from
> 3.3GB to 2.1GB after I do the indexing in collection2.

This should not be possible.  Indexing in one collection should have
absolutely no effect on another collection.

If logging has been left at its default settings, the solr.log file
should have enough info to show what actually happened.

> Also, the search in
> collection1, which was originall very fast, becomes very slow after the
> indexing is done is collection2.

If the two collections have data on the same server(s), I can see this
happening.  More memory is consumed when there is additional data, and
when Solr needs more memory, performance might be affected.  The
solution is generally to install more memory in the server.  If the
system is working, there should be no need to increase the heap size
when the memory size increases ... but there can be situations where the
heap is a little bit too small, where you WOULD want to increase the
heap size.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Shawn,

Thanks for your reply.

The log only shows a list  the following and I don't see any other logs
besides these.

2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1 s:shard1
r:core_node4 x:policies_shard1_replica_n2]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
id=13245417
2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1 s:shard1
r:core_node4 x:policies_shard1_replica_n2]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
id=13245430
2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1 s:shard1
r:core_node4 x:policies_shard1_replica_n2]
o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
id=13245435

There is no change to the segments info. but the slowdown in the first
collection is very drastic.
Before the indexing of collection2, the collection1 query QTime are in the
range of 4 to 50 ms. However, after indexing collection2, the collection1
query QTime increases to more than 1000 ms. The index are done in CSV
format, and the size of the index is 3GB.

Regards,
Edwin



On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]> wrote:

> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
> > indexing in collection2, the indexing affects the records in collection1.
> > Although the records are still intact, it seems that the settings of the
> > termVecotrs get wipe out, and the index size of collection1 reduced from
> > 3.3GB to 2.1GB after I do the indexing in collection2.
>
> This should not be possible.  Indexing in one collection should have
> absolutely no effect on another collection.
>
> If logging has been left at its default settings, the solr.log file
> should have enough info to show what actually happened.
>
> > Also, the search in
> > collection1, which was originall very fast, becomes very slow after the
> > indexing is done is collection2.
>
> If the two collections have data on the same server(s), I can see this
> happening.  More memory is consumed when there is additional data, and
> when Solr needs more memory, performance might be affected.  The
> solution is generally to install more memory in the server.  If the
> system is working, there should be no need to increase the heap size
> when the memory size increases ... but there can be situations where the
> heap is a little bit too small, where you WOULD want to increase the
> heap size.
>
> Thanks,
> Shawn
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Shawn,

> If the two collections have data on the same server(s), I can see this
> happening.  More memory is consumed when there is additional data, and
> when Solr needs more memory, performance might be affected.  The
> solution is generally to install more memory in the server.

I have found that even after we delete the index in collection2, the query
QTime for collection1 still remains slow. It does not goes back to its
previous fast speed before we index collection2.

Regards,
Edwin


On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <[hidden email]>
wrote:

> Hi Shawn,
>
> Thanks for your reply.
>
> The log only shows a list  the following and I don't see any other logs
> besides these.
>
> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245417
> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245430
> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
> id=13245435
>
> There is no change to the segments info. but the slowdown in the first
> collection is very drastic.
> Before the indexing of collection2, the collection1 query QTime are in the
> range of 4 to 50 ms. However, after indexing collection2, the collection1
> query QTime increases to more than 1000 ms. The index are done in CSV
> format, and the size of the index is 3GB.
>
> Regards,
> Edwin
>
>
>
> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]> wrote:
>
>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
>> > indexing in collection2, the indexing affects the records in
>> collection1.
>> > Although the records are still intact, it seems that the settings of the
>> > termVecotrs get wipe out, and the index size of collection1 reduced from
>> > 3.3GB to 2.1GB after I do the indexing in collection2.
>>
>> This should not be possible.  Indexing in one collection should have
>> absolutely no effect on another collection.
>>
>> If logging has been left at its default settings, the solr.log file
>> should have enough info to show what actually happened.
>>
>> > Also, the search in
>> > collection1, which was originall very fast, becomes very slow after the
>> > indexing is done is collection2.
>>
>> If the two collections have data on the same server(s), I can see this
>> happening.  More memory is consumed when there is additional data, and
>> when Solr needs more memory, performance might be affected.  The
>> solution is generally to install more memory in the server.  If the
>> system is working, there should be no need to increase the heap size
>> when the memory size increases ... but there can be situations where the
>> heap is a little bit too small, where you WOULD want to increase the
>> heap size.
>>
>> Thanks,
>> Shawn
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Shawn,

Unfortunately your reply of memory may not be valid. Please refer to my
explanation below of the strange behaviors (is it much more like a BUG than
anything else that is explainable):

Note that we still have 18GB of free unused memory on the server.

1. We indexed the first collection called customers (3.7 millioin records
from CSV data), index size is 2.09GB. The search in customers for any
keyword is returned within 50ms (QTime) for using highlight (unified
highlighter, posting, light term vectors)

2. Then we indexed the second collection called policies (6 million records
from CSV data), index size is 2.55GB. The search in policies for any
keyword is returned within 50ms (QTime) for using highlight (unified
highlighter, posting, light term vectors)

3. But now any search in customers for any keywords (not from cache) takes
as high as 1200ms (QTime). But still policies search remains very fast
(50ms).

4. So we decided to run the force optimize command on customers collection (
https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false),
surprisingly after optimization the search on customers collection for any
keywords become very fast again (less than 50ms). BUT strangely, the search
in policies collection become very slow (around 1200ms) without any changes
to the policies collection.

5. Based on above result, we decided to run the force optimize command on
policies collection (
https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false).
More surprisingly, after optimization the search on policies collection for
any keywords become very fast again (less than 50ms). BUT more strangely,
the search in customers collection again become very slow (around 1200ms)
without any changes to the customers collection.

What a strange and unexpected behavior! If this is not a bug, how could you
explain the above very strange behavior in Solr 7.5. Could it be a bug?

We would appreciate any support or help on our above situation.

Thank you.

Regards,
Edwin

On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <[hidden email]>
wrote:

> Hi Shawn,
>
> > If the two collections have data on the same server(s), I can see this
> > happening.  More memory is consumed when there is additional data, and
> > when Solr needs more memory, performance might be affected.  The
> > solution is generally to install more memory in the server.
>
> I have found that even after we delete the index in collection2, the query
> QTime for collection1 still remains slow. It does not goes back to its
> previous fast speed before we index collection2.
>
> Regards,
> Edwin
>
>
> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <[hidden email]>
> wrote:
>
>> Hi Shawn,
>>
>> Thanks for your reply.
>>
>> The log only shows a list  the following and I don't see any other logs
>> besides these.
>>
>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245417
>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245430
>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>> id=13245435
>>
>> There is no change to the segments info. but the slowdown in the first
>> collection is very drastic.
>> Before the indexing of collection2, the collection1 query QTime are in
>> the range of 4 to 50 ms. However, after indexing collection2, the
>> collection1 query QTime increases to more than 1000 ms. The index are done
>> in CSV format, and the size of the index is 3GB.
>>
>> Regards,
>> Edwin
>>
>>
>>
>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]> wrote:
>>
>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
>>> > indexing in collection2, the indexing affects the records in
>>> collection1.
>>> > Although the records are still intact, it seems that the settings of
>>> the
>>> > termVecotrs get wipe out, and the index size of collection1 reduced
>>> from
>>> > 3.3GB to 2.1GB after I do the indexing in collection2.
>>>
>>> This should not be possible.  Indexing in one collection should have
>>> absolutely no effect on another collection.
>>>
>>> If logging has been left at its default settings, the solr.log file
>>> should have enough info to show what actually happened.
>>>
>>> > Also, the search in
>>> > collection1, which was originall very fast, becomes very slow after the
>>> > indexing is done is collection2.
>>>
>>> If the two collections have data on the same server(s), I can see this
>>> happening.  More memory is consumed when there is additional data, and
>>> when Solr needs more memory, performance might be affected.  The
>>> solution is generally to install more memory in the server.  If the
>>> system is working, there should be no need to increase the heap size
>>> when the memory size increases ... but there can be situations where the
>>> heap is a little bit too small, where you WOULD want to increase the
>>> heap size.
>>>
>>> Thanks,
>>> Shawn
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Jan Høydahl / Cominvent
It would be useful if you can disclose the machine configuration, OS, memory, settings etc, as well as solr config including solr.in <http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture of memory, GC, etc.
You could also specify debugQuery=true on a slow search and check the timings section for clues. What QTime are you seeing on the slow queries in solr.log?
If that does not reveal the reason, I'd connect to your solr instance with a tool like jVisualVM or similar, to inspect what takes time. Or better, hook up to DataDog, SPM or some other cloud tool to get a full view of the system.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <[hidden email]>:
>
> Hi Shawn,
>
> Unfortunately your reply of memory may not be valid. Please refer to my
> explanation below of the strange behaviors (is it much more like a BUG than
> anything else that is explainable):
>
> Note that we still have 18GB of free unused memory on the server.
>
> 1. We indexed the first collection called customers (3.7 millioin records
> from CSV data), index size is 2.09GB. The search in customers for any
> keyword is returned within 50ms (QTime) for using highlight (unified
> highlighter, posting, light term vectors)
>
> 2. Then we indexed the second collection called policies (6 million records
> from CSV data), index size is 2.55GB. The search in policies for any
> keyword is returned within 50ms (QTime) for using highlight (unified
> highlighter, posting, light term vectors)
>
> 3. But now any search in customers for any keywords (not from cache) takes
> as high as 1200ms (QTime). But still policies search remains very fast
> (50ms).
>
> 4. So we decided to run the force optimize command on customers collection (
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false),
> surprisingly after optimization the search on customers collection for any
> keywords become very fast again (less than 50ms). BUT strangely, the search
> in policies collection become very slow (around 1200ms) without any changes
> to the policies collection.
>
> 5. Based on above result, we decided to run the force optimize command on
> policies collection (
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false).
> More surprisingly, after optimization the search on policies collection for
> any keywords become very fast again (less than 50ms). BUT more strangely,
> the search in customers collection again become very slow (around 1200ms)
> without any changes to the customers collection.
>
> What a strange and unexpected behavior! If this is not a bug, how could you
> explain the above very strange behavior in Solr 7.5. Could it be a bug?
>
> We would appreciate any support or help on our above situation.
>
> Thank you.
>
> Regards,
> Edwin
>
> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <[hidden email]>
> wrote:
>
>> Hi Shawn,
>>
>>> If the two collections have data on the same server(s), I can see this
>>> happening.  More memory is consumed when there is additional data, and
>>> when Solr needs more memory, performance might be affected.  The
>>> solution is generally to install more memory in the server.
>>
>> I have found that even after we delete the index in collection2, the query
>> QTime for collection1 still remains slow. It does not goes back to its
>> previous fast speed before we index collection2.
>>
>> Regards,
>> Edwin
>>
>>
>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <[hidden email]>
>> wrote:
>>
>>> Hi Shawn,
>>>
>>> Thanks for your reply.
>>>
>>> The log only shows a list  the following and I don't see any other logs
>>> besides these.
>>>
>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245417
>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245430
>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory update-script#processAdd:
>>> id=13245435
>>>
>>> There is no change to the segments info. but the slowdown in the first
>>> collection is very drastic.
>>> Before the indexing of collection2, the collection1 query QTime are in
>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>> collection1 query QTime increases to more than 1000 ms. The index are done
>>> in CSV format, and the size of the index is 3GB.
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>>
>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]> wrote:
>>>
>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I am
>>>>> indexing in collection2, the indexing affects the records in
>>>> collection1.
>>>>> Although the records are still intact, it seems that the settings of
>>>> the
>>>>> termVecotrs get wipe out, and the index size of collection1 reduced
>>>> from
>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>>
>>>> This should not be possible.  Indexing in one collection should have
>>>> absolutely no effect on another collection.
>>>>
>>>> If logging has been left at its default settings, the solr.log file
>>>> should have enough info to show what actually happened.
>>>>
>>>>> Also, the search in
>>>>> collection1, which was originall very fast, becomes very slow after the
>>>>> indexing is done is collection2.
>>>>
>>>> If the two collections have data on the same server(s), I can see this
>>>> happening.  More memory is consumed when there is additional data, and
>>>> when Solr needs more memory, performance might be affected.  The
>>>> solution is generally to install more memory in the server.  If the
>>>> system is working, there should be no need to increase the heap size
>>>> when the memory size increases ... but there can be situations where the
>>>> heap is a little bit too small, where you WOULD want to increase the
>>>> heap size.
>>>>
>>>> Thanks,
>>>> Shawn
>>>>
>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Thanks for your reply.

Below are what you have requested about our Solr setup, configurations
files ,schema and results of debug queries:

Looking forward to your advice and support on our problem.

1. System configurations
OS: Windows 10 Pro 64 bit
System Memory: 32GB
CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
Processor(s)
HDD: 3.0 TB (free 2.1 TB)  SATA

2. solrconfig.xml of customers and policies collection, and solr.in,cmd
which can be download from the following link:
https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing

3. The debug queries from both collections

*3.1. Debug Query From Policies ( which is Slow)*

  "debug":{

    "rawquerystring":"sherry",

    "querystring":"sherry",

    "parsedquery":"searchFields_tcs:sherry",

    "parsedquery_toString":"searchFields_tcs:sherry",

    "explain":{

      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
3097315) [SchemaSimilarity], result of:\n  14.540428 =
score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..

    "QParser":"LuceneQParser",

    "timing":{

      "time":681.0,

      "prepare":{

        "time":0.0,

        "query":{

          "time":0.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":0.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":0.0}},

      "process":{

        "time":680.0,

        "query":{

          "time":19.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":651.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":8.0}},

      "loadFieldValues":{

        "time":12.0}}}}



*3.2. Debug Query From Customers (which is fast because we index it after
indexing Policies):*



  "debug":{

    "rawquerystring":"sherry",

    "querystring":"sherry",

    "parsedquery":"searchFields_tcs:sherry",

    "parsedquery_toString":"searchFields_tcs:sherry",

    "explain":{

      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
2453665) [SchemaSimilarity], result of:\n  13.191501 =
score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..

    "QParser":"LuceneQParser",

    "timing":{

      "time":38.0,

      "prepare":{

        "time":1.0,

        "query":{

          "time":1.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":0.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":0.0}},

      "process":{

        "time":36.0,

        "query":{

          "time":1.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":31.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":3.0}},

      "loadFieldValues":{

        "time":13.0}}}}



Best Regards,
Edwin

On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]> wrote:

> It would be useful if you can disclose the machine configuration, OS,
> memory, settings etc, as well as solr config including solr.in <
> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture
> of memory, GC, etc.
> You could also specify debugQuery=true on a slow search and check the
> timings section for clues. What QTime are you seeing on the slow queries in
> solr.log?
> If that does not reveal the reason, I'd connect to your solr instance with
> a tool like jVisualVM or similar, to inspect what takes time. Or better,
> hook up to DataDog, SPM or some other cloud tool to get a full view of the
> system.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <[hidden email]
> >:
> >
> > Hi Shawn,
> >
> > Unfortunately your reply of memory may not be valid. Please refer to my
> > explanation below of the strange behaviors (is it much more like a BUG
> than
> > anything else that is explainable):
> >
> > Note that we still have 18GB of free unused memory on the server.
> >
> > 1. We indexed the first collection called customers (3.7 millioin records
> > from CSV data), index size is 2.09GB. The search in customers for any
> > keyword is returned within 50ms (QTime) for using highlight (unified
> > highlighter, posting, light term vectors)
> >
> > 2. Then we indexed the second collection called policies (6 million
> records
> > from CSV data), index size is 2.55GB. The search in policies for any
> > keyword is returned within 50ms (QTime) for using highlight (unified
> > highlighter, posting, light term vectors)
> >
> > 3. But now any search in customers for any keywords (not from cache)
> takes
> > as high as 1200ms (QTime). But still policies search remains very fast
> > (50ms).
> >
> > 4. So we decided to run the force optimize command on customers
> collection (
> >
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> ),
> > surprisingly after optimization the search on customers collection for
> any
> > keywords become very fast again (less than 50ms). BUT strangely, the
> search
> > in policies collection become very slow (around 1200ms) without any
> changes
> > to the policies collection.
> >
> > 5. Based on above result, we decided to run the force optimize command on
> > policies collection (
> >
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> ).
> > More surprisingly, after optimization the search on policies collection
> for
> > any keywords become very fast again (less than 50ms). BUT more strangely,
> > the search in customers collection again become very slow (around 1200ms)
> > without any changes to the customers collection.
> >
> > What a strange and unexpected behavior! If this is not a bug, how could
> you
> > explain the above very strange behavior in Solr 7.5. Could it be a bug?
> >
> > We would appreciate any support or help on our above situation.
> >
> > Thank you.
> >
> > Regards,
> > Edwin
> >
> > On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <[hidden email]>
> > wrote:
> >
> >> Hi Shawn,
> >>
> >>> If the two collections have data on the same server(s), I can see this
> >>> happening.  More memory is consumed when there is additional data, and
> >>> when Solr needs more memory, performance might be affected.  The
> >>> solution is generally to install more memory in the server.
> >>
> >> I have found that even after we delete the index in collection2, the
> query
> >> QTime for collection1 still remains slow. It does not goes back to its
> >> previous fast speed before we index collection2.
> >>
> >> Regards,
> >> Edwin
> >>
> >>
> >> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <[hidden email]
> >
> >> wrote:
> >>
> >>> Hi Shawn,
> >>>
> >>> Thanks for your reply.
> >>>
> >>> The log only shows a list  the following and I don't see any other logs
> >>> besides these.
> >>>
> >>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> >>> id=13245417
> >>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> >>> id=13245430
> >>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> update-script#processAdd:
> >>> id=13245435
> >>>
> >>> There is no change to the segments info. but the slowdown in the first
> >>> collection is very drastic.
> >>> Before the indexing of collection2, the collection1 query QTime are in
> >>> the range of 4 to 50 ms. However, after indexing collection2, the
> >>> collection1 query QTime increases to more than 1000 ms. The index are
> done
> >>> in CSV format, and the size of the index is 3GB.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>>
> >>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
> wrote:
> >>>
> >>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I
> am
> >>>>> indexing in collection2, the indexing affects the records in
> >>>> collection1.
> >>>>> Although the records are still intact, it seems that the settings of
> >>>> the
> >>>>> termVecotrs get wipe out, and the index size of collection1 reduced
> >>>> from
> >>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>
> >>>> This should not be possible.  Indexing in one collection should have
> >>>> absolutely no effect on another collection.
> >>>>
> >>>> If logging has been left at its default settings, the solr.log file
> >>>> should have enough info to show what actually happened.
> >>>>
> >>>>> Also, the search in
> >>>>> collection1, which was originall very fast, becomes very slow after
> the
> >>>>> indexing is done is collection2.
> >>>>
> >>>> If the two collections have data on the same server(s), I can see this
> >>>> happening.  More memory is consumed when there is additional data, and
> >>>> when Solr needs more memory, performance might be affected.  The
> >>>> solution is generally to install more memory in the server.  If the
> >>>> system is working, there should be no need to increase the heap size
> >>>> when the memory size increases ... but there can be situations where
> the
> >>>> heap is a little bit too small, where you WOULD want to increase the
> >>>> heap size.
> >>>>
> >>>> Thanks,
> >>>> Shawn
> >>>>
> >>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Jan Høydahl / Cominvent
Looks like highlighting takes most of the time on the first query (680ms). You config seems to ask for a lot of highlighting here, like 100 snippets of max 100000 characters etc.
Sounds to me that this might be a highlighting configuration problem. Try to disable highlighting (hl=false) and see if you get back your speed.
Also, I see fl=* in your config, which is really asking for all fields. Are you sure you want that, that may also be slow. Try to ask for just the fields you will be using.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <[hidden email]>:
>
> Thanks for your reply.
>
> Below are what you have requested about our Solr setup, configurations
> files ,schema and results of debug queries:
>
> Looking forward to your advice and support on our problem.
>
> 1. System configurations
> OS: Windows 10 Pro 64 bit
> System Memory: 32GB
> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> Processor(s)
> HDD: 3.0 TB (free 2.1 TB)  SATA
>
> 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> which can be download from the following link:
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>
> 3. The debug queries from both collections
>
> *3.1. Debug Query From Policies ( which is Slow)*
>
>  "debug":{
>
>    "rawquerystring":"sherry",
>
>    "querystring":"sherry",
>
>    "parsedquery":"searchFields_tcs:sherry",
>
>    "parsedquery_toString":"searchFields_tcs:sherry",
>
>    "explain":{
>
>      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
> b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>
>    "QParser":"LuceneQParser",
>
>    "timing":{
>
>      "time":681.0,
>
>      "prepare":{
>
>        "time":0.0,
>
>        "query":{
>
>          "time":0.0},
>
>        "facet":{
>
>          "time":0.0},
>
>        "facet_module":{
>
>          "time":0.0},
>
>        "mlt":{
>
>          "time":0.0},
>
>        "highlight":{
>
>          "time":0.0},
>
>        "stats":{
>
>          "time":0.0},
>
>        "expand":{
>
>          "time":0.0},
>
>        "terms":{
>
>          "time":0.0},
>
>        "debug":{
>
>          "time":0.0}},
>
>      "process":{
>
>        "time":680.0,
>
>        "query":{
>
>          "time":19.0},
>
>        "facet":{
>
>          "time":0.0},
>
>        "facet_module":{
>
>          "time":0.0},
>
>        "mlt":{
>
>          "time":0.0},
>
>        "highlight":{
>
>          "time":651.0},
>
>        "stats":{
>
>          "time":0.0},
>
>        "expand":{
>
>          "time":0.0},
>
>        "terms":{
>
>          "time":0.0},
>
>        "debug":{
>
>          "time":8.0}},
>
>      "loadFieldValues":{
>
>        "time":12.0}}}}
>
>
>
> *3.2. Debug Query From Customers (which is fast because we index it after
> indexing Policies):*
>
>
>
>  "debug":{
>
>    "rawquerystring":"sherry",
>
>    "querystring":"sherry",
>
>    "parsedquery":"searchFields_tcs:sherry",
>
>    "parsedquery_toString":"searchFields_tcs:sherry",
>
>    "explain":{
>
>      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>
>    "QParser":"LuceneQParser",
>
>    "timing":{
>
>      "time":38.0,
>
>      "prepare":{
>
>        "time":1.0,
>
>        "query":{
>
>          "time":1.0},
>
>        "facet":{
>
>          "time":0.0},
>
>        "facet_module":{
>
>          "time":0.0},
>
>        "mlt":{
>
>          "time":0.0},
>
>        "highlight":{
>
>          "time":0.0},
>
>        "stats":{
>
>          "time":0.0},
>
>        "expand":{
>
>          "time":0.0},
>
>        "terms":{
>
>          "time":0.0},
>
>        "debug":{
>
>          "time":0.0}},
>
>      "process":{
>
>        "time":36.0,
>
>        "query":{
>
>          "time":1.0},
>
>        "facet":{
>
>          "time":0.0},
>
>        "facet_module":{
>
>          "time":0.0},
>
>        "mlt":{
>
>          "time":0.0},
>
>        "highlight":{
>
>          "time":31.0},
>
>        "stats":{
>
>          "time":0.0},
>
>        "expand":{
>
>          "time":0.0},
>
>        "terms":{
>
>          "time":0.0},
>
>        "debug":{
>
>          "time":3.0}},
>
>      "loadFieldValues":{
>
>        "time":13.0}}}}
>
>
>
> Best Regards,
> Edwin
>
> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]> wrote:
>
>> It would be useful if you can disclose the machine configuration, OS,
>> memory, settings etc, as well as solr config including solr.in <
>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole picture
>> of memory, GC, etc.
>> You could also specify debugQuery=true on a slow search and check the
>> timings section for clues. What QTime are you seeing on the slow queries in
>> solr.log?
>> If that does not reveal the reason, I'd connect to your solr instance with
>> a tool like jVisualVM or similar, to inspect what takes time. Or better,
>> hook up to DataDog, SPM or some other cloud tool to get a full view of the
>> system.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <[hidden email]
>>> :
>>>
>>> Hi Shawn,
>>>
>>> Unfortunately your reply of memory may not be valid. Please refer to my
>>> explanation below of the strange behaviors (is it much more like a BUG
>> than
>>> anything else that is explainable):
>>>
>>> Note that we still have 18GB of free unused memory on the server.
>>>
>>> 1. We indexed the first collection called customers (3.7 millioin records
>>> from CSV data), index size is 2.09GB. The search in customers for any
>>> keyword is returned within 50ms (QTime) for using highlight (unified
>>> highlighter, posting, light term vectors)
>>>
>>> 2. Then we indexed the second collection called policies (6 million
>> records
>>> from CSV data), index size is 2.55GB. The search in policies for any
>>> keyword is returned within 50ms (QTime) for using highlight (unified
>>> highlighter, posting, light term vectors)
>>>
>>> 3. But now any search in customers for any keywords (not from cache)
>> takes
>>> as high as 1200ms (QTime). But still policies search remains very fast
>>> (50ms).
>>>
>>> 4. So we decided to run the force optimize command on customers
>> collection (
>>>
>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>> ),
>>> surprisingly after optimization the search on customers collection for
>> any
>>> keywords become very fast again (less than 50ms). BUT strangely, the
>> search
>>> in policies collection become very slow (around 1200ms) without any
>> changes
>>> to the policies collection.
>>>
>>> 5. Based on above result, we decided to run the force optimize command on
>>> policies collection (
>>>
>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>> ).
>>> More surprisingly, after optimization the search on policies collection
>> for
>>> any keywords become very fast again (less than 50ms). BUT more strangely,
>>> the search in customers collection again become very slow (around 1200ms)
>>> without any changes to the customers collection.
>>>
>>> What a strange and unexpected behavior! If this is not a bug, how could
>> you
>>> explain the above very strange behavior in Solr 7.5. Could it be a bug?
>>>
>>> We would appreciate any support or help on our above situation.
>>>
>>> Thank you.
>>>
>>> Regards,
>>> Edwin
>>>
>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <[hidden email]>
>>> wrote:
>>>
>>>> Hi Shawn,
>>>>
>>>>> If the two collections have data on the same server(s), I can see this
>>>>> happening.  More memory is consumed when there is additional data, and
>>>>> when Solr needs more memory, performance might be affected.  The
>>>>> solution is generally to install more memory in the server.
>>>>
>>>> I have found that even after we delete the index in collection2, the
>> query
>>>> QTime for collection1 still remains slow. It does not goes back to its
>>>> previous fast speed before we index collection2.
>>>>
>>>> Regards,
>>>> Edwin
>>>>
>>>>
>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <[hidden email]
>>>
>>>> wrote:
>>>>
>>>>> Hi Shawn,
>>>>>
>>>>> Thanks for your reply.
>>>>>
>>>>> The log only shows a list  the following and I don't see any other logs
>>>>> besides these.
>>>>>
>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> update-script#processAdd:
>>>>> id=13245417
>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> update-script#processAdd:
>>>>> id=13245430
>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> update-script#processAdd:
>>>>> id=13245435
>>>>>
>>>>> There is no change to the segments info. but the slowdown in the first
>>>>> collection is very drastic.
>>>>> Before the indexing of collection2, the collection1 query QTime are in
>>>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>>>> collection1 query QTime increases to more than 1000 ms. The index are
>> done
>>>>> in CSV format, and the size of the index is 3GB.
>>>>>
>>>>> Regards,
>>>>> Edwin
>>>>>
>>>>>
>>>>>
>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
>> wrote:
>>>>>
>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I
>> am
>>>>>>> indexing in collection2, the indexing affects the records in
>>>>>> collection1.
>>>>>>> Although the records are still intact, it seems that the settings of
>>>>>> the
>>>>>>> termVecotrs get wipe out, and the index size of collection1 reduced
>>>>>> from
>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>>>>
>>>>>> This should not be possible.  Indexing in one collection should have
>>>>>> absolutely no effect on another collection.
>>>>>>
>>>>>> If logging has been left at its default settings, the solr.log file
>>>>>> should have enough info to show what actually happened.
>>>>>>
>>>>>>> Also, the search in
>>>>>>> collection1, which was originall very fast, becomes very slow after
>> the
>>>>>>> indexing is done is collection2.
>>>>>>
>>>>>> If the two collections have data on the same server(s), I can see this
>>>>>> happening.  More memory is consumed when there is additional data, and
>>>>>> when Solr needs more memory, performance might be affected.  The
>>>>>> solution is generally to install more memory in the server.  If the
>>>>>> system is working, there should be no need to increase the heap size
>>>>>> when the memory size increases ... but there can be situations where
>> the
>>>>>> heap is a little bit too small, where you WOULD want to increase the
>>>>>> heap size.
>>>>>>
>>>>>> Thanks,
>>>>>> Shawn
>>>>>>
>>>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Jan,

Thanks for your reply.

However, we are still getting a slow QTime of 517ms even after we set
hl=false&fl=null.

Below is the debug query:

  "debug":{
    "rawquerystring":"cherry",
    "querystring":"cherry",
    "parsedquery":"searchFields_tcs:cherry",
    "parsedquery_toString":"searchFields_tcs:cherry",
    "explain":{
      "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
5747763) [SchemaSimilarity], result of:\n  14.227914 =
score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 =
docCount\n    1.4798305 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter
b\n      19.397041 = avgFieldLength\n      25.0 = fieldLength\n",
      "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
4840794) [SchemaSimilarity], result of:\n  13.937909 =
score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n
9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
(docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 =
docCount\n    1.4496675 = tfNorm, computed as (freq * (k1 + 1)) /
(freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter
b\n      19.397041 = avgFieldLength\n      27.0 = fieldLength\n",
    "QParser":"LuceneQParser",
    "timing":{
      "time":517.0,
      "prepare":{
        "time":0.0,
        "query":{
          "time":0.0},
        "facet":{
          "time":0.0},
        "facet_module":{
          "time":0.0},
        "mlt":{
          "time":0.0},
        "highlight":{
          "time":0.0},
        "stats":{
          "time":0.0},
        "expand":{
          "time":0.0},
        "terms":{
          "time":0.0},
        "debug":{
          "time":0.0}},
      "process":{
        "time":516.0,
        "query":{
          "time":15.0},
        "facet":{
          "time":0.0},
        "facet_module":{
          "time":0.0},
        "mlt":{
          "time":0.0},
        "highlight":{
          "time":0.0},
        "stats":{
          "time":0.0},
        "expand":{
          "time":0.0},
        "terms":{
          "time":0.0},
        "debug":{
          "time":500.0}}}}}

Regards,
Edwin


On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]> wrote:

> Looks like highlighting takes most of the time on the first query (680ms).
> You config seems to ask for a lot of highlighting here, like 100 snippets
> of max 100000 characters etc.
> Sounds to me that this might be a highlighting configuration problem. Try
> to disable highlighting (hl=false) and see if you get back your speed.
> Also, I see fl=* in your config, which is really asking for all fields.
> Are you sure you want that, that may also be slow. Try to ask for just the
> fields you will be using.
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <[hidden email]
> >:
> >
> > Thanks for your reply.
> >
> > Below are what you have requested about our Solr setup, configurations
> > files ,schema and results of debug queries:
> >
> > Looking forward to your advice and support on our problem.
> >
> > 1. System configurations
> > OS: Windows 10 Pro 64 bit
> > System Memory: 32GB
> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> > Processor(s)
> > HDD: 3.0 TB (free 2.1 TB)  SATA
> >
> > 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
> > which can be download from the following link:
> >
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >
> > 3. The debug queries from both collections
> >
> > *3.1. Debug Query From Policies ( which is Slow)*
> >
> >  "debug":{
> >
> >    "rawquerystring":"sherry",
> >
> >    "querystring":"sherry",
> >
> >    "parsedquery":"searchFields_tcs:sherry",
> >
> >    "parsedquery_toString":"searchFields_tcs:sherry",
> >
> >    "explain":{
> >
> >      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> > (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> > docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> > 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
> > b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
> >
> >    "QParser":"LuceneQParser",
> >
> >    "timing":{
> >
> >      "time":681.0,
> >
> >      "prepare":{
> >
> >        "time":0.0,
> >
> >        "query":{
> >
> >          "time":0.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":0.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":0.0}},
> >
> >      "process":{
> >
> >        "time":680.0,
> >
> >        "query":{
> >
> >          "time":19.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":651.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":8.0}},
> >
> >      "loadFieldValues":{
> >
> >        "time":12.0}}}}
> >
> >
> >
> > *3.2. Debug Query From Customers (which is fast because we index it after
> > indexing Policies):*
> >
> >
> >
> >  "debug":{
> >
> >    "rawquerystring":"sherry",
> >
> >    "querystring":"sherry",
> >
> >    "parsedquery":"searchFields_tcs:sherry",
> >
> >    "parsedquery_toString":"searchFields_tcs:sherry",
> >
> >    "explain":{
> >
> >      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> > 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> > score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> > 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> > 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> > b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> > termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> > 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
> >
> >    "QParser":"LuceneQParser",
> >
> >    "timing":{
> >
> >      "time":38.0,
> >
> >      "prepare":{
> >
> >        "time":1.0,
> >
> >        "query":{
> >
> >          "time":1.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":0.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":0.0}},
> >
> >      "process":{
> >
> >        "time":36.0,
> >
> >        "query":{
> >
> >          "time":1.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":31.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":3.0}},
> >
> >      "loadFieldValues":{
> >
> >        "time":13.0}}}}
> >
> >
> >
> > Best Regards,
> > Edwin
> >
> > On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]> wrote:
> >
> >> It would be useful if you can disclose the machine configuration, OS,
> >> memory, settings etc, as well as solr config including solr.in <
> >> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
> picture
> >> of memory, GC, etc.
> >> You could also specify debugQuery=true on a slow search and check the
> >> timings section for clues. What QTime are you seeing on the slow
> queries in
> >> solr.log?
> >> If that does not reveal the reason, I'd connect to your solr instance
> with
> >> a tool like jVisualVM or similar, to inspect what takes time. Or better,
> >> hook up to DataDog, SPM or some other cloud tool to get a full view of
> the
> >> system.
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
> [hidden email]
> >>> :
> >>>
> >>> Hi Shawn,
> >>>
> >>> Unfortunately your reply of memory may not be valid. Please refer to my
> >>> explanation below of the strange behaviors (is it much more like a BUG
> >> than
> >>> anything else that is explainable):
> >>>
> >>> Note that we still have 18GB of free unused memory on the server.
> >>>
> >>> 1. We indexed the first collection called customers (3.7 millioin
> records
> >>> from CSV data), index size is 2.09GB. The search in customers for any
> >>> keyword is returned within 50ms (QTime) for using highlight (unified
> >>> highlighter, posting, light term vectors)
> >>>
> >>> 2. Then we indexed the second collection called policies (6 million
> >> records
> >>> from CSV data), index size is 2.55GB. The search in policies for any
> >>> keyword is returned within 50ms (QTime) for using highlight (unified
> >>> highlighter, posting, light term vectors)
> >>>
> >>> 3. But now any search in customers for any keywords (not from cache)
> >> takes
> >>> as high as 1200ms (QTime). But still policies search remains very fast
> >>> (50ms).
> >>>
> >>> 4. So we decided to run the force optimize command on customers
> >> collection (
> >>>
> >>
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> >> ),
> >>> surprisingly after optimization the search on customers collection for
> >> any
> >>> keywords become very fast again (less than 50ms). BUT strangely, the
> >> search
> >>> in policies collection become very slow (around 1200ms) without any
> >> changes
> >>> to the policies collection.
> >>>
> >>> 5. Based on above result, we decided to run the force optimize command
> on
> >>> policies collection (
> >>>
> >>
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> >> ).
> >>> More surprisingly, after optimization the search on policies collection
> >> for
> >>> any keywords become very fast again (less than 50ms). BUT more
> strangely,
> >>> the search in customers collection again become very slow (around
> 1200ms)
> >>> without any changes to the customers collection.
> >>>
> >>> What a strange and unexpected behavior! If this is not a bug, how could
> >> you
> >>> explain the above very strange behavior in Solr 7.5. Could it be a bug?
> >>>
> >>> We would appreciate any support or help on our above situation.
> >>>
> >>> Thank you.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> Hi Shawn,
> >>>>
> >>>>> If the two collections have data on the same server(s), I can see
> this
> >>>>> happening.  More memory is consumed when there is additional data,
> and
> >>>>> when Solr needs more memory, performance might be affected.  The
> >>>>> solution is generally to install more memory in the server.
> >>>>
> >>>> I have found that even after we delete the index in collection2, the
> >> query
> >>>> QTime for collection1 still remains slow. It does not goes back to its
> >>>> previous fast speed before we index collection2.
> >>>>
> >>>> Regards,
> >>>> Edwin
> >>>>
> >>>>
> >>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
> [hidden email]
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hi Shawn,
> >>>>>
> >>>>> Thanks for your reply.
> >>>>>
> >>>>> The log only shows a list  the following and I don't see any other
> logs
> >>>>> besides these.
> >>>>>
> >>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> update-script#processAdd:
> >>>>> id=13245417
> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> update-script#processAdd:
> >>>>> id=13245430
> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >> update-script#processAdd:
> >>>>> id=13245435
> >>>>>
> >>>>> There is no change to the segments info. but the slowdown in the
> first
> >>>>> collection is very drastic.
> >>>>> Before the indexing of collection2, the collection1 query QTime are
> in
> >>>>> the range of 4 to 50 ms. However, after indexing collection2, the
> >>>>> collection1 query QTime increases to more than 1000 ms. The index are
> >> done
> >>>>> in CSV format, and the size of the index is 3GB.
> >>>>>
> >>>>> Regards,
> >>>>> Edwin
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
> >> wrote:
> >>>>>
> >>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of when I
> >> am
> >>>>>>> indexing in collection2, the indexing affects the records in
> >>>>>> collection1.
> >>>>>>> Although the records are still intact, it seems that the settings
> of
> >>>>>> the
> >>>>>>> termVecotrs get wipe out, and the index size of collection1 reduced
> >>>>>> from
> >>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>>>
> >>>>>> This should not be possible.  Indexing in one collection should have
> >>>>>> absolutely no effect on another collection.
> >>>>>>
> >>>>>> If logging has been left at its default settings, the solr.log file
> >>>>>> should have enough info to show what actually happened.
> >>>>>>
> >>>>>>> Also, the search in
> >>>>>>> collection1, which was originall very fast, becomes very slow after
> >> the
> >>>>>>> indexing is done is collection2.
> >>>>>>
> >>>>>> If the two collections have data on the same server(s), I can see
> this
> >>>>>> happening.  More memory is consumed when there is additional data,
> and
> >>>>>> when Solr needs more memory, performance might be affected.  The
> >>>>>> solution is generally to install more memory in the server.  If the
> >>>>>> system is working, there should be no need to increase the heap size
> >>>>>> when the memory size increases ... but there can be situations where
> >> the
> >>>>>> heap is a little bit too small, where you WOULD want to increase the
> >>>>>> heap size.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Shawn
> >>>>>>
> >>>>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Jan,

Referring to what you have mentioned that the highlighting takes up most of
the time in the first query from the policies collection, the highlighting
was very fast (less than 50ms) from the time it was indexed, till the time
after customers collection gets indexed, in which it slowed down
tremendously.

Also, the slow down does not just affect on the highlight requestHandler.
It also affects other requestHandler like select and clustering as well.
All of them gets the QTime of more than 500ms. This is also proven in the
latest debug query that we have sent earlier, in which we have set hl=false
and fl=null.

Any idea or explanation on this strange behavior?
Thank you for your support, as we look forward to shed some lights on this
issue and to resolve it.

Regards,
Edwin

On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <[hidden email]>
wrote:

> Hi Jan,
>
> Thanks for your reply.
>
> However, we are still getting a slow QTime of 517ms even after we set
> hl=false&fl=null.
>
> Below is the debug query:
>
>   "debug":{
>     "rawquerystring":"cherry",
>     "querystring":"cherry",
>     "parsedquery":"searchFields_tcs:cherry",
>     "parsedquery_toString":"searchFields_tcs:cherry",
>     "explain":{
>       "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      25.0 = fieldLength\n",
>       "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      27.0 = fieldLength\n",
>     "QParser":"LuceneQParser",
>     "timing":{
>       "time":517.0,
>       "prepare":{
>         "time":0.0,
>         "query":{
>           "time":0.0},
>         "facet":{
>           "time":0.0},
>         "facet_module":{
>           "time":0.0},
>         "mlt":{
>           "time":0.0},
>         "highlight":{
>           "time":0.0},
>         "stats":{
>           "time":0.0},
>         "expand":{
>           "time":0.0},
>         "terms":{
>           "time":0.0},
>         "debug":{
>           "time":0.0}},
>       "process":{
>         "time":516.0,
>         "query":{
>           "time":15.0},
>         "facet":{
>           "time":0.0},
>         "facet_module":{
>           "time":0.0},
>         "mlt":{
>           "time":0.0},
>         "highlight":{
>           "time":0.0},
>         "stats":{
>           "time":0.0},
>         "expand":{
>           "time":0.0},
>         "terms":{
>           "time":0.0},
>         "debug":{
>           "time":500.0}}}}}
>
> Regards,
> Edwin
>
>
> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]> wrote:
>
>> Looks like highlighting takes most of the time on the first query
>> (680ms). You config seems to ask for a lot of highlighting here, like 100
>> snippets of max 100000 characters etc.
>> Sounds to me that this might be a highlighting configuration problem. Try
>> to disable highlighting (hl=false) and see if you get back your speed.
>> Also, I see fl=* in your config, which is really asking for all fields.
>> Are you sure you want that, that may also be slow. Try to ask for just the
>> fields you will be using.
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <[hidden email]
>> >:
>> >
>> > Thanks for your reply.
>> >
>> > Below are what you have requested about our Solr setup, configurations
>> > files ,schema and results of debug queries:
>> >
>> > Looking forward to your advice and support on our problem.
>> >
>> > 1. System configurations
>> > OS: Windows 10 Pro 64 bit
>> > System Memory: 32GB
>> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>> > Processor(s)
>> > HDD: 3.0 TB (free 2.1 TB)  SATA
>> >
>> > 2. solrconfig.xml of customers and policies collection, and solr.in,cmd
>> > which can be download from the following link:
>> >
>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>> >
>> > 3. The debug queries from both collections
>> >
>> > *3.1. Debug Query From Policies ( which is Slow)*
>> >
>> >  "debug":{
>> >
>> >    "rawquerystring":"sherry",
>> >
>> >    "querystring":"sherry",
>> >
>> >    "parsedquery":"searchFields_tcs:sherry",
>> >
>> >    "parsedquery_toString":"searchFields_tcs:sherry",
>> >
>> >    "explain":{
>> >
>> >      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
>> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
>> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
>> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
>> > (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
>> > docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
>> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
>> > 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
>> > b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>> >
>> >    "QParser":"LuceneQParser",
>> >
>> >    "timing":{
>> >
>> >      "time":681.0,
>> >
>> >      "prepare":{
>> >
>> >        "time":0.0,
>> >
>> >        "query":{
>> >
>> >          "time":0.0},
>> >
>> >        "facet":{
>> >
>> >          "time":0.0},
>> >
>> >        "facet_module":{
>> >
>> >          "time":0.0},
>> >
>> >        "mlt":{
>> >
>> >          "time":0.0},
>> >
>> >        "highlight":{
>> >
>> >          "time":0.0},
>> >
>> >        "stats":{
>> >
>> >          "time":0.0},
>> >
>> >        "expand":{
>> >
>> >          "time":0.0},
>> >
>> >        "terms":{
>> >
>> >          "time":0.0},
>> >
>> >        "debug":{
>> >
>> >          "time":0.0}},
>> >
>> >      "process":{
>> >
>> >        "time":680.0,
>> >
>> >        "query":{
>> >
>> >          "time":19.0},
>> >
>> >        "facet":{
>> >
>> >          "time":0.0},
>> >
>> >        "facet_module":{
>> >
>> >          "time":0.0},
>> >
>> >        "mlt":{
>> >
>> >          "time":0.0},
>> >
>> >        "highlight":{
>> >
>> >          "time":651.0},
>> >
>> >        "stats":{
>> >
>> >          "time":0.0},
>> >
>> >        "expand":{
>> >
>> >          "time":0.0},
>> >
>> >        "terms":{
>> >
>> >          "time":0.0},
>> >
>> >        "debug":{
>> >
>> >          "time":8.0}},
>> >
>> >      "loadFieldValues":{
>> >
>> >        "time":12.0}}}}
>> >
>> >
>> >
>> > *3.2. Debug Query From Customers (which is fast because we index it
>> after
>> > indexing Policies):*
>> >
>> >
>> >
>> >  "debug":{
>> >
>> >    "rawquerystring":"sherry",
>> >
>> >    "querystring":"sherry",
>> >
>> >    "parsedquery":"searchFields_tcs:sherry",
>> >
>> >    "parsedquery_toString":"searchFields_tcs:sherry",
>> >
>> >    "explain":{
>> >
>> >      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
>> > 2453665) [SchemaSimilarity], result of:\n  13.191501 =
>> > score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
>> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>> > 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
>> > 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>> > b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
>> > termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>> > 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>> >
>> >    "QParser":"LuceneQParser",
>> >
>> >    "timing":{
>> >
>> >      "time":38.0,
>> >
>> >      "prepare":{
>> >
>> >        "time":1.0,
>> >
>> >        "query":{
>> >
>> >          "time":1.0},
>> >
>> >        "facet":{
>> >
>> >          "time":0.0},
>> >
>> >        "facet_module":{
>> >
>> >          "time":0.0},
>> >
>> >        "mlt":{
>> >
>> >          "time":0.0},
>> >
>> >        "highlight":{
>> >
>> >          "time":0.0},
>> >
>> >        "stats":{
>> >
>> >          "time":0.0},
>> >
>> >        "expand":{
>> >
>> >          "time":0.0},
>> >
>> >        "terms":{
>> >
>> >          "time":0.0},
>> >
>> >        "debug":{
>> >
>> >          "time":0.0}},
>> >
>> >      "process":{
>> >
>> >        "time":36.0,
>> >
>> >        "query":{
>> >
>> >          "time":1.0},
>> >
>> >        "facet":{
>> >
>> >          "time":0.0},
>> >
>> >        "facet_module":{
>> >
>> >          "time":0.0},
>> >
>> >        "mlt":{
>> >
>> >          "time":0.0},
>> >
>> >        "highlight":{
>> >
>> >          "time":31.0},
>> >
>> >        "stats":{
>> >
>> >          "time":0.0},
>> >
>> >        "expand":{
>> >
>> >          "time":0.0},
>> >
>> >        "terms":{
>> >
>> >          "time":0.0},
>> >
>> >        "debug":{
>> >
>> >          "time":3.0}},
>> >
>> >      "loadFieldValues":{
>> >
>> >        "time":13.0}}}}
>> >
>> >
>> >
>> > Best Regards,
>> > Edwin
>> >
>> > On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
>> wrote:
>> >
>> >> It would be useful if you can disclose the machine configuration, OS,
>> >> memory, settings etc, as well as solr config including solr.in <
>> >> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
>> picture
>> >> of memory, GC, etc.
>> >> You could also specify debugQuery=true on a slow search and check the
>> >> timings section for clues. What QTime are you seeing on the slow
>> queries in
>> >> solr.log?
>> >> If that does not reveal the reason, I'd connect to your solr instance
>> with
>> >> a tool like jVisualVM or similar, to inspect what takes time. Or
>> better,
>> >> hook up to DataDog, SPM or some other cloud tool to get a full view of
>> the
>> >> system.
>> >>
>> >> --
>> >> Jan Høydahl, search solution architect
>> >> Cominvent AS - www.cominvent.com
>> >>
>> >>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
>> [hidden email]
>> >>> :
>> >>>
>> >>> Hi Shawn,
>> >>>
>> >>> Unfortunately your reply of memory may not be valid. Please refer to
>> my
>> >>> explanation below of the strange behaviors (is it much more like a BUG
>> >> than
>> >>> anything else that is explainable):
>> >>>
>> >>> Note that we still have 18GB of free unused memory on the server.
>> >>>
>> >>> 1. We indexed the first collection called customers (3.7 millioin
>> records
>> >>> from CSV data), index size is 2.09GB. The search in customers for any
>> >>> keyword is returned within 50ms (QTime) for using highlight (unified
>> >>> highlighter, posting, light term vectors)
>> >>>
>> >>> 2. Then we indexed the second collection called policies (6 million
>> >> records
>> >>> from CSV data), index size is 2.55GB. The search in policies for any
>> >>> keyword is returned within 50ms (QTime) for using highlight (unified
>> >>> highlighter, posting, light term vectors)
>> >>>
>> >>> 3. But now any search in customers for any keywords (not from cache)
>> >> takes
>> >>> as high as 1200ms (QTime). But still policies search remains very fast
>> >>> (50ms).
>> >>>
>> >>> 4. So we decided to run the force optimize command on customers
>> >> collection (
>> >>>
>> >>
>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>> >> ),
>> >>> surprisingly after optimization the search on customers collection for
>> >> any
>> >>> keywords become very fast again (less than 50ms). BUT strangely, the
>> >> search
>> >>> in policies collection become very slow (around 1200ms) without any
>> >> changes
>> >>> to the policies collection.
>> >>>
>> >>> 5. Based on above result, we decided to run the force optimize
>> command on
>> >>> policies collection (
>> >>>
>> >>
>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>> >> ).
>> >>> More surprisingly, after optimization the search on policies
>> collection
>> >> for
>> >>> any keywords become very fast again (less than 50ms). BUT more
>> strangely,
>> >>> the search in customers collection again become very slow (around
>> 1200ms)
>> >>> without any changes to the customers collection.
>> >>>
>> >>> What a strange and unexpected behavior! If this is not a bug, how
>> could
>> >> you
>> >>> explain the above very strange behavior in Solr 7.5. Could it be a
>> bug?
>> >>>
>> >>> We would appreciate any support or help on our above situation.
>> >>>
>> >>> Thank you.
>> >>>
>> >>> Regards,
>> >>> Edwin
>> >>>
>> >>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
>> [hidden email]>
>> >>> wrote:
>> >>>
>> >>>> Hi Shawn,
>> >>>>
>> >>>>> If the two collections have data on the same server(s), I can see
>> this
>> >>>>> happening.  More memory is consumed when there is additional data,
>> and
>> >>>>> when Solr needs more memory, performance might be affected.  The
>> >>>>> solution is generally to install more memory in the server.
>> >>>>
>> >>>> I have found that even after we delete the index in collection2, the
>> >> query
>> >>>> QTime for collection1 still remains slow. It does not goes back to
>> its
>> >>>> previous fast speed before we index collection2.
>> >>>>
>> >>>> Regards,
>> >>>> Edwin
>> >>>>
>> >>>>
>> >>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
>> [hidden email]
>> >>>
>> >>>> wrote:
>> >>>>
>> >>>>> Hi Shawn,
>> >>>>>
>> >>>>> Thanks for your reply.
>> >>>>>
>> >>>>> The log only shows a list  the following and I don't see any other
>> logs
>> >>>>> besides these.
>> >>>>>
>> >>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> >> update-script#processAdd:
>> >>>>> id=13245417
>> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> >> update-script#processAdd:
>> >>>>> id=13245430
>> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>> >> update-script#processAdd:
>> >>>>> id=13245435
>> >>>>>
>> >>>>> There is no change to the segments info. but the slowdown in the
>> first
>> >>>>> collection is very drastic.
>> >>>>> Before the indexing of collection2, the collection1 query QTime are
>> in
>> >>>>> the range of 4 to 50 ms. However, after indexing collection2, the
>> >>>>> collection1 query QTime increases to more than 1000 ms. The index
>> are
>> >> done
>> >>>>> in CSV format, and the size of the index is 3GB.
>> >>>>>
>> >>>>> Regards,
>> >>>>> Edwin
>> >>>>>
>> >>>>>
>> >>>>>
>> >>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
>> >> wrote:
>> >>>>>
>> >>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>> >>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of when
>> I
>> >> am
>> >>>>>>> indexing in collection2, the indexing affects the records in
>> >>>>>> collection1.
>> >>>>>>> Although the records are still intact, it seems that the settings
>> of
>> >>>>>> the
>> >>>>>>> termVecotrs get wipe out, and the index size of collection1
>> reduced
>> >>>>>> from
>> >>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>> >>>>>>
>> >>>>>> This should not be possible.  Indexing in one collection should
>> have
>> >>>>>> absolutely no effect on another collection.
>> >>>>>>
>> >>>>>> If logging has been left at its default settings, the solr.log file
>> >>>>>> should have enough info to show what actually happened.
>> >>>>>>
>> >>>>>>> Also, the search in
>> >>>>>>> collection1, which was originall very fast, becomes very slow
>> after
>> >> the
>> >>>>>>> indexing is done is collection2.
>> >>>>>>
>> >>>>>> If the two collections have data on the same server(s), I can see
>> this
>> >>>>>> happening.  More memory is consumed when there is additional data,
>> and
>> >>>>>> when Solr needs more memory, performance might be affected.  The
>> >>>>>> solution is generally to install more memory in the server.  If the
>> >>>>>> system is working, there should be no need to increase the heap
>> size
>> >>>>>> when the memory size increases ... but there can be situations
>> where
>> >> the
>> >>>>>> heap is a little bit too small, where you WOULD want to increase
>> the
>> >>>>>> heap size.
>> >>>>>>
>> >>>>>> Thanks,
>> >>>>>> Shawn
>> >>>>>>
>> >>>>>>
>> >>
>> >>
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Jan and Shawn,

Please focus on the strange issue that I have described above in more
details, summary is as follows:

1. Index customers data, then queries from highlight, select, and all
handlers are very fast (less than 50ms)

2. Now index policies data, then queries on polices are very fast BUT
queries on customers becomes slow

3. If I reindex customers data, then again queries for customers are very
fast BUT queries on policies becomes slow.

How can you explain this behavior?

We have never experienced such a strange behavior before Solr 7.

Regards,
Edwin

On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <[hidden email]>
wrote:

> Hi Jan,
>
> Referring to what you have mentioned that the highlighting takes up most
> of the time in the first query from the policies collection, the
> highlighting was very fast (less than 50ms) from the time it was indexed,
> till the time after customers collection gets indexed, in which it slowed
> down tremendously.
>
> Also, the slow down does not just affect on the highlight requestHandler.
> It also affects other requestHandler like select and clustering as well.
> All of them gets the QTime of more than 500ms. This is also proven in the
> latest debug query that we have sent earlier, in which we have set hl=false
> and fl=null.
>
> Any idea or explanation on this strange behavior?
> Thank you for your support, as we look forward to shed some lights on this
> issue and to resolve it.
>
> Regards,
> Edwin
>
> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <[hidden email]>
> wrote:
>
>> Hi Jan,
>>
>> Thanks for your reply.
>>
>> However, we are still getting a slow QTime of 517ms even after we set
>> hl=false&fl=null.
>>
>> Below is the debug query:
>>
>>   "debug":{
>>     "rawquerystring":"cherry",
>>     "querystring":"cherry",
>>     "parsedquery":"searchFields_tcs:cherry",
>>     "parsedquery_toString":"searchFields_tcs:cherry",
>>     "explain":{
>>       "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      25.0 = fieldLength\n",
>>       "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      27.0 = fieldLength\n",
>>     "QParser":"LuceneQParser",
>>     "timing":{
>>       "time":517.0,
>>       "prepare":{
>>         "time":0.0,
>>         "query":{
>>           "time":0.0},
>>         "facet":{
>>           "time":0.0},
>>         "facet_module":{
>>           "time":0.0},
>>         "mlt":{
>>           "time":0.0},
>>         "highlight":{
>>           "time":0.0},
>>         "stats":{
>>           "time":0.0},
>>         "expand":{
>>           "time":0.0},
>>         "terms":{
>>           "time":0.0},
>>         "debug":{
>>           "time":0.0}},
>>       "process":{
>>         "time":516.0,
>>         "query":{
>>           "time":15.0},
>>         "facet":{
>>           "time":0.0},
>>         "facet_module":{
>>           "time":0.0},
>>         "mlt":{
>>           "time":0.0},
>>         "highlight":{
>>           "time":0.0},
>>         "stats":{
>>           "time":0.0},
>>         "expand":{
>>           "time":0.0},
>>         "terms":{
>>           "time":0.0},
>>         "debug":{
>>           "time":500.0}}}}}
>>
>> Regards,
>> Edwin
>>
>>
>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]> wrote:
>>
>>> Looks like highlighting takes most of the time on the first query
>>> (680ms). You config seems to ask for a lot of highlighting here, like 100
>>> snippets of max 100000 characters etc.
>>> Sounds to me that this might be a highlighting configuration problem.
>>> Try to disable highlighting (hl=false) and see if you get back your speed.
>>> Also, I see fl=* in your config, which is really asking for all fields.
>>> Are you sure you want that, that may also be slow. Try to ask for just the
>>> fields you will be using.
>>>
>>> --
>>> Jan Høydahl, search solution architect
>>> Cominvent AS - www.cominvent.com
>>>
>>> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
>>> [hidden email]>:
>>> >
>>> > Thanks for your reply.
>>> >
>>> > Below are what you have requested about our Solr setup, configurations
>>> > files ,schema and results of debug queries:
>>> >
>>> > Looking forward to your advice and support on our problem.
>>> >
>>> > 1. System configurations
>>> > OS: Windows 10 Pro 64 bit
>>> > System Memory: 32GB
>>> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>>> > Processor(s)
>>> > HDD: 3.0 TB (free 2.1 TB)  SATA
>>> >
>>> > 2. solrconfig.xml of customers and policies collection, and solr.in
>>> ,cmd
>>> > which can be download from the following link:
>>> >
>>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>>> >
>>> > 3. The debug queries from both collections
>>> >
>>> > *3.1. Debug Query From Policies ( which is Slow)*
>>> >
>>> >  "debug":{
>>> >
>>> >    "rawquerystring":"sherry",
>>> >
>>> >    "querystring":"sherry",
>>> >
>>> >    "parsedquery":"searchFields_tcs:sherry",
>>> >
>>> >    "parsedquery_toString":"searchFields_tcs:sherry",
>>> >
>>> >    "explain":{
>>> >
>>> >      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
>>> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
>>> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
>>> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
>>> > (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
>>> > docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
>>> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
>>> > 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
>>> > b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>>> >
>>> >    "QParser":"LuceneQParser",
>>> >
>>> >    "timing":{
>>> >
>>> >      "time":681.0,
>>> >
>>> >      "prepare":{
>>> >
>>> >        "time":0.0,
>>> >
>>> >        "query":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "facet":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "facet_module":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "mlt":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "highlight":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "stats":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "expand":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "terms":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "debug":{
>>> >
>>> >          "time":0.0}},
>>> >
>>> >      "process":{
>>> >
>>> >        "time":680.0,
>>> >
>>> >        "query":{
>>> >
>>> >          "time":19.0},
>>> >
>>> >        "facet":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "facet_module":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "mlt":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "highlight":{
>>> >
>>> >          "time":651.0},
>>> >
>>> >        "stats":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "expand":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "terms":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "debug":{
>>> >
>>> >          "time":8.0}},
>>> >
>>> >      "loadFieldValues":{
>>> >
>>> >        "time":12.0}}}}
>>> >
>>> >
>>> >
>>> > *3.2. Debug Query From Customers (which is fast because we index it
>>> after
>>> > indexing Policies):*
>>> >
>>> >
>>> >
>>> >  "debug":{
>>> >
>>> >    "rawquerystring":"sherry",
>>> >
>>> >    "querystring":"sherry",
>>> >
>>> >    "parsedquery":"searchFields_tcs:sherry",
>>> >
>>> >    "parsedquery_toString":"searchFields_tcs:sherry",
>>> >
>>> >    "explain":{
>>> >
>>> >      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
>>> > 2453665) [SchemaSimilarity], result of:\n  13.191501 =
>>> > score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
>>> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>> > 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
>>> > 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>>> > b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
>>> > termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>> > 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>>> >
>>> >    "QParser":"LuceneQParser",
>>> >
>>> >    "timing":{
>>> >
>>> >      "time":38.0,
>>> >
>>> >      "prepare":{
>>> >
>>> >        "time":1.0,
>>> >
>>> >        "query":{
>>> >
>>> >          "time":1.0},
>>> >
>>> >        "facet":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "facet_module":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "mlt":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "highlight":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "stats":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "expand":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "terms":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "debug":{
>>> >
>>> >          "time":0.0}},
>>> >
>>> >      "process":{
>>> >
>>> >        "time":36.0,
>>> >
>>> >        "query":{
>>> >
>>> >          "time":1.0},
>>> >
>>> >        "facet":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "facet_module":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "mlt":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "highlight":{
>>> >
>>> >          "time":31.0},
>>> >
>>> >        "stats":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "expand":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "terms":{
>>> >
>>> >          "time":0.0},
>>> >
>>> >        "debug":{
>>> >
>>> >          "time":3.0}},
>>> >
>>> >      "loadFieldValues":{
>>> >
>>> >        "time":13.0}}}}
>>> >
>>> >
>>> >
>>> > Best Regards,
>>> > Edwin
>>> >
>>> > On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
>>> wrote:
>>> >
>>> >> It would be useful if you can disclose the machine configuration, OS,
>>> >> memory, settings etc, as well as solr config including solr.in <
>>> >> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
>>> picture
>>> >> of memory, GC, etc.
>>> >> You could also specify debugQuery=true on a slow search and check the
>>> >> timings section for clues. What QTime are you seeing on the slow
>>> queries in
>>> >> solr.log?
>>> >> If that does not reveal the reason, I'd connect to your solr instance
>>> with
>>> >> a tool like jVisualVM or similar, to inspect what takes time. Or
>>> better,
>>> >> hook up to DataDog, SPM or some other cloud tool to get a full view
>>> of the
>>> >> system.
>>> >>
>>> >> --
>>> >> Jan Høydahl, search solution architect
>>> >> Cominvent AS - www.cominvent.com
>>> >>
>>> >>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
>>> [hidden email]
>>> >>> :
>>> >>>
>>> >>> Hi Shawn,
>>> >>>
>>> >>> Unfortunately your reply of memory may not be valid. Please refer to
>>> my
>>> >>> explanation below of the strange behaviors (is it much more like a
>>> BUG
>>> >> than
>>> >>> anything else that is explainable):
>>> >>>
>>> >>> Note that we still have 18GB of free unused memory on the server.
>>> >>>
>>> >>> 1. We indexed the first collection called customers (3.7 millioin
>>> records
>>> >>> from CSV data), index size is 2.09GB. The search in customers for any
>>> >>> keyword is returned within 50ms (QTime) for using highlight (unified
>>> >>> highlighter, posting, light term vectors)
>>> >>>
>>> >>> 2. Then we indexed the second collection called policies (6 million
>>> >> records
>>> >>> from CSV data), index size is 2.55GB. The search in policies for any
>>> >>> keyword is returned within 50ms (QTime) for using highlight (unified
>>> >>> highlighter, posting, light term vectors)
>>> >>>
>>> >>> 3. But now any search in customers for any keywords (not from cache)
>>> >> takes
>>> >>> as high as 1200ms (QTime). But still policies search remains very
>>> fast
>>> >>> (50ms).
>>> >>>
>>> >>> 4. So we decided to run the force optimize command on customers
>>> >> collection (
>>> >>>
>>> >>
>>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>>> >> ),
>>> >>> surprisingly after optimization the search on customers collection
>>> for
>>> >> any
>>> >>> keywords become very fast again (less than 50ms). BUT strangely, the
>>> >> search
>>> >>> in policies collection become very slow (around 1200ms) without any
>>> >> changes
>>> >>> to the policies collection.
>>> >>>
>>> >>> 5. Based on above result, we decided to run the force optimize
>>> command on
>>> >>> policies collection (
>>> >>>
>>> >>
>>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>>> >> ).
>>> >>> More surprisingly, after optimization the search on policies
>>> collection
>>> >> for
>>> >>> any keywords become very fast again (less than 50ms). BUT more
>>> strangely,
>>> >>> the search in customers collection again become very slow (around
>>> 1200ms)
>>> >>> without any changes to the customers collection.
>>> >>>
>>> >>> What a strange and unexpected behavior! If this is not a bug, how
>>> could
>>> >> you
>>> >>> explain the above very strange behavior in Solr 7.5. Could it be a
>>> bug?
>>> >>>
>>> >>> We would appreciate any support or help on our above situation.
>>> >>>
>>> >>> Thank you.
>>> >>>
>>> >>> Regards,
>>> >>> Edwin
>>> >>>
>>> >>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
>>> [hidden email]>
>>> >>> wrote:
>>> >>>
>>> >>>> Hi Shawn,
>>> >>>>
>>> >>>>> If the two collections have data on the same server(s), I can see
>>> this
>>> >>>>> happening.  More memory is consumed when there is additional data,
>>> and
>>> >>>>> when Solr needs more memory, performance might be affected.  The
>>> >>>>> solution is generally to install more memory in the server.
>>> >>>>
>>> >>>> I have found that even after we delete the index in collection2, the
>>> >> query
>>> >>>> QTime for collection1 still remains slow. It does not goes back to
>>> its
>>> >>>> previous fast speed before we index collection2.
>>> >>>>
>>> >>>> Regards,
>>> >>>> Edwin
>>> >>>>
>>> >>>>
>>> >>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
>>> [hidden email]
>>> >>>
>>> >>>> wrote:
>>> >>>>
>>> >>>>> Hi Shawn,
>>> >>>>>
>>> >>>>> Thanks for your reply.
>>> >>>>>
>>> >>>>> The log only shows a list  the following and I don't see any other
>>> logs
>>> >>>>> besides these.
>>> >>>>>
>>> >>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>> >> update-script#processAdd:
>>> >>>>> id=13245417
>>> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>> >> update-script#processAdd:
>>> >>>>> id=13245430
>>> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>> >> update-script#processAdd:
>>> >>>>> id=13245435
>>> >>>>>
>>> >>>>> There is no change to the segments info. but the slowdown in the
>>> first
>>> >>>>> collection is very drastic.
>>> >>>>> Before the indexing of collection2, the collection1 query QTime
>>> are in
>>> >>>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>> >>>>> collection1 query QTime increases to more than 1000 ms. The index
>>> are
>>> >> done
>>> >>>>> in CSV format, and the size of the index is 3GB.
>>> >>>>>
>>> >>>>> Regards,
>>> >>>>> Edwin
>>> >>>>>
>>> >>>>>
>>> >>>>>
>>> >>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
>>> >> wrote:
>>> >>>>>
>>> >>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>> >>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
>>> when I
>>> >> am
>>> >>>>>>> indexing in collection2, the indexing affects the records in
>>> >>>>>> collection1.
>>> >>>>>>> Although the records are still intact, it seems that the
>>> settings of
>>> >>>>>> the
>>> >>>>>>> termVecotrs get wipe out, and the index size of collection1
>>> reduced
>>> >>>>>> from
>>> >>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>> >>>>>>
>>> >>>>>> This should not be possible.  Indexing in one collection should
>>> have
>>> >>>>>> absolutely no effect on another collection.
>>> >>>>>>
>>> >>>>>> If logging has been left at its default settings, the solr.log
>>> file
>>> >>>>>> should have enough info to show what actually happened.
>>> >>>>>>
>>> >>>>>>> Also, the search in
>>> >>>>>>> collection1, which was originall very fast, becomes very slow
>>> after
>>> >> the
>>> >>>>>>> indexing is done is collection2.
>>> >>>>>>
>>> >>>>>> If the two collections have data on the same server(s), I can see
>>> this
>>> >>>>>> happening.  More memory is consumed when there is additional
>>> data, and
>>> >>>>>> when Solr needs more memory, performance might be affected.  The
>>> >>>>>> solution is generally to install more memory in the server.  If
>>> the
>>> >>>>>> system is working, there should be no need to increase the heap
>>> size
>>> >>>>>> when the memory size increases ... but there can be situations
>>> where
>>> >> the
>>> >>>>>> heap is a little bit too small, where you WOULD want to increase
>>> the
>>> >>>>>> heap size.
>>> >>>>>>
>>> >>>>>> Thanks,
>>> >>>>>> Shawn
>>> >>>>>>
>>> >>>>>>
>>> >>
>>> >>
>>>
>>>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Jan and Shawn,

For your info, this is another debug query.

  "debug":{

    "rawquerystring":"johnny",

    "querystring":"johnny",

    "parsedquery":"searchFields_tcs:johnny",

    "parsedquery_toString":"searchFields_tcs:johnny",

    "explain":{

      "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
75730) [SchemaSimilarity], result of:\n  12.8497505 =
score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
= idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,

    "QParser":"LuceneQParser",

    "timing":{

      "time":350.0,

      "prepare":{

        "time":0.0,

        "query":{

          "time":0.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":0.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":0.0}},

      "process":{

        "time":348.0,

        "query":{

          "time":287.0},

        "facet":{

          "time":0.0},

        "facet_module":{

          "time":0.0},

        "mlt":{

          "time":0.0},

        "highlight":{

          "time":54.0},

        "stats":{

          "time":0.0},

        "expand":{

          "time":0.0},

        "terms":{

          "time":0.0},

        "debug":{

          "time":6.0}},

      "loadFieldValues":{

        "time":0.0}}}}


Regards,
Edwin


On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <[hidden email]>
wrote:

> Hi Jan and Shawn,
>
> Please focus on the strange issue that I have described above in more
> details, summary is as follows:
>
> 1. Index customers data, then queries from highlight, select, and all
> handlers are very fast (less than 50ms)
>
> 2. Now index policies data, then queries on polices are very fast BUT
> queries on customers becomes slow
>
> 3. If I reindex customers data, then again queries for customers are very
> fast BUT queries on policies becomes slow.
>
> How can you explain this behavior?
>
> We have never experienced such a strange behavior before Solr 7.
>
> Regards,
> Edwin
>
> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <[hidden email]>
> wrote:
>
>> Hi Jan,
>>
>> Referring to what you have mentioned that the highlighting takes up most
>> of the time in the first query from the policies collection, the
>> highlighting was very fast (less than 50ms) from the time it was indexed,
>> till the time after customers collection gets indexed, in which it slowed
>> down tremendously.
>>
>> Also, the slow down does not just affect on the highlight requestHandler.
>> It also affects other requestHandler like select and clustering as well.
>> All of them gets the QTime of more than 500ms. This is also proven in the
>> latest debug query that we have sent earlier, in which we have set hl=false
>> and fl=null.
>>
>> Any idea or explanation on this strange behavior?
>> Thank you for your support, as we look forward to shed some lights on
>> this issue and to resolve it.
>>
>> Regards,
>> Edwin
>>
>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <[hidden email]>
>> wrote:
>>
>>> Hi Jan,
>>>
>>> Thanks for your reply.
>>>
>>> However, we are still getting a slow QTime of 517ms even after we set
>>> hl=false&fl=null.
>>>
>>> Below is the debug query:
>>>
>>>   "debug":{
>>>     "rawquerystring":"cherry",
>>>     "querystring":"cherry",
>>>     "parsedquery":"searchFields_tcs:cherry",
>>>     "parsedquery_toString":"searchFields_tcs:cherry",
>>>     "explain":{
>>>       "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      25.0 = fieldLength\n",
>>>       "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      27.0 = fieldLength\n",
>>>     "QParser":"LuceneQParser",
>>>     "timing":{
>>>       "time":517.0,
>>>       "prepare":{
>>>         "time":0.0,
>>>         "query":{
>>>           "time":0.0},
>>>         "facet":{
>>>           "time":0.0},
>>>         "facet_module":{
>>>           "time":0.0},
>>>         "mlt":{
>>>           "time":0.0},
>>>         "highlight":{
>>>           "time":0.0},
>>>         "stats":{
>>>           "time":0.0},
>>>         "expand":{
>>>           "time":0.0},
>>>         "terms":{
>>>           "time":0.0},
>>>         "debug":{
>>>           "time":0.0}},
>>>       "process":{
>>>         "time":516.0,
>>>         "query":{
>>>           "time":15.0},
>>>         "facet":{
>>>           "time":0.0},
>>>         "facet_module":{
>>>           "time":0.0},
>>>         "mlt":{
>>>           "time":0.0},
>>>         "highlight":{
>>>           "time":0.0},
>>>         "stats":{
>>>           "time":0.0},
>>>         "expand":{
>>>           "time":0.0},
>>>         "terms":{
>>>           "time":0.0},
>>>         "debug":{
>>>           "time":500.0}}}}}
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]> wrote:
>>>
>>>> Looks like highlighting takes most of the time on the first query
>>>> (680ms). You config seems to ask for a lot of highlighting here, like 100
>>>> snippets of max 100000 characters etc.
>>>> Sounds to me that this might be a highlighting configuration problem.
>>>> Try to disable highlighting (hl=false) and see if you get back your speed.
>>>> Also, I see fl=* in your config, which is really asking for all fields.
>>>> Are you sure you want that, that may also be slow. Try to ask for just the
>>>> fields you will be using.
>>>>
>>>> --
>>>> Jan Høydahl, search solution architect
>>>> Cominvent AS - www.cominvent.com
>>>>
>>>> > 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
>>>> [hidden email]>:
>>>> >
>>>> > Thanks for your reply.
>>>> >
>>>> > Below are what you have requested about our Solr setup, configurations
>>>> > files ,schema and results of debug queries:
>>>> >
>>>> > Looking forward to your advice and support on our problem.
>>>> >
>>>> > 1. System configurations
>>>> > OS: Windows 10 Pro 64 bit
>>>> > System Memory: 32GB
>>>> > CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>>>> > Processor(s)
>>>> > HDD: 3.0 TB (free 2.1 TB)  SATA
>>>> >
>>>> > 2. solrconfig.xml of customers and policies collection, and solr.in
>>>> ,cmd
>>>> > which can be download from the following link:
>>>> >
>>>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>>>> >
>>>> > 3. The debug queries from both collections
>>>> >
>>>> > *3.1. Debug Query From Policies ( which is Slow)*
>>>> >
>>>> >  "debug":{
>>>> >
>>>> >    "rawquerystring":"sherry",
>>>> >
>>>> >    "querystring":"sherry",
>>>> >
>>>> >    "parsedquery":"searchFields_tcs:sherry",
>>>> >
>>>> >    "parsedquery_toString":"searchFields_tcs:sherry",
>>>> >
>>>> >    "explain":{
>>>> >
>>>> >      "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
>>>> > 3097315) [SchemaSimilarity], result of:\n  14.540428 =
>>>> > score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
>>>> > 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
>>>> > (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
>>>> > docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
>>>> > (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
>>>> > 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
>>>> > b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>>>> >
>>>> >    "QParser":"LuceneQParser",
>>>> >
>>>> >    "timing":{
>>>> >
>>>> >      "time":681.0,
>>>> >
>>>> >      "prepare":{
>>>> >
>>>> >        "time":0.0,
>>>> >
>>>> >        "query":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "facet":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "facet_module":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "mlt":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "highlight":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "stats":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "expand":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "terms":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "debug":{
>>>> >
>>>> >          "time":0.0}},
>>>> >
>>>> >      "process":{
>>>> >
>>>> >        "time":680.0,
>>>> >
>>>> >        "query":{
>>>> >
>>>> >          "time":19.0},
>>>> >
>>>> >        "facet":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "facet_module":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "mlt":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "highlight":{
>>>> >
>>>> >          "time":651.0},
>>>> >
>>>> >        "stats":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "expand":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "terms":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "debug":{
>>>> >
>>>> >          "time":8.0}},
>>>> >
>>>> >      "loadFieldValues":{
>>>> >
>>>> >        "time":12.0}}}}
>>>> >
>>>> >
>>>> >
>>>> > *3.2. Debug Query From Customers (which is fast because we index it
>>>> after
>>>> > indexing Policies):*
>>>> >
>>>> >
>>>> >
>>>> >  "debug":{
>>>> >
>>>> >    "rawquerystring":"sherry",
>>>> >
>>>> >    "querystring":"sherry",
>>>> >
>>>> >    "parsedquery":"searchFields_tcs:sherry",
>>>> >
>>>> >    "parsedquery_toString":"searchFields_tcs:sherry",
>>>> >
>>>> >    "explain":{
>>>> >
>>>> >      "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
>>>> > 2453665) [SchemaSimilarity], result of:\n  13.191501 =
>>>> > score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
>>>> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>>> > 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
>>>> > 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>>>> > b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
>>>> > termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>>> > 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>>>> >
>>>> >    "QParser":"LuceneQParser",
>>>> >
>>>> >    "timing":{
>>>> >
>>>> >      "time":38.0,
>>>> >
>>>> >      "prepare":{
>>>> >
>>>> >        "time":1.0,
>>>> >
>>>> >        "query":{
>>>> >
>>>> >          "time":1.0},
>>>> >
>>>> >        "facet":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "facet_module":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "mlt":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "highlight":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "stats":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "expand":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "terms":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "debug":{
>>>> >
>>>> >          "time":0.0}},
>>>> >
>>>> >      "process":{
>>>> >
>>>> >        "time":36.0,
>>>> >
>>>> >        "query":{
>>>> >
>>>> >          "time":1.0},
>>>> >
>>>> >        "facet":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "facet_module":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "mlt":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "highlight":{
>>>> >
>>>> >          "time":31.0},
>>>> >
>>>> >        "stats":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "expand":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "terms":{
>>>> >
>>>> >          "time":0.0},
>>>> >
>>>> >        "debug":{
>>>> >
>>>> >          "time":3.0}},
>>>> >
>>>> >      "loadFieldValues":{
>>>> >
>>>> >        "time":13.0}}}}
>>>> >
>>>> >
>>>> >
>>>> > Best Regards,
>>>> > Edwin
>>>> >
>>>> > On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
>>>> wrote:
>>>> >
>>>> >> It would be useful if you can disclose the machine configuration, OS,
>>>> >> memory, settings etc, as well as solr config including solr.in <
>>>> >> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
>>>> picture
>>>> >> of memory, GC, etc.
>>>> >> You could also specify debugQuery=true on a slow search and check the
>>>> >> timings section for clues. What QTime are you seeing on the slow
>>>> queries in
>>>> >> solr.log?
>>>> >> If that does not reveal the reason, I'd connect to your solr
>>>> instance with
>>>> >> a tool like jVisualVM or similar, to inspect what takes time. Or
>>>> better,
>>>> >> hook up to DataDog, SPM or some other cloud tool to get a full view
>>>> of the
>>>> >> system.
>>>> >>
>>>> >> --
>>>> >> Jan Høydahl, search solution architect
>>>> >> Cominvent AS - www.cominvent.com
>>>> >>
>>>> >>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
>>>> [hidden email]
>>>> >>> :
>>>> >>>
>>>> >>> Hi Shawn,
>>>> >>>
>>>> >>> Unfortunately your reply of memory may not be valid. Please refer
>>>> to my
>>>> >>> explanation below of the strange behaviors (is it much more like a
>>>> BUG
>>>> >> than
>>>> >>> anything else that is explainable):
>>>> >>>
>>>> >>> Note that we still have 18GB of free unused memory on the server.
>>>> >>>
>>>> >>> 1. We indexed the first collection called customers (3.7 millioin
>>>> records
>>>> >>> from CSV data), index size is 2.09GB. The search in customers for
>>>> any
>>>> >>> keyword is returned within 50ms (QTime) for using highlight (unified
>>>> >>> highlighter, posting, light term vectors)
>>>> >>>
>>>> >>> 2. Then we indexed the second collection called policies (6 million
>>>> >> records
>>>> >>> from CSV data), index size is 2.55GB. The search in policies for any
>>>> >>> keyword is returned within 50ms (QTime) for using highlight (unified
>>>> >>> highlighter, posting, light term vectors)
>>>> >>>
>>>> >>> 3. But now any search in customers for any keywords (not from cache)
>>>> >> takes
>>>> >>> as high as 1200ms (QTime). But still policies search remains very
>>>> fast
>>>> >>> (50ms).
>>>> >>>
>>>> >>> 4. So we decided to run the force optimize command on customers
>>>> >> collection (
>>>> >>>
>>>> >>
>>>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>>>> >> ),
>>>> >>> surprisingly after optimization the search on customers collection
>>>> for
>>>> >> any
>>>> >>> keywords become very fast again (less than 50ms). BUT strangely, the
>>>> >> search
>>>> >>> in policies collection become very slow (around 1200ms) without any
>>>> >> changes
>>>> >>> to the policies collection.
>>>> >>>
>>>> >>> 5. Based on above result, we decided to run the force optimize
>>>> command on
>>>> >>> policies collection (
>>>> >>>
>>>> >>
>>>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>>>> >> ).
>>>> >>> More surprisingly, after optimization the search on policies
>>>> collection
>>>> >> for
>>>> >>> any keywords become very fast again (less than 50ms). BUT more
>>>> strangely,
>>>> >>> the search in customers collection again become very slow (around
>>>> 1200ms)
>>>> >>> without any changes to the customers collection.
>>>> >>>
>>>> >>> What a strange and unexpected behavior! If this is not a bug, how
>>>> could
>>>> >> you
>>>> >>> explain the above very strange behavior in Solr 7.5. Could it be a
>>>> bug?
>>>> >>>
>>>> >>> We would appreciate any support or help on our above situation.
>>>> >>>
>>>> >>> Thank you.
>>>> >>>
>>>> >>> Regards,
>>>> >>> Edwin
>>>> >>>
>>>> >>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
>>>> [hidden email]>
>>>> >>> wrote:
>>>> >>>
>>>> >>>> Hi Shawn,
>>>> >>>>
>>>> >>>>> If the two collections have data on the same server(s), I can see
>>>> this
>>>> >>>>> happening.  More memory is consumed when there is additional
>>>> data, and
>>>> >>>>> when Solr needs more memory, performance might be affected.  The
>>>> >>>>> solution is generally to install more memory in the server.
>>>> >>>>
>>>> >>>> I have found that even after we delete the index in collection2,
>>>> the
>>>> >> query
>>>> >>>> QTime for collection1 still remains slow. It does not goes back to
>>>> its
>>>> >>>> previous fast speed before we index collection2.
>>>> >>>>
>>>> >>>> Regards,
>>>> >>>> Edwin
>>>> >>>>
>>>> >>>>
>>>> >>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
>>>> [hidden email]
>>>> >>>
>>>> >>>> wrote:
>>>> >>>>
>>>> >>>>> Hi Shawn,
>>>> >>>>>
>>>> >>>>> Thanks for your reply.
>>>> >>>>>
>>>> >>>>> The log only shows a list  the following and I don't see any
>>>> other logs
>>>> >>>>> besides these.
>>>> >>>>>
>>>> >>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>> >> update-script#processAdd:
>>>> >>>>> id=13245417
>>>> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>> >> update-script#processAdd:
>>>> >>>>> id=13245430
>>>> >>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>> >>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>> >>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>> >> update-script#processAdd:
>>>> >>>>> id=13245435
>>>> >>>>>
>>>> >>>>> There is no change to the segments info. but the slowdown in the
>>>> first
>>>> >>>>> collection is very drastic.
>>>> >>>>> Before the indexing of collection2, the collection1 query QTime
>>>> are in
>>>> >>>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>>> >>>>> collection1 query QTime increases to more than 1000 ms. The index
>>>> are
>>>> >> done
>>>> >>>>> in CSV format, and the size of the index is 3GB.
>>>> >>>>>
>>>> >>>>> Regards,
>>>> >>>>> Edwin
>>>> >>>>>
>>>> >>>>>
>>>> >>>>>
>>>> >>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
>>>> >> wrote:
>>>> >>>>>
>>>> >>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>> >>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
>>>> when I
>>>> >> am
>>>> >>>>>>> indexing in collection2, the indexing affects the records in
>>>> >>>>>> collection1.
>>>> >>>>>>> Although the records are still intact, it seems that the
>>>> settings of
>>>> >>>>>> the
>>>> >>>>>>> termVecotrs get wipe out, and the index size of collection1
>>>> reduced
>>>> >>>>>> from
>>>> >>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>> >>>>>>
>>>> >>>>>> This should not be possible.  Indexing in one collection should
>>>> have
>>>> >>>>>> absolutely no effect on another collection.
>>>> >>>>>>
>>>> >>>>>> If logging has been left at its default settings, the solr.log
>>>> file
>>>> >>>>>> should have enough info to show what actually happened.
>>>> >>>>>>
>>>> >>>>>>> Also, the search in
>>>> >>>>>>> collection1, which was originall very fast, becomes very slow
>>>> after
>>>> >> the
>>>> >>>>>>> indexing is done is collection2.
>>>> >>>>>>
>>>> >>>>>> If the two collections have data on the same server(s), I can
>>>> see this
>>>> >>>>>> happening.  More memory is consumed when there is additional
>>>> data, and
>>>> >>>>>> when Solr needs more memory, performance might be affected.  The
>>>> >>>>>> solution is generally to install more memory in the server.  If
>>>> the
>>>> >>>>>> system is working, there should be no need to increase the heap
>>>> size
>>>> >>>>>> when the memory size increases ... but there can be situations
>>>> where
>>>> >> the
>>>> >>>>>> heap is a little bit too small, where you WOULD want to increase
>>>> the
>>>> >>>>>> heap size.
>>>> >>>>>>
>>>> >>>>>> Thanks,
>>>> >>>>>> Shawn
>>>> >>>>>>
>>>> >>>>>>
>>>> >>
>>>> >>
>>>>
>>>>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Jörn Franke
In reply to this post by Zheng Lin Edwin Yeo
Have you done a correct sizing wrt to memory / CPU?

Check also the data model if you have a lot of queried stored fields that may contain a lot of data.

You may also split those two collections on different nodes.

> Am 23.01.2019 um 18:01 schrieb Zheng Lin Edwin Yeo <[hidden email]>:
>
> Hi,
>
> I am using Solr 7.5.0, and currently I am facing an issue of when I am
> indexing in collection2, the indexing affects the records in collection1.
> Although the records are still intact, it seems that the settings of the
> termVecotrs get wipe out, and the index size of collection1 reduced from
> 3.3GB to 2.1GB after I do the indexing in collection2. Also, the search in
> collection1, which was originall very fast, becomes very slow after the
> indexing is done is collection2.
>
> Anyone has faced such issues before or have any idea on what may have gone
> wrong?
>
> Regards,
> Edwin
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Jan Høydahl / Cominvent
In reply to this post by Zheng Lin Edwin Yeo
Which java version? 32 or 64 bit? You start Solr with default 512Mb heap size?
Other software running on the machine?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <[hidden email]>:
>
> Hi Jan and Shawn,
>
> For your info, this is another debug query.
>
>  "debug":{
>
>    "rawquerystring":"johnny",
>
>    "querystring":"johnny",
>
>    "parsedquery":"searchFields_tcs:johnny",
>
>    "parsedquery_toString":"searchFields_tcs:johnny",
>
>    "explain":{
>
>      "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> 0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
> termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,
>
>    "QParser":"LuceneQParser",
>
>    "timing":{
>
>      "time":350.0,
>
>      "prepare":{
>
>        "time":0.0,
>
>        "query":{
>
>          "time":0.0},
>
>        "facet":{
>
>          "time":0.0},
>
>        "facet_module":{
>
>          "time":0.0},
>
>        "mlt":{
>
>          "time":0.0},
>
>        "highlight":{
>
>          "time":0.0},
>
>        "stats":{
>
>          "time":0.0},
>
>        "expand":{
>
>          "time":0.0},
>
>        "terms":{
>
>          "time":0.0},
>
>        "debug":{
>
>          "time":0.0}},
>
>      "process":{
>
>        "time":348.0,
>
>        "query":{
>
>          "time":287.0},
>
>        "facet":{
>
>          "time":0.0},
>
>        "facet_module":{
>
>          "time":0.0},
>
>        "mlt":{
>
>          "time":0.0},
>
>        "highlight":{
>
>          "time":54.0},
>
>        "stats":{
>
>          "time":0.0},
>
>        "expand":{
>
>          "time":0.0},
>
>        "terms":{
>
>          "time":0.0},
>
>        "debug":{
>
>          "time":6.0}},
>
>      "loadFieldValues":{
>
>        "time":0.0}}}}
>
>
> Regards,
> Edwin
>
>
> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <[hidden email]>
> wrote:
>
>> Hi Jan and Shawn,
>>
>> Please focus on the strange issue that I have described above in more
>> details, summary is as follows:
>>
>> 1. Index customers data, then queries from highlight, select, and all
>> handlers are very fast (less than 50ms)
>>
>> 2. Now index policies data, then queries on polices are very fast BUT
>> queries on customers becomes slow
>>
>> 3. If I reindex customers data, then again queries for customers are very
>> fast BUT queries on policies becomes slow.
>>
>> How can you explain this behavior?
>>
>> We have never experienced such a strange behavior before Solr 7.
>>
>> Regards,
>> Edwin
>>
>> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <[hidden email]>
>> wrote:
>>
>>> Hi Jan,
>>>
>>> Referring to what you have mentioned that the highlighting takes up most
>>> of the time in the first query from the policies collection, the
>>> highlighting was very fast (less than 50ms) from the time it was indexed,
>>> till the time after customers collection gets indexed, in which it slowed
>>> down tremendously.
>>>
>>> Also, the slow down does not just affect on the highlight requestHandler.
>>> It also affects other requestHandler like select and clustering as well.
>>> All of them gets the QTime of more than 500ms. This is also proven in the
>>> latest debug query that we have sent earlier, in which we have set hl=false
>>> and fl=null.
>>>
>>> Any idea or explanation on this strange behavior?
>>> Thank you for your support, as we look forward to shed some lights on
>>> this issue and to resolve it.
>>>
>>> Regards,
>>> Edwin
>>>
>>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <[hidden email]>
>>> wrote:
>>>
>>>> Hi Jan,
>>>>
>>>> Thanks for your reply.
>>>>
>>>> However, we are still getting a slow QTime of 517ms even after we set
>>>> hl=false&fl=null.
>>>>
>>>> Below is the debug query:
>>>>
>>>>  "debug":{
>>>>    "rawquerystring":"cherry",
>>>>    "querystring":"cherry",
>>>>    "parsedquery":"searchFields_tcs:cherry",
>>>>    "parsedquery_toString":"searchFields_tcs:cherry",
>>>>    "explain":{
>>>>      "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in 5747763) [SchemaSimilarity], result of:\n  14.227914 = score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      25.0 = fieldLength\n",
>>>>      "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in 4840794) [SchemaSimilarity], result of:\n  13.937909 = score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5)) from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n      27.0 = fieldLength\n",
>>>>    "QParser":"LuceneQParser",
>>>>    "timing":{
>>>>      "time":517.0,
>>>>      "prepare":{
>>>>        "time":0.0,
>>>>        "query":{
>>>>          "time":0.0},
>>>>        "facet":{
>>>>          "time":0.0},
>>>>        "facet_module":{
>>>>          "time":0.0},
>>>>        "mlt":{
>>>>          "time":0.0},
>>>>        "highlight":{
>>>>          "time":0.0},
>>>>        "stats":{
>>>>          "time":0.0},
>>>>        "expand":{
>>>>          "time":0.0},
>>>>        "terms":{
>>>>          "time":0.0},
>>>>        "debug":{
>>>>          "time":0.0}},
>>>>      "process":{
>>>>        "time":516.0,
>>>>        "query":{
>>>>          "time":15.0},
>>>>        "facet":{
>>>>          "time":0.0},
>>>>        "facet_module":{
>>>>          "time":0.0},
>>>>        "mlt":{
>>>>          "time":0.0},
>>>>        "highlight":{
>>>>          "time":0.0},
>>>>        "stats":{
>>>>          "time":0.0},
>>>>        "expand":{
>>>>          "time":0.0},
>>>>        "terms":{
>>>>          "time":0.0},
>>>>        "debug":{
>>>>          "time":500.0}}}}}
>>>>
>>>> Regards,
>>>> Edwin
>>>>
>>>>
>>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]> wrote:
>>>>
>>>>> Looks like highlighting takes most of the time on the first query
>>>>> (680ms). You config seems to ask for a lot of highlighting here, like 100
>>>>> snippets of max 100000 characters etc.
>>>>> Sounds to me that this might be a highlighting configuration problem.
>>>>> Try to disable highlighting (hl=false) and see if you get back your speed.
>>>>> Also, I see fl=* in your config, which is really asking for all fields.
>>>>> Are you sure you want that, that may also be slow. Try to ask for just the
>>>>> fields you will be using.
>>>>>
>>>>> --
>>>>> Jan Høydahl, search solution architect
>>>>> Cominvent AS - www.cominvent.com
>>>>>
>>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
>>>>> [hidden email]>:
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> Below are what you have requested about our Solr setup, configurations
>>>>>> files ,schema and results of debug queries:
>>>>>>
>>>>>> Looking forward to your advice and support on our problem.
>>>>>>
>>>>>> 1. System configurations
>>>>>> OS: Windows 10 Pro 64 bit
>>>>>> System Memory: 32GB
>>>>>> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>>>>>> Processor(s)
>>>>>> HDD: 3.0 TB (free 2.1 TB)  SATA
>>>>>>
>>>>>> 2. solrconfig.xml of customers and policies collection, and solr.in
>>>>> ,cmd
>>>>>> which can be download from the following link:
>>>>>>
>>>>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>>>>>>
>>>>>> 3. The debug queries from both collections
>>>>>>
>>>>>> *3.1. Debug Query From Policies ( which is Slow)*
>>>>>>
>>>>>> "debug":{
>>>>>>
>>>>>>   "rawquerystring":"sherry",
>>>>>>
>>>>>>   "querystring":"sherry",
>>>>>>
>>>>>>   "parsedquery":"searchFields_tcs:sherry",
>>>>>>
>>>>>>   "parsedquery_toString":"searchFields_tcs:sherry",
>>>>>>
>>>>>>   "explain":{
>>>>>>
>>>>>>     "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
>>>>>> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
>>>>>> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
>>>>>> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
>>>>>> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
>>>>>> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
>>>>>> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
>>>>>> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
>>>>>> b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>>>>>>
>>>>>>   "QParser":"LuceneQParser",
>>>>>>
>>>>>>   "timing":{
>>>>>>
>>>>>>     "time":681.0,
>>>>>>
>>>>>>     "prepare":{
>>>>>>
>>>>>>       "time":0.0,
>>>>>>
>>>>>>       "query":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "facet":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "facet_module":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "mlt":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "highlight":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "stats":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "expand":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "terms":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "debug":{
>>>>>>
>>>>>>         "time":0.0}},
>>>>>>
>>>>>>     "process":{
>>>>>>
>>>>>>       "time":680.0,
>>>>>>
>>>>>>       "query":{
>>>>>>
>>>>>>         "time":19.0},
>>>>>>
>>>>>>       "facet":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "facet_module":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "mlt":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "highlight":{
>>>>>>
>>>>>>         "time":651.0},
>>>>>>
>>>>>>       "stats":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "expand":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "terms":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "debug":{
>>>>>>
>>>>>>         "time":8.0}},
>>>>>>
>>>>>>     "loadFieldValues":{
>>>>>>
>>>>>>       "time":12.0}}}}
>>>>>>
>>>>>>
>>>>>>
>>>>>> *3.2. Debug Query From Customers (which is fast because we index it
>>>>> after
>>>>>> indexing Policies):*
>>>>>>
>>>>>>
>>>>>>
>>>>>> "debug":{
>>>>>>
>>>>>>   "rawquerystring":"sherry",
>>>>>>
>>>>>>   "querystring":"sherry",
>>>>>>
>>>>>>   "parsedquery":"searchFields_tcs:sherry",
>>>>>>
>>>>>>   "parsedquery_toString":"searchFields_tcs:sherry",
>>>>>>
>>>>>>   "explain":{
>>>>>>
>>>>>>     "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
>>>>>> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
>>>>>> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n    9.08604
>>>>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>>>>> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
>>>>>> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>>>>>> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
>>>>>> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>>>>> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>>>>>>
>>>>>>   "QParser":"LuceneQParser",
>>>>>>
>>>>>>   "timing":{
>>>>>>
>>>>>>     "time":38.0,
>>>>>>
>>>>>>     "prepare":{
>>>>>>
>>>>>>       "time":1.0,
>>>>>>
>>>>>>       "query":{
>>>>>>
>>>>>>         "time":1.0},
>>>>>>
>>>>>>       "facet":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "facet_module":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "mlt":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "highlight":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "stats":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "expand":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "terms":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "debug":{
>>>>>>
>>>>>>         "time":0.0}},
>>>>>>
>>>>>>     "process":{
>>>>>>
>>>>>>       "time":36.0,
>>>>>>
>>>>>>       "query":{
>>>>>>
>>>>>>         "time":1.0},
>>>>>>
>>>>>>       "facet":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "facet_module":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "mlt":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "highlight":{
>>>>>>
>>>>>>         "time":31.0},
>>>>>>
>>>>>>       "stats":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "expand":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "terms":{
>>>>>>
>>>>>>         "time":0.0},
>>>>>>
>>>>>>       "debug":{
>>>>>>
>>>>>>         "time":3.0}},
>>>>>>
>>>>>>     "loadFieldValues":{
>>>>>>
>>>>>>       "time":13.0}}}}
>>>>>>
>>>>>>
>>>>>>
>>>>>> Best Regards,
>>>>>> Edwin
>>>>>>
>>>>>> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
>>>>> wrote:
>>>>>>
>>>>>>> It would be useful if you can disclose the machine configuration, OS,
>>>>>>> memory, settings etc, as well as solr config including solr.in <
>>>>>>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
>>>>> picture
>>>>>>> of memory, GC, etc.
>>>>>>> You could also specify debugQuery=true on a slow search and check the
>>>>>>> timings section for clues. What QTime are you seeing on the slow
>>>>> queries in
>>>>>>> solr.log?
>>>>>>> If that does not reveal the reason, I'd connect to your solr
>>>>> instance with
>>>>>>> a tool like jVisualVM or similar, to inspect what takes time. Or
>>>>> better,
>>>>>>> hook up to DataDog, SPM or some other cloud tool to get a full view
>>>>> of the
>>>>>>> system.
>>>>>>>
>>>>>>> --
>>>>>>> Jan Høydahl, search solution architect
>>>>>>> Cominvent AS - www.cominvent.com
>>>>>>>
>>>>>>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
>>>>> [hidden email]
>>>>>>>> :
>>>>>>>>
>>>>>>>> Hi Shawn,
>>>>>>>>
>>>>>>>> Unfortunately your reply of memory may not be valid. Please refer
>>>>> to my
>>>>>>>> explanation below of the strange behaviors (is it much more like a
>>>>> BUG
>>>>>>> than
>>>>>>>> anything else that is explainable):
>>>>>>>>
>>>>>>>> Note that we still have 18GB of free unused memory on the server.
>>>>>>>>
>>>>>>>> 1. We indexed the first collection called customers (3.7 millioin
>>>>> records
>>>>>>>> from CSV data), index size is 2.09GB. The search in customers for
>>>>> any
>>>>>>>> keyword is returned within 50ms (QTime) for using highlight (unified
>>>>>>>> highlighter, posting, light term vectors)
>>>>>>>>
>>>>>>>> 2. Then we indexed the second collection called policies (6 million
>>>>>>> records
>>>>>>>> from CSV data), index size is 2.55GB. The search in policies for any
>>>>>>>> keyword is returned within 50ms (QTime) for using highlight (unified
>>>>>>>> highlighter, posting, light term vectors)
>>>>>>>>
>>>>>>>> 3. But now any search in customers for any keywords (not from cache)
>>>>>>> takes
>>>>>>>> as high as 1200ms (QTime). But still policies search remains very
>>>>> fast
>>>>>>>> (50ms).
>>>>>>>>
>>>>>>>> 4. So we decided to run the force optimize command on customers
>>>>>>> collection (
>>>>>>>>
>>>>>>>
>>>>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>>>>>>> ),
>>>>>>>> surprisingly after optimization the search on customers collection
>>>>> for
>>>>>>> any
>>>>>>>> keywords become very fast again (less than 50ms). BUT strangely, the
>>>>>>> search
>>>>>>>> in policies collection become very slow (around 1200ms) without any
>>>>>>> changes
>>>>>>>> to the policies collection.
>>>>>>>>
>>>>>>>> 5. Based on above result, we decided to run the force optimize
>>>>> command on
>>>>>>>> policies collection (
>>>>>>>>
>>>>>>>
>>>>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>>>>>>> ).
>>>>>>>> More surprisingly, after optimization the search on policies
>>>>> collection
>>>>>>> for
>>>>>>>> any keywords become very fast again (less than 50ms). BUT more
>>>>> strangely,
>>>>>>>> the search in customers collection again become very slow (around
>>>>> 1200ms)
>>>>>>>> without any changes to the customers collection.
>>>>>>>>
>>>>>>>> What a strange and unexpected behavior! If this is not a bug, how
>>>>> could
>>>>>>> you
>>>>>>>> explain the above very strange behavior in Solr 7.5. Could it be a
>>>>> bug?
>>>>>>>>
>>>>>>>> We would appreciate any support or help on our above situation.
>>>>>>>>
>>>>>>>> Thank you.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Edwin
>>>>>>>>
>>>>>>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
>>>>> [hidden email]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Shawn,
>>>>>>>>>
>>>>>>>>>> If the two collections have data on the same server(s), I can see
>>>>> this
>>>>>>>>>> happening.  More memory is consumed when there is additional
>>>>> data, and
>>>>>>>>>> when Solr needs more memory, performance might be affected.  The
>>>>>>>>>> solution is generally to install more memory in the server.
>>>>>>>>>
>>>>>>>>> I have found that even after we delete the index in collection2,
>>>>> the
>>>>>>> query
>>>>>>>>> QTime for collection1 still remains slow. It does not goes back to
>>>>> its
>>>>>>>>> previous fast speed before we index collection2.
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Edwin
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
>>>>> [hidden email]
>>>>>>>>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Shawn,
>>>>>>>>>>
>>>>>>>>>> Thanks for your reply.
>>>>>>>>>>
>>>>>>>>>> The log only shows a list  the following and I don't see any
>>>>> other logs
>>>>>>>>>> besides these.
>>>>>>>>>>
>>>>>>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>> update-script#processAdd:
>>>>>>>>>> id=13245417
>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>> update-script#processAdd:
>>>>>>>>>> id=13245430
>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330) [c:collectioin1
>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>> update-script#processAdd:
>>>>>>>>>> id=13245435
>>>>>>>>>>
>>>>>>>>>> There is no change to the segments info. but the slowdown in the
>>>>> first
>>>>>>>>>> collection is very drastic.
>>>>>>>>>> Before the indexing of collection2, the collection1 query QTime
>>>>> are in
>>>>>>>>>> the range of 4 to 50 ms. However, after indexing collection2, the
>>>>>>>>>> collection1 query QTime increases to more than 1000 ms. The index
>>>>> are
>>>>>>> done
>>>>>>>>>> in CSV format, and the size of the index is 3GB.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Edwin
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]>
>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>>>>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
>>>>> when I
>>>>>>> am
>>>>>>>>>>>> indexing in collection2, the indexing affects the records in
>>>>>>>>>>> collection1.
>>>>>>>>>>>> Although the records are still intact, it seems that the
>>>>> settings of
>>>>>>>>>>> the
>>>>>>>>>>>> termVecotrs get wipe out, and the index size of collection1
>>>>> reduced
>>>>>>>>>>> from
>>>>>>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>>>>>>>>>
>>>>>>>>>>> This should not be possible.  Indexing in one collection should
>>>>> have
>>>>>>>>>>> absolutely no effect on another collection.
>>>>>>>>>>>
>>>>>>>>>>> If logging has been left at its default settings, the solr.log
>>>>> file
>>>>>>>>>>> should have enough info to show what actually happened.
>>>>>>>>>>>
>>>>>>>>>>>> Also, the search in
>>>>>>>>>>>> collection1, which was originall very fast, becomes very slow
>>>>> after
>>>>>>> the
>>>>>>>>>>>> indexing is done is collection2.
>>>>>>>>>>>
>>>>>>>>>>> If the two collections have data on the same server(s), I can
>>>>> see this
>>>>>>>>>>> happening.  More memory is consumed when there is additional
>>>>> data, and
>>>>>>>>>>> when Solr needs more memory, performance might be affected.  The
>>>>>>>>>>> solution is generally to install more memory in the server.  If
>>>>> the
>>>>>>>>>>> system is working, there should be no need to increase the heap
>>>>> size
>>>>>>>>>>> when the memory size increases ... but there can be situations
>>>>> where
>>>>>>> the
>>>>>>>>>>> heap is a little bit too small, where you WOULD want to increase
>>>>> the
>>>>>>>>>>> heap size.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Shawn
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>
>>>>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Jan,

We are using 64 bit Java, version 1.8.0_191.
We started Solr with 6 GB heap size.

Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++  running
on the machine. There is still 22 GB of memory left on the server, out of
the 32 GB available on the machine.

Regards,
Edwin

On Fri, 25 Jan 2019 at 21:04, Jan Høydahl <[hidden email]> wrote:

> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
> size?
> Other software running on the machine?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <[hidden email]
> >:
> >
> > Hi Jan and Shawn,
> >
> > For your info, this is another debug query.
> >
> >  "debug":{
> >
> >    "rawquerystring":"johnny",
> >
> >    "querystring":"johnny",
> >
> >    "parsedquery":"searchFields_tcs:johnny",
> >
> >    "parsedquery_toString":"searchFields_tcs:johnny",
> >
> >    "explain":{
> >
> >      "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> > 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> > score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
> > = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> > 0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
> > 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> > b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
> > termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> > 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,
> >
> >    "QParser":"LuceneQParser",
> >
> >    "timing":{
> >
> >      "time":350.0,
> >
> >      "prepare":{
> >
> >        "time":0.0,
> >
> >        "query":{
> >
> >          "time":0.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":0.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":0.0}},
> >
> >      "process":{
> >
> >        "time":348.0,
> >
> >        "query":{
> >
> >          "time":287.0},
> >
> >        "facet":{
> >
> >          "time":0.0},
> >
> >        "facet_module":{
> >
> >          "time":0.0},
> >
> >        "mlt":{
> >
> >          "time":0.0},
> >
> >        "highlight":{
> >
> >          "time":54.0},
> >
> >        "stats":{
> >
> >          "time":0.0},
> >
> >        "expand":{
> >
> >          "time":0.0},
> >
> >        "terms":{
> >
> >          "time":0.0},
> >
> >        "debug":{
> >
> >          "time":6.0}},
> >
> >      "loadFieldValues":{
> >
> >        "time":0.0}}}}
> >
> >
> > Regards,
> > Edwin
> >
> >
> > On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <[hidden email]>
> > wrote:
> >
> >> Hi Jan and Shawn,
> >>
> >> Please focus on the strange issue that I have described above in more
> >> details, summary is as follows:
> >>
> >> 1. Index customers data, then queries from highlight, select, and all
> >> handlers are very fast (less than 50ms)
> >>
> >> 2. Now index policies data, then queries on polices are very fast BUT
> >> queries on customers becomes slow
> >>
> >> 3. If I reindex customers data, then again queries for customers are
> very
> >> fast BUT queries on policies becomes slow.
> >>
> >> How can you explain this behavior?
> >>
> >> We have never experienced such a strange behavior before Solr 7.
> >>
> >> Regards,
> >> Edwin
> >>
> >> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <[hidden email]
> >
> >> wrote:
> >>
> >>> Hi Jan,
> >>>
> >>> Referring to what you have mentioned that the highlighting takes up
> most
> >>> of the time in the first query from the policies collection, the
> >>> highlighting was very fast (less than 50ms) from the time it was
> indexed,
> >>> till the time after customers collection gets indexed, in which it
> slowed
> >>> down tremendously.
> >>>
> >>> Also, the slow down does not just affect on the highlight
> requestHandler.
> >>> It also affects other requestHandler like select and clustering as
> well.
> >>> All of them gets the QTime of more than 500ms. This is also proven in
> the
> >>> latest debug query that we have sent earlier, in which we have set
> hl=false
> >>> and fl=null.
> >>>
> >>> Any idea or explanation on this strange behavior?
> >>> Thank you for your support, as we look forward to shed some lights on
> >>> this issue and to resolve it.
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> Hi Jan,
> >>>>
> >>>> Thanks for your reply.
> >>>>
> >>>> However, we are still getting a slow QTime of 517ms even after we set
> >>>> hl=false&fl=null.
> >>>>
> >>>> Below is the debug query:
> >>>>
> >>>>  "debug":{
> >>>>    "rawquerystring":"cherry",
> >>>>    "querystring":"cherry",
> >>>>    "parsedquery":"searchFields_tcs:cherry",
> >>>>    "parsedquery_toString":"searchFields_tcs:cherry",
> >>>>    "explain":{
> >>>>      "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
> 5747763) [SchemaSimilarity], result of:\n  14.227914 =
> score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 =
> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 =
> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 =
> parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n
>     25.0 = fieldLength\n",
> >>>>      "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
> 4840794) [SchemaSimilarity], result of:\n  13.937909 =
> score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 =
> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 =
> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 =
> parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n
>     27.0 = fieldLength\n",
> >>>>    "QParser":"LuceneQParser",
> >>>>    "timing":{
> >>>>      "time":517.0,
> >>>>      "prepare":{
> >>>>        "time":0.0,
> >>>>        "query":{
> >>>>          "time":0.0},
> >>>>        "facet":{
> >>>>          "time":0.0},
> >>>>        "facet_module":{
> >>>>          "time":0.0},
> >>>>        "mlt":{
> >>>>          "time":0.0},
> >>>>        "highlight":{
> >>>>          "time":0.0},
> >>>>        "stats":{
> >>>>          "time":0.0},
> >>>>        "expand":{
> >>>>          "time":0.0},
> >>>>        "terms":{
> >>>>          "time":0.0},
> >>>>        "debug":{
> >>>>          "time":0.0}},
> >>>>      "process":{
> >>>>        "time":516.0,
> >>>>        "query":{
> >>>>          "time":15.0},
> >>>>        "facet":{
> >>>>          "time":0.0},
> >>>>        "facet_module":{
> >>>>          "time":0.0},
> >>>>        "mlt":{
> >>>>          "time":0.0},
> >>>>        "highlight":{
> >>>>          "time":0.0},
> >>>>        "stats":{
> >>>>          "time":0.0},
> >>>>        "expand":{
> >>>>          "time":0.0},
> >>>>        "terms":{
> >>>>          "time":0.0},
> >>>>        "debug":{
> >>>>          "time":500.0}}}}}
> >>>>
> >>>> Regards,
> >>>> Edwin
> >>>>
> >>>>
> >>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]>
> wrote:
> >>>>
> >>>>> Looks like highlighting takes most of the time on the first query
> >>>>> (680ms). You config seems to ask for a lot of highlighting here,
> like 100
> >>>>> snippets of max 100000 characters etc.
> >>>>> Sounds to me that this might be a highlighting configuration problem.
> >>>>> Try to disable highlighting (hl=false) and see if you get back your
> speed.
> >>>>> Also, I see fl=* in your config, which is really asking for all
> fields.
> >>>>> Are you sure you want that, that may also be slow. Try to ask for
> just the
> >>>>> fields you will be using.
> >>>>>
> >>>>> --
> >>>>> Jan Høydahl, search solution architect
> >>>>> Cominvent AS - www.cominvent.com
> >>>>>
> >>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
> >>>>> [hidden email]>:
> >>>>>>
> >>>>>> Thanks for your reply.
> >>>>>>
> >>>>>> Below are what you have requested about our Solr setup,
> configurations
> >>>>>> files ,schema and results of debug queries:
> >>>>>>
> >>>>>> Looking forward to your advice and support on our problem.
> >>>>>>
> >>>>>> 1. System configurations
> >>>>>> OS: Windows 10 Pro 64 bit
> >>>>>> System Memory: 32GB
> >>>>>> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
> >>>>>> Processor(s)
> >>>>>> HDD: 3.0 TB (free 2.1 TB)  SATA
> >>>>>>
> >>>>>> 2. solrconfig.xml of customers and policies collection, and solr.in
> >>>>> ,cmd
> >>>>>> which can be download from the following link:
> >>>>>>
> >>>>>
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >>>>>>
> >>>>>> 3. The debug queries from both collections
> >>>>>>
> >>>>>> *3.1. Debug Query From Policies ( which is Slow)*
> >>>>>>
> >>>>>> "debug":{
> >>>>>>
> >>>>>>   "rawquerystring":"sherry",
> >>>>>>
> >>>>>>   "querystring":"sherry",
> >>>>>>
> >>>>>>   "parsedquery":"searchFields_tcs:sherry",
> >>>>>>
> >>>>>>   "parsedquery_toString":"searchFields_tcs:sherry",
> >>>>>>
> >>>>>>   "explain":{
> >>>>>>
> >>>>>>     "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> >>>>>> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> >>>>>> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> >>>>>> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> >>>>>> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> >>>>>> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> >>>>>> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> >>>>>> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
> >>>>>> b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
> >>>>>>
> >>>>>>   "QParser":"LuceneQParser",
> >>>>>>
> >>>>>>   "timing":{
> >>>>>>
> >>>>>>     "time":681.0,
> >>>>>>
> >>>>>>     "prepare":{
> >>>>>>
> >>>>>>       "time":0.0,
> >>>>>>
> >>>>>>       "query":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "facet":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "facet_module":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "mlt":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "highlight":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "stats":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "expand":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "terms":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "debug":{
> >>>>>>
> >>>>>>         "time":0.0}},
> >>>>>>
> >>>>>>     "process":{
> >>>>>>
> >>>>>>       "time":680.0,
> >>>>>>
> >>>>>>       "query":{
> >>>>>>
> >>>>>>         "time":19.0},
> >>>>>>
> >>>>>>       "facet":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "facet_module":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "mlt":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "highlight":{
> >>>>>>
> >>>>>>         "time":651.0},
> >>>>>>
> >>>>>>       "stats":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "expand":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "terms":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "debug":{
> >>>>>>
> >>>>>>         "time":8.0}},
> >>>>>>
> >>>>>>     "loadFieldValues":{
> >>>>>>
> >>>>>>       "time":12.0}}}}
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> *3.2. Debug Query From Customers (which is fast because we index it
> >>>>> after
> >>>>>> indexing Policies):*
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> "debug":{
> >>>>>>
> >>>>>>   "rawquerystring":"sherry",
> >>>>>>
> >>>>>>   "querystring":"sherry",
> >>>>>>
> >>>>>>   "parsedquery":"searchFields_tcs:sherry",
> >>>>>>
> >>>>>>   "parsedquery_toString":"searchFields_tcs:sherry",
> >>>>>>
> >>>>>>   "explain":{
> >>>>>>
> >>>>>>     "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> >>>>>> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> >>>>>> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n
> 9.08604
> >>>>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> >>>>>> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> >>>>>> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
> -
> >>>>>> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> >>>>>> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> >>>>>> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
> >>>>>>
> >>>>>>   "QParser":"LuceneQParser",
> >>>>>>
> >>>>>>   "timing":{
> >>>>>>
> >>>>>>     "time":38.0,
> >>>>>>
> >>>>>>     "prepare":{
> >>>>>>
> >>>>>>       "time":1.0,
> >>>>>>
> >>>>>>       "query":{
> >>>>>>
> >>>>>>         "time":1.0},
> >>>>>>
> >>>>>>       "facet":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "facet_module":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "mlt":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "highlight":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "stats":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "expand":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "terms":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "debug":{
> >>>>>>
> >>>>>>         "time":0.0}},
> >>>>>>
> >>>>>>     "process":{
> >>>>>>
> >>>>>>       "time":36.0,
> >>>>>>
> >>>>>>       "query":{
> >>>>>>
> >>>>>>         "time":1.0},
> >>>>>>
> >>>>>>       "facet":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "facet_module":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "mlt":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "highlight":{
> >>>>>>
> >>>>>>         "time":31.0},
> >>>>>>
> >>>>>>       "stats":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "expand":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "terms":{
> >>>>>>
> >>>>>>         "time":0.0},
> >>>>>>
> >>>>>>       "debug":{
> >>>>>>
> >>>>>>         "time":3.0}},
> >>>>>>
> >>>>>>     "loadFieldValues":{
> >>>>>>
> >>>>>>       "time":13.0}}}}
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> Best Regards,
> >>>>>> Edwin
> >>>>>>
> >>>>>> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
> >>>>> wrote:
> >>>>>>
> >>>>>>> It would be useful if you can disclose the machine configuration,
> OS,
> >>>>>>> memory, settings etc, as well as solr config including solr.in <
> >>>>>>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
> >>>>> picture
> >>>>>>> of memory, GC, etc.
> >>>>>>> You could also specify debugQuery=true on a slow search and check
> the
> >>>>>>> timings section for clues. What QTime are you seeing on the slow
> >>>>> queries in
> >>>>>>> solr.log?
> >>>>>>> If that does not reveal the reason, I'd connect to your solr
> >>>>> instance with
> >>>>>>> a tool like jVisualVM or similar, to inspect what takes time. Or
> >>>>> better,
> >>>>>>> hook up to DataDog, SPM or some other cloud tool to get a full view
> >>>>> of the
> >>>>>>> system.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jan Høydahl, search solution architect
> >>>>>>> Cominvent AS - www.cominvent.com
> >>>>>>>
> >>>>>>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
> >>>>> [hidden email]
> >>>>>>>> :
> >>>>>>>>
> >>>>>>>> Hi Shawn,
> >>>>>>>>
> >>>>>>>> Unfortunately your reply of memory may not be valid. Please refer
> >>>>> to my
> >>>>>>>> explanation below of the strange behaviors (is it much more like a
> >>>>> BUG
> >>>>>>> than
> >>>>>>>> anything else that is explainable):
> >>>>>>>>
> >>>>>>>> Note that we still have 18GB of free unused memory on the server.
> >>>>>>>>
> >>>>>>>> 1. We indexed the first collection called customers (3.7 millioin
> >>>>> records
> >>>>>>>> from CSV data), index size is 2.09GB. The search in customers for
> >>>>> any
> >>>>>>>> keyword is returned within 50ms (QTime) for using highlight
> (unified
> >>>>>>>> highlighter, posting, light term vectors)
> >>>>>>>>
> >>>>>>>> 2. Then we indexed the second collection called policies (6
> million
> >>>>>>> records
> >>>>>>>> from CSV data), index size is 2.55GB. The search in policies for
> any
> >>>>>>>> keyword is returned within 50ms (QTime) for using highlight
> (unified
> >>>>>>>> highlighter, posting, light term vectors)
> >>>>>>>>
> >>>>>>>> 3. But now any search in customers for any keywords (not from
> cache)
> >>>>>>> takes
> >>>>>>>> as high as 1200ms (QTime). But still policies search remains very
> >>>>> fast
> >>>>>>>> (50ms).
> >>>>>>>>
> >>>>>>>> 4. So we decided to run the force optimize command on customers
> >>>>>>> collection (
> >>>>>>>>
> >>>>>>>
> >>>>>
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> >>>>>>> ),
> >>>>>>>> surprisingly after optimization the search on customers collection
> >>>>> for
> >>>>>>> any
> >>>>>>>> keywords become very fast again (less than 50ms). BUT strangely,
> the
> >>>>>>> search
> >>>>>>>> in policies collection become very slow (around 1200ms) without
> any
> >>>>>>> changes
> >>>>>>>> to the policies collection.
> >>>>>>>>
> >>>>>>>> 5. Based on above result, we decided to run the force optimize
> >>>>> command on
> >>>>>>>> policies collection (
> >>>>>>>>
> >>>>>>>
> >>>>>
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> >>>>>>> ).
> >>>>>>>> More surprisingly, after optimization the search on policies
> >>>>> collection
> >>>>>>> for
> >>>>>>>> any keywords become very fast again (less than 50ms). BUT more
> >>>>> strangely,
> >>>>>>>> the search in customers collection again become very slow (around
> >>>>> 1200ms)
> >>>>>>>> without any changes to the customers collection.
> >>>>>>>>
> >>>>>>>> What a strange and unexpected behavior! If this is not a bug, how
> >>>>> could
> >>>>>>> you
> >>>>>>>> explain the above very strange behavior in Solr 7.5. Could it be a
> >>>>> bug?
> >>>>>>>>
> >>>>>>>> We would appreciate any support or help on our above situation.
> >>>>>>>>
> >>>>>>>> Thank you.
> >>>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Edwin
> >>>>>>>>
> >>>>>>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
> >>>>> [hidden email]>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hi Shawn,
> >>>>>>>>>
> >>>>>>>>>> If the two collections have data on the same server(s), I can
> see
> >>>>> this
> >>>>>>>>>> happening.  More memory is consumed when there is additional
> >>>>> data, and
> >>>>>>>>>> when Solr needs more memory, performance might be affected.  The
> >>>>>>>>>> solution is generally to install more memory in the server.
> >>>>>>>>>
> >>>>>>>>> I have found that even after we delete the index in collection2,
> >>>>> the
> >>>>>>> query
> >>>>>>>>> QTime for collection1 still remains slow. It does not goes back
> to
> >>>>> its
> >>>>>>>>> previous fast speed before we index collection2.
> >>>>>>>>>
> >>>>>>>>> Regards,
> >>>>>>>>> Edwin
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
> >>>>> [hidden email]
> >>>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>
> >>>>>>>>>> Hi Shawn,
> >>>>>>>>>>
> >>>>>>>>>> Thanks for your reply.
> >>>>>>>>>>
> >>>>>>>>>> The log only shows a list  the following and I don't see any
> >>>>> other logs
> >>>>>>>>>> besides these.
> >>>>>>>>>>
> >>>>>>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330)
> [c:collectioin1
> >>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>> update-script#processAdd:
> >>>>>>>>>> id=13245417
> >>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
> [c:collectioin1
> >>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>> update-script#processAdd:
> >>>>>>>>>> id=13245430
> >>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
> [c:collectioin1
> >>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>> update-script#processAdd:
> >>>>>>>>>> id=13245435
> >>>>>>>>>>
> >>>>>>>>>> There is no change to the segments info. but the slowdown in the
> >>>>> first
> >>>>>>>>>> collection is very drastic.
> >>>>>>>>>> Before the indexing of collection2, the collection1 query QTime
> >>>>> are in
> >>>>>>>>>> the range of 4 to 50 ms. However, after indexing collection2,
> the
> >>>>>>>>>> collection1 query QTime increases to more than 1000 ms. The
> index
> >>>>> are
> >>>>>>> done
> >>>>>>>>>> in CSV format, and the size of the index is 3GB.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Edwin
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>>
> >>>>>>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]
> >
> >>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>>>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
> >>>>> when I
> >>>>>>> am
> >>>>>>>>>>>> indexing in collection2, the indexing affects the records in
> >>>>>>>>>>> collection1.
> >>>>>>>>>>>> Although the records are still intact, it seems that the
> >>>>> settings of
> >>>>>>>>>>> the
> >>>>>>>>>>>> termVecotrs get wipe out, and the index size of collection1
> >>>>> reduced
> >>>>>>>>>>> from
> >>>>>>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>>>>>>>>
> >>>>>>>>>>> This should not be possible.  Indexing in one collection should
> >>>>> have
> >>>>>>>>>>> absolutely no effect on another collection.
> >>>>>>>>>>>
> >>>>>>>>>>> If logging has been left at its default settings, the solr.log
> >>>>> file
> >>>>>>>>>>> should have enough info to show what actually happened.
> >>>>>>>>>>>
> >>>>>>>>>>>> Also, the search in
> >>>>>>>>>>>> collection1, which was originall very fast, becomes very slow
> >>>>> after
> >>>>>>> the
> >>>>>>>>>>>> indexing is done is collection2.
> >>>>>>>>>>>
> >>>>>>>>>>> If the two collections have data on the same server(s), I can
> >>>>> see this
> >>>>>>>>>>> happening.  More memory is consumed when there is additional
> >>>>> data, and
> >>>>>>>>>>> when Solr needs more memory, performance might be affected.
> The
> >>>>>>>>>>> solution is generally to install more memory in the server.  If
> >>>>> the
> >>>>>>>>>>> system is working, there should be no need to increase the heap
> >>>>> size
> >>>>>>>>>>> when the memory size increases ... but there can be situations
> >>>>> where
> >>>>>>> the
> >>>>>>>>>>> heap is a little bit too small, where you WOULD want to
> increase
> >>>>> the
> >>>>>>>>>>> heap size.
> >>>>>>>>>>>
> >>>>>>>>>>> Thanks,
> >>>>>>>>>>> Shawn
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>
> >>>>>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Jan Høydahl / Cominvent
How do you start Solr, cause the solr.in.cmd you sent does not contain the memory settings. What other parameters do you start Solr with?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 25. jan. 2019 kl. 15:28 skrev Zheng Lin Edwin Yeo <[hidden email]>:
>
> Hi Jan,
>
> We are using 64 bit Java, version 1.8.0_191.
> We started Solr with 6 GB heap size.
>
> Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++  running
> on the machine. There is still 22 GB of memory left on the server, out of
> the 32 GB available on the machine.
>
> Regards,
> Edwin
>
> On Fri, 25 Jan 2019 at 21:04, Jan Høydahl <[hidden email]> wrote:
>
>> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
>> size?
>> Other software running on the machine?
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <[hidden email]
>>> :
>>>
>>> Hi Jan and Shawn,
>>>
>>> For your info, this is another debug query.
>>>
>>> "debug":{
>>>
>>>   "rawquerystring":"johnny",
>>>
>>>   "querystring":"johnny",
>>>
>>>   "parsedquery":"searchFields_tcs:johnny",
>>>
>>>   "parsedquery_toString":"searchFields_tcs:johnny",
>>>
>>>   "explain":{
>>>
>>>     "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
>>> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
>>> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>> 0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
>>> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
>>> b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
>>> termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>> 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,
>>>
>>>   "QParser":"LuceneQParser",
>>>
>>>   "timing":{
>>>
>>>     "time":350.0,
>>>
>>>     "prepare":{
>>>
>>>       "time":0.0,
>>>
>>>       "query":{
>>>
>>>         "time":0.0},
>>>
>>>       "facet":{
>>>
>>>         "time":0.0},
>>>
>>>       "facet_module":{
>>>
>>>         "time":0.0},
>>>
>>>       "mlt":{
>>>
>>>         "time":0.0},
>>>
>>>       "highlight":{
>>>
>>>         "time":0.0},
>>>
>>>       "stats":{
>>>
>>>         "time":0.0},
>>>
>>>       "expand":{
>>>
>>>         "time":0.0},
>>>
>>>       "terms":{
>>>
>>>         "time":0.0},
>>>
>>>       "debug":{
>>>
>>>         "time":0.0}},
>>>
>>>     "process":{
>>>
>>>       "time":348.0,
>>>
>>>       "query":{
>>>
>>>         "time":287.0},
>>>
>>>       "facet":{
>>>
>>>         "time":0.0},
>>>
>>>       "facet_module":{
>>>
>>>         "time":0.0},
>>>
>>>       "mlt":{
>>>
>>>         "time":0.0},
>>>
>>>       "highlight":{
>>>
>>>         "time":54.0},
>>>
>>>       "stats":{
>>>
>>>         "time":0.0},
>>>
>>>       "expand":{
>>>
>>>         "time":0.0},
>>>
>>>       "terms":{
>>>
>>>         "time":0.0},
>>>
>>>       "debug":{
>>>
>>>         "time":6.0}},
>>>
>>>     "loadFieldValues":{
>>>
>>>       "time":0.0}}}}
>>>
>>>
>>> Regards,
>>> Edwin
>>>
>>>
>>> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <[hidden email]>
>>> wrote:
>>>
>>>> Hi Jan and Shawn,
>>>>
>>>> Please focus on the strange issue that I have described above in more
>>>> details, summary is as follows:
>>>>
>>>> 1. Index customers data, then queries from highlight, select, and all
>>>> handlers are very fast (less than 50ms)
>>>>
>>>> 2. Now index policies data, then queries on polices are very fast BUT
>>>> queries on customers becomes slow
>>>>
>>>> 3. If I reindex customers data, then again queries for customers are
>> very
>>>> fast BUT queries on policies becomes slow.
>>>>
>>>> How can you explain this behavior?
>>>>
>>>> We have never experienced such a strange behavior before Solr 7.
>>>>
>>>> Regards,
>>>> Edwin
>>>>
>>>> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <[hidden email]
>>>
>>>> wrote:
>>>>
>>>>> Hi Jan,
>>>>>
>>>>> Referring to what you have mentioned that the highlighting takes up
>> most
>>>>> of the time in the first query from the policies collection, the
>>>>> highlighting was very fast (less than 50ms) from the time it was
>> indexed,
>>>>> till the time after customers collection gets indexed, in which it
>> slowed
>>>>> down tremendously.
>>>>>
>>>>> Also, the slow down does not just affect on the highlight
>> requestHandler.
>>>>> It also affects other requestHandler like select and clustering as
>> well.
>>>>> All of them gets the QTime of more than 500ms. This is also proven in
>> the
>>>>> latest debug query that we have sent earlier, in which we have set
>> hl=false
>>>>> and fl=null.
>>>>>
>>>>> Any idea or explanation on this strange behavior?
>>>>> Thank you for your support, as we look forward to shed some lights on
>>>>> this issue and to resolve it.
>>>>>
>>>>> Regards,
>>>>> Edwin
>>>>>
>>>>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
>> [hidden email]>
>>>>> wrote:
>>>>>
>>>>>> Hi Jan,
>>>>>>
>>>>>> Thanks for your reply.
>>>>>>
>>>>>> However, we are still getting a slow QTime of 517ms even after we set
>>>>>> hl=false&fl=null.
>>>>>>
>>>>>> Below is the debug query:
>>>>>>
>>>>>> "debug":{
>>>>>>   "rawquerystring":"cherry",
>>>>>>   "querystring":"cherry",
>>>>>>   "parsedquery":"searchFields_tcs:cherry",
>>>>>>   "parsedquery_toString":"searchFields_tcs:cherry",
>>>>>>   "explain":{
>>>>>>     "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
>> 5747763) [SchemaSimilarity], result of:\n  14.227914 =
>> score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 =
>> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
>> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305 =
>> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
>> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 =
>> parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n
>>    25.0 = fieldLength\n",
>>>>>>     "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
>> 4840794) [SchemaSimilarity], result of:\n  13.937909 =
>> score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556 =
>> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
>> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675 =
>> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
>> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n      1.2 =
>> parameter k1\n      0.75 = parameter b\n      19.397041 = avgFieldLength\n
>>    27.0 = fieldLength\n",
>>>>>>   "QParser":"LuceneQParser",
>>>>>>   "timing":{
>>>>>>     "time":517.0,
>>>>>>     "prepare":{
>>>>>>       "time":0.0,
>>>>>>       "query":{
>>>>>>         "time":0.0},
>>>>>>       "facet":{
>>>>>>         "time":0.0},
>>>>>>       "facet_module":{
>>>>>>         "time":0.0},
>>>>>>       "mlt":{
>>>>>>         "time":0.0},
>>>>>>       "highlight":{
>>>>>>         "time":0.0},
>>>>>>       "stats":{
>>>>>>         "time":0.0},
>>>>>>       "expand":{
>>>>>>         "time":0.0},
>>>>>>       "terms":{
>>>>>>         "time":0.0},
>>>>>>       "debug":{
>>>>>>         "time":0.0}},
>>>>>>     "process":{
>>>>>>       "time":516.0,
>>>>>>       "query":{
>>>>>>         "time":15.0},
>>>>>>       "facet":{
>>>>>>         "time":0.0},
>>>>>>       "facet_module":{
>>>>>>         "time":0.0},
>>>>>>       "mlt":{
>>>>>>         "time":0.0},
>>>>>>       "highlight":{
>>>>>>         "time":0.0},
>>>>>>       "stats":{
>>>>>>         "time":0.0},
>>>>>>       "expand":{
>>>>>>         "time":0.0},
>>>>>>       "terms":{
>>>>>>         "time":0.0},
>>>>>>       "debug":{
>>>>>>         "time":500.0}}}}}
>>>>>>
>>>>>> Regards,
>>>>>> Edwin
>>>>>>
>>>>>>
>>>>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]>
>> wrote:
>>>>>>
>>>>>>> Looks like highlighting takes most of the time on the first query
>>>>>>> (680ms). You config seems to ask for a lot of highlighting here,
>> like 100
>>>>>>> snippets of max 100000 characters etc.
>>>>>>> Sounds to me that this might be a highlighting configuration problem.
>>>>>>> Try to disable highlighting (hl=false) and see if you get back your
>> speed.
>>>>>>> Also, I see fl=* in your config, which is really asking for all
>> fields.
>>>>>>> Are you sure you want that, that may also be slow. Try to ask for
>> just the
>>>>>>> fields you will be using.
>>>>>>>
>>>>>>> --
>>>>>>> Jan Høydahl, search solution architect
>>>>>>> Cominvent AS - www.cominvent.com
>>>>>>>
>>>>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
>>>>>>> [hidden email]>:
>>>>>>>>
>>>>>>>> Thanks for your reply.
>>>>>>>>
>>>>>>>> Below are what you have requested about our Solr setup,
>> configurations
>>>>>>>> files ,schema and results of debug queries:
>>>>>>>>
>>>>>>>> Looking forward to your advice and support on our problem.
>>>>>>>>
>>>>>>>> 1. System configurations
>>>>>>>> OS: Windows 10 Pro 64 bit
>>>>>>>> System Memory: 32GB
>>>>>>>> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8 Logical
>>>>>>>> Processor(s)
>>>>>>>> HDD: 3.0 TB (free 2.1 TB)  SATA
>>>>>>>>
>>>>>>>> 2. solrconfig.xml of customers and policies collection, and solr.in
>>>>>>> ,cmd
>>>>>>>> which can be download from the following link:
>>>>>>>>
>>>>>>>
>> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
>>>>>>>>
>>>>>>>> 3. The debug queries from both collections
>>>>>>>>
>>>>>>>> *3.1. Debug Query From Policies ( which is Slow)*
>>>>>>>>
>>>>>>>> "debug":{
>>>>>>>>
>>>>>>>>  "rawquerystring":"sherry",
>>>>>>>>
>>>>>>>>  "querystring":"sherry",
>>>>>>>>
>>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
>>>>>>>>
>>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
>>>>>>>>
>>>>>>>>  "explain":{
>>>>>>>>
>>>>>>>>    "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
>>>>>>>> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
>>>>>>>> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
>>>>>>>> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
>>>>>>>> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
>>>>>>>> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
>>>>>>>> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
>>>>>>>> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 = parameter
>>>>>>>> b\n      19.397041 = avgFieldLength\n      31.0 = fieldLength\n”,..
>>>>>>>>
>>>>>>>>  "QParser":"LuceneQParser",
>>>>>>>>
>>>>>>>>  "timing":{
>>>>>>>>
>>>>>>>>    "time":681.0,
>>>>>>>>
>>>>>>>>    "prepare":{
>>>>>>>>
>>>>>>>>      "time":0.0,
>>>>>>>>
>>>>>>>>      "query":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "facet":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "facet_module":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "mlt":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "highlight":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "stats":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "expand":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "terms":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "debug":{
>>>>>>>>
>>>>>>>>        "time":0.0}},
>>>>>>>>
>>>>>>>>    "process":{
>>>>>>>>
>>>>>>>>      "time":680.0,
>>>>>>>>
>>>>>>>>      "query":{
>>>>>>>>
>>>>>>>>        "time":19.0},
>>>>>>>>
>>>>>>>>      "facet":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "facet_module":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "mlt":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "highlight":{
>>>>>>>>
>>>>>>>>        "time":651.0},
>>>>>>>>
>>>>>>>>      "stats":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "expand":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "terms":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "debug":{
>>>>>>>>
>>>>>>>>        "time":8.0}},
>>>>>>>>
>>>>>>>>    "loadFieldValues":{
>>>>>>>>
>>>>>>>>      "time":12.0}}}}
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> *3.2. Debug Query From Customers (which is fast because we index it
>>>>>>> after
>>>>>>>> indexing Policies):*
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> "debug":{
>>>>>>>>
>>>>>>>>  "rawquerystring":"sherry",
>>>>>>>>
>>>>>>>>  "querystring":"sherry",
>>>>>>>>
>>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
>>>>>>>>
>>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
>>>>>>>>
>>>>>>>>  "explain":{
>>>>>>>>
>>>>>>>>    "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
>>>>>>>> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
>>>>>>>> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n
>> 9.08604
>>>>>>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
>>>>>>>> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
>>>>>>>> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1
>> -
>>>>>>>> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
>>>>>>>> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
>>>>>>>> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
>>>>>>>>
>>>>>>>>  "QParser":"LuceneQParser",
>>>>>>>>
>>>>>>>>  "timing":{
>>>>>>>>
>>>>>>>>    "time":38.0,
>>>>>>>>
>>>>>>>>    "prepare":{
>>>>>>>>
>>>>>>>>      "time":1.0,
>>>>>>>>
>>>>>>>>      "query":{
>>>>>>>>
>>>>>>>>        "time":1.0},
>>>>>>>>
>>>>>>>>      "facet":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "facet_module":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "mlt":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "highlight":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "stats":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "expand":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "terms":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "debug":{
>>>>>>>>
>>>>>>>>        "time":0.0}},
>>>>>>>>
>>>>>>>>    "process":{
>>>>>>>>
>>>>>>>>      "time":36.0,
>>>>>>>>
>>>>>>>>      "query":{
>>>>>>>>
>>>>>>>>        "time":1.0},
>>>>>>>>
>>>>>>>>      "facet":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "facet_module":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "mlt":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "highlight":{
>>>>>>>>
>>>>>>>>        "time":31.0},
>>>>>>>>
>>>>>>>>      "stats":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "expand":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "terms":{
>>>>>>>>
>>>>>>>>        "time":0.0},
>>>>>>>>
>>>>>>>>      "debug":{
>>>>>>>>
>>>>>>>>        "time":3.0}},
>>>>>>>>
>>>>>>>>    "loadFieldValues":{
>>>>>>>>
>>>>>>>>      "time":13.0}}}}
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> Best Regards,
>>>>>>>> Edwin
>>>>>>>>
>>>>>>>> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> It would be useful if you can disclose the machine configuration,
>> OS,
>>>>>>>>> memory, settings etc, as well as solr config including solr.in <
>>>>>>>>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
>>>>>>> picture
>>>>>>>>> of memory, GC, etc.
>>>>>>>>> You could also specify debugQuery=true on a slow search and check
>> the
>>>>>>>>> timings section for clues. What QTime are you seeing on the slow
>>>>>>> queries in
>>>>>>>>> solr.log?
>>>>>>>>> If that does not reveal the reason, I'd connect to your solr
>>>>>>> instance with
>>>>>>>>> a tool like jVisualVM or similar, to inspect what takes time. Or
>>>>>>> better,
>>>>>>>>> hook up to DataDog, SPM or some other cloud tool to get a full view
>>>>>>> of the
>>>>>>>>> system.
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Jan Høydahl, search solution architect
>>>>>>>>> Cominvent AS - www.cominvent.com
>>>>>>>>>
>>>>>>>>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
>>>>>>> [hidden email]
>>>>>>>>>> :
>>>>>>>>>>
>>>>>>>>>> Hi Shawn,
>>>>>>>>>>
>>>>>>>>>> Unfortunately your reply of memory may not be valid. Please refer
>>>>>>> to my
>>>>>>>>>> explanation below of the strange behaviors (is it much more like a
>>>>>>> BUG
>>>>>>>>> than
>>>>>>>>>> anything else that is explainable):
>>>>>>>>>>
>>>>>>>>>> Note that we still have 18GB of free unused memory on the server.
>>>>>>>>>>
>>>>>>>>>> 1. We indexed the first collection called customers (3.7 millioin
>>>>>>> records
>>>>>>>>>> from CSV data), index size is 2.09GB. The search in customers for
>>>>>>> any
>>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
>> (unified
>>>>>>>>>> highlighter, posting, light term vectors)
>>>>>>>>>>
>>>>>>>>>> 2. Then we indexed the second collection called policies (6
>> million
>>>>>>>>> records
>>>>>>>>>> from CSV data), index size is 2.55GB. The search in policies for
>> any
>>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
>> (unified
>>>>>>>>>> highlighter, posting, light term vectors)
>>>>>>>>>>
>>>>>>>>>> 3. But now any search in customers for any keywords (not from
>> cache)
>>>>>>>>> takes
>>>>>>>>>> as high as 1200ms (QTime). But still policies search remains very
>>>>>>> fast
>>>>>>>>>> (50ms).
>>>>>>>>>>
>>>>>>>>>> 4. So we decided to run the force optimize command on customers
>>>>>>>>> collection (
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
>>>>>>>>> ),
>>>>>>>>>> surprisingly after optimization the search on customers collection
>>>>>>> for
>>>>>>>>> any
>>>>>>>>>> keywords become very fast again (less than 50ms). BUT strangely,
>> the
>>>>>>>>> search
>>>>>>>>>> in policies collection become very slow (around 1200ms) without
>> any
>>>>>>>>> changes
>>>>>>>>>> to the policies collection.
>>>>>>>>>>
>>>>>>>>>> 5. Based on above result, we decided to run the force optimize
>>>>>>> command on
>>>>>>>>>> policies collection (
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
>>>>>>>>> ).
>>>>>>>>>> More surprisingly, after optimization the search on policies
>>>>>>> collection
>>>>>>>>> for
>>>>>>>>>> any keywords become very fast again (less than 50ms). BUT more
>>>>>>> strangely,
>>>>>>>>>> the search in customers collection again become very slow (around
>>>>>>> 1200ms)
>>>>>>>>>> without any changes to the customers collection.
>>>>>>>>>>
>>>>>>>>>> What a strange and unexpected behavior! If this is not a bug, how
>>>>>>> could
>>>>>>>>> you
>>>>>>>>>> explain the above very strange behavior in Solr 7.5. Could it be a
>>>>>>> bug?
>>>>>>>>>>
>>>>>>>>>> We would appreciate any support or help on our above situation.
>>>>>>>>>>
>>>>>>>>>> Thank you.
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>> Edwin
>>>>>>>>>>
>>>>>>>>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
>>>>>>> [hidden email]>
>>>>>>>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi Shawn,
>>>>>>>>>>>
>>>>>>>>>>>> If the two collections have data on the same server(s), I can
>> see
>>>>>>> this
>>>>>>>>>>>> happening.  More memory is consumed when there is additional
>>>>>>> data, and
>>>>>>>>>>>> when Solr needs more memory, performance might be affected.  The
>>>>>>>>>>>> solution is generally to install more memory in the server.
>>>>>>>>>>>
>>>>>>>>>>> I have found that even after we delete the index in collection2,
>>>>>>> the
>>>>>>>>> query
>>>>>>>>>>> QTime for collection1 still remains slow. It does not goes back
>> to
>>>>>>> its
>>>>>>>>>>> previous fast speed before we index collection2.
>>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Edwin
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
>>>>>>> [hidden email]
>>>>>>>>>>
>>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> Hi Shawn,
>>>>>>>>>>>>
>>>>>>>>>>>> Thanks for your reply.
>>>>>>>>>>>>
>>>>>>>>>>>> The log only shows a list  the following and I don't see any
>>>>>>> other logs
>>>>>>>>>>>> besides these.
>>>>>>>>>>>>
>>>>>>>>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330)
>> [c:collectioin1
>>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>>>> update-script#processAdd:
>>>>>>>>>>>> id=13245417
>>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
>> [c:collectioin1
>>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>>>> update-script#processAdd:
>>>>>>>>>>>> id=13245430
>>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
>> [c:collectioin1
>>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
>>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
>>>>>>>>> update-script#processAdd:
>>>>>>>>>>>> id=13245435
>>>>>>>>>>>>
>>>>>>>>>>>> There is no change to the segments info. but the slowdown in the
>>>>>>> first
>>>>>>>>>>>> collection is very drastic.
>>>>>>>>>>>> Before the indexing of collection2, the collection1 query QTime
>>>>>>> are in
>>>>>>>>>>>> the range of 4 to 50 ms. However, after indexing collection2,
>> the
>>>>>>>>>>>> collection1 query QTime increases to more than 1000 ms. The
>> index
>>>>>>> are
>>>>>>>>> done
>>>>>>>>>>>> in CSV format, and the size of the index is 3GB.
>>>>>>>>>>>>
>>>>>>>>>>>> Regards,
>>>>>>>>>>>> Edwin
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <[hidden email]
>>>
>>>>>>>>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
>>>>>>>>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
>>>>>>> when I
>>>>>>>>> am
>>>>>>>>>>>>>> indexing in collection2, the indexing affects the records in
>>>>>>>>>>>>> collection1.
>>>>>>>>>>>>>> Although the records are still intact, it seems that the
>>>>>>> settings of
>>>>>>>>>>>>> the
>>>>>>>>>>>>>> termVecotrs get wipe out, and the index size of collection1
>>>>>>> reduced
>>>>>>>>>>>>> from
>>>>>>>>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> This should not be possible.  Indexing in one collection should
>>>>>>> have
>>>>>>>>>>>>> absolutely no effect on another collection.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If logging has been left at its default settings, the solr.log
>>>>>>> file
>>>>>>>>>>>>> should have enough info to show what actually happened.
>>>>>>>>>>>>>
>>>>>>>>>>>>>> Also, the search in
>>>>>>>>>>>>>> collection1, which was originall very fast, becomes very slow
>>>>>>> after
>>>>>>>>> the
>>>>>>>>>>>>>> indexing is done is collection2.
>>>>>>>>>>>>>
>>>>>>>>>>>>> If the two collections have data on the same server(s), I can
>>>>>>> see this
>>>>>>>>>>>>> happening.  More memory is consumed when there is additional
>>>>>>> data, and
>>>>>>>>>>>>> when Solr needs more memory, performance might be affected.
>> The
>>>>>>>>>>>>> solution is generally to install more memory in the server.  If
>>>>>>> the
>>>>>>>>>>>>> system is working, there should be no need to increase the heap
>>>>>>> size
>>>>>>>>>>>>> when the memory size increases ... but there can be situations
>>>>>>> where
>>>>>>>>> the
>>>>>>>>>>>>> heap is a little bit too small, where you WOULD want to
>> increase
>>>>>>> the
>>>>>>>>>>>>> heap size.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>> Shawn
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Jan,

Below is the command that we used to start Solr:

cd solr-7.5.0
bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause


We also have a replica, and in this development setting, we put it in the
same PC to simulate it.
Below is the command to start the replica, which uses Port 8984.

cd solr-7.5.0
bin\solr.cmd start -cloud -p 8984 -s solrMain\node2 -m 6g -z
"localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
pause

Regards,
Edwin

On Fri, 25 Jan 2019 at 22:35, Jan Høydahl <[hidden email]> wrote:

> How do you start Solr, cause the solr.in.cmd you sent does not contain the
> memory settings. What other parameters do you start Solr with?
>
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
>
> > 25. jan. 2019 kl. 15:28 skrev Zheng Lin Edwin Yeo <[hidden email]
> >:
> >
> > Hi Jan,
> >
> > We are using 64 bit Java, version 1.8.0_191.
> > We started Solr with 6 GB heap size.
> >
> > Besides Solr, we have ZooKeeper, IIS, Google Chrome and NotePad++
> running
> > on the machine. There is still 22 GB of memory left on the server, out of
> > the 32 GB available on the machine.
> >
> > Regards,
> > Edwin
> >
> > On Fri, 25 Jan 2019 at 21:04, Jan Høydahl <[hidden email]> wrote:
> >
> >> Which java version? 32 or 64 bit? You start Solr with default 512Mb heap
> >> size?
> >> Other software running on the machine?
> >>
> >> --
> >> Jan Høydahl, search solution architect
> >> Cominvent AS - www.cominvent.com
> >>
> >>> 25. jan. 2019 kl. 13:05 skrev Zheng Lin Edwin Yeo <
> [hidden email]
> >>> :
> >>>
> >>> Hi Jan and Shawn,
> >>>
> >>> For your info, this is another debug query.
> >>>
> >>> "debug":{
> >>>
> >>>   "rawquerystring":"johnny",
> >>>
> >>>   "querystring":"johnny",
> >>>
> >>>   "parsedquery":"searchFields_tcs:johnny",
> >>>
> >>>   "parsedquery_toString":"searchFields_tcs:johnny",
> >>>
> >>>   "explain":{
> >>>
> >>>     "192280":"\n12.8497505 = weight(searchFields_tcs:johnny in
> >>> 75730) [SchemaSimilarity], result of:\n  12.8497505 =
> >>> score(doc=75730,freq=4.0 = termFreq=4.0\n), product of:\n    7.5108404
> >>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> >>> 0.5)) from:\n      473.0 = docFreq\n      865438.0 = docCount\n
> >>> 1.7108272 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 -
> >>> b + b * fieldLength / avgFieldLength)) from:\n      4.0 =
> >>> termFreq=4.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> >>> 26.66791 = avgFieldLength\n      25.0 = fieldLength\n”,
> >>>
> >>>   "QParser":"LuceneQParser",
> >>>
> >>>   "timing":{
> >>>
> >>>     "time":350.0,
> >>>
> >>>     "prepare":{
> >>>
> >>>       "time":0.0,
> >>>
> >>>       "query":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "facet":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "facet_module":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "mlt":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "highlight":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "stats":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "expand":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "terms":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "debug":{
> >>>
> >>>         "time":0.0}},
> >>>
> >>>     "process":{
> >>>
> >>>       "time":348.0,
> >>>
> >>>       "query":{
> >>>
> >>>         "time":287.0},
> >>>
> >>>       "facet":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "facet_module":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "mlt":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "highlight":{
> >>>
> >>>         "time":54.0},
> >>>
> >>>       "stats":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "expand":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "terms":{
> >>>
> >>>         "time":0.0},
> >>>
> >>>       "debug":{
> >>>
> >>>         "time":6.0}},
> >>>
> >>>     "loadFieldValues":{
> >>>
> >>>       "time":0.0}}}}
> >>>
> >>>
> >>> Regards,
> >>> Edwin
> >>>
> >>>
> >>> On Fri, 25 Jan 2019 at 19:52, Zheng Lin Edwin Yeo <
> [hidden email]>
> >>> wrote:
> >>>
> >>>> Hi Jan and Shawn,
> >>>>
> >>>> Please focus on the strange issue that I have described above in more
> >>>> details, summary is as follows:
> >>>>
> >>>> 1. Index customers data, then queries from highlight, select, and all
> >>>> handlers are very fast (less than 50ms)
> >>>>
> >>>> 2. Now index policies data, then queries on polices are very fast BUT
> >>>> queries on customers becomes slow
> >>>>
> >>>> 3. If I reindex customers data, then again queries for customers are
> >> very
> >>>> fast BUT queries on policies becomes slow.
> >>>>
> >>>> How can you explain this behavior?
> >>>>
> >>>> We have never experienced such a strange behavior before Solr 7.
> >>>>
> >>>> Regards,
> >>>> Edwin
> >>>>
> >>>> On Fri, 25 Jan 2019 at 17:06, Zheng Lin Edwin Yeo <
> [hidden email]
> >>>
> >>>> wrote:
> >>>>
> >>>>> Hi Jan,
> >>>>>
> >>>>> Referring to what you have mentioned that the highlighting takes up
> >> most
> >>>>> of the time in the first query from the policies collection, the
> >>>>> highlighting was very fast (less than 50ms) from the time it was
> >> indexed,
> >>>>> till the time after customers collection gets indexed, in which it
> >> slowed
> >>>>> down tremendously.
> >>>>>
> >>>>> Also, the slow down does not just affect on the highlight
> >> requestHandler.
> >>>>> It also affects other requestHandler like select and clustering as
> >> well.
> >>>>> All of them gets the QTime of more than 500ms. This is also proven in
> >> the
> >>>>> latest debug query that we have sent earlier, in which we have set
> >> hl=false
> >>>>> and fl=null.
> >>>>>
> >>>>> Any idea or explanation on this strange behavior?
> >>>>> Thank you for your support, as we look forward to shed some lights on
> >>>>> this issue and to resolve it.
> >>>>>
> >>>>> Regards,
> >>>>> Edwin
> >>>>>
> >>>>> On Thu, 24 Jan 2019 at 23:35, Zheng Lin Edwin Yeo <
> >> [hidden email]>
> >>>>> wrote:
> >>>>>
> >>>>>> Hi Jan,
> >>>>>>
> >>>>>> Thanks for your reply.
> >>>>>>
> >>>>>> However, we are still getting a slow QTime of 517ms even after we
> set
> >>>>>> hl=false&fl=null.
> >>>>>>
> >>>>>> Below is the debug query:
> >>>>>>
> >>>>>> "debug":{
> >>>>>>   "rawquerystring":"cherry",
> >>>>>>   "querystring":"cherry",
> >>>>>>   "parsedquery":"searchFields_tcs:cherry",
> >>>>>>   "parsedquery_toString":"searchFields_tcs:cherry",
> >>>>>>   "explain":{
> >>>>>>     "46226513":"\n14.227914 = weight(searchFields_tcs:cherry in
> >> 5747763) [SchemaSimilarity], result of:\n  14.227914 =
> >> score(doc=5747763,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556
> =
> >> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
> >> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4798305
> =
> >> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
> >> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n
> 1.2 =
> >> parameter k1\n      0.75 = parameter b\n      19.397041 =
> avgFieldLength\n
> >>    25.0 = fieldLength\n",
> >>>>>>     "54088731":"\n13.937909 = weight(searchFields_tcs:cherry in
> >> 4840794) [SchemaSimilarity], result of:\n  13.937909 =
> >> score(doc=4840794,freq=3.0 = termFreq=3.0\n), product of:\n    9.614556
> =
> >> idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq + 0.5))
> >> from:\n      400.0 = docFreq\n      6000000.0 = docCount\n    1.4496675
> =
> >> tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 * (1 - b + b *
> >> fieldLength / avgFieldLength)) from:\n      3.0 = termFreq=3.0\n
> 1.2 =
> >> parameter k1\n      0.75 = parameter b\n      19.397041 =
> avgFieldLength\n
> >>    27.0 = fieldLength\n",
> >>>>>>   "QParser":"LuceneQParser",
> >>>>>>   "timing":{
> >>>>>>     "time":517.0,
> >>>>>>     "prepare":{
> >>>>>>       "time":0.0,
> >>>>>>       "query":{
> >>>>>>         "time":0.0},
> >>>>>>       "facet":{
> >>>>>>         "time":0.0},
> >>>>>>       "facet_module":{
> >>>>>>         "time":0.0},
> >>>>>>       "mlt":{
> >>>>>>         "time":0.0},
> >>>>>>       "highlight":{
> >>>>>>         "time":0.0},
> >>>>>>       "stats":{
> >>>>>>         "time":0.0},
> >>>>>>       "expand":{
> >>>>>>         "time":0.0},
> >>>>>>       "terms":{
> >>>>>>         "time":0.0},
> >>>>>>       "debug":{
> >>>>>>         "time":0.0}},
> >>>>>>     "process":{
> >>>>>>       "time":516.0,
> >>>>>>       "query":{
> >>>>>>         "time":15.0},
> >>>>>>       "facet":{
> >>>>>>         "time":0.0},
> >>>>>>       "facet_module":{
> >>>>>>         "time":0.0},
> >>>>>>       "mlt":{
> >>>>>>         "time":0.0},
> >>>>>>       "highlight":{
> >>>>>>         "time":0.0},
> >>>>>>       "stats":{
> >>>>>>         "time":0.0},
> >>>>>>       "expand":{
> >>>>>>         "time":0.0},
> >>>>>>       "terms":{
> >>>>>>         "time":0.0},
> >>>>>>       "debug":{
> >>>>>>         "time":500.0}}}}}
> >>>>>>
> >>>>>> Regards,
> >>>>>> Edwin
> >>>>>>
> >>>>>>
> >>>>>> On Thu, 24 Jan 2019 at 22:43, Jan Høydahl <[hidden email]>
> >> wrote:
> >>>>>>
> >>>>>>> Looks like highlighting takes most of the time on the first query
> >>>>>>> (680ms). You config seems to ask for a lot of highlighting here,
> >> like 100
> >>>>>>> snippets of max 100000 characters etc.
> >>>>>>> Sounds to me that this might be a highlighting configuration
> problem.
> >>>>>>> Try to disable highlighting (hl=false) and see if you get back your
> >> speed.
> >>>>>>> Also, I see fl=* in your config, which is really asking for all
> >> fields.
> >>>>>>> Are you sure you want that, that may also be slow. Try to ask for
> >> just the
> >>>>>>> fields you will be using.
> >>>>>>>
> >>>>>>> --
> >>>>>>> Jan Høydahl, search solution architect
> >>>>>>> Cominvent AS - www.cominvent.com
> >>>>>>>
> >>>>>>>> 24. jan. 2019 kl. 14:59 skrev Zheng Lin Edwin Yeo <
> >>>>>>> [hidden email]>:
> >>>>>>>>
> >>>>>>>> Thanks for your reply.
> >>>>>>>>
> >>>>>>>> Below are what you have requested about our Solr setup,
> >> configurations
> >>>>>>>> files ,schema and results of debug queries:
> >>>>>>>>
> >>>>>>>> Looking forward to your advice and support on our problem.
> >>>>>>>>
> >>>>>>>> 1. System configurations
> >>>>>>>> OS: Windows 10 Pro 64 bit
> >>>>>>>> System Memory: 32GB
> >>>>>>>> CPU: Intel(R) Core(TM) i7-4790K CPU @ 4.00GHz, 4 Core(s), 8
> Logical
> >>>>>>>> Processor(s)
> >>>>>>>> HDD: 3.0 TB (free 2.1 TB)  SATA
> >>>>>>>>
> >>>>>>>> 2. solrconfig.xml of customers and policies collection, and
> solr.in
> >>>>>>> ,cmd
> >>>>>>>> which can be download from the following link:
> >>>>>>>>
> >>>>>>>
> >>
> https://drive.google.com/file/d/1AATjonQsEC5B0ldz27Xvx5A55Dp5ul8K/view?usp=sharing
> >>>>>>>>
> >>>>>>>> 3. The debug queries from both collections
> >>>>>>>>
> >>>>>>>> *3.1. Debug Query From Policies ( which is Slow)*
> >>>>>>>>
> >>>>>>>> "debug":{
> >>>>>>>>
> >>>>>>>>  "rawquerystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "querystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "explain":{
> >>>>>>>>
> >>>>>>>>    "31702988":"\n14.540428 = weight(searchFields_tcs:sherry in
> >>>>>>>> 3097315) [SchemaSimilarity], result of:\n  14.540428 =
> >>>>>>>> score(doc=3097315,freq=5.0 = termFreq=5.0\n), product of:\n
> >>>>>>>> 8.907154 = idf, computed as log(1 + (docCount - docFreq + 0.5) /
> >>>>>>>> (docFreq + 0.5)) from:\n      812.0 = docFreq\n      6000000.0 =
> >>>>>>>> docCount\n    1.6324438 = tfNorm, computed as (freq * (k1 + 1)) /
> >>>>>>>> (freq + k1 * (1 - b + b * fieldLength / avgFieldLength)) from:\n
> >>>>>>>> 5.0 = termFreq=5.0\n      1.2 = parameter k1\n      0.75 =
> parameter
> >>>>>>>> b\n      19.397041 = avgFieldLength\n      31.0 =
> fieldLength\n”,..
> >>>>>>>>
> >>>>>>>>  "QParser":"LuceneQParser",
> >>>>>>>>
> >>>>>>>>  "timing":{
> >>>>>>>>
> >>>>>>>>    "time":681.0,
> >>>>>>>>
> >>>>>>>>    "prepare":{
> >>>>>>>>
> >>>>>>>>      "time":0.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":0.0}},
> >>>>>>>>
> >>>>>>>>    "process":{
> >>>>>>>>
> >>>>>>>>      "time":680.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":19.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":651.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":8.0}},
> >>>>>>>>
> >>>>>>>>    "loadFieldValues":{
> >>>>>>>>
> >>>>>>>>      "time":12.0}}}}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> *3.2. Debug Query From Customers (which is fast because we index
> it
> >>>>>>> after
> >>>>>>>> indexing Policies):*
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> "debug":{
> >>>>>>>>
> >>>>>>>>  "rawquerystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "querystring":"sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "parsedquery_toString":"searchFields_tcs:sherry",
> >>>>>>>>
> >>>>>>>>  "explain":{
> >>>>>>>>
> >>>>>>>>    "S7900271B":"\n13.191501 = weight(searchFields_tcs:sherry in
> >>>>>>>> 2453665) [SchemaSimilarity], result of:\n  13.191501 =
> >>>>>>>> score(doc=2453665,freq=3.0 = termFreq=3.0\n), product of:\n
> >> 9.08604
> >>>>>>>> = idf, computed as log(1 + (docCount - docFreq + 0.5) / (docFreq +
> >>>>>>>> 0.5)) from:\n      428.0 = docFreq\n      3784142.0 = docCount\n
> >>>>>>>> 1.4518428 = tfNorm, computed as (freq * (k1 + 1)) / (freq + k1 *
> (1
> >> -
> >>>>>>>> b + b * fieldLength / avgFieldLength)) from:\n      3.0 =
> >>>>>>>> termFreq=3.0\n      1.2 = parameter k1\n      0.75 = parameter b\n
> >>>>>>>> 20.22558 = avgFieldLength\n      28.0 = fieldLength\n”, ..
> >>>>>>>>
> >>>>>>>>  "QParser":"LuceneQParser",
> >>>>>>>>
> >>>>>>>>  "timing":{
> >>>>>>>>
> >>>>>>>>    "time":38.0,
> >>>>>>>>
> >>>>>>>>    "prepare":{
> >>>>>>>>
> >>>>>>>>      "time":1.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":1.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":0.0}},
> >>>>>>>>
> >>>>>>>>    "process":{
> >>>>>>>>
> >>>>>>>>      "time":36.0,
> >>>>>>>>
> >>>>>>>>      "query":{
> >>>>>>>>
> >>>>>>>>        "time":1.0},
> >>>>>>>>
> >>>>>>>>      "facet":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "facet_module":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "mlt":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "highlight":{
> >>>>>>>>
> >>>>>>>>        "time":31.0},
> >>>>>>>>
> >>>>>>>>      "stats":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "expand":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "terms":{
> >>>>>>>>
> >>>>>>>>        "time":0.0},
> >>>>>>>>
> >>>>>>>>      "debug":{
> >>>>>>>>
> >>>>>>>>        "time":3.0}},
> >>>>>>>>
> >>>>>>>>    "loadFieldValues":{
> >>>>>>>>
> >>>>>>>>      "time":13.0}}}}
> >>>>>>>>
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Best Regards,
> >>>>>>>> Edwin
> >>>>>>>>
> >>>>>>>> On Thu, 24 Jan 2019 at 20:57, Jan Høydahl <[hidden email]>
> >>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> It would be useful if you can disclose the machine configuration,
> >> OS,
> >>>>>>>>> memory, settings etc, as well as solr config including solr.in <
> >>>>>>>>> http://solr.in/>.sh, solrconfig.xml etc, so we can see the whole
> >>>>>>> picture
> >>>>>>>>> of memory, GC, etc.
> >>>>>>>>> You could also specify debugQuery=true on a slow search and check
> >> the
> >>>>>>>>> timings section for clues. What QTime are you seeing on the slow
> >>>>>>> queries in
> >>>>>>>>> solr.log?
> >>>>>>>>> If that does not reveal the reason, I'd connect to your solr
> >>>>>>> instance with
> >>>>>>>>> a tool like jVisualVM or similar, to inspect what takes time. Or
> >>>>>>> better,
> >>>>>>>>> hook up to DataDog, SPM or some other cloud tool to get a full
> view
> >>>>>>> of the
> >>>>>>>>> system.
> >>>>>>>>>
> >>>>>>>>> --
> >>>>>>>>> Jan Høydahl, search solution architect
> >>>>>>>>> Cominvent AS - www.cominvent.com
> >>>>>>>>>
> >>>>>>>>>> 24. jan. 2019 kl. 13:42 skrev Zheng Lin Edwin Yeo <
> >>>>>>> [hidden email]
> >>>>>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>> Hi Shawn,
> >>>>>>>>>>
> >>>>>>>>>> Unfortunately your reply of memory may not be valid. Please
> refer
> >>>>>>> to my
> >>>>>>>>>> explanation below of the strange behaviors (is it much more
> like a
> >>>>>>> BUG
> >>>>>>>>> than
> >>>>>>>>>> anything else that is explainable):
> >>>>>>>>>>
> >>>>>>>>>> Note that we still have 18GB of free unused memory on the
> server.
> >>>>>>>>>>
> >>>>>>>>>> 1. We indexed the first collection called customers (3.7
> millioin
> >>>>>>> records
> >>>>>>>>>> from CSV data), index size is 2.09GB. The search in customers
> for
> >>>>>>> any
> >>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
> >> (unified
> >>>>>>>>>> highlighter, posting, light term vectors)
> >>>>>>>>>>
> >>>>>>>>>> 2. Then we indexed the second collection called policies (6
> >> million
> >>>>>>>>> records
> >>>>>>>>>> from CSV data), index size is 2.55GB. The search in policies for
> >> any
> >>>>>>>>>> keyword is returned within 50ms (QTime) for using highlight
> >> (unified
> >>>>>>>>>> highlighter, posting, light term vectors)
> >>>>>>>>>>
> >>>>>>>>>> 3. But now any search in customers for any keywords (not from
> >> cache)
> >>>>>>>>> takes
> >>>>>>>>>> as high as 1200ms (QTime). But still policies search remains
> very
> >>>>>>> fast
> >>>>>>>>>> (50ms).
> >>>>>>>>>>
> >>>>>>>>>> 4. So we decided to run the force optimize command on customers
> >>>>>>>>> collection (
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>
> https://localhost:8983/edm/customers/update?optimize=true&numSegments=1&waitFlush=false
> >>>>>>>>> ),
> >>>>>>>>>> surprisingly after optimization the search on customers
> collection
> >>>>>>> for
> >>>>>>>>> any
> >>>>>>>>>> keywords become very fast again (less than 50ms). BUT strangely,
> >> the
> >>>>>>>>> search
> >>>>>>>>>> in policies collection become very slow (around 1200ms) without
> >> any
> >>>>>>>>> changes
> >>>>>>>>>> to the policies collection.
> >>>>>>>>>>
> >>>>>>>>>> 5. Based on above result, we decided to run the force optimize
> >>>>>>> command on
> >>>>>>>>>> policies collection (
> >>>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>
> https://localhost:8983/edm/policies/update?optimize=true&numSegments=1&waitFlush=false
> >>>>>>>>> ).
> >>>>>>>>>> More surprisingly, after optimization the search on policies
> >>>>>>> collection
> >>>>>>>>> for
> >>>>>>>>>> any keywords become very fast again (less than 50ms). BUT more
> >>>>>>> strangely,
> >>>>>>>>>> the search in customers collection again become very slow
> (around
> >>>>>>> 1200ms)
> >>>>>>>>>> without any changes to the customers collection.
> >>>>>>>>>>
> >>>>>>>>>> What a strange and unexpected behavior! If this is not a bug,
> how
> >>>>>>> could
> >>>>>>>>> you
> >>>>>>>>>> explain the above very strange behavior in Solr 7.5. Could it
> be a
> >>>>>>> bug?
> >>>>>>>>>>
> >>>>>>>>>> We would appreciate any support or help on our above situation.
> >>>>>>>>>>
> >>>>>>>>>> Thank you.
> >>>>>>>>>>
> >>>>>>>>>> Regards,
> >>>>>>>>>> Edwin
> >>>>>>>>>>
> >>>>>>>>>> On Thu, 24 Jan 2019 at 16:14, Zheng Lin Edwin Yeo <
> >>>>>>> [hidden email]>
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>>> Hi Shawn,
> >>>>>>>>>>>
> >>>>>>>>>>>> If the two collections have data on the same server(s), I can
> >> see
> >>>>>>> this
> >>>>>>>>>>>> happening.  More memory is consumed when there is additional
> >>>>>>> data, and
> >>>>>>>>>>>> when Solr needs more memory, performance might be affected.
> The
> >>>>>>>>>>>> solution is generally to install more memory in the server.
> >>>>>>>>>>>
> >>>>>>>>>>> I have found that even after we delete the index in
> collection2,
> >>>>>>> the
> >>>>>>>>> query
> >>>>>>>>>>> QTime for collection1 still remains slow. It does not goes back
> >> to
> >>>>>>> its
> >>>>>>>>>>> previous fast speed before we index collection2.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards,
> >>>>>>>>>>> Edwin
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, 24 Jan 2019 at 11:13, Zheng Lin Edwin Yeo <
> >>>>>>> [hidden email]
> >>>>>>>>>>
> >>>>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hi Shawn,
> >>>>>>>>>>>>
> >>>>>>>>>>>> Thanks for your reply.
> >>>>>>>>>>>>
> >>>>>>>>>>>> The log only shows a list  the following and I don't see any
> >>>>>>> other logs
> >>>>>>>>>>>> besides these.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2019-01-24 02:47:57.925 INFO  (qtp2131952342-1330)
> >> [c:collectioin1
> >>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>>>> update-script#processAdd:
> >>>>>>>>>>>> id=13245417
> >>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
> >> [c:collectioin1
> >>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>>>> update-script#processAdd:
> >>>>>>>>>>>> id=13245430
> >>>>>>>>>>>> 2019-01-24 02:47:57.957 INFO  (qtp2131952342-1330)
> >> [c:collectioin1
> >>>>>>>>>>>> s:shard1 r:core_node4 x:collection1_shard1_replica_n2]
> >>>>>>>>>>>> o.a.s.u.p.StatelessScriptUpdateProcessorFactory
> >>>>>>>>> update-script#processAdd:
> >>>>>>>>>>>> id=13245435
> >>>>>>>>>>>>
> >>>>>>>>>>>> There is no change to the segments info. but the slowdown in
> the
> >>>>>>> first
> >>>>>>>>>>>> collection is very drastic.
> >>>>>>>>>>>> Before the indexing of collection2, the collection1 query
> QTime
> >>>>>>> are in
> >>>>>>>>>>>> the range of 4 to 50 ms. However, after indexing collection2,
> >> the
> >>>>>>>>>>>> collection1 query QTime increases to more than 1000 ms. The
> >> index
> >>>>>>> are
> >>>>>>>>> done
> >>>>>>>>>>>> in CSV format, and the size of the index is 3GB.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Regards,
> >>>>>>>>>>>> Edwin
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>>
> >>>>>>>>>>>> On Thu, 24 Jan 2019 at 01:09, Shawn Heisey <
> [hidden email]
> >>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>>
> >>>>>>>>>>>>> On 1/23/2019 10:01 AM, Zheng Lin Edwin Yeo wrote:
> >>>>>>>>>>>>>> I am using Solr 7.5.0, and currently I am facing an issue of
> >>>>>>> when I
> >>>>>>>>> am
> >>>>>>>>>>>>>> indexing in collection2, the indexing affects the records in
> >>>>>>>>>>>>> collection1.
> >>>>>>>>>>>>>> Although the records are still intact, it seems that the
> >>>>>>> settings of
> >>>>>>>>>>>>> the
> >>>>>>>>>>>>>> termVecotrs get wipe out, and the index size of collection1
> >>>>>>> reduced
> >>>>>>>>>>>>> from
> >>>>>>>>>>>>>> 3.3GB to 2.1GB after I do the indexing in collection2.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> This should not be possible.  Indexing in one collection
> should
> >>>>>>> have
> >>>>>>>>>>>>> absolutely no effect on another collection.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If logging has been left at its default settings, the
> solr.log
> >>>>>>> file
> >>>>>>>>>>>>> should have enough info to show what actually happened.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>> Also, the search in
> >>>>>>>>>>>>>> collection1, which was originall very fast, becomes very
> slow
> >>>>>>> after
> >>>>>>>>> the
> >>>>>>>>>>>>>> indexing is done is collection2.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> If the two collections have data on the same server(s), I can
> >>>>>>> see this
> >>>>>>>>>>>>> happening.  More memory is consumed when there is additional
> >>>>>>> data, and
> >>>>>>>>>>>>> when Solr needs more memory, performance might be affected.
> >> The
> >>>>>>>>>>>>> solution is generally to install more memory in the server.
> If
> >>>>>>> the
> >>>>>>>>>>>>> system is working, there should be no need to increase the
> heap
> >>>>>>> size
> >>>>>>>>>>>>> when the memory size increases ... but there can be
> situations
> >>>>>>> where
> >>>>>>>>> the
> >>>>>>>>>>>>> heap is a little bit too small, where you WOULD want to
> >> increase
> >>>>>>> the
> >>>>>>>>>>>>> heap size.
> >>>>>>>>>>>>>
> >>>>>>>>>>>>> Thanks,
> >>>>>>>>>>>>> Shawn
> >>>>>>>>>>>>>
> >>>>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>>
> >>>>>>>
> >>>>>>>
> >>
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Shawn Heisey-2
On 1/25/2019 7:47 AM, Zheng Lin Edwin Yeo wrote:
> Below is the command that we used to start Solr:
>
> cd solr-7.5.0
> bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
> "localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
> pause

Can you gather the screenshot mentioned here and send us a link to a
file sharing site so we can view the screenshot?

https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
Hi Shawn,

As requested, below is the link to the screenshot of the resource monitor
of our system.
https://drive.google.com/file/d/1_-Tqhk9YYp9w8injHU4ZPSvdFJOx8A5s/view?usp=sharing

Regards,
Edwin

On Fri, 25 Jan 2019 at 23:35, Shawn Heisey <[hidden email]> wrote:

> On 1/25/2019 7:47 AM, Zheng Lin Edwin Yeo wrote:
> > Below is the command that we used to start Solr:
> >
> > cd solr-7.5.0
> > bin\solr.cmd start -cloud -p 8983 -s solrMain\node1 -m 6g -z
> > "localhost:2181,localhost:2182,localhost:2183" -Dsolr.ltr.enabled=true
> > pause
>
> Can you gather the screenshot mentioned here and send us a link to a
> file sharing site so we can view the screenshot?
>
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Process_listing_on_Windows
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing in one collection affect index in another collection

Zheng Lin Edwin Yeo
In reply to this post by Jörn Franke
Hi Jorn,

I have set the heap size to 6GB, and the system has 32GB of RAM.

The data is indexed from CSV file, so each field's data is like database
type of data. Only the searchFields may have more data as it contains the
important fields of the collection. But then again it is not as large as
things like contents from emails.

The index size is currently only 3GB, which shouldn't be considered big to
split the collections to different nodes?

Regards,
Edwin

On Fri, 25 Jan 2019 at 20:49, Jörn Franke <[hidden email]> wrote:

> Have you done a correct sizing wrt to memory / CPU?
>
> Check also the data model if you have a lot of queried stored fields that
> may contain a lot of data.
>
> You may also split those two collections on different nodes.
>
> > Am 23.01.2019 um 18:01 schrieb Zheng Lin Edwin Yeo <[hidden email]
> >:
> >
> > Hi,
> >
> > I am using Solr 7.5.0, and currently I am facing an issue of when I am
> > indexing in collection2, the indexing affects the records in collection1.
> > Although the records are still intact, it seems that the settings of the
> > termVecotrs get wipe out, and the index size of collection1 reduced from
> > 3.3GB to 2.1GB after I do the indexing in collection2. Also, the search
> in
> > collection1, which was originall very fast, becomes very slow after the
> > indexing is done is collection2.
> >
> > Anyone has faced such issues before or have any idea on what may have
> gone
> > wrong?
> >
> > Regards,
> > Edwin
>
12