Add dynamic field to existing index slow

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Add dynamic field to existing index slow

derrick cui-2
I have 400k data, indexing is pretty fast, only take 10 minutes, but add dynamic field to all documents according to query results is very slow, take about 1.5 hours.
Anyone knows what could be the reason?
Thanks



Sent from Yahoo Mail for iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

Alexandre Rafalovitch
Indexing new documents is just adding additional segments.

Adding new field to a document means:
1) Reading existing document (may not always be possible, depending on
field configuration)
2) Marking existing document as deleted
3) Creating new document with reconstructed+plus new fields
4) Possibly triggering a marge if a lot of documents have been updated

Perhaps the above is a contributing factor. But I also feel that maybe
there is some detail in your question I did not fully understand.

Regards,
   Alex.

On Sun, 30 Jun 2019 at 10:33, derrick cui
<[hidden email]> wrote:
>
> I have 400k data, indexing is pretty fast, only take 10 minutes, but add dynamic field to all documents according to query results is very slow, take about 1.5 hours.
> Anyone knows what could be the reason?
> Thanks
>
>
>
> Sent from Yahoo Mail for iPhone
Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

derrick cui-2
Thanks Alex,
My usage is that 
1. I execute query and get result, return id only 2. Add a value to a dynamic field3. Save to solr with batch size1000
I have define 50 queries and run them parallel Also I disable hard commit and soft commit per 1000 docs 

I am wondering whether any configuration can speed it




Sent from Yahoo Mail for iPhone


On Sunday, June 30, 2019, 10:39 AM, Alexandre Rafalovitch <[hidden email]> wrote:

Indexing new documents is just adding additional segments.

Adding new field to a document means:
1) Reading existing document (may not always be possible, depending on
field configuration)
2) Marking existing document as deleted
3) Creating new document with reconstructed+plus new fields
4) Possibly triggering a marge if a lot of documents have been updated

Perhaps the above is a contributing factor. But I also feel that maybe
there is some detail in your question I did not fully understand.

Regards,
  Alex.

On Sun, 30 Jun 2019 at 10:33, derrick cui
<[hidden email]> wrote:
>
> I have 400k data, indexing is pretty fast, only take 10 minutes, but add dynamic field to all documents according to query results is very slow, take about 1.5 hours.
> Anyone knows what could be the reason?
> Thanks
>
>
>
> Sent from Yahoo Mail for iPhone



Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

Alexandre Rafalovitch
Only thing I can think of is to check whether you can do in-place
rather than atomic updates:
https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html#in-place-updates
But the conditions are quite restrictive: non-indexed
(indexed="false"), non-stored (stored="false"), single valued
(multiValued="false") numeric docValues (docValues="true") field

The other option may be to use an external value field and not update
Solr documents at all:
https://lucene.apache.org/solr/guide/8_1/working-with-external-files-and-processes.html

Regards,
   Alex.

On Sun, 30 Jun 2019 at 10:53, derrick cui
<[hidden email]> wrote:

>
> Thanks Alex,
> My usage is that
> 1. I execute query and get result, return id only 2. Add a value to a dynamic field3. Save to solr with batch size1000
> I have define 50 queries and run them parallel Also I disable hard commit and soft commit per 1000 docs
>
> I am wondering whether any configuration can speed it
>
>
>
>
> Sent from Yahoo Mail for iPhone
>
>
> On Sunday, June 30, 2019, 10:39 AM, Alexandre Rafalovitch <[hidden email]> wrote:
>
> Indexing new documents is just adding additional segments.
>
> Adding new field to a document means:
> 1) Reading existing document (may not always be possible, depending on
> field configuration)
> 2) Marking existing document as deleted
> 3) Creating new document with reconstructed+plus new fields
> 4) Possibly triggering a marge if a lot of documents have been updated
>
> Perhaps the above is a contributing factor. But I also feel that maybe
> there is some detail in your question I did not fully understand.
>
> Regards,
>   Alex.
>
> On Sun, 30 Jun 2019 at 10:33, derrick cui
> <[hidden email]> wrote:
> >
> > I have 400k data, indexing is pretty fast, only take 10 minutes, but add dynamic field to all documents according to query results is very slow, take about 1.5 hours.
> > Anyone knows what could be the reason?
> > Thanks
> >
> >
> >
> > Sent from Yahoo Mail for iPhone
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

Erick Erickson
Well, the first thing I’d do is see what’s taking the time, querying or updating? Should be easy enough to comment out whatever it is that sends docs to Solr.

If it’s querying, it sounds like you’re paging through your entire data set and may be hitting the “deep paging” problem. Use cursorMark in that case.

Best,
Erick

> On Jun 30, 2019, at 9:12 AM, Alexandre Rafalovitch <[hidden email]> wrote:
>
> Only thing I can think of is to check whether you can do in-place
> rather than atomic updates:
> https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html#in-place-updates
> But the conditions are quite restrictive: non-indexed
> (indexed="false"), non-stored (stored="false"), single valued
> (multiValued="false") numeric docValues (docValues="true") field
>
> The other option may be to use an external value field and not update
> Solr documents at all:
> https://lucene.apache.org/solr/guide/8_1/working-with-external-files-and-processes.html
>
> Regards,
>   Alex.
>
> On Sun, 30 Jun 2019 at 10:53, derrick cui
> <[hidden email]> wrote:
>>
>> Thanks Alex,
>> My usage is that
>> 1. I execute query and get result, return id only 2. Add a value to a dynamic field3. Save to solr with batch size1000
>> I have define 50 queries and run them parallel Also I disable hard commit and soft commit per 1000 docs
>>
>> I am wondering whether any configuration can speed it
>>
>>
>>
>>
>> Sent from Yahoo Mail for iPhone
>>
>>
>> On Sunday, June 30, 2019, 10:39 AM, Alexandre Rafalovitch <[hidden email]> wrote:
>>
>> Indexing new documents is just adding additional segments.
>>
>> Adding new field to a document means:
>> 1) Reading existing document (may not always be possible, depending on
>> field configuration)
>> 2) Marking existing document as deleted
>> 3) Creating new document with reconstructed+plus new fields
>> 4) Possibly triggering a marge if a lot of documents have been updated
>>
>> Perhaps the above is a contributing factor. But I also feel that maybe
>> there is some detail in your question I did not fully understand.
>>
>> Regards,
>>  Alex.
>>
>> On Sun, 30 Jun 2019 at 10:33, derrick cui
>> <[hidden email]> wrote:
>>>
>>> I have 400k data, indexing is pretty fast, only take 10 minutes, but add dynamic field to all documents according to query results is very slow, take about 1.5 hours.
>>> Anyone knows what could be the reason?
>>> Thanks
>>>
>>>
>>>
>>> Sent from Yahoo Mail for iPhone
>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

derrick cui-2
Good point Erick, I will try it today, but I have already use cursorMark in my query for deep pagination.
Also I noticed that my cpu usage is pretty high, 8 cores, usage is over 700%. I am not sure it will help if I use ssd disk 


Sent from Yahoo Mail for iPhone


On Sunday, June 30, 2019, 2:57 PM, Erick Erickson <[hidden email]> wrote:

Well, the first thing I’d do is see what’s taking the time, querying or updating? Should be easy enough to comment out whatever it is that sends docs to Solr.

If it’s querying, it sounds like you’re paging through your entire data set and may be hitting the “deep paging” problem. Use cursorMark in that case.

Best,
Erick

> On Jun 30, 2019, at 9:12 AM, Alexandre Rafalovitch <[hidden email]> wrote:
>
> Only thing I can think of is to check whether you can do in-place
> rather than atomic updates:
> https://lucene.apache.org/solr/guide/8_1/updating-parts-of-documents.html#in-place-updates
> But the conditions are quite restrictive: non-indexed
> (indexed="false"), non-stored (stored="false"), single valued
> (multiValued="false") numeric docValues (docValues="true") field
>
> The other option may be to use an external value field and not update
> Solr documents at all:
> https://lucene.apache.org/solr/guide/8_1/working-with-external-files-and-processes.html
>
> Regards,
>  Alex.
>
> On Sun, 30 Jun 2019 at 10:53, derrick cui
> <[hidden email]> wrote:
>>
>> Thanks Alex,
>> My usage is that
>> 1. I execute query and get result, return id only 2. Add a value to a dynamic field3. Save to solr with batch size1000
>> I have define 50 queries and run them parallel Also I disable hard commit and soft commit per 1000 docs
>>
>> I am wondering whether any configuration can speed it
>>
>>
>>
>>
>> Sent from Yahoo Mail for iPhone
>>
>>
>> On Sunday, June 30, 2019, 10:39 AM, Alexandre Rafalovitch <[hidden email]> wrote:
>>
>> Indexing new documents is just adding additional segments.
>>
>> Adding new field to a document means:
>> 1) Reading existing document (may not always be possible, depending on
>> field configuration)
>> 2) Marking existing document as deleted
>> 3) Creating new document with reconstructed+plus new fields
>> 4) Possibly triggering a marge if a lot of documents have been updated
>>
>> Perhaps the above is a contributing factor. But I also feel that maybe
>> there is some detail in your question I did not fully understand.
>>
>> Regards,
>>  Alex.
>>
>> On Sun, 30 Jun 2019 at 10:33, derrick cui
>> <[hidden email]> wrote:
>>>
>>> I have 400k data, indexing is pretty fast, only take 10 minutes, but add dynamic field to all documents according to query results is very slow, take about 1.5 hours.
>>> Anyone knows what could be the reason?
>>> Thanks
>>>
>>>
>>>
>>> Sent from Yahoo Mail for iPhone
>>
>>
>>



Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

Shawn Heisey-2
On 6/30/2019 2:08 PM, derrick cui wrote:
> Good point Erick, I will try it today, but I have already use cursorMark in my query for deep pagination.
> Also I noticed that my cpu usage is pretty high, 8 cores, usage is over 700%. I am not sure it will help if I use ssd disk

That depends on whether the load is caused by iowait or by actual CPU usage.

If it's caused by iowait, then SSD would help, but additional memory
would help more.  Retrieving data from the OS disk cache (which exists
in main memory) is faster than SSD.

If it is actual CPU load, then it will take some additional poking
around to figure out which part of your activities causes the load, as
Erick mentioned.

It's normally a little bit easier to learn these things from Unix-like
operating systems than from Windows.  What OS are you running Solr on?

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

derrick cui-2
I have tested the query desperately, actually executing query is pretty fast, it only took a few minutes to go through all results including converting solr document to java object. So I believe the slowness is in persistence end.  BTW,  I am using linux system.


Sent from Yahoo Mail for iPhone


On Sunday, June 30, 2019, 4:52 PM, Shawn Heisey <[hidden email]> wrote:

On 6/30/2019 2:08 PM, derrick cui wrote:
> Good point Erick, I will try it today, but I have already use cursorMark in my query for deep pagination.
> Also I noticed that my cpu usage is pretty high, 8 cores, usage is over 700%. I am not sure it will help if I use ssd disk

That depends on whether the load is caused by iowait or by actual CPU usage.

If it's caused by iowait, then SSD would help, but additional memory
would help more.  Retrieving data from the OS disk cache (which exists
in main memory) is faster than SSD.

If it is actual CPU load, then it will take some additional poking
around to figure out which part of your activities causes the load, as
Erick mentioned.

It's normally a little bit easier to learn these things from Unix-like
operating systems than from Windows.  What OS are you running Solr on?

Thanks,
Shawn



Reply | Threaded
Open this post in threaded view
|

Re: Add dynamic field to existing index slow

Erick Erickson
OK, then let’s see the indexing code. Make sure you don’t
1> commit after every batch
2> never, never, never optimize.

BTW, you do not want to turn off commits entirely, there are some internal data structures that grow between commits. So I might do something like specify commitWithin on my adds for something like 5 minutes.

Best,
Erick

> On Jul 2, 2019, at 6:24 AM, derrick cui <[hidden email]> wrote:
>
> I have tested the query desperately, actually executing query is pretty fast, it only took a few minutes to go through all results including converting solr document to java object. So I believe the slowness is in persistence end.  BTW,  I am using linux system.
>
>
> Sent from Yahoo Mail for iPhone
>
>
> On Sunday, June 30, 2019, 4:52 PM, Shawn Heisey <[hidden email]> wrote:
>
> On 6/30/2019 2:08 PM, derrick cui wrote:
>> Good point Erick, I will try it today, but I have already use cursorMark in my query for deep pagination.
>> Also I noticed that my cpu usage is pretty high, 8 cores, usage is over 700%. I am not sure it will help if I use ssd disk
>
> That depends on whether the load is caused by iowait or by actual CPU usage.
>
> If it's caused by iowait, then SSD would help, but additional memory
> would help more.  Retrieving data from the OS disk cache (which exists
> in main memory) is faster than SSD.
>
> If it is actual CPU load, then it will take some additional poking
> around to figure out which part of your activities causes the load, as
> Erick mentioned.
>
> It's normally a little bit easier to learn these things from Unix-like
> operating systems than from Windows.  What OS are you running Solr on?
>
> Thanks,
> Shawn
>
>
>