Help with slow retrieving data

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Help with slow retrieving data

Wendy2
Hi Solr users:I use Solr 7.3.1 and 150,000 documents and about 6GB in total.
When I try to retrieve 20000 ids (4 letter code, indexed and stored), it
took 17s to retrieve 1.14M size data. I tried to increase RAM and cache, but
only helped to some degree (from 25s to 17s).  Any idea/suggestions where I
should look?   Thanks!  ========================================wget -O
output.txt
'http://localhost:8983/solr/s_entry/search?fl=pdb_id,score&q=human&start=0&rows=20000' 
1.14M  66.7KB/s    in 17s    



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Shawn Heisey-2
On 3/24/2019 7:16 AM, Wendy2 wrote:
> Hi Solr users:I use Solr 7.3.1 and 150,000 documents and about 6GB in total.
> When I try to retrieve 20000 ids (4 letter code, indexed and stored), it
> took 17s to retrieve 1.14M size data. I tried to increase RAM and cache, but

Can you get the screenshot described here, share it with a file sharing
site, and provide the link?

https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Wendy2
Hi Shawn,

Thank you very much for your response! Here is a screen shot.  Is the CPU an
issue?

<http://lucene.472066.n3.nabble.com/file/t493740/Screen_Shot_2019-03-24_at_2.png>



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Shawn Heisey-2
On 3/24/2019 12:11 PM, Wendy2 wrote:
> Thank you very much for your response! Here is a screen shot.  Is the CPU an
> issue?

You said that your index is 6GB, but the process listing is saying that
you have more than 30GB of index data being managed by Solr.  There's a
discrepancy somewhere.

This listing appears to be sorted by CPU, not by resident memory as the
instructions I pointed you at indicated.  I can't be sure whether or not
something important is missing from the listing.  For now I am going to
assume that I can see everything important.

What happens if you restart Solr or reload your core and then do the
same query with rows=0?  Is that fast or slow?  If it is slow, then it
is not retrieving the data that is slow, but the query itself.

Retrieving a large number of rows normally involves decompressing stored
fields.  This will exercise the CPU.

It looks like you have a 4GB heap for Solr.  With over 30GB of index,
it's entirely possible that 4GB of heap is not enough ... or it might be
plenty.  It's not super easy to figure out exactly how much heap you
need.  Usually it requires experimentation.

You are sharing this machine between Solr and mongodb.  Depending on how
much data is in the mongo database, you might need to add more memory or
split your services onto different machines.

Thanks,
Shawn
Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Wendy2
Hi Shawn,Thanks for your response.  I have several Solr cores on the same
Solr instance. The particular core with slow retrieve response has 6 gb
data. Sorry for the confusion.I restart Solr and ran same query with rows=0
vs 10000, QTime for both are OK, so I guess it is the retrieving slow? I
also tried return different rows, the more rows, the longer retrieving time.
The machine has 64G ram, I tried 32G for Solr Heap memory, but the
performance didn't improve much.  Any suggestions?  Thank you very
much!=============================Return 0 rows:232  --.-KB/s    in 0s {
"responseHeader":{    "status":0,    "QTime":96,    "params":{    
"q":"human",      "fl":"pdb_id,score",      "start":"0",      "rows":"0"}},
"response":{"numFound":67428,"start":0,"maxScore":246.08528,"docs":[]  }}~      
Return 10000 rows: 584.46K  65.4KB/s    in 8.9s {  "responseHeader":{  
"status":0,    "QTime":39,    "params":{      "q":"human",    
"fl":"pdb_id,score",      "start":"0",      "rows":"10000"}},
"response":{"numFound":67428,"start":0,"maxScore":246.08528,"docs":[      {
<http://lucene.472066.n3.nabble.com/file/t493740/ScreenShot2019-03-24at6.png>



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Erick Erickson
If the fields you’re returning have to be pulled from the store=“true” parts of the index, then each value returned requires
1> a disk read
2> decompressing 16K minimum

which is what Shawn was getting at.

Try this:
1> insure docValues=true for the field. You’ll have to re-index all your docs.
2> if that doesn’t make much of a difference, try adding useDocValuesAsStored for the field.

You’ll have to make sure your index is warmed up to get good measurements.

And be aware that if you’re displaying this in a browser, the browser itself may be taking a long time to render the results. To eliminate that, try sending the query with curl or similar.

Finally, Solr was not designed to be efficient at returning large numbers of rows. You may well want to use streaming for that: https://lucene.apache.org/solr/guide/7_2/streaming-expressions.html

Best,
Erick

> On Mar 24, 2019, at 3:30 PM, Wendy2 <[hidden email]> wrote:
>
> Hi Shawn,Thanks for your response.  I have several Solr cores on the same
> Solr instance. The particular core with slow retrieve response has 6 gb
> data. Sorry for the confusion.I restart Solr and ran same query with rows=0
> vs 10000, QTime for both are OK, so I guess it is the retrieving slow? I
> also tried return different rows, the more rows, the longer retrieving time.
> The machine has 64G ram, I tried 32G for Solr Heap memory, but the
> performance didn't improve much.  Any suggestions?  Thank you very
> much!=============================Return 0 rows:232  --.-KB/s    in 0s {
> "responseHeader":{    "status":0,    "QTime":96,    "params":{    
> "q":"human",      "fl":"pdb_id,score",      "start":"0",      "rows":"0"}},
> "response":{"numFound":67428,"start":0,"maxScore":246.08528,"docs":[]  }}~      
> Return 10000 rows: 584.46K  65.4KB/s    in 8.9s {  "responseHeader":{  
> "status":0,    "QTime":39,    "params":{      "q":"human",    
> "fl":"pdb_id,score",      "start":"0",      "rows":"10000"}},
> "response":{"numFound":67428,"start":0,"maxScore":246.08528,"docs":[      {
> <http://lucene.472066.n3.nabble.com/file/t493740/ScreenShot2019-03-24at6.png>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Wendy2
Hi Eric,

Thank you very much for your response! I tried

"Try this:
1> insure docValues=true for the field. You’ll have to re-index all your
docs. "

I tried the above approach as you recommended, the performance was getting
better, reduced about 3 seconds.

Then I tested on a new cloud server with local SSD for one core on Solr, the
performance was great.
With 50000 rows to retrieve, the response time was 0.2s, which is better
than our acceptance criteria :-)
So happy.  Thank you!

=================testing====================
 wget -O output.txt
'http://localhost:8983/solr/s_entry/select?fl=pdb_id,score&q=human&start=0&rows=50000'
--2019-03-25 10:23:21--
http://localhost:8983/solr/s_entry/select?fl=pdb_id,score&q=human&start=0&rows=50000
Resolving localhost (localhost)... ::1, 127.0.0.1
Connecting to localhost (localhost)|::1|:8983... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [application/json]
Saving to: 'output.txt'

output.txt                                                                 [
<=>                                                                                                                                                                      
]   2.90M  16.1MB/s    in 0.2s    





--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Erick Erickson
Glad it’s working out for you. There are a couple of things here that bear a bit more investigation.

Using SSDs shouldn’t materially affect the response if:

1> the searcher is warmed. Before trying your query, execute a few queries like “q="some search that hits a log of docs"&sort=myfield asc”

2> Your Solr instance isn't swapping.

What’s not making sense is that once docValues are read into memory, there is _no_ disk access necessary, assuming the DV structure for the field has not been swapped out.

Things that may be getting in the way:

- you are asking for _any_ fields to be returned that are not docValues

- you are not getting the docValues (useDocValuesAsStored=true)

- your Solr instance is swapping. DocValues data is kept in the OS memory space, see: http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

- you haven’t warmed up your searchers to read these values off disk before you measure.

Your results are in line with expectations, but can’t account for the difference between your old system and new. Perhaps when you re-indexed you had some docValues that weren’t before?

FWIW,
Erick

> On Mar 25, 2019, at 10:44 AM, Wendy2 <[hidden email]> wrote:
>
> Hi Eric,
>
> Thank you very much for your response! I tried
>
> "Try this:
> 1> insure docValues=true for the field. You’ll have to re-index all your
> docs. "
>
> I tried the above approach as you recommended, the performance was getting
> better, reduced about 3 seconds.
>
> Then I tested on a new cloud server with local SSD for one core on Solr, the
> performance was great.
> With 50000 rows to retrieve, the response time was 0.2s, which is better
> than our acceptance criteria :-)
> So happy.  Thank you!
>
> =================testing====================
> wget -O output.txt
> 'http://localhost:8983/solr/s_entry/select?fl=pdb_id,score&q=human&start=0&rows=50000'
> --2019-03-25 10:23:21--
> http://localhost:8983/solr/s_entry/select?fl=pdb_id,score&q=human&start=0&rows=50000
> Resolving localhost (localhost)... ::1, 127.0.0.1
> Connecting to localhost (localhost)|::1|:8983... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: unspecified [application/json]
> Saving to: 'output.txt'
>
> output.txt                                                                 [
> <=>                                                                                                                                                                      
> ]   2.90M  16.1MB/s    in 0.2s    
>
>
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply | Threaded
Open this post in threaded view
|

Re: Help with slow retrieving data

Wendy2
Hi Eric,

Thank you for your response!  

On the old system, I changed to use docValues=true, and had better
performance. But the searcher was not warmed before I measured it. Also the
local disk was too small so I used an attached volume which turned out was a
big cause of the slow retrieve.

On the new system, I didn't use docValues=true, but used SSD, so the
retrieve was much much faster.

In both cases, the QTime were good.

I will keep tuning the performance for sorting, facets, etc.

Thanks and all the best!



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html