Solr RPS is painfully low

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr RPS is painfully low

Alex Benjamen
Hello,
 
I have a situation where I'm using solr with a 3Gb complete index (in ram) on a dual-core
AMD machine, and I'm only getting about 1.3rps on cold queries (which for most part there
is little chance for the query to be identical)
 
Is this normal? The index contains about 20MM documents and I have 16Gb RAM. When I
perform the load test the CPU hits 100%.
 
Here's a typical query:
gender:f AND ( friends:y )  AND  country:us AND age:(18 || 19 || 20 || 21) AND photos:y
 
On average the result set is a few hundred thousand - is there any way to optimize such a query
or how can I get a better RPS? 1.3 rps is way too low for an index that fits completely into RAM
 
So after some thoughts on how to reduce the number of the documents in the index (which is the single
biggest factor in CPU usage) I've decided to split up the users by country, which gives me a somewhat
uneven distribution of users. Even after doing this, I'm getting only about 8 RPS across 5 solr instances
running on different ports (with all of the indexes in RAM) Each index contains between 4-8 MM docs.
I guess I could go with quad core and get 16RPS, but the question that comes to my mind is whether
this is an acceptable RPS for the size of index. (The total physical index size is around 1.3GB which
is all on a ramdisk in memory). I'm positive that I can get a better RPS by "splitting" the index further,
into smaller document sets, but this is undesired as it limits functionality

Note: due to the nature of the search which I'm doing, it's very inlikely that I will be able to achive
more than 20% cache hit ratio in the queryResultCache  

Thanks,
-Alex
 
 
Reply | Threaded
Open this post in threaded view
|

Re: Solr RPS is painfully low

Walter Underwood, Netflix
How many rows are you requesting? Are you sorting? --wunder

On 1/2/08 4:09 PM, "Alex Benjamen" <[hidden email]> wrote:

> Hello,
>  
> I have a situation where I'm using solr with a 3Gb complete index (in ram) on
> a dual-core
> AMD machine, and I'm only getting about 1.3rps on cold queries (which for most
> part there
> is little chance for the query to be identical)
>  
> Is this normal? The index contains about 20MM documents and I have 16Gb RAM.
> When I
> perform the load test the CPU hits 100%.
>  
> Here's a typical query:
> gender:f AND ( friends:y )  AND  country:us AND age:(18 || 19 || 20 || 21) AND
> photos:y
>  
> On average the result set is a few hundred thousand - is there any way to
> optimize such a query
> or how can I get a better RPS? 1.3 rps is way too low for an index that fits
> completely into RAM
>  
> So after some thoughts on how to reduce the number of the documents in the
> index (which is the single
> biggest factor in CPU usage) I've decided to split up the users by country,
> which gives me a somewhat
> uneven distribution of users. Even after doing this, I'm getting only about 8
> RPS across 5 solr instances
> running on different ports (with all of the indexes in RAM) Each index
> contains between 4-8 MM docs.
> I guess I could go with quad core and get 16RPS, but the question that comes
> to my mind is whether
> this is an acceptable RPS for the size of index. (The total physical index
> size is around 1.3GB which
> is all on a ramdisk in memory). I'm positive that I can get a better RPS by
> "splitting" the index further,
> into smaller document sets, but this is undesired as it limits functionality
>
> Note: due to the nature of the search which I'm doing, it's very inlikely that
> I will be able to achive
> more than 20% cache hit ratio in the queryResultCache
>
> Thanks,
> -Alex
>  
>  

Reply | Threaded
Open this post in threaded view
|

Re: Solr RPS is painfully low

sfox-2
In reply to this post by Alex Benjamen
Are you (or have you tried) breaking these queries up as a set of  
filter queries?

fq=gender:f&fq=( friends:y )&fq= country:us&fq= age:(18 || 19 || 20  
|| 21)&fq=photos:y

(mod correct syntax)

Should get you the same result but each fq is cached separately as a  
bitset and future queries that have similar limits (gender:f)
will take advantage of the bitset rather than having to do the actual  
query.

Don't know if this applies to your situation, but it might help a lot.

Sean

On Jan 2, 2008, at 6:09 PM, Alex Benjamen wrote:

> Hello,
>
> I have a situation where I'm using solr with a 3Gb complete index  
> (in ram) on a dual-core
> AMD machine, and I'm only getting about 1.3rps on cold queries  
> (which for most part there
> is little chance for the query to be identical)
>
> Is this normal? The index contains about 20MM documents and I have  
> 16Gb RAM. When I
> perform the load test the CPU hits 100%.
>
> Here's a typical query:
> gender:f AND ( friends:y )  AND  country:us AND age:(18 || 19 || 20  
> || 21) AND photos:y
>
> On average the result set is a few hundred thousand - is there any  
> way to optimize such a query
> or how can I get a better RPS? 1.3 rps is way too low for an index  
> that fits completely into RAM
>
> So after some thoughts on how to reduce the number of the documents  
> in the index (which is the single
> biggest factor in CPU usage) I've decided to split up the users by  
> country, which gives me a somewhat
> uneven distribution of users. Even after doing this, I'm getting  
> only about 8 RPS across 5 solr instances
> running on different ports (with all of the indexes in RAM) Each  
> index contains between 4-8 MM docs.
> I guess I could go with quad core and get 16RPS, but the question  
> that comes to my mind is whether
> this is an acceptable RPS for the size of index. (The total  
> physical index size is around 1.3GB which
> is all on a ramdisk in memory). I'm positive that I can get a  
> better RPS by "splitting" the index further,
> into smaller document sets, but this is undesired as it limits  
> functionality
>
> Note: due to the nature of the search which I'm doing, it's very  
> inlikely that I will be able to achive
> more than 20% cache hit ratio in the queryResultCache
>
> Thanks,
> -Alex
>
>

Reply | Threaded
Open this post in threaded view
|

RE: Solr RPS is painfully low

Alex Benjamen
In reply to this post by Alex Benjamen
Walter:
 
>How many rows are you requesting? Are you sorting? --wunder
 
I'm only requesting 20 rows, and I'm not specifically sorting by any field. Does solr
automatically induce sort by default, and if so, how do I disable it?

Thanks,
Alex

 
Reply | Threaded
Open this post in threaded view
|

Re: Solr RPS is painfully low

hossman
In reply to this post by sfox-2

: fq=gender:f&fq=( friends:y )&fq= country:us&fq= age:(18 || 19 || 20 ||
: 21)&fq=photos:y

that would be my suggestion based on waht i'm guessing your typical use
cases are ... but it's really hard to infer patterns from only a single
example URL.

the queryResultCache isn't nearly as interesting in cases like this as the
filterCache is ... your filterCache doesn't even need to be very big to
give you huge wins for the type of use cases i'm guessing you have.



-Hoss

Reply | Threaded
Open this post in threaded view
|

RE: Solr RPS is painfully low

hossman
In reply to this post by Alex Benjamen

: I'm only requesting 20 rows, and I'm not specifically sorting by any field. Does solr
: automatically induce sort by default, and if so, how do I disable it?

default sorting is by score, which is cheap ... walter's question was
mainly to verify that you are not sorting sice it is expensive (we
have to make guesses as to what might be causing you problems in
the absence of seeing your configs or full URLs)

-Hoss