Solr on a multiprocessor machine

classic Classic list List threaded Threaded
18 messages Options
Reply | Threaded
Open this post in threaded view
|

Solr on a multiprocessor machine

harish.agarwal
Hi All,

I'm very new to Solr, and also fairly new Java and servlet containers, etc.  I'm trying to set up Solr on a single machine with a distributed index.  My current implementation uses Tomcat as a servlet container with multiple instances of Solr being served.  Each instance of Solr is a shard of my index.  I have two shards as I'm running this setup on a dual processor machine.

I'm benchmarking this setup and would expect that, given the two processors, search times for one shard would be roughly the same as search times across both.  However, I'm finding that search times are about double when searching across both shards, which seems to indicate that both processors are not being used efficiently.

Does anyone have any advice on what I might be doing wrong?  Spreading things across different machines is not an option.  Both indexes are being run off of the same disk, but this was also the case with my prior distributed search solution (using sphinx) in which I did see the expected performance boost.

Thanks,
-Harish
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
Distributed search requires more work (more than one pass.)  If you
weren't CPU bound to begin with, it's definitely going to make things
worse by splitting up the index on the same box.

-Yonik

On Thu, Jan 8, 2009 at 3:53 PM, smock <[hidden email]> wrote:

>
> Hi All,
>
> I'm very new to Solr, and also fairly new Java and servlet containers, etc.
> I'm trying to set up Solr on a single machine with a distributed index.  My
> current implementation uses Tomcat as a servlet container with multiple
> instances of Solr being served.  Each instance of Solr is a shard of my
> index.  I have two shards as I'm running this setup on a dual processor
> machine.
>
> I'm benchmarking this setup and would expect that, given the two processors,
> search times for one shard would be roughly the same as search times across
> both.  However, I'm finding that search times are about double when
> searching across both shards, which seems to indicate that both processors
> are not being used efficiently.
>
> Does anyone have any advice on what I might be doing wrong?  Spreading
> things across different machines is not an option.  Both indexes are being
> run off of the same disk, but this was also the case with my prior
> distributed search solution (using sphinx) in which I did see the expected
> performance boost.
>
> Thanks,
> -Harish
> --
> View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21360747.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
Hi Yonik,

Thanks for the reply - could you please give me some more details on what you mean?  I was able to obtain a performance boost by distributing Sphinx on the same box, with multiple processors.  Each instance of sphinx ran on a different processor, and given that there was a performance boost, it seems like search was cpu bound - at least in that case.  I realize that Solr is sufficiently different from Sphinx, so I may not see the same boost, but I'm not sure I understand why this is the case.  As is, I may have to stick with sphinx, but I'd really like to move thing over to Solr.

Thanks again,
-Harish


yonik wrote
Distributed search requires more work (more than one pass.)  If you
weren't CPU bound to begin with, it's definitely going to make things
worse by splitting up the index on the same box.

-Yonik

On Thu, Jan 8, 2009 at 3:53 PM, smock <harish.agarwal@gmail.com> wrote:
>
> Hi All,
>
> I'm very new to Solr, and also fairly new Java and servlet containers, etc.
> I'm trying to set up Solr on a single machine with a distributed index.  My
> current implementation uses Tomcat as a servlet container with multiple
> instances of Solr being served.  Each instance of Solr is a shard of my
> index.  I have two shards as I'm running this setup on a dual processor
> machine.
>
> I'm benchmarking this setup and would expect that, given the two processors,
> search times for one shard would be roughly the same as search times across
> both.  However, I'm finding that search times are about double when
> searching across both shards, which seems to indicate that both processors
> are not being used efficiently.
>
> Does anyone have any advice on what I might be doing wrong?  Spreading
> things across different machines is not an option.  Both indexes are being
> run off of the same disk, but this was also the case with my prior
> distributed search solution (using sphinx) in which I did see the expected
> performance boost.
>
> Thanks,
> -Harish
> --
> View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21360747.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
On Thu, Jan 8, 2009 at 4:51 PM, smock <[hidden email]> wrote:
> Thanks for the reply - could you please give me some more details on what
> you mean?

If there isn't enough memory to cache the index in RAM, then your
bottleneck could be from retrieving stored fields from disk.
Distributed search will make this much worse because you have 2 JVMs
eating up memory instead of one, further lowering the cache hit ratio
of the OS disk cache.

With a 2 CPU machine, a single Solr index is advisable, esp for web
traffic where there will be plenty of requests to keep both CPUs busy.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
Assuming I have enough RAM then, should I be able to get a performance boost with my current setup?  Basically, the question I am trying to answer is - will the Tomcat+Solr setup I have above utilize multiple processors or do I need to do something else (like having a different tomcat instance for each Solr shard)?

Also - and this question comes purely out of my own ignorance of how the Tomcat/Solr relationship works - right now I'm starting Tomcat specifying the maximum memory size.  I'm also setting cache parameters in solrconfig.xml for each solr instance to half of what I would for a full size index.  Shouldn't the JVMs for both instances use roughly the same total amount of memory as 1 JVM for the full size index?

While I'm testing things out on a 2 processor machine, I'll eventually be using an 8 proc. machine with plenty of RAM to cache the index in RAM.  I'm not super worried about requests/sec. right now - I'd rather each individual search be faster, which is why I'm interested in distributing the index across my 8 procs.

Thanks very much!
-Harish


yonik wrote
On Thu, Jan 8, 2009 at 4:51 PM, smock <harish.agarwal@gmail.com> wrote:
> Thanks for the reply - could you please give me some more details on what
> you mean?

If there isn't enough memory to cache the index in RAM, then your
bottleneck could be from retrieving stored fields from disk.
Distributed search will make this much worse because you have 2 JVMs
eating up memory instead of one, further lowering the cache hit ratio
of the OS disk cache.

With a 2 CPU machine, a single Solr index is advisable, esp for web
traffic where there will be plenty of requests to keep both CPUs busy.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Walter Underwood, Netflix
Solr will use multiple processors. Most of your speed will come from
cached responses. Use a single instance, test with real query logs,
and tune the cache sizes by looking at the cache hit statistics in
the statistics page of the Solr admin UI.

wunder

On 1/8/09 3:37 PM, "smock" <[hidden email]> wrote:

>
> Assuming I have enough RAM then, should I be able to get a performance boost
> with my current setup?  Basically, the question I am trying to answer is -
> will the Tomcat+Solr setup I have above utilize multiple processors or do I
> need to do something else (like having a different tomcat instance for each
> Solr shard)?
>
> Also - and this question comes purely out of my own ignorance of how the
> Tomcat/Solr relationship works - right now I'm starting Tomcat specifying
> the maximum memory size.  I'm also setting cache parameters in
> solrconfig.xml for each solr instance to half of what I would for a full
> size index.  Shouldn't the JVMs for both instances use roughly the same
> total amount of memory as 1 JVM for the full size index?
>
> While I'm testing things out on a 2 processor machine, I'll eventually be
> using an 8 proc. machine with plenty of RAM to cache the index in RAM.  I'm
> not super worried about requests/sec. right now - I'd rather each individual
> search be faster, which is why I'm interested in distributing the index
> across my 8 procs.
>
> Thanks very much!
> -Harish
>
>
>
> yonik wrote:
>>
>> On Thu, Jan 8, 2009 at 4:51 PM, smock <[hidden email]> wrote:
>>> Thanks for the reply - could you please give me some more details on what
>>> you mean?
>>
>> If there isn't enough memory to cache the index in RAM, then your
>> bottleneck could be from retrieving stored fields from disk.
>> Distributed search will make this much worse because you have 2 JVMs
>> eating up memory instead of one, further lowering the cache hit ratio
>> of the OS disk cache.
>>
>> With a 2 CPU machine, a single Solr index is advisable, esp for web
>> traffic where there will be plenty of requests to keep both CPUs busy.
>>
>> -Yonik


Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Mike Klaas
In reply to this post by harish.agarwal
On 8-Jan-09, at 3:37 PM, smock wrote:

>
> Assuming I have enough RAM then, should I be able to get a  
> performance boost
> with my current setup?  Basically, the question I am trying to  
> answer is -
> will the Tomcat+Solr setup I have above utilize multiple processors  
> or do I
> need to do something else (like having a different tomcat instance  
> for each
> Solr shard)?
>
> Also - and this question comes purely out of my own ignorance of how  
> the
> Tomcat/Solr relationship works - right now I'm starting Tomcat  
> specifying
> the maximum memory size.  I'm also setting cache parameters in
> solrconfig.xml for each solr instance to half of what I would for a  
> full
> size index.  Shouldn't the JVMs for both instances use roughly the  
> same
> total amount of memory as 1 JVM for the full size index?
>
> While I'm testing things out on a 2 processor machine, I'll  
> eventually be
> using an 8 proc. machine with plenty of RAM to cache the index in  
> RAM.  I'm
> not super worried about requests/sec. right now - I'd rather each  
> individual
> search be faster, which is why I'm interested in distributing the  
> index
> across my 8 procs.

As Yonik mentioned, it depends greatly on the size of the index/RAM  
ratio.  I don't see any reason why, in theory, two Solrs in a single  
Tomcat could not both work on a single query in parallel, but I've  
never tried it.  I _have_ had success sharding Solr on a single using  
a webapp container per Solr instance (in my case, Jetty).

Note that if these instances are sharing a single disk, and your RAM  
is low, then they will be competing over the slowest resource on your  
machine and the query could be IO bound, in which case sharding is  
useless.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
Mike,

I should have more than enough RAM to fit the index in, I don't think my searches will be IO bound.

One question - just to make sure I understand - did you use one Jetty instance per shard?  In my case, what I'm doing is using one Tomcat instance to run multiple Solr webapps.  I'm not sure if this makes a difference, in term of processor usage as I don't understand the internal workings of Tomcat serving up Solr (in other words, if Tomcat will be able to run the different Solr instances on different processors, or if its all bound to the processor Tomcat is using).

Thanks for your help!
-Harish

Mike Klaas wrote
On 8-Jan-09, at 3:37 PM, smock wrote:

>
> Assuming I have enough RAM then, should I be able to get a  
> performance boost
> with my current setup?  Basically, the question I am trying to  
> answer is -
> will the Tomcat+Solr setup I have above utilize multiple processors  
> or do I
> need to do something else (like having a different tomcat instance  
> for each
> Solr shard)?
>
> Also - and this question comes purely out of my own ignorance of how  
> the
> Tomcat/Solr relationship works - right now I'm starting Tomcat  
> specifying
> the maximum memory size.  I'm also setting cache parameters in
> solrconfig.xml for each solr instance to half of what I would for a  
> full
> size index.  Shouldn't the JVMs for both instances use roughly the  
> same
> total amount of memory as 1 JVM for the full size index?
>
> While I'm testing things out on a 2 processor machine, I'll  
> eventually be
> using an 8 proc. machine with plenty of RAM to cache the index in  
> RAM.  I'm
> not super worried about requests/sec. right now - I'd rather each  
> individual
> search be faster, which is why I'm interested in distributing the  
> index
> across my 8 procs.

As Yonik mentioned, it depends greatly on the size of the index/RAM  
ratio.  I don't see any reason why, in theory, two Solrs in a single  
Tomcat could not both work on a single query in parallel, but I've  
never tried it.  I _have_ had success sharding Solr on a single using  
a webapp container per Solr instance (in my case, Jetty).

Note that if these instances are sharing a single disk, and your RAM  
is low, then they will be competing over the slowest resource on your  
machine and the query could be IO bound, in which case sharding is  
useless.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
On Thu, Jan 8, 2009 at 9:25 PM, smock <[hidden email]> wrote:
> I should have more than enough RAM to fit the index in, I don't think my
> searches will be IO bound.

There is still overhead to distributed search - if the actual CPU
bound search/faceting stuff isn't your bottleneck, or if the index is
too small, the overhead won't be a net win.  Distributed search was
not really designed to utilize multiple processors (we should probably
do that in a single Solr server if needed), it was designed to go
across multiple boxes.

-Yonik


> One question - just to make sure I understand - did you use one Jetty
> instance per shard?  In my case, what I'm doing is using one Tomcat instance
> to run multiple Solr webapps.  I'm not sure if this makes a difference, in
> term of processor usage as I don't understand the internal workings of
> Tomcat serving up Solr (in other words, if Tomcat will be able to run the
> different Solr instances on different processors, or if its all bound to the
> processor Tomcat is using).
>
> Thanks for your help!
> -Harish
>
>
> Mike Klaas wrote:
>>
>> On 8-Jan-09, at 3:37 PM, smock wrote:
>>
>>>
>>> Assuming I have enough RAM then, should I be able to get a
>>> performance boost
>>> with my current setup?  Basically, the question I am trying to
>>> answer is -
>>> will the Tomcat+Solr setup I have above utilize multiple processors
>>> or do I
>>> need to do something else (like having a different tomcat instance
>>> for each
>>> Solr shard)?
>>>
>>> Also - and this question comes purely out of my own ignorance of how
>>> the
>>> Tomcat/Solr relationship works - right now I'm starting Tomcat
>>> specifying
>>> the maximum memory size.  I'm also setting cache parameters in
>>> solrconfig.xml for each solr instance to half of what I would for a
>>> full
>>> size index.  Shouldn't the JVMs for both instances use roughly the
>>> same
>>> total amount of memory as 1 JVM for the full size index?
>>>
>>> While I'm testing things out on a 2 processor machine, I'll
>>> eventually be
>>> using an 8 proc. machine with plenty of RAM to cache the index in
>>> RAM.  I'm
>>> not super worried about requests/sec. right now - I'd rather each
>>> individual
>>> search be faster, which is why I'm interested in distributing the
>>> index
>>> across my 8 procs.
>>
>> As Yonik mentioned, it depends greatly on the size of the index/RAM
>> ratio.  I don't see any reason why, in theory, two Solrs in a single
>> Tomcat could not both work on a single query in parallel, but I've
>> never tried it.  I _have_ had success sharding Solr on a single using
>> a webapp container per Solr instance (in my case, Jetty).
>>
>> Note that if these instances are sharing a single disk, and your RAM
>> is low, then they will be competing over the slowest resource on your
>> machine and the query could be IO bound, in which case sharding is
>> useless.
>>
>> -Mike
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21365126.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
Yonik,

I don't mean to be argumentative - just trying to understand, what is the difference between distributed search across processors, and distributed search across boxes (again, assuming that my searches are truly CPU bound)?  My only basis for comparison is sphinx, which I was able to get to run in parallel across multiple processors just the same as I would across boxes.  With sphinx there was overhead as well in farming out the searches and then combining the results, but as the bulk of the time (for the kind of searches and the kind of index I'm running) was spent processing, it was a net win (I saw roughly a factor of n speedup, where n was the number of processors/shards).

Thanks again, for all your help, this has been really useful so far.
-Harish

yonik wrote
On Thu, Jan 8, 2009 at 9:25 PM, smock <harish.agarwal@gmail.com> wrote:
> I should have more than enough RAM to fit the index in, I don't think my
> searches will be IO bound.

There is still overhead to distributed search - if the actual CPU
bound search/faceting stuff isn't your bottleneck, or if the index is
too small, the overhead won't be a net win.  Distributed search was
not really designed to utilize multiple processors (we should probably
do that in a single Solr server if needed), it was designed to go
across multiple boxes.

-Yonik


> One question - just to make sure I understand - did you use one Jetty
> instance per shard?  In my case, what I'm doing is using one Tomcat instance
> to run multiple Solr webapps.  I'm not sure if this makes a difference, in
> term of processor usage as I don't understand the internal workings of
> Tomcat serving up Solr (in other words, if Tomcat will be able to run the
> different Solr instances on different processors, or if its all bound to the
> processor Tomcat is using).
>
> Thanks for your help!
> -Harish
>
>
> Mike Klaas wrote:
>>
>> On 8-Jan-09, at 3:37 PM, smock wrote:
>>
>>>
>>> Assuming I have enough RAM then, should I be able to get a
>>> performance boost
>>> with my current setup?  Basically, the question I am trying to
>>> answer is -
>>> will the Tomcat+Solr setup I have above utilize multiple processors
>>> or do I
>>> need to do something else (like having a different tomcat instance
>>> for each
>>> Solr shard)?
>>>
>>> Also - and this question comes purely out of my own ignorance of how
>>> the
>>> Tomcat/Solr relationship works - right now I'm starting Tomcat
>>> specifying
>>> the maximum memory size.  I'm also setting cache parameters in
>>> solrconfig.xml for each solr instance to half of what I would for a
>>> full
>>> size index.  Shouldn't the JVMs for both instances use roughly the
>>> same
>>> total amount of memory as 1 JVM for the full size index?
>>>
>>> While I'm testing things out on a 2 processor machine, I'll
>>> eventually be
>>> using an 8 proc. machine with plenty of RAM to cache the index in
>>> RAM.  I'm
>>> not super worried about requests/sec. right now - I'd rather each
>>> individual
>>> search be faster, which is why I'm interested in distributing the
>>> index
>>> across my 8 procs.
>>
>> As Yonik mentioned, it depends greatly on the size of the index/RAM
>> ratio.  I don't see any reason why, in theory, two Solrs in a single
>> Tomcat could not both work on a single query in parallel, but I've
>> never tried it.  I _have_ had success sharding Solr on a single using
>> a webapp container per Solr instance (in my case, Jetty).
>>
>> Note that if these instances are sharing a single disk, and your RAM
>> is low, then they will be competing over the slowest resource on your
>> machine and the query could be IO bound, in which case sharding is
>> useless.
>>
>> -Mike
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21365126.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
On Thu, Jan 8, 2009 at 10:03 PM, smock <[hidden email]> wrote:
> I don't mean to be argumentative - just trying to understand, what is the
> difference between distributed search across processors, and distributed
> search across boxes (again, assuming that my searches are truly CPU bound)?

Even if your searches are CPU bound, there is CPU and IO overhead in
distributed search.

time_for_whole_index
  vs
time_for_half_index + distributed_search_overhead

Distributed search is optimized for the case when the index is so big
that one *must* distribute it across multiple shards.  It works in
multiple phases, first only collecting and merging the document ids,
and then requesting stored fields for the top documents in another
phase.  It's also optimized for total throughput of the whole system.

If one was optimizing for response time with smaller documents and
single requests, then merging results in a single shot would yield
better results.

If you load test a distributed vs non-distributed system on a single
box, the distributed will normally lose.  This is because to find the
top 10 documents in general, one must retrieve the top 10 documents
from each shard - more work is done.  Single request latency *can* be
shorter under the right circumstances, but under load it will always
lose since more work is done in aggregate.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
Hi Yonik,
I see, I didn't realize that there was a 2nd phase to retrieve stored values.  Sphinx also queries the top n number of documents and combines the results - unless the algorithm is very different, I wouldn't expect that this adds a lot of overhead as sphinx has a very definite performance boost when distributing.

It seems to me that all of the points you're making below apply just as well to distributing across multiple boxes - where does the issue of doing the distribution on a single box come into play?  Anecdotally, everything you're saying completely meshes with my load testing of Solr (single full index is performing better than the distributed index).  I may have to stick with Sphinx, though, if I can't boost the performance of Solr on a single box.

-Harish


yonik wrote
On Thu, Jan 8, 2009 at 10:03 PM, smock <harish.agarwal@gmail.com> wrote:
> I don't mean to be argumentative - just trying to understand, what is the
> difference between distributed search across processors, and distributed
> search across boxes (again, assuming that my searches are truly CPU bound)?

Even if your searches are CPU bound, there is CPU and IO overhead in
distributed search.

time_for_whole_index
  vs
time_for_half_index + distributed_search_overhead

Distributed search is optimized for the case when the index is so big
that one *must* distribute it across multiple shards.  It works in
multiple phases, first only collecting and merging the document ids,
and then requesting stored fields for the top documents in another
phase.  It's also optimized for total throughput of the whole system.

If one was optimizing for response time with smaller documents and
single requests, then merging results in a single shot would yield
better results.

If you load test a distributed vs non-distributed system on a single
box, the distributed will normally lose.  This is because to find the
top 10 documents in general, one must retrieve the top 10 documents
from each shard - more work is done.  Single request latency *can* be
shorter under the right circumstances, but under load it will always
lose since more work is done in aggregate.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
Maybe we should back up a bit and look at your requirements: both
query latency and throughput.
If the index is small enough, distributed search is definitely not the
first step to take to address performance issues - there are many
other things to look into first.

Start by looking at what queries are slowest, and we may be able to
help speed them up through some optimizations.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
Hi Yonik,

In some ways I have a 'small index'  (~8 million documents at the moment).  However, I have a lot of attributes (currently about 30, but I'm expecting that number to keep growing) and am interested in faceting across all of them for every search (on a completely unrelated note, if you have any idea if setting facet.fields to 'all' is an option, please let me know how to do it) - this is where performance started to suffer when I was using sphinx.  Search times increased quite a bit, proportional to the number of hits returned by a search (because the number of hits is directly related to the facet computation time).  I found with sphinx that distributing my index was a big win when doing these faceted searches because every node had to deal with less facets per index.

In addition, while I'm okay with depending on intermediate caching (documentCaches, filterCaches, etc.) to help speed up searches - I would like every first search to be as fast as possible.  My index sees a lot of unique queries and I don't want to depend on a querycache to speed things up.

-Harish


yonik wrote
Maybe we should back up a bit and look at your requirements: both
query latency and throughput.
If the index is small enough, distributed search is definitely not the
first step to take to address performance issues - there are many
other things to look into first.

Start by looking at what queries are slowest, and we may be able to
help speed them up through some optimizations.

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
Are you on Solr 1.3 or a recent nightly build?  The development
version of 1.4 has a number of scalability enhancements.

-Yonik

On Fri, Jan 9, 2009 at 12:18 AM, smock <[hidden email]> wrote:

>
> Hi Yonik,
>
> In some ways I have a 'small index'  (~8 million documents at the moment).
> However, I have a lot of attributes (currently about 30, but I'm expecting
> that number to keep growing) and am interested in faceting across all of
> them for every search (on a completely unrelated note, if you have any idea
> if setting facet.fields to 'all' is an option, please let me know how to do
> it) - this is where performance started to suffer when I was using sphinx.
> Search times increased quite a bit, proportional to the number of hits
> returned by a search (because the number of hits is directly related to the
> facet computation time).  I found with sphinx that distributing my index was
> a big win when doing these faceted searches because every node had to deal
> with less facets per index.
>
> In addition, while I'm okay with depending on intermediate caching
> (documentCaches, filterCaches, etc.) to help speed up searches - I would
> like every first search to be as fast as possible.  My index sees a lot of
> unique queries and I don't want to depend on a querycache to speed things
> up.
>
> -Harish
>
>
>
> yonik wrote:
>>
>> Maybe we should back up a bit and look at your requirements: both
>> query latency and throughput.
>> If the index is small enough, distributed search is definitely not the
>> first step to take to address performance issues - there are many
>> other things to look into first.
>>
>> Start by looking at what queries are slowest, and we may be able to
>> help speed them up through some optimizations.
>>
>> -Yonik
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21366406.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

harish.agarwal
I'm using 1.3 - are the nightly builds stable enough to use in production?

yonik wrote
Are you on Solr 1.3 or a recent nightly build?  The development
version of 1.4 has a number of scalability enhancements.

-Yonik

On Fri, Jan 9, 2009 at 12:18 AM, smock <harish.agarwal@gmail.com> wrote:
>
> Hi Yonik,
>
> In some ways I have a 'small index'  (~8 million documents at the moment).
> However, I have a lot of attributes (currently about 30, but I'm expecting
> that number to keep growing) and am interested in faceting across all of
> them for every search (on a completely unrelated note, if you have any idea
> if setting facet.fields to 'all' is an option, please let me know how to do
> it) - this is where performance started to suffer when I was using sphinx.
> Search times increased quite a bit, proportional to the number of hits
> returned by a search (because the number of hits is directly related to the
> facet computation time).  I found with sphinx that distributing my index was
> a big win when doing these faceted searches because every node had to deal
> with less facets per index.
>
> In addition, while I'm okay with depending on intermediate caching
> (documentCaches, filterCaches, etc.) to help speed up searches - I would
> like every first search to be as fast as possible.  My index sees a lot of
> unique queries and I don't want to depend on a querycache to speed things
> up.
>
> -Harish
>
>
>
> yonik wrote:
>>
>> Maybe we should back up a bit and look at your requirements: both
>> query latency and throughput.
>> If the index is small enough, distributed search is definitely not the
>> first step to take to address performance issues - there are many
>> other things to look into first.
>>
>> Start by looking at what queries are slowest, and we may be able to
>> help speed them up through some optimizations.
>>
>> -Yonik
>>
>>
>
> --
> View this message in context: http://www.nabble.com/Solr-on-a-multiprocessor-machine-tp21360747p21366406.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Erik Hatcher

On Jan 9, 2009, at 12:28 AM, smock wrote:
> I'm using 1.3 - are the nightly builds stable enough to use in  
> production?

Testing always recommended, and no official guarantees are made of  
course, but trunk is vastly superior to 1.3 in faceting performance.  
I'd use trunk (in fact I am) in production.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: Solr on a multiprocessor machine

Yonik Seeley
In reply to this post by harish.agarwal
On Fri, Jan 9, 2009 at 12:18 AM, smock <[hidden email]> wrote:
> In some ways I have a 'small index'  (~8 million documents at the moment).
> However, I have a lot of attributes (currently about 30, but I'm expecting
> that number to keep growing) and am interested in faceting across all of
> them for every search

OK, this is where you will become CPU bound (faceting on 30 fields).
But if you will have any search traffic at all, you are better off
going with non-distributed search on a single box over distributed on
a single box.

Distributed search needs to do more work than non-distributed for
faceting also (in the form of over-requesting and facet refinement
requests).  If you are interested in why this extra work needs to be
done, search form "refinement" in
https://issues.apache.org/jira/browse/SOLR-303

-Yonik