Real-time replication

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Real-time replication

John Reuning-2
Apologies if this has been covered.  I searched the archives and didn't
see a thread on this topic.

Has anyone experimented with a near real-time replication scheme similar
to RDBMS replication?  There's large efficiency in using rsync to copy
the lucene index files to slaves, but what if you want index changes to
propagate in a few seconds instead of a few minutes?

Is it feasible to make a solr manager take update requests and send them
to slaves as it receives them?  (I guess maybe they're not really slaves
in this case.)  The manager might issue commits every 10-30 seconds to
reduce the write load.  Write overhead still exists on all read servers,
but at least the read requests are spread across the pool.

Thanks,

-John R.
Reply | Threaded
Open this post in threaded view
|

Re: Real-time replication

Matthew Runo
The only problem that I see possibly happening is that you may end up  
committing more often than SOLR can open/prewarm new searchers. This  
happens in the peak of the day on our servers - leaving us with 5-10  
searchers just hanging out waiting for prewarm to be up - only be  
closed as soon as they're registered because there's already another  
searcher waiting behind it.

That said, I need to tune my cache. A lot.

+--------------------------------------------------------+
  | Matthew Runo
  | Zappos Development
  | [hidden email]
  | 702-943-7833
+--------------------------------------------------------+


On Oct 4, 2007, at 9:07 AM, John Reuning wrote:

> Apologies if this has been covered.  I searched the archives and  
> didn't see a thread on this topic.
>
> Has anyone experimented with a near real-time replication scheme  
> similar to RDBMS replication?  There's large efficiency in using  
> rsync to copy the lucene index files to slaves, but what if you  
> want index changes to propagate in a few seconds instead of a few  
> minutes?
>
> Is it feasible to make a solr manager take update requests and send  
> them to slaves as it receives them?  (I guess maybe they're not  
> really slaves in this case.)  The manager might issue commits every  
> 10-30 seconds to reduce the write load.  Write overhead still  
> exists on all read servers, but at least the read requests are  
> spread across the pool.
>
> Thanks,
>
> -John R.
>

Reply | Threaded
Open this post in threaded view
|

Re: Real-time replication

Walter Underwood, Netflix
We don't use Solr replication. Each server is independent and
does its own indexing. This has several advantages:

* all installations are identical
* no single point of failure
* no inter-server version or config dependencies
* we can run a different version or config on one server for testing

The drawbacks are:

* 4X the DB accesses to get content
* each server has a CPU spike during indexing (we stagger that)

When we finally move to Solr 1.2 (or 1.3 if we wait long enough),
we can install it on one server and watch the performance. No need
to worry about different versions of Lucene.

Matthew: If your Searchers are only open for a short while, don't pre-warm.
Pre-warming is an optimization, not a necessity.

wunder


On 10/4/07 9:32 AM, "Matthew Runo" <[hidden email]> wrote:

> The only problem that I see possibly happening is that you may end up
> committing more often than SOLR can open/prewarm new searchers. This
> happens in the peak of the day on our servers - leaving us with 5-10
> searchers just hanging out waiting for prewarm to be up - only be
> closed as soon as they're registered because there's already another
> searcher waiting behind it.
>
> That said, I need to tune my cache. A lot.
>
> +--------------------------------------------------------+
>   | Matthew Runo
>   | Zappos Development
>   | [hidden email]
>   | 702-943-7833
> +--------------------------------------------------------+
>
>
> On Oct 4, 2007, at 9:07 AM, John Reuning wrote:
>
>> Apologies if this has been covered.  I searched the archives and
>> didn't see a thread on this topic.
>>
>> Has anyone experimented with a near real-time replication scheme
>> similar to RDBMS replication?  There's large efficiency in using
>> rsync to copy the lucene index files to slaves, but what if you
>> want index changes to propagate in a few seconds instead of a few
>> minutes?
>>
>> Is it feasible to make a solr manager take update requests and send
>> them to slaves as it receives them?  (I guess maybe they're not
>> really slaves in this case.)  The manager might issue commits every
>> 10-30 seconds to reduce the write load.  Write overhead still
>> exists on all read servers, but at least the read requests are
>> spread across the pool.
>>
>> Thanks,
>>
>> -John R.
>>
>