Indexing vs Search node

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Indexing vs Search node

Fernando Otero
Hi guys,
    I read in several blog posts that it's never a good idea to index and
search on the same node. I wonder how that can be achieved in Solr Cloud or
if it happens automatically.




--

Fernando Otero

Sr Engineering Manager, Panamera

Buenos Aires - Argentina

Mobile: +54 911 67697108

Email:  [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Indexing vs Search node

Shawn Heisey-2
On 11/9/2018 12:13 PM, Fernando Otero wrote:
>      I read in several blog posts that it's never a good idea to index and
> search on the same node. I wonder how that can be achieved in Solr Cloud or
> if it happens automatically.

I would disagree with that blanket assertion.

Indexing does put extra load on a server that can interfere with query
performance.  Whether that will be a real problem pretty much depends on
exactly how much indexing you're doing, and what kind of query load you
need to handle.  For extreme scaling, it can be a good idea to separate
indexing and searching.

With a master/slave architecture, any version of Solr can separate
indexing and querying.

Before 7.x, it wasn't possible to separate indexing and querying with
SolrCloud.  With previous major versions, ALL replicas do the same
indexing.  With 7.x, that's still the default behavior, but 7.x has new
replica types that make it possible for indexing to only take place on
shard leaders. The latest version of Solr 7.x has a way to prefer
certain replica types, which is how the separation can be achieved.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Indexing vs Search node

Erick Erickson
Fernando:

I'd phrase it more strongly than Shawn. Prior to 7.0
all replicas both indexed and search (they were NRT replica),
so there wasn't any choice but to index and search on
every replica.

It's one of those things that if you have very high
throughput (indexing) situations, you _might_
want to use TLOG and/or PULL replicas.

But TANSTAAFL (There Ain't  No Such Thing As A Free Lunch).
TLOG/PULL replicas copy index segments around, which
may be up to 5G each (default TieredMergePolicy cap on individual
segment sizes), whereas NRT replicas just get the raw document.

So in the TLOG/PULL situations, you'll get bursts of network traffic
but each replica has less CPU load because all the replicas but one
for each shard do not  have to index the doc.

In the NRT case, the raw documents are forwarded so the
network is less bursty, but all of the replicas spend CPU
cycles indexing.

So I wouldn't worry about it unless you running into performance
problems, _then_ I'd investigate TLOG/PULL replicas.

Best,
Erick
On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey <[hidden email]> wrote:

>
> On 11/9/2018 12:13 PM, Fernando Otero wrote:
> >      I read in several blog posts that it's never a good idea to index and
> > search on the same node. I wonder how that can be achieved in Solr Cloud or
> > if it happens automatically.
>
> I would disagree with that blanket assertion.
>
> Indexing does put extra load on a server that can interfere with query
> performance.  Whether that will be a real problem pretty much depends on
> exactly how much indexing you're doing, and what kind of query load you
> need to handle.  For extreme scaling, it can be a good idea to separate
> indexing and searching.
>
> With a master/slave architecture, any version of Solr can separate
> indexing and querying.
>
> Before 7.x, it wasn't possible to separate indexing and querying with
> SolrCloud.  With previous major versions, ALL replicas do the same
> indexing.  With 7.x, that's still the default behavior, but 7.x has new
> replica types that make it possible for indexing to only take place on
> shard leaders. The latest version of Solr 7.x has a way to prefer
> certain replica types, which is how the separation can be achieved.
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing vs Search node

David Hastings
I personally like standalone solr for this reason, i can tune the indexing
"master" for doing nothing but taking in documents and that way the slaves
dont battle for resources in the process.

On Fri, Nov 9, 2018 at 3:10 PM Erick Erickson <[hidden email]>
wrote:

> Fernando:
>
> I'd phrase it more strongly than Shawn. Prior to 7.0
> all replicas both indexed and search (they were NRT replica),
> so there wasn't any choice but to index and search on
> every replica.
>
> It's one of those things that if you have very high
> throughput (indexing) situations, you _might_
> want to use TLOG and/or PULL replicas.
>
> But TANSTAAFL (There Ain't  No Such Thing As A Free Lunch).
> TLOG/PULL replicas copy index segments around, which
> may be up to 5G each (default TieredMergePolicy cap on individual
> segment sizes), whereas NRT replicas just get the raw document.
>
> So in the TLOG/PULL situations, you'll get bursts of network traffic
> but each replica has less CPU load because all the replicas but one
> for each shard do not  have to index the doc.
>
> In the NRT case, the raw documents are forwarded so the
> network is less bursty, but all of the replicas spend CPU
> cycles indexing.
>
> So I wouldn't worry about it unless you running into performance
> problems, _then_ I'd investigate TLOG/PULL replicas.
>
> Best,
> Erick
> On Fri, Nov 9, 2018 at 11:37 AM Shawn Heisey <[hidden email]> wrote:
> >
> > On 11/9/2018 12:13 PM, Fernando Otero wrote:
> > >      I read in several blog posts that it's never a good idea to index
> and
> > > search on the same node. I wonder how that can be achieved in Solr
> Cloud or
> > > if it happens automatically.
> >
> > I would disagree with that blanket assertion.
> >
> > Indexing does put extra load on a server that can interfere with query
> > performance.  Whether that will be a real problem pretty much depends on
> > exactly how much indexing you're doing, and what kind of query load you
> > need to handle.  For extreme scaling, it can be a good idea to separate
> > indexing and searching.
> >
> > With a master/slave architecture, any version of Solr can separate
> > indexing and querying.
> >
> > Before 7.x, it wasn't possible to separate indexing and querying with
> > SolrCloud.  With previous major versions, ALL replicas do the same
> > indexing.  With 7.x, that's still the default behavior, but 7.x has new
> > replica types that make it possible for indexing to only take place on
> > shard leaders. The latest version of Solr 7.x has a way to prefer
> > certain replica types, which is how the separation can be achieved.
> >
> > Thanks,
> > Shawn
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Indexing vs Search node

Shawn Heisey-2
On 11/9/2018 1:58 PM, David Hastings wrote:
> I personally like standalone solr for this reason, i can tune the indexing
> "master" for doing nothing but taking in documents and that way the slaves
> dont battle for resources in the process.

SolrCloud can be set up pretty similar to this if you're running 7.5. 
You set things up so each collection has two TLOG replicas and the rest
of them are PULL.

SolrCloud doesn't have master and slave in the same way as the old
architecture.  There are no single points of failure if the hardware is
set up correctly.  But because PULL replicas cannot become leader, they
are a lot like slaves.  Solr 7.5 and later can configure a preference
for different replica types at query time.  So with the setup described
above, you tell it to prefer PULL replicas.  If all the PULL replicas
were to die, then SolrCloud would use whatever is left.

Let's say that you set up a collection so it has two TLOG replicas and
four PULL replicas.  You could have the TLOG replicas live on a pair of
servers with SSD drives and less memory than the other four servers that
have PULL replicas, which could be running standard hard drives. 
Queries love memory, indexing loves fast disks.  The preference that
indicates PULL replicas would keep the queries so they are running only
on the four machines with more memory.

The reason that you want two TLOG replicas instead of one is so that if
the current leader dies, there is another TLOG replica available to
become leader.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: Indexing vs Search node

Fernando Otero
Thanks everyone this gave me great arguments for migrating to Solr7 :D

On Fri, Nov 9, 2018 at 7:50 PM Shawn Heisey <[hidden email]> wrote:

> On 11/9/2018 1:58 PM, David Hastings wrote:
> > I personally like standalone solr for this reason, i can tune the
> indexing
> > "master" for doing nothing but taking in documents and that way the
> slaves
> > dont battle for resources in the process.
>
> SolrCloud can be set up pretty similar to this if you're running 7.5.
> You set things up so each collection has two TLOG replicas and the rest
> of them are PULL.
>
> SolrCloud doesn't have master and slave in the same way as the old
> architecture.  There are no single points of failure if the hardware is
> set up correctly.  But because PULL replicas cannot become leader, they
> are a lot like slaves.  Solr 7.5 and later can configure a preference
> for different replica types at query time.  So with the setup described
> above, you tell it to prefer PULL replicas.  If all the PULL replicas
> were to die, then SolrCloud would use whatever is left.
>
> Let's say that you set up a collection so it has two TLOG replicas and
> four PULL replicas.  You could have the TLOG replicas live on a pair of
> servers with SSD drives and less memory than the other four servers that
> have PULL replicas, which could be running standard hard drives.
> Queries love memory, indexing loves fast disks.  The preference that
> indicates PULL replicas would keep the queries so they are running only
> on the four machines with more memory.
>
> The reason that you want two TLOG replicas instead of one is so that if
> the current leader dies, there is another TLOG replica available to
> become leader.
>
> Thanks,
> Shawn
>
>

--

Fernando Otero

Sr Engineering Manager, Panamera

Buenos Aires - Argentina

Mobile: +54 911 67697108

Email:  [hidden email]