reader/searcher refresh after replication (commit)

classic Classic list List threaded Threaded
10 messages Options
Reply | Threaded
Open this post in threaded view
|

reader/searcher refresh after replication (commit)

eks dev
Hi all,
I am a bit confused with IndexSearcher refresh lifecycles...
In a master slave setup, I override postCommit listener on slave
(solr trunk version) to read some user information stored in
userCommitData on master

----------
@Override
public final void postCommit() {
// This returnes "stale" information that was present before
replication finished
RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
Map<String, String> userData =
refC.get().getIndexReader().getIndexCommit().getUserData();
}
------------
I expected core.getNewestSearcher(true); to return refreshed
SolrIndexSearcher, but it didn't

When is this information going to be refreshed to the status from the
replicated index, I repeat this is postCommit listener?

What is the way to get the information from the last commit point?

Maybe like this?
core.getDeletionPolicy().getLatestCommit().getUserData();

Or I need to explicitly open new searcher (isn't solr does this behind
the scenes?)
core.openNewSearcher(false, false)

Not critical, reopening new searcher works, but I would like to
understand these lifecycles, when solr loads latest commit point...

Thanks, eks
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Mark Miller-3
Post commit calls are made before a new searcher is opened.

Might be easier to try to hook in with a new searcher listener?

On Feb 21, 2012, at 8:23 AM, eks dev wrote:

> Hi all,
> I am a bit confused with IndexSearcher refresh lifecycles...
> In a master slave setup, I override postCommit listener on slave
> (solr trunk version) to read some user information stored in
> userCommitData on master
>
> ----------
> @Override
> public final void postCommit() {
> // This returnes "stale" information that was present before
> replication finished
> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
> Map<String, String> userData =
> refC.get().getIndexReader().getIndexCommit().getUserData();
> }
> ------------
> I expected core.getNewestSearcher(true); to return refreshed
> SolrIndexSearcher, but it didn't
>
> When is this information going to be refreshed to the status from the
> replicated index, I repeat this is postCommit listener?
>
> What is the way to get the information from the last commit point?
>
> Maybe like this?
> core.getDeletionPolicy().getLatestCommit().getUserData();
>
> Or I need to explicitly open new searcher (isn't solr does this behind
> the scenes?)
> core.openNewSearcher(false, false)
>
> Not critical, reopening new searcher works, but I would like to
> understand these lifecycles, when solr loads latest commit point...
>
> Thanks, eks

- Mark Miller
lucidimagination.com











Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

eks dev
Thanks Mark,
Hmm, I would like to have this information asap, not to wait until the
first search gets executed (depends on user) . Is solr going to create
new searcher as a part of "replication transaction"...

Just to make it clear why I need it...
I have simple master, many slaves config where master does "batch"
updates in big chunks (things user can wait longer to see on search
side) but slaves work in soft commit mode internally where I permit
them to run away slightly from master.... in order to know where
"incremental update" should start, I read it from UserData ....

Basically, ideally, before commit (after successful replication is
finished) ends, I would like to read in these counters to let
"incremental update" run from the right point...

I need to prevent updating "replicated index" before I read this
information (duplicates can appear).... are there any "IndexWriter"
listeners around?


Thanks again,
eks.



On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:

> Post commit calls are made before a new searcher is opened.
>
> Might be easier to try to hook in with a new searcher listener?
>
> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>
>> Hi all,
>> I am a bit confused with IndexSearcher refresh lifecycles...
>> In a master slave setup, I override postCommit listener on slave
>> (solr trunk version) to read some user information stored in
>> userCommitData on master
>>
>> ----------
>> @Override
>> public final void postCommit() {
>> // This returnes "stale" information that was present before
>> replication finished
>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>> Map<String, String> userData =
>> refC.get().getIndexReader().getIndexCommit().getUserData();
>> }
>> ------------
>> I expected core.getNewestSearcher(true); to return refreshed
>> SolrIndexSearcher, but it didn't
>>
>> When is this information going to be refreshed to the status from the
>> replicated index, I repeat this is postCommit listener?
>>
>> What is the way to get the information from the last commit point?
>>
>> Maybe like this?
>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>
>> Or I need to explicitly open new searcher (isn't solr does this behind
>> the scenes?)
>> core.openNewSearcher(false, false)
>>
>> Not critical, reopening new searcher works, but I would like to
>> understand these lifecycles, when solr loads latest commit point...
>>
>> Thanks, eks
>
> - Mark Miller
> lucidimagination.com
>
>
>
>
>
>
>
>
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

eks dev
And drinks on me to those who decoupled implicit commit from close...
this was tricky trap

On Tue, Feb 21, 2012 at 9:10 PM, eks dev <[hidden email]> wrote:

> Thanks Mark,
> Hmm, I would like to have this information asap, not to wait until the
> first search gets executed (depends on user) . Is solr going to create
> new searcher as a part of "replication transaction"...
>
> Just to make it clear why I need it...
> I have simple master, many slaves config where master does "batch"
> updates in big chunks (things user can wait longer to see on search
> side) but slaves work in soft commit mode internally where I permit
> them to run away slightly from master.... in order to know where
> "incremental update" should start, I read it from UserData ....
>
> Basically, ideally, before commit (after successful replication is
> finished) ends, I would like to read in these counters to let
> "incremental update" run from the right point...
>
> I need to prevent updating "replicated index" before I read this
> information (duplicates can appear).... are there any "IndexWriter"
> listeners around?
>
>
> Thanks again,
> eks.
>
>
>
> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>> Post commit calls are made before a new searcher is opened.
>>
>> Might be easier to try to hook in with a new searcher listener?
>>
>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>
>>> Hi all,
>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>> In a master slave setup, I override postCommit listener on slave
>>> (solr trunk version) to read some user information stored in
>>> userCommitData on master
>>>
>>> ----------
>>> @Override
>>> public final void postCommit() {
>>> // This returnes "stale" information that was present before
>>> replication finished
>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>> Map<String, String> userData =
>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>> }
>>> ------------
>>> I expected core.getNewestSearcher(true); to return refreshed
>>> SolrIndexSearcher, but it didn't
>>>
>>> When is this information going to be refreshed to the status from the
>>> replicated index, I repeat this is postCommit listener?
>>>
>>> What is the way to get the information from the last commit point?
>>>
>>> Maybe like this?
>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>
>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>> the scenes?)
>>> core.openNewSearcher(false, false)
>>>
>>> Not critical, reopening new searcher works, but I would like to
>>> understand these lifecycles, when solr loads latest commit point...
>>>
>>> Thanks, eks
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
Em
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Em
In reply to this post by eks dev
Eks,

that sounds strange!

Am I getting you right?
You have a master which indexes batch-updates from time to time.
Furthermore you got some slaves, pulling data from that master to keep
them up-to-date with the newest batch-updates.
Additionally your slaves index own content in soft-commit mode that
needs to be available as soon as possible.
In consequence the slavesare not in sync with the master.

I am not 100% certain, but chances are good that Solr's
replication-mechanism only changes those segments that are not in sync
with the master.

What are you expecting a BeforeCommitListener could do for you, if one
would exist?

Kind regards,
Em

Am 21.02.2012 21:10, schrieb eks dev:

> Thanks Mark,
> Hmm, I would like to have this information asap, not to wait until the
> first search gets executed (depends on user) . Is solr going to create
> new searcher as a part of "replication transaction"...
>
> Just to make it clear why I need it...
> I have simple master, many slaves config where master does "batch"
> updates in big chunks (things user can wait longer to see on search
> side) but slaves work in soft commit mode internally where I permit
> them to run away slightly from master.... in order to know where
> "incremental update" should start, I read it from UserData ....
>
> Basically, ideally, before commit (after successful replication is
> finished) ends, I would like to read in these counters to let
> "incremental update" run from the right point...
>
> I need to prevent updating "replicated index" before I read this
> information (duplicates can appear).... are there any "IndexWriter"
> listeners around?
>
>
> Thanks again,
> eks.
>
>
>
> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>> Post commit calls are made before a new searcher is opened.
>>
>> Might be easier to try to hook in with a new searcher listener?
>>
>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>
>>> Hi all,
>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>> In a master slave setup, I override postCommit listener on slave
>>> (solr trunk version) to read some user information stored in
>>> userCommitData on master
>>>
>>> ----------
>>> @Override
>>> public final void postCommit() {
>>> // This returnes "stale" information that was present before
>>> replication finished
>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>> Map<String, String> userData =
>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>> }
>>> ------------
>>> I expected core.getNewestSearcher(true); to return refreshed
>>> SolrIndexSearcher, but it didn't
>>>
>>> When is this information going to be refreshed to the status from the
>>> replicated index, I repeat this is postCommit listener?
>>>
>>> What is the way to get the information from the last commit point?
>>>
>>> Maybe like this?
>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>
>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>> the scenes?)
>>> core.openNewSearcher(false, false)
>>>
>>> Not critical, reopening new searcher works, but I would like to
>>> understand these lifecycles, when solr loads latest commit point...
>>>
>>> Thanks, eks
>>
>> - Mark Miller
>> lucidimagination.com
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Savia Beson
Yes, I consciously let my slaves run away from the master in order to
reduce update latency, but every now and then they sync up with master
that is doing heavy lifting.

The price you pay is that slaves do not see the same documents as the
master, but this is the case anyhow with replication, in my setup
slave may go ahead of master with updates, this delta gets zeroed
after replication and the game starts again.

What you have to take into account with this is very small time window
where you may "go back in time" on slaves (not seeing documents that
were already there), but we are talking about seconds and a couple out
of 200Mio documents (only those documents that were softComited on
slave during replication, since commit ond master and postCommit on
slave).

Why do you think something is strange here?

> What are you expecting a BeforeCommitListener could do for you, if one
> would exist?
Why should I be expecting something?

I just need to read userCommit Data as soon as replication is done,
and I am looking for proper/easy way to do it.  (postCommitListener is
what I use now).

What makes me slightly nervous are those life cycle questions, e.g.
when I issue update command before and after postCommit event, which
index gets updated, the one just replicated or the one that was there
just before replication.

There are definitely ways to optimize this, for example to force
replication handler to copy only delta files if index gets updated on
slave and master  (there is already todo somewhere on solr replication
Wiki I think). Now replicationHandler copies complete index if this
gets detected ...

I am all ears if there are better proposals to have low latency
updates in multi server setup...


On Tue, Feb 21, 2012 at 11:53 PM, Em <[hidden email]> wrote:

> Eks,
>
> that sounds strange!
>
> Am I getting you right?
> You have a master which indexes batch-updates from time to time.
> Furthermore you got some slaves, pulling data from that master to keep
> them up-to-date with the newest batch-updates.
> Additionally your slaves index own content in soft-commit mode that
> needs to be available as soon as possible.
> In consequence the slavesare not in sync with the master.
>
> I am not 100% certain, but chances are good that Solr's
> replication-mechanism only changes those segments that are not in sync
> with the master.
>
> What are you expecting a BeforeCommitListener could do for you, if one
> would exist?
>
> Kind regards,
> Em
>
> Am 21.02.2012 21:10, schrieb eks dev:
>> Thanks Mark,
>> Hmm, I would like to have this information asap, not to wait until the
>> first search gets executed (depends on user) . Is solr going to create
>> new searcher as a part of "replication transaction"...
>>
>> Just to make it clear why I need it...
>> I have simple master, many slaves config where master does "batch"
>> updates in big chunks (things user can wait longer to see on search
>> side) but slaves work in soft commit mode internally where I permit
>> them to run away slightly from master.... in order to know where
>> "incremental update" should start, I read it from UserData ....
>>
>> Basically, ideally, before commit (after successful replication is
>> finished) ends, I would like to read in these counters to let
>> "incremental update" run from the right point...
>>
>> I need to prevent updating "replicated index" before I read this
>> information (duplicates can appear).... are there any "IndexWriter"
>> listeners around?
>>
>>
>> Thanks again,
>> eks.
>>
>>
>>
>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>>> Post commit calls are made before a new searcher is opened.
>>>
>>> Might be easier to try to hook in with a new searcher listener?
>>>
>>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>>
>>>> Hi all,
>>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>>> In a master slave setup, I override postCommit listener on slave
>>>> (solr trunk version) to read some user information stored in
>>>> userCommitData on master
>>>>
>>>> ----------
>>>> @Override
>>>> public final void postCommit() {
>>>> // This returnes "stale" information that was present before
>>>> replication finished
>>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>>> Map<String, String> userData =
>>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>>> }
>>>> ------------
>>>> I expected core.getNewestSearcher(true); to return refreshed
>>>> SolrIndexSearcher, but it didn't
>>>>
>>>> When is this information going to be refreshed to the status from the
>>>> replicated index, I repeat this is postCommit listener?
>>>>
>>>> What is the way to get the information from the last commit point?
>>>>
>>>> Maybe like this?
>>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>>
>>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>>> the scenes?)
>>>> core.openNewSearcher(false, false)
>>>>
>>>> Not critical, reopening new searcher works, but I would like to
>>>> understand these lifecycles, when solr loads latest commit point...
>>>>
>>>> Thanks, eks
>>>
>>> - Mark Miller
>>> lucidimagination.com
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
Em
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Em
Sounds much clearer to me than before. :)

Ad-hoc I have two ideas:
First: Let Replication run asynchronously.
If shard1 is pulling the new index from the master and therefore very
recent documents aren't available anymore, shard2 will find them in the
mean-time. As soon as shard1 is up-to-date (including the most recent
documents) shard2 can pull its update from the master.
However beeing out of sync between two shards that should serve the same
data has its own problems, I think.

Second:
You can have another SolrCore for the most recent documents. This one
could be based on a RAMDirectory for reduced latency (or even use
NRT-features, if available in your Solr-version).
Your Master-Slave setup becomes more easier, since you do not have to
worry about out-of-sync-scenarios anymore.
The challange here is to handle duplicate documents (i.e. newer versions
in the RAMDirectory) and proper relevancy due to unbalanced shards by
design.

Kind regards,
Em


Am 22.02.2012 09:25, schrieb eks dev:

> Yes, I consciously let my slaves run away from the master in order to
> reduce update latency, but every now and then they sync up with master
> that is doing heavy lifting.
>
> The price you pay is that slaves do not see the same documents as the
> master, but this is the case anyhow with replication, in my setup
> slave may go ahead of master with updates, this delta gets zeroed
> after replication and the game starts again.
>
> What you have to take into account with this is very small time window
> where you may "go back in time" on slaves (not seeing documents that
> were already there), but we are talking about seconds and a couple out
> of 200Mio documents (only those documents that were softComited on
> slave during replication, since commit ond master and postCommit on
> slave).
>
> Why do you think something is strange here?
>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
> Why should I be expecting something?
>
> I just need to read userCommit Data as soon as replication is done,
> and I am looking for proper/easy way to do it.  (postCommitListener is
> what I use now).
>
> What makes me slightly nervous are those life cycle questions, e.g.
> when I issue update command before and after postCommit event, which
> index gets updated, the one just replicated or the one that was there
> just before replication.
>
> There are definitely ways to optimize this, for example to force
> replication handler to copy only delta files if index gets updated on
> slave and master  (there is already todo somewhere on solr replication
> Wiki I think). Now replicationHandler copies complete index if this
> gets detected ...
>
> I am all ears if there are better proposals to have low latency
> updates in multi server setup...
>
>
> On Tue, Feb 21, 2012 at 11:53 PM, Em <[hidden email]> wrote:
>> Eks,
>>
>> that sounds strange!
>>
>> Am I getting you right?
>> You have a master which indexes batch-updates from time to time.
>> Furthermore you got some slaves, pulling data from that master to keep
>> them up-to-date with the newest batch-updates.
>> Additionally your slaves index own content in soft-commit mode that
>> needs to be available as soon as possible.
>> In consequence the slavesare not in sync with the master.
>>
>> I am not 100% certain, but chances are good that Solr's
>> replication-mechanism only changes those segments that are not in sync
>> with the master.
>>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
>>
>> Kind regards,
>> Em
>>
>> Am 21.02.2012 21:10, schrieb eks dev:
>>> Thanks Mark,
>>> Hmm, I would like to have this information asap, not to wait until the
>>> first search gets executed (depends on user) . Is solr going to create
>>> new searcher as a part of "replication transaction"...
>>>
>>> Just to make it clear why I need it...
>>> I have simple master, many slaves config where master does "batch"
>>> updates in big chunks (things user can wait longer to see on search
>>> side) but slaves work in soft commit mode internally where I permit
>>> them to run away slightly from master.... in order to know where
>>> "incremental update" should start, I read it from UserData ....
>>>
>>> Basically, ideally, before commit (after successful replication is
>>> finished) ends, I would like to read in these counters to let
>>> "incremental update" run from the right point...
>>>
>>> I need to prevent updating "replicated index" before I read this
>>> information (duplicates can appear).... are there any "IndexWriter"
>>> listeners around?
>>>
>>>
>>> Thanks again,
>>> eks.
>>>
>>>
>>>
>>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>>>> Post commit calls are made before a new searcher is opened.
>>>>
>>>> Might be easier to try to hook in with a new searcher listener?
>>>>
>>>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>>>
>>>>> Hi all,
>>>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>>>> In a master slave setup, I override postCommit listener on slave
>>>>> (solr trunk version) to read some user information stored in
>>>>> userCommitData on master
>>>>>
>>>>> ----------
>>>>> @Override
>>>>> public final void postCommit() {
>>>>> // This returnes "stale" information that was present before
>>>>> replication finished
>>>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>>>> Map<String, String> userData =
>>>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>>>> }
>>>>> ------------
>>>>> I expected core.getNewestSearcher(true); to return refreshed
>>>>> SolrIndexSearcher, but it didn't
>>>>>
>>>>> When is this information going to be refreshed to the status from the
>>>>> replicated index, I repeat this is postCommit listener?
>>>>>
>>>>> What is the way to get the information from the last commit point?
>>>>>
>>>>> Maybe like this?
>>>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>>>
>>>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>>>> the scenes?)
>>>>> core.openNewSearcher(false, false)
>>>>>
>>>>> Not critical, reopening new searcher works, but I would like to
>>>>> understand these lifecycles, when solr loads latest commit point...
>>>>>
>>>>> Thanks, eks
>>>>
>>>> - Mark Miller
>>>> lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Erick Erickson
In reply to this post by Savia Beson
You'll *really like* the SolrCloud stuff going into trunk when it's baked
for a while....

Best
Erick

On Wed, Feb 22, 2012 at 3:25 AM, eks dev <[hidden email]> wrote:

> Yes, I consciously let my slaves run away from the master in order to
> reduce update latency, but every now and then they sync up with master
> that is doing heavy lifting.
>
> The price you pay is that slaves do not see the same documents as the
> master, but this is the case anyhow with replication, in my setup
> slave may go ahead of master with updates, this delta gets zeroed
> after replication and the game starts again.
>
> What you have to take into account with this is very small time window
> where you may "go back in time" on slaves (not seeing documents that
> were already there), but we are talking about seconds and a couple out
> of 200Mio documents (only those documents that were softComited on
> slave during replication, since commit ond master and postCommit on
> slave).
>
> Why do you think something is strange here?
>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
> Why should I be expecting something?
>
> I just need to read userCommit Data as soon as replication is done,
> and I am looking for proper/easy way to do it.  (postCommitListener is
> what I use now).
>
> What makes me slightly nervous are those life cycle questions, e.g.
> when I issue update command before and after postCommit event, which
> index gets updated, the one just replicated or the one that was there
> just before replication.
>
> There are definitely ways to optimize this, for example to force
> replication handler to copy only delta files if index gets updated on
> slave and master  (there is already todo somewhere on solr replication
> Wiki I think). Now replicationHandler copies complete index if this
> gets detected ...
>
> I am all ears if there are better proposals to have low latency
> updates in multi server setup...
>
>
> On Tue, Feb 21, 2012 at 11:53 PM, Em <[hidden email]> wrote:
>> Eks,
>>
>> that sounds strange!
>>
>> Am I getting you right?
>> You have a master which indexes batch-updates from time to time.
>> Furthermore you got some slaves, pulling data from that master to keep
>> them up-to-date with the newest batch-updates.
>> Additionally your slaves index own content in soft-commit mode that
>> needs to be available as soon as possible.
>> In consequence the slavesare not in sync with the master.
>>
>> I am not 100% certain, but chances are good that Solr's
>> replication-mechanism only changes those segments that are not in sync
>> with the master.
>>
>> What are you expecting a BeforeCommitListener could do for you, if one
>> would exist?
>>
>> Kind regards,
>> Em
>>
>> Am 21.02.2012 21:10, schrieb eks dev:
>>> Thanks Mark,
>>> Hmm, I would like to have this information asap, not to wait until the
>>> first search gets executed (depends on user) . Is solr going to create
>>> new searcher as a part of "replication transaction"...
>>>
>>> Just to make it clear why I need it...
>>> I have simple master, many slaves config where master does "batch"
>>> updates in big chunks (things user can wait longer to see on search
>>> side) but slaves work in soft commit mode internally where I permit
>>> them to run away slightly from master.... in order to know where
>>> "incremental update" should start, I read it from UserData ....
>>>
>>> Basically, ideally, before commit (after successful replication is
>>> finished) ends, I would like to read in these counters to let
>>> "incremental update" run from the right point...
>>>
>>> I need to prevent updating "replicated index" before I read this
>>> information (duplicates can appear).... are there any "IndexWriter"
>>> listeners around?
>>>
>>>
>>> Thanks again,
>>> eks.
>>>
>>>
>>>
>>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>>>> Post commit calls are made before a new searcher is opened.
>>>>
>>>> Might be easier to try to hook in with a new searcher listener?
>>>>
>>>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>>>
>>>>> Hi all,
>>>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>>>> In a master slave setup, I override postCommit listener on slave
>>>>> (solr trunk version) to read some user information stored in
>>>>> userCommitData on master
>>>>>
>>>>> ----------
>>>>> @Override
>>>>> public final void postCommit() {
>>>>> // This returnes "stale" information that was present before
>>>>> replication finished
>>>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>>>> Map<String, String> userData =
>>>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>>>> }
>>>>> ------------
>>>>> I expected core.getNewestSearcher(true); to return refreshed
>>>>> SolrIndexSearcher, but it didn't
>>>>>
>>>>> When is this information going to be refreshed to the status from the
>>>>> replicated index, I repeat this is postCommit listener?
>>>>>
>>>>> What is the way to get the information from the last commit point?
>>>>>
>>>>> Maybe like this?
>>>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>>>
>>>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>>>> the scenes?)
>>>>> core.openNewSearcher(false, false)
>>>>>
>>>>> Not critical, reopening new searcher works, but I would like to
>>>>> understand these lifecycles, when solr loads latest commit point...
>>>>>
>>>>> Thanks, eks
>>>>
>>>> - Mark Miller
>>>> lucidimagination.com
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>
Em
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Em
Erick,

> You'll *really like* the SolrCloud stuff going into trunk when it's baked
> for a while....
How stable is SolrCloud at the moment?
I can not wait to try it out.

Kind regards,
Em


Am 22.02.2012 14:45, schrieb Erick Erickson:

> You'll *really like* the SolrCloud stuff going into trunk when it's baked
> for a while....
>
> Best
> Erick
>
> On Wed, Feb 22, 2012 at 3:25 AM, eks dev <[hidden email]> wrote:
>> Yes, I consciously let my slaves run away from the master in order to
>> reduce update latency, but every now and then they sync up with master
>> that is doing heavy lifting.
>>
>> The price you pay is that slaves do not see the same documents as the
>> master, but this is the case anyhow with replication, in my setup
>> slave may go ahead of master with updates, this delta gets zeroed
>> after replication and the game starts again.
>>
>> What you have to take into account with this is very small time window
>> where you may "go back in time" on slaves (not seeing documents that
>> were already there), but we are talking about seconds and a couple out
>> of 200Mio documents (only those documents that were softComited on
>> slave during replication, since commit ond master and postCommit on
>> slave).
>>
>> Why do you think something is strange here?
>>
>>> What are you expecting a BeforeCommitListener could do for you, if one
>>> would exist?
>> Why should I be expecting something?
>>
>> I just need to read userCommit Data as soon as replication is done,
>> and I am looking for proper/easy way to do it.  (postCommitListener is
>> what I use now).
>>
>> What makes me slightly nervous are those life cycle questions, e.g.
>> when I issue update command before and after postCommit event, which
>> index gets updated, the one just replicated or the one that was there
>> just before replication.
>>
>> There are definitely ways to optimize this, for example to force
>> replication handler to copy only delta files if index gets updated on
>> slave and master  (there is already todo somewhere on solr replication
>> Wiki I think). Now replicationHandler copies complete index if this
>> gets detected ...
>>
>> I am all ears if there are better proposals to have low latency
>> updates in multi server setup...
>>
>>
>> On Tue, Feb 21, 2012 at 11:53 PM, Em <[hidden email]> wrote:
>>> Eks,
>>>
>>> that sounds strange!
>>>
>>> Am I getting you right?
>>> You have a master which indexes batch-updates from time to time.
>>> Furthermore you got some slaves, pulling data from that master to keep
>>> them up-to-date with the newest batch-updates.
>>> Additionally your slaves index own content in soft-commit mode that
>>> needs to be available as soon as possible.
>>> In consequence the slavesare not in sync with the master.
>>>
>>> I am not 100% certain, but chances are good that Solr's
>>> replication-mechanism only changes those segments that are not in sync
>>> with the master.
>>>
>>> What are you expecting a BeforeCommitListener could do for you, if one
>>> would exist?
>>>
>>> Kind regards,
>>> Em
>>>
>>> Am 21.02.2012 21:10, schrieb eks dev:
>>>> Thanks Mark,
>>>> Hmm, I would like to have this information asap, not to wait until the
>>>> first search gets executed (depends on user) . Is solr going to create
>>>> new searcher as a part of "replication transaction"...
>>>>
>>>> Just to make it clear why I need it...
>>>> I have simple master, many slaves config where master does "batch"
>>>> updates in big chunks (things user can wait longer to see on search
>>>> side) but slaves work in soft commit mode internally where I permit
>>>> them to run away slightly from master.... in order to know where
>>>> "incremental update" should start, I read it from UserData ....
>>>>
>>>> Basically, ideally, before commit (after successful replication is
>>>> finished) ends, I would like to read in these counters to let
>>>> "incremental update" run from the right point...
>>>>
>>>> I need to prevent updating "replicated index" before I read this
>>>> information (duplicates can appear).... are there any "IndexWriter"
>>>> listeners around?
>>>>
>>>>
>>>> Thanks again,
>>>> eks.
>>>>
>>>>
>>>>
>>>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>>>>> Post commit calls are made before a new searcher is opened.
>>>>>
>>>>> Might be easier to try to hook in with a new searcher listener?
>>>>>
>>>>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>>>>
>>>>>> Hi all,
>>>>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>>>>> In a master slave setup, I override postCommit listener on slave
>>>>>> (solr trunk version) to read some user information stored in
>>>>>> userCommitData on master
>>>>>>
>>>>>> ----------
>>>>>> @Override
>>>>>> public final void postCommit() {
>>>>>> // This returnes "stale" information that was present before
>>>>>> replication finished
>>>>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>>>>> Map<String, String> userData =
>>>>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>>>>> }
>>>>>> ------------
>>>>>> I expected core.getNewestSearcher(true); to return refreshed
>>>>>> SolrIndexSearcher, but it didn't
>>>>>>
>>>>>> When is this information going to be refreshed to the status from the
>>>>>> replicated index, I repeat this is postCommit listener?
>>>>>>
>>>>>> What is the way to get the information from the last commit point?
>>>>>>
>>>>>> Maybe like this?
>>>>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>>>>
>>>>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>>>>> the scenes?)
>>>>>> core.openNewSearcher(false, false)
>>>>>>
>>>>>> Not critical, reopening new searcher works, but I would like to
>>>>>> understand these lifecycles, when solr loads latest commit point...
>>>>>>
>>>>>> Thanks, eks
>>>>>
>>>>> - Mark Miller
>>>>> lucidimagination.com
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: reader/searcher refresh after replication (commit)

Erick Erickson
It's certainly stable enough to start experimenting with, and I know
that it's under pretty active development now. I've seen a lot
of back-and-forth between Mark Miller and Jamie Johnson,
Jamie trying things and Mark responding.

It's part of the trunk, so be prepared for occasional re-indexing
being required. This isn't related to SolrCloud, just the
fact that it's only available on trunk.

And I'm certain that the more eyes look at it, the better it'll be,
so I'd say "go for it". I tried out the example here:
http://wiki.apache.org/solr/SolrCloud
and it went quite well, but I didn't stress it much yet (that's next).

Personally, I'd put it through some pretty heavy testing before
deploying to production at this point, just because of all the
new features on trunk. But having people work with it is the best
way to move the effort forward.

So feel free!
Erick

On Wed, Feb 22, 2012 at 9:07 AM, Em <[hidden email]> wrote:

> Erick,
>
>> You'll *really like* the SolrCloud stuff going into trunk when it's baked
>> for a while....
> How stable is SolrCloud at the moment?
> I can not wait to try it out.
>
> Kind regards,
> Em
>
>
> Am 22.02.2012 14:45, schrieb Erick Erickson:
>> You'll *really like* the SolrCloud stuff going into trunk when it's baked
>> for a while....
>>
>> Best
>> Erick
>>
>> On Wed, Feb 22, 2012 at 3:25 AM, eks dev <[hidden email]> wrote:
>>> Yes, I consciously let my slaves run away from the master in order to
>>> reduce update latency, but every now and then they sync up with master
>>> that is doing heavy lifting.
>>>
>>> The price you pay is that slaves do not see the same documents as the
>>> master, but this is the case anyhow with replication, in my setup
>>> slave may go ahead of master with updates, this delta gets zeroed
>>> after replication and the game starts again.
>>>
>>> What you have to take into account with this is very small time window
>>> where you may "go back in time" on slaves (not seeing documents that
>>> were already there), but we are talking about seconds and a couple out
>>> of 200Mio documents (only those documents that were softComited on
>>> slave during replication, since commit ond master and postCommit on
>>> slave).
>>>
>>> Why do you think something is strange here?
>>>
>>>> What are you expecting a BeforeCommitListener could do for you, if one
>>>> would exist?
>>> Why should I be expecting something?
>>>
>>> I just need to read userCommit Data as soon as replication is done,
>>> and I am looking for proper/easy way to do it.  (postCommitListener is
>>> what I use now).
>>>
>>> What makes me slightly nervous are those life cycle questions, e.g.
>>> when I issue update command before and after postCommit event, which
>>> index gets updated, the one just replicated or the one that was there
>>> just before replication.
>>>
>>> There are definitely ways to optimize this, for example to force
>>> replication handler to copy only delta files if index gets updated on
>>> slave and master  (there is already todo somewhere on solr replication
>>> Wiki I think). Now replicationHandler copies complete index if this
>>> gets detected ...
>>>
>>> I am all ears if there are better proposals to have low latency
>>> updates in multi server setup...
>>>
>>>
>>> On Tue, Feb 21, 2012 at 11:53 PM, Em <[hidden email]> wrote:
>>>> Eks,
>>>>
>>>> that sounds strange!
>>>>
>>>> Am I getting you right?
>>>> You have a master which indexes batch-updates from time to time.
>>>> Furthermore you got some slaves, pulling data from that master to keep
>>>> them up-to-date with the newest batch-updates.
>>>> Additionally your slaves index own content in soft-commit mode that
>>>> needs to be available as soon as possible.
>>>> In consequence the slavesare not in sync with the master.
>>>>
>>>> I am not 100% certain, but chances are good that Solr's
>>>> replication-mechanism only changes those segments that are not in sync
>>>> with the master.
>>>>
>>>> What are you expecting a BeforeCommitListener could do for you, if one
>>>> would exist?
>>>>
>>>> Kind regards,
>>>> Em
>>>>
>>>> Am 21.02.2012 21:10, schrieb eks dev:
>>>>> Thanks Mark,
>>>>> Hmm, I would like to have this information asap, not to wait until the
>>>>> first search gets executed (depends on user) . Is solr going to create
>>>>> new searcher as a part of "replication transaction"...
>>>>>
>>>>> Just to make it clear why I need it...
>>>>> I have simple master, many slaves config where master does "batch"
>>>>> updates in big chunks (things user can wait longer to see on search
>>>>> side) but slaves work in soft commit mode internally where I permit
>>>>> them to run away slightly from master.... in order to know where
>>>>> "incremental update" should start, I read it from UserData ....
>>>>>
>>>>> Basically, ideally, before commit (after successful replication is
>>>>> finished) ends, I would like to read in these counters to let
>>>>> "incremental update" run from the right point...
>>>>>
>>>>> I need to prevent updating "replicated index" before I read this
>>>>> information (duplicates can appear).... are there any "IndexWriter"
>>>>> listeners around?
>>>>>
>>>>>
>>>>> Thanks again,
>>>>> eks.
>>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 21, 2012 at 8:03 PM, Mark Miller <[hidden email]> wrote:
>>>>>> Post commit calls are made before a new searcher is opened.
>>>>>>
>>>>>> Might be easier to try to hook in with a new searcher listener?
>>>>>>
>>>>>> On Feb 21, 2012, at 8:23 AM, eks dev wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>> I am a bit confused with IndexSearcher refresh lifecycles...
>>>>>>> In a master slave setup, I override postCommit listener on slave
>>>>>>> (solr trunk version) to read some user information stored in
>>>>>>> userCommitData on master
>>>>>>>
>>>>>>> ----------
>>>>>>> @Override
>>>>>>> public final void postCommit() {
>>>>>>> // This returnes "stale" information that was present before
>>>>>>> replication finished
>>>>>>> RefCounted<SolrIndexSearcher> refC = core.getNewestSearcher(true);
>>>>>>> Map<String, String> userData =
>>>>>>> refC.get().getIndexReader().getIndexCommit().getUserData();
>>>>>>> }
>>>>>>> ------------
>>>>>>> I expected core.getNewestSearcher(true); to return refreshed
>>>>>>> SolrIndexSearcher, but it didn't
>>>>>>>
>>>>>>> When is this information going to be refreshed to the status from the
>>>>>>> replicated index, I repeat this is postCommit listener?
>>>>>>>
>>>>>>> What is the way to get the information from the last commit point?
>>>>>>>
>>>>>>> Maybe like this?
>>>>>>> core.getDeletionPolicy().getLatestCommit().getUserData();
>>>>>>>
>>>>>>> Or I need to explicitly open new searcher (isn't solr does this behind
>>>>>>> the scenes?)
>>>>>>> core.openNewSearcher(false, false)
>>>>>>>
>>>>>>> Not critical, reopening new searcher works, but I would like to
>>>>>>> understand these lifecycles, when solr loads latest commit point...
>>>>>>>
>>>>>>> Thanks, eks
>>>>>>
>>>>>> - Mark Miller
>>>>>> lucidimagination.com
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>