commit

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

commit

jason rutherglen-2
In using Solr I've found the need to have a reload command in addition to a commit.  The reason for this is sometimes updates are made but are not available via the server.  The commit makes a snapshot which on a large index is a potentially expensive operation.  Is there a way to do reload today?

Reply | Threaded
Open this post in threaded view
|

Re: commit

Chris Hostetter-3

: In using Solr I've found the need to have a reload command in addition
: to a commit.  The reason for this is sometimes updates are made but are
: not available via the server.  The commit makes a snapshot which on a
: large index is a potentially expensive operation.  Is there a way to do
: reload today?

I'm a little confused, here's a bunch of thoughts on your email in no
particular order...


Making snapshots should be really really really fast and easy -- it's just
creating hardlinks to files, so it shouldn't take very long ... are you
sure it's really an issue?


Generally speaking all a "commit" operation really is is a an instruction
to:
  1) close/reopen the current writer/reaer used for adds/deletes
  2) open a new reader for searches
  3) close the older reader for searches once all currently processing
     requests finish.

...the concept of "reloading" the index really requires that all three of
those things happen -- so in my mind that's what a commit is.

The Solr app won't create snapshots automatically -- it will only do that
if you have a call to the snapshooter script registered as aprt of a
listener -- it sounds like you have a postCommit event listener which does
this.  You might want to turn that off, or change it to a postOptimize
event listener so that snapshoots are only made when you optimize -- or
you could not have any listeners, and just run snapshooter yourself
whenever you want a snapshot

(Bill/Yonik: sanity check me here: there's no reason snapshooter can't be
run manually right?)

When registering a listener, there is also a "wait" option that controlls
wether the operation will block untill the listener is done ...  i don't
know if there's any particular reason why the example for snapshooter has
wait=true, but i think you can change that to false if you think
snapshooting is taking too long (again: bill/yonik, am i wrong?)

Another thing that might take a while when doing a commit -- seperate from
snapshooting -- is the (auto)warming of the various caches that happens
when opening the new reader for searching.  if you are doing lots of
commits at a rapid rate because you really want the newly added docs to
appear right, you may want to turn off any newSearcher listeners you have,
and change the autowarm count on your caches to be 0.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: commit

jason rutherglen-2
Is there a way to decouple the snapshot creation from the index reloading currently?  If not I was going to build it in.  We have a 700 meg index, so creating a snapshot basically copies that, and after several snapshots takes up a lot of storage.  Sometimes I just want to see a change show up on the master, sometimes I want to create a snapshot for the slave servers.  This was very confusing when I first started using Solr.  

Thanks,

Jason

----- Original Message ----
From: Chris Hostetter <[hidden email]>
To: [hidden email]; jason rutherglen <[hidden email]>
Sent: Friday, April 21, 2006 5:28:15 PM
Subject: Re: commit


: In using Solr I've found the need to have a reload command in addition
: to a commit.  The reason for this is sometimes updates are made but are
: not available via the server.  The commit makes a snapshot which on a
: large index is a potentially expensive operation.  Is there a way to do
: reload today?

I'm a little confused, here's a bunch of thoughts on your email in no
particular order...


Making snapshots should be really really really fast and easy -- it's just
creating hardlinks to files, so it shouldn't take very long ... are you
sure it's really an issue?


Generally speaking all a "commit" operation really is is a an instruction
to:
  1) close/reopen the current writer/reaer used for adds/deletes
  2) open a new reader for searches
  3) close the older reader for searches once all currently processing
     requests finish.

...the concept of "reloading" the index really requires that all three of
those things happen -- so in my mind that's what a commit is.

The Solr app won't create snapshots automatically -- it will only do that
if you have a call to the snapshooter script registered as aprt of a
listener -- it sounds like you have a postCommit event listener which does
this.  You might want to turn that off, or change it to a postOptimize
event listener so that snapshoots are only made when you optimize -- or
you could not have any listeners, and just run snapshooter yourself
whenever you want a snapshot

(Bill/Yonik: sanity check me here: there's no reason snapshooter can't be
run manually right?)

When registering a listener, there is also a "wait" option that controlls
wether the operation will block untill the listener is done ...  i don't
know if there's any particular reason why the example for snapshooter has
wait=true, but i think you can change that to false if you think
snapshooting is taking too long (again: bill/yonik, am i wrong?)

Another thing that might take a while when doing a commit -- seperate from
snapshooting -- is the (auto)warming of the various caches that happens
when opening the new reader for searching.  if you are doing lots of
commits at a rapid rate because you really want the newly added docs to
appear right, you may want to turn off any newSearcher listeners you have,
and change the autowarm count on your caches to be 0.


-Hoss




Reply | Threaded
Open this post in threaded view
|

Re: commit

Yonik Seeley
Hi Jason,

On 4/21/06, jason rutherglen <[hidden email]> wrote:
> Is there a way to decouple the snapshot creation from the index reloading currently?  If not I was going to build it in.  We have a 700 meg index, so creating a snapshot basically copies that, and after several snapshots takes up a lot of storage.

It may not be taking up as much space as you think (it really depends).
Only the index segments that were changed take up new space... the
other index segments are all hard linked across all of the snapshots
to the same file.

>  Sometimes I just want to see a change show up on the master, sometimes I want to create a snapshot for the slave servers.  This was very confusing when I first started using Solr.

I've had the idea of a "quiet" commit in the back of my mind for a
while... for when someone just wants to checkpoint their indexing
work, but not force a new index reader to be opened, or have snapshots
taken.  This can also be useful if you are rebuilding an index from
scratch and you don't want a snapshot of an incomplete index being
replicated out to the slaves.

Your idea is another variant, where you want some things done, but not others.

Here are all the variants (a snapshot is taken by a commit listener):
a) don't call listeners, don't open new indexreader
b) don't call listeners, open new reader
c) call listeners, don't open new reader
d) call listeners, open new reader   // the current behavior

I think (a) could be very useful, (b) could be useful to locally
sanity check an index, and
(c) is probably not useful.

Could you open a JIRA bug to track this?

-Yonik
Reply | Threaded
Open this post in threaded view
|

Re: commit

Bill Au
The snapshooter can be run manaully.  So one can disable the postCommit
listener
and run it manually to take snapshots at specific times rather than after
every commit.

Bill

On 4/23/06, Yonik Seeley <[hidden email]> wrote:

>
> Hi Jason,
>
> On 4/21/06, jason rutherglen <[hidden email]> wrote:
> > Is there a way to decouple the snapshot creation from the index
> reloading currently?  If not I was going to build it in.  We have a 700 meg
> index, so creating a snapshot basically copies that, and after several
> snapshots takes up a lot of storage.
>
> It may not be taking up as much space as you think (it really depends).
> Only the index segments that were changed take up new space... the
> other index segments are all hard linked across all of the snapshots
> to the same file.
>
> >  Sometimes I just want to see a change show up on the master, sometimes
> I want to create a snapshot for the slave servers.  This was very confusing
> when I first started using Solr.
>
> I've had the idea of a "quiet" commit in the back of my mind for a
> while... for when someone just wants to checkpoint their indexing
> work, but not force a new index reader to be opened, or have snapshots
> taken.  This can also be useful if you are rebuilding an index from
> scratch and you don't want a snapshot of an incomplete index being
> replicated out to the slaves.
>
> Your idea is another variant, where you want some things done, but not
> others.
>
> Here are all the variants (a snapshot is taken by a commit listener):
> a) don't call listeners, don't open new indexreader
> b) don't call listeners, open new reader
> c) call listeners, don't open new reader
> d) call listeners, open new reader   // the current behavior
>
> I think (a) could be very useful, (b) could be useful to locally
> sanity check an index, and
> (c) is probably not useful.
>
> Could you open a JIRA bug to track this?
>
> -Yonik
>
Reply | Threaded
Open this post in threaded view
|

Re: commit

jason rutherglen-2
Yes, I rewrote the shell scripts into java for more control and made them callable via a web service.  So there is the normal commit call and a snapshooter call.  I will check these in soon.  Then when a snapshot is completed, the slave machines snappuller is called.  The cron is implemented using a hack of Resin's CronResource.  Not sure if there is a more generic way to do this using non-Resin code.

----- Original Message ----
From: Bill Au <[hidden email]>
To: [hidden email]
Sent: Monday, April 24, 2006 7:56:51 AM
Subject: Re: commit

The snapshooter can be run manaully.  So one can disable the postCommit
listener
and run it manually to take snapshots at specific times rather than after
every commit.

Bill

On 4/23/06, Yonik Seeley <[hidden email]> wrote:

>
> Hi Jason,
>
> On 4/21/06, jason rutherglen <[hidden email]> wrote:
> > Is there a way to decouple the snapshot creation from the index
> reloading currently?  If not I was going to build it in.  We have a 700 meg
> index, so creating a snapshot basically copies that, and after several
> snapshots takes up a lot of storage.
>
> It may not be taking up as much space as you think (it really depends).
> Only the index segments that were changed take up new space... the
> other index segments are all hard linked across all of the snapshots
> to the same file.
>
> >  Sometimes I just want to see a change show up on the master, sometimes
> I want to create a snapshot for the slave servers.  This was very confusing
> when I first started using Solr.
>
> I've had the idea of a "quiet" commit in the back of my mind for a
> while... for when someone just wants to checkpoint their indexing
> work, but not force a new index reader to be opened, or have snapshots
> taken.  This can also be useful if you are rebuilding an index from
> scratch and you don't want a snapshot of an incomplete index being
> replicated out to the slaves.
>
> Your idea is another variant, where you want some things done, but not
> others.
>
> Here are all the variants (a snapshot is taken by a commit listener):
> a) don't call listeners, don't open new indexreader
> b) don't call listeners, open new reader
> c) call listeners, don't open new reader
> d) call listeners, open new reader   // the current behavior
>
> I think (a) could be very useful, (b) could be useful to locally
> sanity check an index, and
> (c) is probably not useful.
>
> Could you open a JIRA bug to track this?
>
> -Yonik
>