How to assure a permanent index.

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

How to assure a permanent index.

Thierry Collogne
Hello,

Currently we are using lucene for our search on intranet, but we are
thinking of replacing it with solr.
While indexing, we have built in a system that assures us that there will be
never be no lucene index for more than a few seconds.

I was wondering if solr has something like that. I mean if I do the
following.
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Thierry Collogne
Sorry. Did a send by accident. This the next part of the mail.

I mean if I do the following.

     -  delete all documents from the index
     -  add all documents
     -  do a commit.

Will this result in a temporary empty index, or will I always have results?
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Bertrand Delacretaz
On 3/21/07, Thierry Collogne <[hidden email]> wrote:

> ...I mean if I do the following.
>
>      -  delete all documents from the index
>      -  add all documents
>      -  do a commit.
>
> Will this result in a temporary empty index, or will I always have results?...

Changes to the index are invisible to the search components until a
<commit/> is sent to Solr, so you should be fine (although personally
I'd feel safer replacing documents in smaller batches).

You could also use the "index switching" mechanism used when
replicating Solr indexes (see
http://wiki.apache.org/solr/CollectionDistribution) to prepare the
index in another Solr instance and activate it instantly when needed.

-Bertrand
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Maarten.De.Vilder
In reply to this post by Thierry Collogne
the documents are only deleted when you do a commit ...
so you should never have an empty index (or at least not for more then a
couple of seconds)

note that you dont have to delete all documents .... you can just upload
new documents with the same UniqueID and Solr will delete the old
documents automaticly ... this way you are guaranteed not to have an empty
index

grts,m





"Thierry Collogne" <[hidden email]>
21/03/2007 09:22
Please respond to
[hidden email]


To
[hidden email]
cc

Subject
Re: How to assure a permanent index.






Sorry. Did a send by accident. This the next part of the mail.

I mean if I do the following.

     -  delete all documents from the index
     -  add all documents
     -  do a commit.

Will this result in a temporary empty index, or will I always have
results?

Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Thierry Collogne
In reply to this post by Bertrand Delacretaz
Thank you for the quick response. I wil take a look at it.

On 21/03/07, Bertrand Delacretaz <[hidden email]> wrote:

>
> On 3/21/07, Thierry Collogne <[hidden email]> wrote:
>
> > ...I mean if I do the following.
> >
> >      -  delete all documents from the index
> >      -  add all documents
> >      -  do a commit.
> >
> > Will this result in a temporary empty index, or will I always have
> results?...
>
> Changes to the index are invisible to the search components until a
> <commit/> is sent to Solr, so you should be fine (although personally
> I'd feel safer replacing documents in smaller batches).
>
> You could also use the "index switching" mechanism used when
> replicating Solr indexes (see
> http://wiki.apache.org/solr/CollectionDistribution) to prepare the
> index in another Solr instance and activate it instantly when needed.
>
> -Bertrand
>
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Walter Underwood, Netflix
In reply to this post by Maarten.De.Vilder
On 3/21/07 1:33 AM, "[hidden email]"
<[hidden email]> wrote:

> note that you dont have to delete all documents .... you can just upload
> new documents with the same UniqueID and Solr will delete the old
> documents automaticly ... this way you are guaranteed not to have an empty
> index

That works if you keep track of all documents that have disappeared
since the last index run. Otherwise, you end up with orphans in
the search index, documents that exist in search, but not in the
real world, also known as "serving 404's in results".

wunder
--
Walter Underwood
Search Guru, Netflix


Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Chris Hostetter-3
: > new documents with the same UniqueID and Solr will delete the old
: > documents automaticly ... this way you are guaranteed not to have an empty
: > index
:
: That works if you keep track of all documents that have disappeared
: since the last index run. Otherwise, you end up with orphans in

a solution i use to deal with this in some cases is to have a timestamp
field recording when the doc was indexed, and after each "batch" update
run, search for all docs with a timestamp prior to the start of hte run ...
if the percentage of docs is really high throw and error and abort, but if
it's in an accptible range, then delete them all (using delete by query)

the percentage of old docs sanity check isn't strictly neccessary,
especially if you current approach is delete all first, then read -- this
approach is never any risker then that one, even without the sanity test.




-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Maarten.De.Vilder
In reply to this post by Walter Underwood, Netflix
well, yes indeed :)
but i do think it is easier to put up synchronisation for deleted
documents as well
clearing the whole index is kind of overkill

when you do this :
* delete all documents
* submit all documents
* commit
you should also keep in mind that Solr will do an autocommit after a
certain number of documents ...
so if the process takes a couple of minutes/hours, you might end up with
an empty index and no results for the users !

cheers,
m




Walter Underwood <[hidden email]>
21/03/2007 17:32
Please respond to
[hidden email]


To
<[hidden email]>
cc

Subject
Re: How to assure a permanent index.






On 3/21/07 1:33 AM, "[hidden email]"
<[hidden email]> wrote:

> note that you dont have to delete all documents .... you can just upload
> new documents with the same UniqueID and Solr will delete the old
> documents automaticly ... this way you are guaranteed not to have an
empty
> index

That works if you keep track of all documents that have disappeared
since the last index run. Otherwise, you end up with orphans in
the search index, documents that exist in search, but not in the
real world, also known as "serving 404's in results".

wunder
--
Walter Underwood
Search Guru, Netflix



Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Mike Klaas
On 3/22/07, [hidden email] <[hidden email]> wrote:

> well, yes indeed :)
> but i do think it is easier to put up synchronisation for deleted
> documents as well
> clearing the whole index is kind of overkill
>
> when you do this :
> * delete all documents
> * submit all documents
> * commit
> you should also keep in mind that Solr will do an autocommit after a
> certain number of documents ...

Solr should only do so if you explicitly configured it as such.

Regardless, if you are rebuilding the index from scratch, delete-all
then rebuilt is a dangerous and less efficient method compared to
creating a new index and pointing solr to the new index once it is
completed.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Thierry Collogne
And how would you do that? Create a new index and point solr to the new
index?

On 22/03/07, Mike Klaas <[hidden email]> wrote:

>
> On 3/22/07, [hidden email] <[hidden email]> wrote:
> > well, yes indeed :)
> > but i do think it is easier to put up synchronisation for deleted
> > documents as well
> > clearing the whole index is kind of overkill
> >
> > when you do this :
> > * delete all documents
> > * submit all documents
> > * commit
> > you should also keep in mind that Solr will do an autocommit after a
> > certain number of documents ...
>
> Solr should only do so if you explicitly configured it as such.
>
> Regardless, if you are rebuilding the index from scratch, delete-all
> then rebuilt is a dangerous and less efficient method compared to
> creating a new index and pointing solr to the new index once it is
> completed.
>
> -Mike
>
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Mike Klaas
On 3/22/07, Thierry Collogne <[hidden email]> wrote:
> And how would you do that? Create a new index and point solr to the new
> index?

I don't think that is possible without restarting  Solr.

You could have two solr webapps, and alternate between to two,
pointing your app at one and building on the other, then switching.

Another possibility is to build the index on a master and use
snappuller to install it on the slave (I'll admit that I've never used
replication and so don't know how it handes the deletion of all
segments).

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Thierry Collogne
Ok. Thanks.

On 22/03/07, Mike Klaas <[hidden email]> wrote:

>
> On 3/22/07, Thierry Collogne <[hidden email]> wrote:
> > And how would you do that? Create a new index and point solr to the new
> > index?
>
> I don't think that is possible without restarting  Solr.
>
> You could have two solr webapps, and alternate between to two,
> pointing your app at one and building on the other, then switching.
>
> Another possibility is to build the index on a master and use
> snappuller to install it on the slave (I'll admit that I've never used
> replication and so don't know how it handes the deletion of all
> segments).
>
> -Mike
>
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Chris Hostetter-3
In reply to this post by Mike Klaas

: > And how would you do that? Create a new index and point solr to the new
: > index?
:
: I don't think that is possible without restarting  Solr.

: Another possibility is to build the index on a master and use
: snappuller to install it on the slave (I'll admit that I've never used

that's pretty much the same thing, just refered to in different ways.  I
think the CollectionBuilding wiki is just talking about how you can build
a new index with an incompatibly differnet schema.xml on a seperate Solr
port and then manually put it into place on your primary query port with a
quick bounce -- allowing very short downtime.

if you're replacing your index but the schema is still the same, it's
really just a snappulling situation.

: replication and so don't know how it handes the deletion of all
: segments).

it works fine ... from a replication standpoint doing a full rebuild like
this where you delete everything and then readd them is no worse then
doing an optimize ... all of hte files the slave use to have go away, and
you push out all new files.


-Hoss

Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Thierry Collogne
Where can I find some information about snappulling?

On 23/03/07, Chris Hostetter <[hidden email]> wrote:

>
>
> : > And how would you do that? Create a new index and point solr to the
> new
> : > index?
> :
> : I don't think that is possible without restarting  Solr.
>
> : Another possibility is to build the index on a master and use
> : snappuller to install it on the slave (I'll admit that I've never used
>
> that's pretty much the same thing, just refered to in different ways.  I
> think the CollectionBuilding wiki is just talking about how you can build
> a new index with an incompatibly differnet schema.xml on a seperate Solr
> port and then manually put it into place on your primary query port with a
> quick bounce -- allowing very short downtime.
>
> if you're replacing your index but the schema is still the same, it's
> really just a snappulling situation.
>
> : replication and so don't know how it handes the deletion of all
> : segments).
>
> it works fine ... from a replication standpoint doing a full rebuild like
> this where you delete everything and then readd them is no worse then
> doing an optimize ... all of hte files the slave use to have go away, and
> you push out all new files.
>
>
> -Hoss
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to assure a permanent index.

Chris Hostetter-3

: Where can I find some information about snappulling?

http://wiki.apache.org/solr/CollectionDistribution

-Hoss