How do I create a solr core with the data from an existing one?

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

How do I create a solr core with the data from an existing one?

Steve Dupree
*Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
of the core, and then swapping it in for the main core. I tried following
these steps:

   1. Create prep core:
   http://localhost:8983/solr/admin/cores?action=CREATE&name=prep&instanceDir=main
   2. Perform index update, then commit/optimize on prep core.
   3. Swap main and prep core:
   http://localhost:8983/solr/admin/cores?action=SWAP&core=main&other=prep
   4. Unload prep core:
   http://localhost:8983/solr/admin/cores?action=UNLOAD&core=prep

The problem I am having is, the core created in step 1 doesn't have any data
in it. If I am going to do a full index of everything and the kitchen sink,
that would be fine, but if I just want to update a (large) subset of the
documents - that's obviously not going to work.

(I could merge the cores, but part of what I'm trying to do is get rid of
any deleted documents without trying to make a list of them.)

Is there some flag to the CREATE action that I'm missing? The Solr Wiki page
for CoreAdmin <http://wiki.apache.org/solr/CoreAdmin> is a little sparse on
details.

Is this approach wrong? I found at least one message on this list that
stated that performing updates in a separate core on the same machine won't
help, given that they're both using the same CPU. Is that true?
thanks in advance
~stannius
Reply | Threaded
Open this post in threaded view
|

Re: How do I create a solr core with the data from an existing one?

Gijs Kunze
Hi,

I'm not sure if it's the best option but you could use replication to
copy the index (http://wiki.apache.org/solr/SolrReplication). As long as
you core is configured as a master you can use the fetchindex command to
do a one-time replication from the new core (see the HTTP API section in
the wiki page).

Regards,

gwk


On 3/24/2010 5:31 PM, Steve Dupree wrote:

> *Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
> of the core, and then swapping it in for the main core. I tried following
> these steps:
>
>     1. Create prep core:
>     http://localhost:8983/solr/admin/cores?action=CREATE&name=prep&instanceDir=main
>     2. Perform index update, then commit/optimize on prep core.
>     3. Swap main and prep core:
>     http://localhost:8983/solr/admin/cores?action=SWAP&core=main&other=prep
>     4. Unload prep core:
>     http://localhost:8983/solr/admin/cores?action=UNLOAD&core=prep
>
> The problem I am having is, the core created in step 1 doesn't have any data
> in it. If I am going to do a full index of everything and the kitchen sink,
> that would be fine, but if I just want to update a (large) subset of the
> documents - that's obviously not going to work.
>
> (I could merge the cores, but part of what I'm trying to do is get rid of
> any deleted documents without trying to make a list of them.)
>
> Is there some flag to the CREATE action that I'm missing? The Solr Wiki page
> for CoreAdmin<http://wiki.apache.org/solr/CoreAdmin>  is a little sparse on
> details.
>
> Is this approach wrong? I found at least one message on this list that
> stated that performing updates in a separate core on the same machine won't
> help, given that they're both using the same CPU. Is that true?
> thanks in advance
> ~stannius
>
>    
Reply | Threaded
Open this post in threaded view
|

Re: How do I create a solr core with the data from an existing one?

Chris Hostetter-3
In reply to this post by Steve Dupree

: *Solr 1.4 Enterprise Search Server* recommends doing large updates on a copy
: of the core, and then swapping it in for the main core. I tried following
        ...
: The problem I am having is, the core created in step 1 doesn't have any data
: in it. If I am going to do a full index of everything and the kitchen sink,
: that would be fine, but if I just want to update a (large) subset of the
: documents - that's obviously not going to work.

that's really the point of that recommendation -- it's a way to compleltey
rebuild without any downtime (the old core keeps serving requests until
the new one is completely ready)

If you are just updating some of the docs (even if it's a large "some")
you should just updating hte existing core.

if you really want to "clone" the data in a core, then replication is
really the only way to do that currently.  Replicating to a "query
machine" instead of having clients query the "master" you are updating
directly is usually a good idea for lots of reasons -- but in this case
you could always temporarily disable replication, make your large batch
changes to the master, and then renable the replciation so the query boxes
only see the changes when they are all done.


-Hoss