Copying part of index directory

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

Copying part of index directory

Roopesh P Raj
Hi,

I am new to solr, lucene. In my project I want to copy index directory
based on some query (copy may not copy the whole index directory). I
came across a backup script in solr/bin folder but it seems to be
copying the whole index directory.

My query is, what is the procedure for copying a part of the index. Is
it like - do a query, get all the fields, make a new index with those
queried results? Or is there any other way ?

My project is in python and I am using the client =>
http://svn.apache.org/repos/asf/lucene/solr/trunk/client/python/solr.py.
I want to do the copy operation from python code. The above python
client, solr.py has a constructor which takes in host, solrBase,
persistent=True, postHeaders. Can we specify the location of the index
dir from python (through constructor or so) ?

Thanks
Roopesh






------------------
DigitalGlue, India



Reply | Threaded
Open this post in threaded view
|

Re: Copying part of index directory

Mike Klaas

On 15-Jun-07, at 4:25 AM, Roopesh P Raj wrote:

> Hi,
>
> I am new to solr, lucene. In my project I want to copy index  
> directory based on some query (copy may not copy the whole index  
> directory). I came across a backup script in solr/bin folder but it  
> seems to be copying the whole index directory.
>
> My query is, what is the procedure for copying a part of the index.  
> Is it like - do a query, get all the fields, make a new index with  
> those queried results? Or is there any other way ?

You can't easily copy parts of the physical file.  I suggest  
performing the query, fetching some unique key (application-
specific), and re-indexing those documents from your original  
source.  It can be done through Solr too, but you have to be careful  
to store all the relevant fields to being with.

> My project is in python and I am using the client => http://
> svn.apache.org/repos/asf/lucene/solr/trunk/client/python/solr.py. I  
> want to do the copy operation from python code. The above python  
> client, solr.py has a constructor which takes in host, solrBase,  
> persistent=True, postHeaders. Can we specify the location of the  
> index dir from python (through constructor or so) ?

No, the index dir is determined by solrconfig.xml of the Solr  
instance.  The python client can only be used to connect to an  
already-running instance.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Copying part of index directory

Roopesh P Raj
Mike Klaas wrote:

>
> On 15-Jun-07, at 4:25 AM, Roopesh P Raj wrote:
>
>> Hi,
>>
>> I am new to solr, lucene. In my project I want to copy index
>> directory based on some query (copy may not copy the whole index
>> directory). I came across a backup script in solr/bin folder but it
>> seems to be copying the whole index directory.
>>
>> My query is, what is the procedure for copying a part of the index.
>> Is it like - do a query, get all the fields, make a new index with
>> those queried results? Or is there any other way ?
>
> You can't easily copy parts of the physical file.  I suggest
> performing the query, fetching some unique key (application-specific),
> and re-indexing those documents from your original source.  It can be
> done through Solr too, but you have to be careful to store all the
> relevant fields to being with.
Thanks for the reply. I have one more query. My doubt is where to
re-index (location of the index directory) ? For this should I run
another instance of solr? Is this the preferred approach ?

Roopesh

------------------
DigitalGlue, India



Reply | Threaded
Open this post in threaded view
|

Re: Copying part of index directory

Mike Klaas
On 17-Jun-07, at 3:03 AM, Roopesh P Raj wrote:

>>
> Thanks for the reply. I have one more query. My doubt is where to  
> re-index (location of the index directory) ? For this should I run  
> another instance of solr? Is this the preferred approach ?

There is no preferred approach, this is dictated entirely by your  
requirements.  Since you wanted to create a new subindex, you'll have  
to set up another Solr instance somewhere.  Another machine, another  
webapp, etc.

-Mike