Recreating SOLR index after a schema change - without having to re-post the data

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Recreating SOLR index after a schema change - without having to re-post the data

Vannia Rajan
Hi,

  We are using solr-server for a large data-set. We need some changes in
solr schema.xml (datatype change from integer to sint for few fields). It
turns out that the two datatypes (integer and sint) are incompatible and
hence we need to re-index SOLR.

My question is:
   Is there any way by which i can just re-create the index files for
existing data/documents in solr? (without having to re-post the documents)

   I searched through many forums and everything seems to say : "I have to
re-post ALL documents to solr for re-indexing". Please suggest me a better
alternative to achieve my schema-change (I have very large solr-index -
sized around 10GB and it will be tough to query the whole data-set, store it
somewhere as XMLs and then to repost)

--
Thanks,
Vanniarajan
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Tim Sell
That really is the only way, it would be far easier if you were
importing from another source.
Are you using solr as a data store?

It is not possible via solr to change existing documents in a solr
index. It would be a nice feature though.

~Tim.

2009/7/31 Vannia Rajan <[hidden email]>:

> Hi,
>
>  We are using solr-server for a large data-set. We need some changes in
> solr schema.xml (datatype change from integer to sint for few fields). It
> turns out that the two datatypes (integer and sint) are incompatible and
> hence we need to re-index SOLR.
>
> My question is:
>   Is there any way by which i can just re-create the index files for
> existing data/documents in solr? (without having to re-post the documents)
>
>   I searched through many forums and everything seems to say : "I have to
> re-post ALL documents to solr for re-indexing". Please suggest me a better
> alternative to achieve my schema-change (I have very large solr-index -
> sized around 10GB and it will be tough to query the whole data-set, store it
> somewhere as XMLs and then to repost)
>
> --
> Thanks,
> Vanniarajan
>
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Erik Hatcher
In reply to this post by Vannia Rajan
You'll have to reindex your documents from scratch.  Such is the  
nature of changing the schema of an index.  It's always a great idea  
(in fact, I'd say mandatory) to have a full reindex process handy.

        Erik


On Jul 31, 2009, at 2:37 AM, Vannia Rajan wrote:

> Hi,
>
>  We are using solr-server for a large data-set. We need some changes  
> in
> solr schema.xml (datatype change from integer to sint for few  
> fields). It
> turns out that the two datatypes (integer and sint) are incompatible  
> and
> hence we need to re-index SOLR.
>
> My question is:
>   Is there any way by which i can just re-create the index files for
> existing data/documents in solr? (without having to re-post the  
> documents)
>
>   I searched through many forums and everything seems to say : "I  
> have to
> re-post ALL documents to solr for re-indexing". Please suggest me a  
> better
> alternative to achieve my schema-change (I have very large solr-
> index -
> sized around 10GB and it will be tough to query the whole data-set,  
> store it
> somewhere as XMLs and then to repost)
>
> --
> Thanks,
> Vanniarajan

Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Vannia Rajan
In reply to this post by Tim Sell
On Fri, Jul 31, 2009 at 3:17 PM, Tim Sell <[hidden email]> wrote:

> Are you using solr as a data store?
>

No, data comes from somewhere else, solr is just for indexing giving back
query results.

>
> It is not possible via solr to change existing documents in a solr
> index. It would be a nice feature though.
>

Yes, it would be a nice feature. Is there any particular url where i can
submit new feature requests to solr? (The feature is: separating the "index"
and "data" in solr and if "index" is not available, re-create it from the
"data" automatically)

--
Thanks,
Vanniarajan
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Vannia Rajan
In reply to this post by Erik Hatcher
On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <[hidden email]>wrote:

> You'll have to reindex your documents from scratch.  Such is the nature of
> changing the schema of an index.  It's always a great idea (in fact, I'd say
> mandatory) to have a full reindex process handy.
>
>
Thank you for your response. Yes, i need to make the setup handy to query &
repost to solr - till this new feature is included in SOLR.

--
Thanks,
Vanniarajan
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Erik Hatcher

On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:

> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <[hidden email]
> >wrote:
>
>> You'll have to reindex your documents from scratch.  Such is the  
>> nature of
>> changing the schema of an index.  It's always a great idea (in  
>> fact, I'd say
>> mandatory) to have a full reindex process handy.
>>
>>
> Thank you for your response. Yes, i need to make the setup handy to  
> query &
> repost to solr - till this new feature is included in SOLR.

It's only tractable to do this if the original field values are  
stored, which is quite prohibitive in many cases.  So I don't think  
this is a feature that you'll see in Solr any time soon.

        Erik

Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

estauthamer
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Shalin Shekhar Mangar
In reply to this post by Erik Hatcher
On Fri, Jul 31, 2009 at 6:29 PM, Erik Hatcher <[hidden email]>wrote:

>
> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>
>  On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <[hidden email]
>> >wrote:
>>
>>  You'll have to reindex your documents from scratch.  Such is the nature
>>> of
>>> changing the schema of an index.  It's always a great idea (in fact, I'd
>>> say
>>> mandatory) to have a full reindex process handy.
>>>
>>>
>>>  Thank you for your response. Yes, i need to make the setup handy to
>> query &
>> repost to solr - till this new feature is included in SOLR.
>>
>
> It's only tractable to do this if the original field values are stored,
> which is quite prohibitive in many cases.  So I don't think this is a
> feature that you'll see in Solr any time soon.
>

Yes, it would be more expensive. However for those wishing for such a
feature, there are two issues:

https://issues.apache.org/jira/browse/SOLR-828
https://issues.apache.org/jira/browse/SOLR-139

--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Chantal Ackermann
In reply to this post by estauthamer
Hi Edwin,

what prevents you of storing the data (possibly formatted in SOLR xml
input format) yourself on some disk?

Cheers,
Chantal

Edwin Stauthamer schrieb:

> That is a shame. I have much experience with Autonomy IDOL and the
> possibility of quickly reindexing the content without making a call to the
> original source is great. Just Export, update the config, and import
> (=reindex) to see if, for instance the performance is better or just to
> transport the information to an other server.
>
> This can only be done of course when there are no fields added etc.
>
> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <[hidden email]>wrote:
>
>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>>
>>  On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <[hidden email]
>>>> wrote:
>>>  You'll have to reindex your documents from scratch.  Such is the nature
>>>> of
>>>> changing the schema of an index.  It's always a great idea (in fact, I'd
>>>> say
>>>> mandatory) to have a full reindex process handy.
>>>>
>>>>
>>>>  Thank you for your response. Yes, i need to make the setup handy to
>>> query &
>>> repost to solr - till this new feature is included in SOLR.
>>>
>> It's only tractable to do this if the original field values are stored,
>> which is quite prohibitive in many cases.  So I don't think this is a
>> feature that you'll see in Solr any time soon.
>>
>>        Erik
>>
>>
>
>
> --
> Met vriendelijke groet / Kind regards,
>
> Edwin Stauthamer
> Adviser Search & Collaboration
> Emid Consult
> T: +31 (0) 70 8870700
> M: +31 (0) 6 4555 4994
> E: [hidden email]
> I: http://www.emidconsult.com

--
Chantal Ackermann
Consultant

mobil    +49 (176) 10 00 09 45
email    [hidden email]

--------------------------------------------------------------------------------------------------------

b.telligent GmbH & Co. KG
Lichtenbergstraße 8
D-85748 Garching / München

fon       +49 (89) 54 84 25 60
fax        +49 (89) 54 84 25 69
web      www.btelligent.de

Registered in Munich: HRA 84393
Managing Director: b.telligent Verwaltungs GmbH, HRB 153164 represented
by Sebastian Amtage and Klaus Blaschek
USt.Id.-Nr. DE814054803



Confidentiality Note
This email is intended only for the use of the individual or entity to
which it is addressed, and may contain information that is privileged,
confidential and exempt from disclosure under applicable law. If the
reader of this email message is not the intended recipient, or the
employee or agent responsible for delivery of the message to the
intended recipient, you are hereby notified that any dissemination,
distribution or copying of this communication is prohibited. If you have
received this email in error, please notify us immediately by telephone
at +49 (0) 89 54 84 25 60. Thank you.
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Erik Hatcher
In reply to this post by estauthamer
There certainly could be some intermediate storage of documents prior  
to indexing, but as far as the Lucene index goes it is inherently a  
one-way process.  Solr could facilitate this pretty easily... with an  
update processor that wrote the documents coming in to some other  
storage (one option: simple Solr XML files on the filesystem).  So  
hope is not lost.

        Erik



On Jul 31, 2009, at 9:07 AM, Edwin Stauthamer wrote:

> That is a shame. I have much experience with Autonomy IDOL and the
> possibility of quickly reindexing the content without making a call  
> to the
> original source is great. Just Export, update the config, and import
> (=reindex) to see if, for instance the performance is better or just  
> to
> transport the information to an other server.
>
> This can only be done of course when there are no fields added etc.
>
> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <[hidden email]
> >wrote:
>
>>
>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
>>
>> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <[hidden email]
>>>> wrote:
>>>
>>> You'll have to reindex your documents from scratch.  Such is the  
>>> nature
>>>> of
>>>> changing the schema of an index.  It's always a great idea (in  
>>>> fact, I'd
>>>> say
>>>> mandatory) to have a full reindex process handy.
>>>>
>>>>
>>>> Thank you for your response. Yes, i need to make the setup handy to
>>> query &
>>> repost to solr - till this new feature is included in SOLR.
>>>
>>
>> It's only tractable to do this if the original field values are  
>> stored,
>> which is quite prohibitive in many cases.  So I don't think this is a
>> feature that you'll see in Solr any time soon.
>>
>>       Erik
>>
>>
>
>
> --
> Met vriendelijke groet / Kind regards,
>
> Edwin Stauthamer
> Adviser Search & Collaboration
> Emid Consult
> T: +31 (0) 70 8870700
> M: +31 (0) 6 4555 4994
> E: [hidden email]
> I: http://www.emidconsult.com

Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

estauthamer
CONTENTS DELETED
The author has deleted this message.
Reply | Threaded
Open this post in threaded view
|

Re: Recreating SOLR index after a schema change - without having to re-post the data

Bill Au
The CSVLoader is very fast but it doesn't support document or field boosting
at index time.  If you don't need that you can also generate input data to
Solr into file(s) to be loaded by the CSVLoader.  Just reload whenever you
change the schema.  You will need to regenerate data if you add/remove
fields.  But you can simple reload from existing input file(s) if you are
only changing the properties of a field.

Bill

On Fri, Jul 31, 2009 at 9:41 AM, Edwin Stauthamer <
[hidden email]> wrote:

> Simple but effective ;-)
>
> On Fri, Jul 31, 2009 at 3:23 PM, Erik Hatcher <[hidden email]
> >wrote:
>
> > There certainly could be some intermediate storage of documents prior to
> > indexing, but as far as the Lucene index goes it is inherently a one-way
> > process.  Solr could facilitate this pretty easily... with an update
> > processor that wrote the documents coming in to some other storage (one
> > option: simple Solr XML files on the filesystem).  So hope is not lost.
> >
> >        Erik
> >
> >
> >
> >
> > On Jul 31, 2009, at 9:07 AM, Edwin Stauthamer wrote:
> >
> >  That is a shame. I have much experience with Autonomy IDOL and the
> >> possibility of quickly reindexing the content without making a call to
> the
> >> original source is great. Just Export, update the config, and import
> >> (=reindex) to see if, for instance the performance is better or just to
> >> transport the information to an other server.
> >>
> >> This can only be done of course when there are no fields added etc.
> >>
> >> On Fri, Jul 31, 2009 at 2:59 PM, Erik Hatcher <
> [hidden email]
> >> >wrote:
> >>
> >>
> >>> On Jul 31, 2009, at 7:01 AM, Vannia Rajan wrote:
> >>>
> >>> On Fri, Jul 31, 2009 at 3:22 PM, Erik Hatcher <
> >>> [hidden email]
> >>>
> >>>> wrote:
> >>>>>
> >>>>
> >>>> You'll have to reindex your documents from scratch.  Such is the
> nature
> >>>>
> >>>>> of
> >>>>> changing the schema of an index.  It's always a great idea (in fact,
> >>>>> I'd
> >>>>> say
> >>>>> mandatory) to have a full reindex process handy.
> >>>>>
> >>>>>
> >>>>> Thank you for your response. Yes, i need to make the setup handy to
> >>>>>
> >>>> query &
> >>>> repost to solr - till this new feature is included in SOLR.
> >>>>
> >>>>
> >>> It's only tractable to do this if the original field values are stored,
> >>> which is quite prohibitive in many cases.  So I don't think this is a
> >>> feature that you'll see in Solr any time soon.
> >>>
> >>>      Erik
> >>>
> >>>
> >>>
> >>
> >> --
> >> Met vriendelijke groet / Kind regards,
> >>
> >> Edwin Stauthamer
> >> Adviser Search & Collaboration
> >> Emid Consult
> >> T: +31 (0) 70 8870700
> >> M: +31 (0) 6 4555 4994
> >> E: [hidden email]
> >> I: http://www.emidconsult.com
> >>
> >
> >
>
>
> --
> Met vriendelijke groet / Kind regards,
>
> Edwin Stauthamer
> Adviser Search & Collaboration
> Emid Consult
> T: +31 (0) 70 8870700
> M: +31 (0) 6 4555 4994
> E: [hidden email]
> I: http://www.emidconsult.com
>