Merging Solr Indexes

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Merging Solr Indexes

vivek sar
Hi,

  As part of speeding up the index process I'm thinking of spawning
multiple threads which will write to different temporary SolrCores.
Once the index process is done I want to merge all the indexes in
temporary cores to a master core. For ex., if I want one SolrCore per
day then every index cycle I'll spawn 4 threads which will index into
some temporary index and once they are done I want to merge all these
into the day core. My questions,

1) I want to use the same schema and solrconfig.xml for all cores
without duplicating them - how do I do that?
2) How do I merge the temporary Solr cores into one master core
programmatically? I've read the wiki on "MergingSolrIndexes", but I
want to do it programmatically (like in Lucene -
writer.addIndexes(..)) once the temporary indices are done.
3) Can I remove the temporary indices once the merge process is done?
4) Is this the right strategy to speed up indexing?

Thanks,
-vivek
Reply | Threaded
Open this post in threaded view
|

Re: Merging Solr Indexes

Otis Gospodnetic-2

Let me start with 4)
Have you tried simply using multiple threads to send your docs to a single Solr instance/core?  You should get about the same performance as what you are trying with your approach below, but without the headache of managing multiple cores and index merging (not yet possible to do programatically).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: vivek sar <[hidden email]>
> To: [hidden email]
> Sent: Tuesday, March 31, 2009 1:59:01 PM
> Subject: Merging Solr Indexes
>
> Hi,
>
>   As part of speeding up the index process I'm thinking of spawning
> multiple threads which will write to different temporary SolrCores.
> Once the index process is done I want to merge all the indexes in
> temporary cores to a master core. For ex., if I want one SolrCore per
> day then every index cycle I'll spawn 4 threads which will index into
> some temporary index and once they are done I want to merge all these
> into the day core. My questions,
>
> 1) I want to use the same schema and solrconfig.xml for all cores
> without duplicating them - how do I do that?
> 2) How do I merge the temporary Solr cores into one master core
> programmatically? I've read the wiki on "MergingSolrIndexes", but I
> want to do it programmatically (like in Lucene -
> writer.addIndexes(..)) once the temporary indices are done.
> 3) Can I remove the temporary indices once the merge process is done?
> 4) Is this the right strategy to speed up indexing?
>
> Thanks,
> -vivek

Reply | Threaded
Open this post in threaded view
|

Re: Merging Solr Indexes

vivek sar
Thanks Otis. Could you write to same core (same index) from multiple
threads at the same time? I thought each writer would lock the index
so other can not write at the same time. I'll try it though.

Another reason of putting indexes in separate core was to limit the
index size. Our index can grow up to 50G a day, so I was hoping
writing to smaller indexes would be faster in separate cores and if
needed I can merge them at later point (like end of day). I want to
keep daily cores. Isn't this a good idea? How else can I limit the
index size (beside multiple instances or separate boxes).

Thanks,
-vivek


On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
<[hidden email]> wrote:

>
> Let me start with 4)
> Have you tried simply using multiple threads to send your docs to a single Solr instance/core?  You should get about the same performance as what you are trying with your approach below, but without the headache of managing multiple cores and index merging (not yet possible to do programatically).
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: vivek sar <[hidden email]>
>> To: [hidden email]
>> Sent: Tuesday, March 31, 2009 1:59:01 PM
>> Subject: Merging Solr Indexes
>>
>> Hi,
>>
>>   As part of speeding up the index process I'm thinking of spawning
>> multiple threads which will write to different temporary SolrCores.
>> Once the index process is done I want to merge all the indexes in
>> temporary cores to a master core. For ex., if I want one SolrCore per
>> day then every index cycle I'll spawn 4 threads which will index into
>> some temporary index and once they are done I want to merge all these
>> into the day core. My questions,
>>
>> 1) I want to use the same schema and solrconfig.xml for all cores
>> without duplicating them - how do I do that?
>> 2) How do I merge the temporary Solr cores into one master core
>> programmatically? I've read the wiki on "MergingSolrIndexes", but I
>> want to do it programmatically (like in Lucene -
>> writer.addIndexes(..)) once the temporary indices are done.
>> 3) Can I remove the temporary indices once the merge process is done?
>> 4) Is this the right strategy to speed up indexing?
>>
>> Thanks,
>> -vivek
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Merging Solr Indexes

Otis Gospodnetic-2

Hi,

Yes, you can write to the same index from multiple threads.  You still need to keep track of the index size manually, whether you create 1 or N indices/cores.  I'd go with a single index first.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



----- Original Message ----

> From: vivek sar <[hidden email]>
> To: [hidden email]
> Sent: Wednesday, April 1, 2009 4:26:04 AM
> Subject: Re: Merging Solr Indexes
>
> Thanks Otis. Could you write to same core (same index) from multiple
> threads at the same time? I thought each writer would lock the index
> so other can not write at the same time. I'll try it though.
>
> Another reason of putting indexes in separate core was to limit the
> index size. Our index can grow up to 50G a day, so I was hoping
> writing to smaller indexes would be faster in separate cores and if
> needed I can merge them at later point (like end of day). I want to
> keep daily cores. Isn't this a good idea? How else can I limit the
> index size (beside multiple instances or separate boxes).
>
> Thanks,
> -vivek
>
>
> On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
> wrote:
> >
> > Let me start with 4)
> > Have you tried simply using multiple threads to send your docs to a single
> Solr instance/core?  You should get about the same performance as what you are
> trying with your approach below, but without the headache of managing multiple
> cores and index merging (not yet possible to do programatically).
> >
> > Otis
> > --
> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >
> >
> >
> > ----- Original Message ----
> >> From: vivek sar
> >> To: [hidden email]
> >> Sent: Tuesday, March 31, 2009 1:59:01 PM
> >> Subject: Merging Solr Indexes
> >>
> >> Hi,
> >>
> >>   As part of speeding up the index process I'm thinking of spawning
> >> multiple threads which will write to different temporary SolrCores.
> >> Once the index process is done I want to merge all the indexes in
> >> temporary cores to a master core. For ex., if I want one SolrCore per
> >> day then every index cycle I'll spawn 4 threads which will index into
> >> some temporary index and once they are done I want to merge all these
> >> into the day core. My questions,
> >>
> >> 1) I want to use the same schema and solrconfig.xml for all cores
> >> without duplicating them - how do I do that?
> >> 2) How do I merge the temporary Solr cores into one master core
> >> programmatically? I've read the wiki on "MergingSolrIndexes", but I
> >> want to do it programmatically (like in Lucene -
> >> writer.addIndexes(..)) once the temporary indices are done.
> >> 3) Can I remove the temporary indices once the merge process is done?
> >> 4) Is this the right strategy to speed up indexing?
> >>
> >> Thanks,
> >> -vivek
> >
> >

Reply | Threaded
Open this post in threaded view
|

Re: Merging Solr Indexes

Ning Li-3
There is a jira issue on supporting index merge:
https://issues.apache.org/jira/browse/SOLR-1051.
But I agree with Otis that you should go with a single index first.

Cheers,
Ning


On Wed, Apr 1, 2009 at 12:06 PM, Otis Gospodnetic
<[hidden email]> wrote:

>
> Hi,
>
> Yes, you can write to the same index from multiple threads.  You still need to keep track of the index size manually, whether you create 1 or N indices/cores.  I'd go with a single index first.
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
>> From: vivek sar <[hidden email]>
>> To: [hidden email]
>> Sent: Wednesday, April 1, 2009 4:26:04 AM
>> Subject: Re: Merging Solr Indexes
>>
>> Thanks Otis. Could you write to same core (same index) from multiple
>> threads at the same time? I thought each writer would lock the index
>> so other can not write at the same time. I'll try it though.
>>
>> Another reason of putting indexes in separate core was to limit the
>> index size. Our index can grow up to 50G a day, so I was hoping
>> writing to smaller indexes would be faster in separate cores and if
>> needed I can merge them at later point (like end of day). I want to
>> keep daily cores. Isn't this a good idea? How else can I limit the
>> index size (beside multiple instances or separate boxes).
>>
>> Thanks,
>> -vivek
>>
>>
>> On Tue, Mar 31, 2009 at 8:28 PM, Otis Gospodnetic
>> wrote:
>> >
>> > Let me start with 4)
>> > Have you tried simply using multiple threads to send your docs to a single
>> Solr instance/core?  You should get about the same performance as what you are
>> trying with your approach below, but without the headache of managing multiple
>> cores and index merging (not yet possible to do programatically).
>> >
>> > Otis
>> > --
>> > Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>> >
>> >
>> >
>> > ----- Original Message ----
>> >> From: vivek sar
>> >> To: [hidden email]
>> >> Sent: Tuesday, March 31, 2009 1:59:01 PM
>> >> Subject: Merging Solr Indexes
>> >>
>> >> Hi,
>> >>
>> >>   As part of speeding up the index process I'm thinking of spawning
>> >> multiple threads which will write to different temporary SolrCores.
>> >> Once the index process is done I want to merge all the indexes in
>> >> temporary cores to a master core. For ex., if I want one SolrCore per
>> >> day then every index cycle I'll spawn 4 threads which will index into
>> >> some temporary index and once they are done I want to merge all these
>> >> into the day core. My questions,
>> >>
>> >> 1) I want to use the same schema and solrconfig.xml for all cores
>> >> without duplicating them - how do I do that?
>> >> 2) How do I merge the temporary Solr cores into one master core
>> >> programmatically? I've read the wiki on "MergingSolrIndexes", but I
>> >> want to do it programmatically (like in Lucene -
>> >> writer.addIndexes(..)) once the temporary indices are done.
>> >> 3) Can I remove the temporary indices once the merge process is done?
>> >> 4) Is this the right strategy to speed up indexing?
>> >>
>> >> Thanks,
>> >> -vivek
>> >
>> >
>
>