Import data from standalone solr into a solrcloud collection

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Import data from standalone solr into a solrcloud collection

Sushant Vengurlekar
I created a solr cloud collection with 2 shards and a replication factor of
2. How can I load data into this collection which I have currently stored
in a core on a standalone solr. I used the conf from this core on
standalone solr to create the collection on the solrcloud

Thank you
Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Erick Erickson
Probably the easiest way would be to recreate your collection with 1
shard. Then copy the index from your standalone setup.

After verifying your setup, use the Collections SPLITSHARD command.

Best,
Erick

On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
<[hidden email]> wrote:
> I created a solr cloud collection with 2 shards and a replication factor of
> 2. How can I load data into this collection which I have currently stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud
>
> Thank you
Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Sushant Vengurlekar
Thank you Eric.

In the create collection command I need to set the replication factor
though correct?

On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <[hidden email]>
wrote:

> Probably the easiest way would be to recreate your collection with 1
> shard. Then copy the index from your standalone setup.
>
> After verifying your setup, use the Collections SPLITSHARD command.
>
> Best,
> Erick
>
> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
> <[hidden email]> wrote:
> > I created a solr cloud collection with 2 shards and a replication factor
> of
> > 2. How can I load data into this collection which I have currently stored
> > in a core on a standalone solr. I used the conf from this core on
> > standalone solr to create the collection on the solrcloud
> >
> > Thank you
>
Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Aroop Ganguly
Hi Sushant

replicationFactor defaults to 1 and is not mandatory.
numShards is mandatory, where you’d equate it to 1.

Aroop

> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <[hidden email]> wrote:
>
> Thank you Eric.
>
> In the create collection command I need to set the replication factor
> though correct?
>
> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <[hidden email]>
> wrote:
>
>> Probably the easiest way would be to recreate your collection with 1
>> shard. Then copy the index from your standalone setup.
>>
>> After verifying your setup, use the Collections SPLITSHARD command.
>>
>> Best,
>> Erick
>>
>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>> <[hidden email]> wrote:
>>> I created a solr cloud collection with 2 shards and a replication factor
>> of
>>> 2. How can I load data into this collection which I have currently stored
>>> in a core on a standalone solr. I used the conf from this core on
>>> standalone solr to create the collection on the solrcloud
>>>
>>> Thank you
>>

Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Sushant Vengurlekar
Thank you Aroop

After I import the data into the collection from the standalone solr core I
want to split it into 2 shards across 2 nodes that I have. So I will have
to set replicationfactor of 2 & numShards =2 ?

On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly <[hidden email]>
wrote:

> Hi Sushant
>
> replicationFactor defaults to 1 and is not mandatory.
> numShards is mandatory, where you’d equate it to 1.
>
> Aroop
>
> > On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
> [hidden email]> wrote:
> >
> > Thank you Eric.
> >
> > In the create collection command I need to set the replication factor
> > though correct?
> >
> > On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <[hidden email]
> >
> > wrote:
> >
> >> Probably the easiest way would be to recreate your collection with 1
> >> shard. Then copy the index from your standalone setup.
> >>
> >> After verifying your setup, use the Collections SPLITSHARD command.
> >>
> >> Best,
> >> Erick
> >>
> >> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
> >> <[hidden email]> wrote:
> >>> I created a solr cloud collection with 2 shards and a replication
> factor
> >> of
> >>> 2. How can I load data into this collection which I have currently
> stored
> >>> in a core on a standalone solr. I used the conf from this core on
> >>> standalone solr to create the collection on the solrcloud
> >>>
> >>> Thank you
> >>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Aroop Ganguly
I see.
By definition of splitting, the new shards will have the same number of replicas as the original shard.
You could use the replicationFactor>=2 to ensure that both of your solr nodes are used.
You could also use the maxShardsPerNode parameter alone or in conjunction with the replicationFactor property to achieve your target state.



> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar <[hidden email]> wrote:
>
> Thank you Aroop
>
> After I import the data into the collection from the standalone solr core I
> want to split it into 2 shards across 2 nodes that I have. So I will have
> to set replicationfactor of 2 & numShards =2 ?
>
> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly <[hidden email]>
> wrote:
>
>> Hi Sushant
>>
>> replicationFactor defaults to 1 and is not mandatory.
>> numShards is mandatory, where you’d equate it to 1.
>>
>> Aroop
>>
>>> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>> [hidden email]> wrote:
>>>
>>> Thank you Eric.
>>>
>>> In the create collection command I need to set the replication factor
>>> though correct?
>>>
>>> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <[hidden email]
>>>
>>> wrote:
>>>
>>>> Probably the easiest way would be to recreate your collection with 1
>>>> shard. Then copy the index from your standalone setup.
>>>>
>>>> After verifying your setup, use the Collections SPLITSHARD command.
>>>>
>>>> Best,
>>>> Erick
>>>>
>>>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>>> <[hidden email]> wrote:
>>>>> I created a solr cloud collection with 2 shards and a replication
>> factor
>>>> of
>>>>> 2. How can I load data into this collection which I have currently
>> stored
>>>>> in a core on a standalone solr. I used the conf from this core on
>>>>> standalone solr to create the collection on the solrcloud
>>>>>
>>>>> Thank you
>>>>
>>
>>

Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Erick Erickson
Personally I'd start with a 1-shard, 1-replica collection (i.e. leader-only).

From there split the shard.

once all that had been done satisfactorily, just use the collections
API ADDREPLICA command to build out your collection to whatever degree
of redundancy you need.

Best,
Erick

On Tue, Jun 19, 2018 at 1:04 PM, Aroop Ganguly <[hidden email]> wrote:

> I see.
> By definition of splitting, the new shards will have the same number of replicas as the original shard.
> You could use the replicationFactor>=2 to ensure that both of your solr nodes are used.
> You could also use the maxShardsPerNode parameter alone or in conjunction with the replicationFactor property to achieve your target state.
>
>
>
>> On Jun 19, 2018, at 12:51 PM, Sushant Vengurlekar <[hidden email]> wrote:
>>
>> Thank you Aroop
>>
>> After I import the data into the collection from the standalone solr core I
>> want to split it into 2 shards across 2 nodes that I have. So I will have
>> to set replicationfactor of 2 & numShards =2 ?
>>
>> On Tue, Jun 19, 2018 at 12:46 PM Aroop Ganguly <[hidden email]>
>> wrote:
>>
>>> Hi Sushant
>>>
>>> replicationFactor defaults to 1 and is not mandatory.
>>> numShards is mandatory, where you’d equate it to 1.
>>>
>>> Aroop
>>>
>>>> On Jun 19, 2018, at 12:29 PM, Sushant Vengurlekar <
>>> [hidden email]> wrote:
>>>>
>>>> Thank you Eric.
>>>>
>>>> In the create collection command I need to set the replication factor
>>>> though correct?
>>>>
>>>> On Tue, Jun 19, 2018 at 11:14 AM Erick Erickson <[hidden email]
>>>>
>>>> wrote:
>>>>
>>>>> Probably the easiest way would be to recreate your collection with 1
>>>>> shard. Then copy the index from your standalone setup.
>>>>>
>>>>> After verifying your setup, use the Collections SPLITSHARD command.
>>>>>
>>>>> Best,
>>>>> Erick
>>>>>
>>>>> On Tue, Jun 19, 2018 at 10:50 AM, Sushant Vengurlekar
>>>>> <[hidden email]> wrote:
>>>>>> I created a solr cloud collection with 2 shards and a replication
>>> factor
>>>>> of
>>>>>> 2. How can I load data into this collection which I have currently
>>> stored
>>>>>> in a core on a standalone solr. I used the conf from this core on
>>>>>> standalone solr to create the collection on the solrcloud
>>>>>>
>>>>>> Thank you
>>>>>
>>>
>>>
>
Reply | Threaded
Open this post in threaded view
|

Re: Import data from standalone solr into a solrcloud collection

Shawn Heisey-2
In reply to this post by Sushant Vengurlekar
On 6/19/2018 11:50 AM, Sushant Vengurlekar wrote:
> I created a solr cloud collection with 2 shards and a replication factor of
> 2. How can I load data into this collection which I have currently stored
> in a core on a standalone solr. I used the conf from this core on
> standalone solr to create the collection on the solrcloud

Erick's suggestion of creating a collection with one shard and one
replica, then splitting the shard and adding replicas is one solution. 
If properly executed, it can work very well.

Another possibility is to create the collection with the number of
shards and replicas that you want right up front and then use the
dataimport handler to import documents from the standalone Solr.  One of
the sources you can use with DIH is another Solr install.

https://lucene.apache.org/solr/guide/6_6/uploading-structured-data-store-data-with-the-data-import-handler.html#solrentityprocessor

If you're using a new enough version of SolrCloud (6.4 or later), you
should definitely be using cursorMark in the DIH config and a sort
parameter that includes a sort on the uniqueKey field.

Thanks,
Shawn