How to routing document for send to particular shard range

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

How to routing document for send to particular shard range

Ketan Thanki
Hi,

I have requirement now quite different as I need to set routing key hash for document which confirm it to send to particular shard as its range.

I have solrcloud configuration with 4 shard  & 4 replica with below shard range.
shard1: 80000000-bfffffff
shard2: c0000000-ffffffff
shard3: 0-3fffffff
shard4: 40000000-7fffffff

e.g: below show the project  works in organization which is my routing key.
Org1= works for project1,project2
Org2=works for project3
Org3=works for project4
Org4=project5

So as mentions above I want to index org1 to shard1,org2 to shard2,org3 to shard3,org4 to shard4 meanwhile send it to particular shard.
How could I manage compositeId routing to do this.

Regards,
Ketan.
Please cast a vote for Asite in the 2017 Construction Computing Awards: Click here to Vote<http://caddealer.com/concompawards/index.php?page=cca2017vote>

[CC Award Winners!]

Reply | Threaded
Open this post in threaded view
|

Re: How to routing document for send to particular shard range

Amrit Sarkar
Ketan,

If you know defined indexing architecture; isn't it better to use
"implicit" router by writing logic on your own end.

If the document is of "Org1", send the document with extra param*
"_route_:shard1"* and likewise.

Snippet from official doc:
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
:

If you created the collection and defined the "implicit" router at the time
> of creation, you can additionally define a router.field parameter to use a
> field from each document to identify a shard where the document belongs. If
> the field specified is missing in the document, however, the document will
> be rejected. You could also use the _route_ parameter to name a specific
> shard.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:

> Hi,
>
> I have requirement now quite different as I need to set routing key hash
> for document which confirm it to send to particular shard as its range.
>
> I have solrcloud configuration with 4 shard  & 4 replica with below shard
> range.
> shard1: 80000000-bfffffff
> shard2: c0000000-ffffffff
> shard3: 0-3fffffff
> shard4: 40000000-7fffffff
>
> e.g: below show the project  works in organization which is my routing key.
> Org1= works for project1,project2
> Org2=works for project3
> Org3=works for project4
> Org4=project5
>
> So as mentions above I want to index org1 to shard1,org2 to shard2,org3 to
> shard3,org4 to shard4 meanwhile send it to particular shard.
> How could I manage compositeId routing to do this.
>
> Regards,
> Ketan.
> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> Click here to Vote<http://caddealer.com/concompawards/index.php?page=
> cca2017vote>
>
> [CC Award Winners!]
>
>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
Thanks Amrit,

For suggesting me the approach.

I have got some understanding regarding to it and i need to implement implicit routing for specific shard based. I have try by make changes on core.properties. but it can't work So can you please let me for the configuration changes needed. Is it need to create extra field for document to rout?

I have below configuration Collection created manually:
1: Workset with 4 shard and 4 replica
2: Model with 4 shard and 4 replica


For e.g Core.properties for 1 shard :
Workset Colection:
name=workset
shard=shard1
collection=workset

Model Collection:
name=model
shard=shard1
collection=model


So can u please let me the changes needed in configuration for the implicit routing.

Please do needful.

Regards,


-----Original Message-----
From: Amrit Sarkar [mailto:[hidden email]]
Sent: Wednesday, November 08, 2017 12:36 PM
To: [hidden email]
Subject: Re: How to routing document for send to particular shard range

Ketan,

If you know defined indexing architecture; isn't it better to use "implicit" router by writing logic on your own end.

If the document is of "Org1", send the document with extra param*
"_route_:shard1"* and likewise.

Snippet from official doc:
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
:

If you created the collection and defined the "implicit" router at the time
> of creation, you can additionally define a router.field parameter to
> use a field from each document to identify a shard where the document
> belongs. If the field specified is missing in the document, however,
> the document will be rejected. You could also use the _route_
> parameter to name a specific shard.



Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:

> Hi,
>
> I have requirement now quite different as I need to set routing key
> hash for document which confirm it to send to particular shard as its range.
>
> I have solrcloud configuration with 4 shard  & 4 replica with below
> shard range.
> shard1: 80000000-bfffffff
> shard2: c0000000-ffffffff
> shard3: 0-3fffffff
> shard4: 40000000-7fffffff
>
> e.g: below show the project  works in organization which is my routing key.
> Org1= works for project1,project2
> Org2=works for project3
> Org3=works for project4
> Org4=project5
>
> So as mentions above I want to index org1 to shard1,org2 to
> shard2,org3 to
> shard3,org4 to shard4 meanwhile send it to particular shard.
> How could I manage compositeId routing to do this.
>
> Regards,
> Ketan.
> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> Click here to Vote<http://caddealer.com/concompawards/index.php?page=
> cca2017vote>
>
> [CC Award Winners!]
>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to routing document for send to particular shard range

Erick Erickson
You cannot just make configuration changes, whether you use implicit
or compositeId is defined when you _create_ the collection and cannot
be changed later.

You need to create a new collection and specify router.name=implicit
when you create it. Then you can route documents as you desire.

I would caution against this though. If you use implicit routing _you_
have to insure balancing. For instance, you could have 10,000,000
documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced
shards.

Implicit routing is particularly useful for time-series indexing,
where you, say, index a day's worth of documents to each shard. It may
be appropriate in your case, but so far you haven't told us _why_ you
think routing docs to particular shards is desirable.

Best,
Erick

On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:

> Thanks Amrit,
>
> For suggesting me the approach.
>
> I have got some understanding regarding to it and i need to implement implicit routing for specific shard based. I have try by make changes on core.properties. but it can't work So can you please let me for the configuration changes needed. Is it need to create extra field for document to rout?
>
> I have below configuration Collection created manually:
> 1: Workset with 4 shard and 4 replica
> 2: Model with 4 shard and 4 replica
>
>
> For e.g Core.properties for 1 shard :
> Workset Colection:
> name=workset
> shard=shard1
> collection=workset
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
>
>
> So can u please let me the changes needed in configuration for the implicit routing.
>
> Please do needful.
>
> Regards,
>
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Wednesday, November 08, 2017 12:36 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard range
>
> Ketan,
>
> If you know defined indexing architecture; isn't it better to use "implicit" router by writing logic on your own end.
>
> If the document is of "Org1", send the document with extra param*
> "_route_:shard1"* and likewise.
>
> Snippet from official doc:
> https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> :
>
> If you created the collection and defined the "implicit" router at the time
>> of creation, you can additionally define a router.field parameter to
>> use a field from each document to identify a shard where the document
>> belongs. If the field specified is missing in the document, however,
>> the document will be rejected. You could also use the _route_
>> parameter to name a specific shard.
>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:
>
>> Hi,
>>
>> I have requirement now quite different as I need to set routing key
>> hash for document which confirm it to send to particular shard as its range.
>>
>> I have solrcloud configuration with 4 shard  & 4 replica with below
>> shard range.
>> shard1: 80000000-bfffffff
>> shard2: c0000000-ffffffff
>> shard3: 0-3fffffff
>> shard4: 40000000-7fffffff
>>
>> e.g: below show the project  works in organization which is my routing key.
>> Org1= works for project1,project2
>> Org2=works for project3
>> Org3=works for project4
>> Org4=project5
>>
>> So as mentions above I want to index org1 to shard1,org2 to
>> shard2,org3 to
>> shard3,org4 to shard4 meanwhile send it to particular shard.
>> How could I manage compositeId routing to do this.
>>
>> Regards,
>> Ketan.
>> Please cast a vote for Asite in the 2017 Construction Computing Awards:
>> Click here to Vote<http://caddealer.com/concompawards/index.php?page=
>> cca2017vote>
>>
>> [CC Award Winners!]
>>
>>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
Hi Erik,

My requirement to index the documents of particular organization to specific shard. Also I have made changes in core.properties as menions below.

Model Collection:
name=model
shard=shard1
collection=model
router.name=implicit
router.field=core
shards=shard1,shard2

Workset Collection:
name=workset
shard=shard1
collection=workset
router.name=implicit
router.field=core
shards=shard1,shard2

here I have also created new field 'core' which value is any shard where I need to send documents and on retrieval use '_route_'  parameter with mentioning the particular shard. But issue facing still my clusterstate.json showing the "router":{"name":"compositeId"} is it means my settings not impacted? or its default.

Please do needful.

Regards,

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Friday, November 10, 2017 12:06 PM
To: solr-user
Subject: Re: How to routing document for send to particular shard range

You cannot just make configuration changes, whether you use implicit or compositeId is defined when you _create_ the collection and cannot be changed later.

You need to create a new collection and specify router.name=implicit when you create it. Then you can route documents as you desire.

I would caution against this though. If you use implicit routing _you_ have to insure balancing. For instance, you could have 10,000,000 documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards.

Implicit routing is particularly useful for time-series indexing, where you, say, index a day's worth of documents to each shard. It may be appropriate in your case, but so far you haven't told us _why_ you think routing docs to particular shards is desirable.

Best,
Erick

On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:

> Thanks Amrit,
>
> For suggesting me the approach.
>
> I have got some understanding regarding to it and i need to implement implicit routing for specific shard based. I have try by make changes on core.properties. but it can't work So can you please let me for the configuration changes needed. Is it need to create extra field for document to rout?
>
> I have below configuration Collection created manually:
> 1: Workset with 4 shard and 4 replica
> 2: Model with 4 shard and 4 replica
>
>
> For e.g Core.properties for 1 shard :
> Workset Colection:
> name=workset
> shard=shard1
> collection=workset
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
>
>
> So can u please let me the changes needed in configuration for the implicit routing.
>
> Please do needful.
>
> Regards,
>
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Wednesday, November 08, 2017 12:36 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard
> range
>
> Ketan,
>
> If you know defined indexing architecture; isn't it better to use "implicit" router by writing logic on your own end.
>
> If the document is of "Org1", send the document with extra param*
> "_route_:shard1"* and likewise.
>
> Snippet from official doc:
> https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-s
> olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> :
>
> If you created the collection and defined the "implicit" router at the
> time
>> of creation, you can additionally define a router.field parameter to
>> use a field from each document to identify a shard where the document
>> belongs. If the field specified is missing in the document, however,
>> the document will be rejected. You could also use the _route_
>> parameter to name a specific shard.
>
>
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:
>
>> Hi,
>>
>> I have requirement now quite different as I need to set routing key
>> hash for document which confirm it to send to particular shard as its range.
>>
>> I have solrcloud configuration with 4 shard  & 4 replica with below
>> shard range.
>> shard1: 80000000-bfffffff
>> shard2: c0000000-ffffffff
>> shard3: 0-3fffffff
>> shard4: 40000000-7fffffff
>>
>> e.g: below show the project  works in organization which is my routing key.
>> Org1= works for project1,project2
>> Org2=works for project3
>> Org3=works for project4
>> Org4=project5
>>
>> So as mentions above I want to index org1 to shard1,org2 to
>> shard2,org3 to
>> shard3,org4 to shard4 meanwhile send it to particular shard.
>> How could I manage compositeId routing to do this.
>>
>> Regards,
>> Ketan.
>> Please cast a vote for Asite in the 2017 Construction Computing Awards:
>> Click here to Vote<http://caddealer.com/concompawards/index.php?page=
>> cca2017vote>
>>
>> [CC Award Winners!]
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: How to routing document for send to particular shard range

Amrit Sarkar
Ketan,

here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with
> mentioning the particular shard. But issue facing still my
> clusterstate.json showing the "router":{"name":"compositeId"} is it means
> my settings not impacted? or its default.


Only answering this query, as Erick has already mentioned in the above
comment. You need to RECREATE the collection passinfg the "route.field" in
the "create collection" api parameters as "route.field" is
collection-specific property maintained at zookeeper (state.json /
clusterstate.json).

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-create

I highly recommend not to alter core.properties manually when dealing with
SolrCloud and instead relying on SolrCloud APIs to make necessary change.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:

> Hi Erik,
>
> My requirement to index the documents of particular organization to
> specific shard. Also I have made changes in core.properties as menions
> below.
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> Workset Collection:
> name=workset
> shard=shard1
> collection=workset
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with
> mentioning the particular shard. But issue facing still my
> clusterstate.json showing the "router":{"name":"compositeId"} is it means
> my settings not impacted? or its default.
>
> Please do needful.
>
> Regards,
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Friday, November 10, 2017 12:06 PM
> To: solr-user
> Subject: Re: How to routing document for send to particular shard range
>
> You cannot just make configuration changes, whether you use implicit or
> compositeId is defined when you _create_ the collection and cannot be
> changed later.
>
> You need to create a new collection and specify router.name=implicit when
> you create it. Then you can route documents as you desire.
>
> I would caution against this though. If you use implicit routing _you_
> have to insure balancing. For instance, you could have 10,000,000 documents
> for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards.
>
> Implicit routing is particularly useful for time-series indexing, where
> you, say, index a day's worth of documents to each shard. It may be
> appropriate in your case, but so far you haven't told us _why_ you think
> routing docs to particular shards is desirable.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
> > Thanks Amrit,
> >
> > For suggesting me the approach.
> >
> > I have got some understanding regarding to it and i need to implement
> implicit routing for specific shard based. I have try by make changes on
> core.properties. but it can't work So can you please let me for the
> configuration changes needed. Is it need to create extra field for document
> to rout?
> >
> > I have below configuration Collection created manually:
> > 1: Workset with 4 shard and 4 replica
> > 2: Model with 4 shard and 4 replica
> >
> >
> > For e.g Core.properties for 1 shard :
> > Workset Colection:
> > name=workset
> > shard=shard1
> > collection=workset
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> >
> >
> > So can u please let me the changes needed in configuration for the
> implicit routing.
> >
> > Please do needful.
> >
> > Regards,
> >
> >
> > -----Original Message-----
> > From: Amrit Sarkar [mailto:[hidden email]]
> > Sent: Wednesday, November 08, 2017 12:36 PM
> > To: [hidden email]
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > Ketan,
> >
> > If you know defined indexing architecture; isn't it better to use
> "implicit" router by writing logic on your own end.
> >
> > If the document is of "Org1", send the document with extra param*
> > "_route_:shard1"* and likewise.
> >
> > Snippet from official doc:
> > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-s
> > olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> > :
> >
> > If you created the collection and defined the "implicit" router at the
> > time
> >> of creation, you can additionally define a router.field parameter to
> >> use a field from each document to identify a shard where the document
> >> belongs. If the field specified is missing in the document, however,
> >> the document will be rejected. You could also use the _route_
> >> parameter to name a specific shard.
> >
> >
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I have requirement now quite different as I need to set routing key
> >> hash for document which confirm it to send to particular shard as its
> range.
> >>
> >> I have solrcloud configuration with 4 shard  & 4 replica with below
> >> shard range.
> >> shard1: 80000000-bfffffff
> >> shard2: c0000000-ffffffff
> >> shard3: 0-3fffffff
> >> shard4: 40000000-7fffffff
> >>
> >> e.g: below show the project  works in organization which is my routing
> key.
> >> Org1= works for project1,project2
> >> Org2=works for project3
> >> Org3=works for project4
> >> Org4=project5
> >>
> >> So as mentions above I want to index org1 to shard1,org2 to
> >> shard2,org3 to
> >> shard3,org4 to shard4 meanwhile send it to particular shard.
> >> How could I manage compositeId routing to do this.
> >>
> >> Regards,
> >> Ketan.
> >> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> >> Click here to Vote<http://caddealer.com/concompawards/index.php?page=
> >> cca2017vote>
> >>
> >> [CC Award Winners!]
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
Thanks Amrit,

I getting it know so can you please told me anyhow can I achieve using composite routing ?  as mentions my requirement below.

Because will need to send particular client data to particular shard.

Regards,


-----Original Message-----
From: Amrit Sarkar [mailto:[hidden email]]
Sent: Friday, November 10, 2017 5:34 PM
To: [hidden email]
Subject: Re: How to routing document for send to particular shard range

Ketan,

here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with
> mentioning the particular shard. But issue facing still my
> clusterstate.json showing the "router":{"name":"compositeId"} is it
> means my settings not impacted? or its default.


Only answering this query, as Erick has already mentioned in the above comment. You need to RECREATE the collection passinfg the "route.field" in the "create collection" api parameters as "route.field" is collection-specific property maintained at zookeeper (state.json / clusterstate.json).

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-create

I highly recommend not to alter core.properties manually when dealing with SolrCloud and instead relying on SolrCloud APIs to make necessary change.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:

> Hi Erik,
>
> My requirement to index the documents of particular organization to
> specific shard. Also I have made changes in core.properties as menions
> below.
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> Workset Collection:
> name=workset
> shard=shard1
> collection=workset
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> here I have also created new field 'core' which value is any shard
> where I need to send documents and on retrieval use '_route_'  
> parameter with mentioning the particular shard. But issue facing still
> my clusterstate.json showing the "router":{"name":"compositeId"} is it
> means my settings not impacted? or its default.
>
> Please do needful.
>
> Regards,
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Friday, November 10, 2017 12:06 PM
> To: solr-user
> Subject: Re: How to routing document for send to particular shard
> range
>
> You cannot just make configuration changes, whether you use implicit
> or compositeId is defined when you _create_ the collection and cannot
> be changed later.
>
> You need to create a new collection and specify router.name=implicit
> when you create it. Then you can route documents as you desire.
>
> I would caution against this though. If you use implicit routing _you_
> have to insure balancing. For instance, you could have 10,000,000
> documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards.
>
> Implicit routing is particularly useful for time-series indexing,
> where you, say, index a day's worth of documents to each shard. It may
> be appropriate in your case, but so far you haven't told us _why_ you
> think routing docs to particular shards is desirable.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
> > Thanks Amrit,
> >
> > For suggesting me the approach.
> >
> > I have got some understanding regarding to it and i need to
> > implement
> implicit routing for specific shard based. I have try by make changes
> on core.properties. but it can't work So can you please let me for the
> configuration changes needed. Is it need to create extra field for
> document to rout?
> >
> > I have below configuration Collection created manually:
> > 1: Workset with 4 shard and 4 replica
> > 2: Model with 4 shard and 4 replica
> >
> >
> > For e.g Core.properties for 1 shard :
> > Workset Colection:
> > name=workset
> > shard=shard1
> > collection=workset
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> >
> >
> > So can u please let me the changes needed in configuration for the
> implicit routing.
> >
> > Please do needful.
> >
> > Regards,
> >
> >
> > -----Original Message-----
> > From: Amrit Sarkar [mailto:[hidden email]]
> > Sent: Wednesday, November 08, 2017 12:36 PM
> > To: [hidden email]
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > Ketan,
> >
> > If you know defined indexing architecture; isn't it better to use
> "implicit" router by writing logic on your own end.
> >
> > If the document is of "Org1", send the document with extra param*
> > "_route_:shard1"* and likewise.
> >
> > Snippet from official doc:
> > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in
> > -s olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> > :
> >
> > If you created the collection and defined the "implicit" router at
> > the time
> >> of creation, you can additionally define a router.field parameter
> >> to use a field from each document to identify a shard where the
> >> document belongs. If the field specified is missing in the
> >> document, however, the document will be rejected. You could also
> >> use the _route_ parameter to name a specific shard.
> >
> >
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I have requirement now quite different as I need to set routing key
> >> hash for document which confirm it to send to particular shard as
> >> its
> range.
> >>
> >> I have solrcloud configuration with 4 shard  & 4 replica with below
> >> shard range.
> >> shard1: 80000000-bfffffff
> >> shard2: c0000000-ffffffff
> >> shard3: 0-3fffffff
> >> shard4: 40000000-7fffffff
> >>
> >> e.g: below show the project  works in organization which is my
> >> routing
> key.
> >> Org1= works for project1,project2
> >> Org2=works for project3
> >> Org3=works for project4
> >> Org4=project5
> >>
> >> So as mentions above I want to index org1 to shard1,org2 to
> >> shard2,org3 to
> >> shard3,org4 to shard4 meanwhile send it to particular shard.
> >> How could I manage compositeId routing to do this.
> >>
> >> Regards,
> >> Ketan.
> >> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> >> Click here to
> >> Vote<http://caddealer.com/concompawards/index.php?page=
> >> cca2017vote>
> >>
> >> [CC Award Winners!]
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
In reply to this post by Amrit Sarkar

Thanks Amrit,

My requirement to achieve best performance while using document routing facility in solr so regarding to it we need to index the particular client data into particular shard so if its  manageable than we will improve the performance as we need.

Please do needful.


Regards,


-----Original Message-----
From: Amrit Sarkar [mailto:[hidden email]]
Sent: Friday, November 10, 2017 5:34 PM
To: [hidden email]
Subject: Re: How to routing document for send to particular shard range

Ketan,

here I have also created new field 'core' which value is any shard where I
> need to send documents and on retrieval use '_route_'  parameter with
> mentioning the particular shard. But issue facing still my
> clusterstate.json showing the "router":{"name":"compositeId"} is it
> means my settings not impacted? or its default.


Only answering this query, as Erick has already mentioned in the above comment. You need to RECREATE the collection passinfg the "route.field" in the "create collection" api parameters as "route.field" is collection-specific property maintained at zookeeper (state.json / clusterstate.json).

https://lucene.apache.org/solr/guide/6_6/collections-api.html#CollectionsAPI-create

I highly recommend not to alter core.properties manually when dealing with SolrCloud and instead relying on SolrCloud APIs to make necessary change.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:

> Hi Erik,
>
> My requirement to index the documents of particular organization to
> specific shard. Also I have made changes in core.properties as menions
> below.
>
> Model Collection:
> name=model
> shard=shard1
> collection=model
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> Workset Collection:
> name=workset
> shard=shard1
> collection=workset
> router.name=implicit
> router.field=core
> shards=shard1,shard2
>
> here I have also created new field 'core' which value is any shard
> where I need to send documents and on retrieval use '_route_'
> parameter with mentioning the particular shard. But issue facing still
> my clusterstate.json showing the "router":{"name":"compositeId"} is it
> means my settings not impacted? or its default.
>
> Please do needful.
>
> Regards,
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Friday, November 10, 2017 12:06 PM
> To: solr-user
> Subject: Re: How to routing document for send to particular shard
> range
>
> You cannot just make configuration changes, whether you use implicit
> or compositeId is defined when you _create_ the collection and cannot
> be changed later.
>
> You need to create a new collection and specify router.name=implicit
> when you create it. Then you can route documents as you desire.
>
> I would caution against this though. If you use implicit routing _you_
> have to insure balancing. For instance, you could have 10,000,000
> documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced shards.
>
> Implicit routing is particularly useful for time-series indexing,
> where you, say, index a day's worth of documents to each shard. It may
> be appropriate in your case, but so far you haven't told us _why_ you
> think routing docs to particular shards is desirable.
>
> Best,
> Erick
>
> On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
> > Thanks Amrit,
> >
> > For suggesting me the approach.
> >
> > I have got some understanding regarding to it and i need to
> > implement
> implicit routing for specific shard based. I have try by make changes
> on core.properties. but it can't work So can you please let me for the
> configuration changes needed. Is it need to create extra field for
> document to rout?
> >
> > I have below configuration Collection created manually:
> > 1: Workset with 4 shard and 4 replica
> > 2: Model with 4 shard and 4 replica
> >
> >
> > For e.g Core.properties for 1 shard :
> > Workset Colection:
> > name=workset
> > shard=shard1
> > collection=workset
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> >
> >
> > So can u please let me the changes needed in configuration for the
> implicit routing.
> >
> > Please do needful.
> >
> > Regards,
> >
> >
> > -----Original Message-----
> > From: Amrit Sarkar [mailto:[hidden email]]
> > Sent: Wednesday, November 08, 2017 12:36 PM
> > To: [hidden email]
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > Ketan,
> >
> > If you know defined indexing architecture; isn't it better to use
> "implicit" router by writing logic on your own end.
> >
> > If the document is of "Org1", send the document with extra param*
> > "_route_:shard1"* and likewise.
> >
> > Snippet from official doc:
> > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in
> > -s olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> > :
> >
> > If you created the collection and defined the "implicit" router at
> > the time
> >> of creation, you can additionally define a router.field parameter
> >> to use a field from each document to identify a shard where the
> >> document belongs. If the field specified is missing in the
> >> document, however, the document will be rejected. You could also
> >> use the _route_ parameter to name a specific shard.
> >
> >
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > Medium: https://medium.com/@sarkaramrit2
> >
> > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]> wrote:
> >
> >> Hi,
> >>
> >> I have requirement now quite different as I need to set routing key
> >> hash for document which confirm it to send to particular shard as
> >> its
> range.
> >>
> >> I have solrcloud configuration with 4 shard  & 4 replica with below
> >> shard range.
> >> shard1: 80000000-bfffffff
> >> shard2: c0000000-ffffffff
> >> shard3: 0-3fffffff
> >> shard4: 40000000-7fffffff
> >>
> >> e.g: below show the project  works in organization which is my
> >> routing
> key.
> >> Org1= works for project1,project2
> >> Org2=works for project3
> >> Org3=works for project4
> >> Org4=project5
> >>
> >> So as mentions above I want to index org1 to shard1,org2 to
> >> shard2,org3 to
> >> shard3,org4 to shard4 meanwhile send it to particular shard.
> >> How could I manage compositeId routing to do this.
> >>
> >> Regards,
> >> Ketan.
> >> Please cast a vote for Asite in the 2017 Construction Computing Awards:
> >> Click here to
> >> Vote<http://caddealer.com/concompawards/index.php?page=
> >> cca2017vote>
> >>
> >> [CC Award Winners!]
> >>
> >>
>
Reply | Threaded
Open this post in threaded view
|

Re: How to routing document for send to particular shard range

Amrit Sarkar
Surely someone else can chim in;

but when you say: "so regarding to it we need to index the particular
> client data into particular shard so if its  manageable than we will
> improve the performance as we need"


You can / should create different collections for different client data, so
that you can for surely improve performance as per need. There are multiple
configurations which drives indexing and querying capabilities and
incorporating everything in single collection will hinder that flexibility.
Also if you need to add new client in future, you don't need to think about
sharding again, add new collection and tweak its configuration as per need.

Still if you need to use compositeKey to acheive your use-case, I am not
sure how to do that honestly. Since shards are predefined when collection
will be created. You cannot add more shards and such. You can only split a
shard, which will divide the index and hence the hash range. I will
strongly recommend you to reconsider your SolrCloud design technique for
your use-case.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki <[hidden email]> wrote:

>
> Thanks Amrit,
>
> My requirement to achieve best performance while using document routing
> facility in solr so regarding to it we need to index the particular client
> data into particular shard so if its  manageable than we will improve the
> performance as we need.
>
> Please do needful.
>
>
> Regards,
>
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Friday, November 10, 2017 5:34 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard range
>
> Ketan,
>
> here I have also created new field 'core' which value is any shard where I
> > need to send documents and on retrieval use '_route_'  parameter with
> > mentioning the particular shard. But issue facing still my
> > clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
>
>
> Only answering this query, as Erick has already mentioned in the above
> comment. You need to RECREATE the collection passinfg the "route.field" in
> the "create collection" api parameters as "route.field" is
> collection-specific property maintained at zookeeper (state.json /
> clusterstate.json).
>
> https://lucene.apache.org/solr/guide/6_6/collections-
> api.html#CollectionsAPI-create
>
> I highly recommend not to alter core.properties manually when dealing with
> SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:
>
> > Hi Erik,
> >
> > My requirement to index the documents of particular organization to
> > specific shard. Also I have made changes in core.properties as menions
> > below.
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > Workset Collection:
> > name=workset
> > shard=shard1
> > collection=workset
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > here I have also created new field 'core' which value is any shard
> > where I need to send documents and on retrieval use '_route_'
> > parameter with mentioning the particular shard. But issue facing still
> > my clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
> >
> > Please do needful.
> >
> > Regards,
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:[hidden email]]
> > Sent: Friday, November 10, 2017 12:06 PM
> > To: solr-user
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > You cannot just make configuration changes, whether you use implicit
> > or compositeId is defined when you _create_ the collection and cannot
> > be changed later.
> >
> > You need to create a new collection and specify router.name=implicit
> > when you create it. Then you can route documents as you desire.
> >
> > I would caution against this though. If you use implicit routing _you_
> > have to insure balancing. For instance, you could have 10,000,000
> > documents for "Org1" and 15 for "Org2", resulting in hugely unbalanced
> shards.
> >
> > Implicit routing is particularly useful for time-series indexing,
> > where you, say, index a day's worth of documents to each shard. It may
> > be appropriate in your case, but so far you haven't told us _why_ you
> > think routing docs to particular shards is desirable.
> >
> > Best,
> > Erick
> >
> > On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
> > > Thanks Amrit,
> > >
> > > For suggesting me the approach.
> > >
> > > I have got some understanding regarding to it and i need to
> > > implement
> > implicit routing for specific shard based. I have try by make changes
> > on core.properties. but it can't work So can you please let me for the
> > configuration changes needed. Is it need to create extra field for
> > document to rout?
> > >
> > > I have below configuration Collection created manually:
> > > 1: Workset with 4 shard and 4 replica
> > > 2: Model with 4 shard and 4 replica
> > >
> > >
> > > For e.g Core.properties for 1 shard :
> > > Workset Colection:
> > > name=workset
> > > shard=shard1
> > > collection=workset
> > >
> > > Model Collection:
> > > name=model
> > > shard=shard1
> > > collection=model
> > >
> > >
> > > So can u please let me the changes needed in configuration for the
> > implicit routing.
> > >
> > > Please do needful.
> > >
> > > Regards,
> > >
> > >
> > > -----Original Message-----
> > > From: Amrit Sarkar [mailto:[hidden email]]
> > > Sent: Wednesday, November 08, 2017 12:36 PM
> > > To: [hidden email]
> > > Subject: Re: How to routing document for send to particular shard
> > > range
> > >
> > > Ketan,
> > >
> > > If you know defined indexing architecture; isn't it better to use
> > "implicit" router by writing logic on your own end.
> > >
> > > If the document is of "Org1", send the document with extra param*
> > > "_route_:shard1"* and likewise.
> > >
> > > Snippet from official doc:
> > > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in
> > > -s olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> > > :
> > >
> > > If you created the collection and defined the "implicit" router at
> > > the time
> > >> of creation, you can additionally define a router.field parameter
> > >> to use a field from each document to identify a shard where the
> > >> document belongs. If the field specified is missing in the
> > >> document, however, the document will be rejected. You could also
> > >> use the _route_ parameter to name a specific shard.
> > >
> > >
> > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > > Medium: https://medium.com/@sarkaramrit2
> > >
> > > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have requirement now quite different as I need to set routing key
> > >> hash for document which confirm it to send to particular shard as
> > >> its
> > range.
> > >>
> > >> I have solrcloud configuration with 4 shard  & 4 replica with below
> > >> shard range.
> > >> shard1: 80000000-bfffffff
> > >> shard2: c0000000-ffffffff
> > >> shard3: 0-3fffffff
> > >> shard4: 40000000-7fffffff
> > >>
> > >> e.g: below show the project  works in organization which is my
> > >> routing
> > key.
> > >> Org1= works for project1,project2
> > >> Org2=works for project3
> > >> Org3=works for project4
> > >> Org4=project5
> > >>
> > >> So as mentions above I want to index org1 to shard1,org2 to
> > >> shard2,org3 to
> > >> shard3,org4 to shard4 meanwhile send it to particular shard.
> > >> How could I manage compositeId routing to do this.
> > >>
> > >> Regards,
> > >> Ketan.
> > >> Please cast a vote for Asite in the 2017 Construction Computing
> Awards:
> > >> Click here to
> > >> Vote<http://caddealer.com/concompawards/index.php?page=
> > >> cca2017vote>
> > >>
> > >> [CC Award Winners!]
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
Thanks Amrit ,

Actually we have huge amount of data so that's why thinking to index data into particular shard accept it's looks difficult but need to achieve the performance using document routing for huge data.

With configuration of  4 shard and 4 replica  is it better to distribute the one project data in multiple shard or in one shard which one is  feasible using document routing because needs the best performance while insertion & retrieval of document. And there would be the multiple projects of client which has huge amount of data.

I also taken the reading with 4 shard and 4 replica where without routing data are distribute among all 4 shard  and with routing its distributes in 1 shard because of used 1 bit of shard key like projectId/1!DocumentId.my reading looks as below.
1:inserting data in 4 shard without document routing  time taken ( in millisecond)  =325108  
Inserting data in 1 shard with document routing time time taken ( in millisecond)  =251835

2: retrieving data from 4 shard without document routing time taken( in millisecond)  = 234242
And retrieving data from 1 shard with document routing time taken ( in millisecond)= 94562

As per above reading getting  performance in local  while data in 1 shard but in production there will be huge data so is it need to distribute in 2 shard or in 1 shard which one is feasible for achieve better performance.
 

Regards,
Ketan

-----Original Message-----
From: Amrit Sarkar [mailto:[hidden email]]
Sent: Monday, November 13, 2017 8:52 PM
To: [hidden email]
Subject: Re: How to routing document for send to particular shard range

Surely someone else can chim in;

but when you say: "so regarding to it we need to index the particular
> client data into particular shard so if its  manageable than we will
> improve the performance as we need"


You can / should create different collections for different client data, so that you can for surely improve performance as per need. There are multiple configurations which drives indexing and querying capabilities and incorporating everything in single collection will hinder that flexibility.
Also if you need to add new client in future, you don't need to think about sharding again, add new collection and tweak its configuration as per need.

Still if you need to use compositeKey to acheive your use-case, I am not sure how to do that honestly. Since shards are predefined when collection will be created. You cannot add more shards and such. You can only split a shard, which will divide the index and hence the hash range. I will strongly recommend you to reconsider your SolrCloud design technique for your use-case.

Amrit Sarkar
Search Engineer
Lucidworks, Inc.
415-589-9269
www.lucidworks.com
Twitter http://twitter.com/lucidworks
LinkedIn: https://www.linkedin.com/in/sarkaramrit2
Medium: https://medium.com/@sarkaramrit2

On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki <[hidden email]> wrote:

>
> Thanks Amrit,
>
> My requirement to achieve best performance while using document
> routing facility in solr so regarding to it we need to index the
> particular client data into particular shard so if its  manageable
> than we will improve the performance as we need.
>
> Please do needful.
>
>
> Regards,
>
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Friday, November 10, 2017 5:34 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard
> range
>
> Ketan,
>
> here I have also created new field 'core' which value is any shard
> where I
> > need to send documents and on retrieval use '_route_'  parameter
> > with mentioning the particular shard. But issue facing still my
> > clusterstate.json showing the "router":{"name":"compositeId"} is it
> > means my settings not impacted? or its default.
>
>
> Only answering this query, as Erick has already mentioned in the above
> comment. You need to RECREATE the collection passinfg the
> "route.field" in the "create collection" api parameters as
> "route.field" is collection-specific property maintained at zookeeper
> (state.json / clusterstate.json).
>
> https://lucene.apache.org/solr/guide/6_6/collections-
> api.html#CollectionsAPI-create
>
> I highly recommend not to alter core.properties manually when dealing
> with SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:
>
> > Hi Erik,
> >
> > My requirement to index the documents of particular organization to
> > specific shard. Also I have made changes in core.properties as
> > menions below.
> >
> > Model Collection:
> > name=model
> > shard=shard1
> > collection=model
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > Workset Collection:
> > name=workset
> > shard=shard1
> > collection=workset
> > router.name=implicit
> > router.field=core
> > shards=shard1,shard2
> >
> > here I have also created new field 'core' which value is any shard
> > where I need to send documents and on retrieval use '_route_'
> > parameter with mentioning the particular shard. But issue facing
> > still my clusterstate.json showing the
> > "router":{"name":"compositeId"} is it means my settings not impacted? or its default.
> >
> > Please do needful.
> >
> > Regards,
> >
> > -----Original Message-----
> > From: Erick Erickson [mailto:[hidden email]]
> > Sent: Friday, November 10, 2017 12:06 PM
> > To: solr-user
> > Subject: Re: How to routing document for send to particular shard
> > range
> >
> > You cannot just make configuration changes, whether you use implicit
> > or compositeId is defined when you _create_ the collection and
> > cannot be changed later.
> >
> > You need to create a new collection and specify router.name=implicit
> > when you create it. Then you can route documents as you desire.
> >
> > I would caution against this though. If you use implicit routing
> > _you_ have to insure balancing. For instance, you could have
> > 10,000,000 documents for "Org1" and 15 for "Org2", resulting in
> > hugely unbalanced
> shards.
> >
> > Implicit routing is particularly useful for time-series indexing,
> > where you, say, index a day's worth of documents to each shard. It
> > may be appropriate in your case, but so far you haven't told us
> > _why_ you think routing docs to particular shards is desirable.
> >
> > Best,
> > Erick
> >
> > On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
> > > Thanks Amrit,
> > >
> > > For suggesting me the approach.
> > >
> > > I have got some understanding regarding to it and i need to
> > > implement
> > implicit routing for specific shard based. I have try by make
> > changes on core.properties. but it can't work So can you please let
> > me for the configuration changes needed. Is it need to create extra
> > field for document to rout?
> > >
> > > I have below configuration Collection created manually:
> > > 1: Workset with 4 shard and 4 replica
> > > 2: Model with 4 shard and 4 replica
> > >
> > >
> > > For e.g Core.properties for 1 shard :
> > > Workset Colection:
> > > name=workset
> > > shard=shard1
> > > collection=workset
> > >
> > > Model Collection:
> > > name=model
> > > shard=shard1
> > > collection=model
> > >
> > >
> > > So can u please let me the changes needed in configuration for the
> > implicit routing.
> > >
> > > Please do needful.
> > >
> > > Regards,
> > >
> > >
> > > -----Original Message-----
> > > From: Amrit Sarkar [mailto:[hidden email]]
> > > Sent: Wednesday, November 08, 2017 12:36 PM
> > > To: [hidden email]
> > > Subject: Re: How to routing document for send to particular shard
> > > range
> > >
> > > Ketan,
> > >
> > > If you know defined indexing architecture; isn't it better to use
> > "implicit" router by writing logic on your own end.
> > >
> > > If the document is of "Org1", send the document with extra param*
> > > "_route_:shard1"* and likewise.
> > >
> > > Snippet from official doc:
> > > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-
> > > in -s
> > > olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
> > > :
> > >
> > > If you created the collection and defined the "implicit" router at
> > > the time
> > >> of creation, you can additionally define a router.field parameter
> > >> to use a field from each document to identify a shard where the
> > >> document belongs. If the field specified is missing in the
> > >> document, however, the document will be rejected. You could also
> > >> use the _route_ parameter to name a specific shard.
> > >
> > >
> > >
> > > Amrit Sarkar
> > > Search Engineer
> > > Lucidworks, Inc.
> > > 415-589-9269
> > > www.lucidworks.com
> > > Twitter http://twitter.com/lucidworks
> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> > > Medium: https://medium.com/@sarkaramrit2
> > >
> > > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]>
> wrote:
> > >
> > >> Hi,
> > >>
> > >> I have requirement now quite different as I need to set routing
> > >> key hash for document which confirm it to send to particular
> > >> shard as its
> > range.
> > >>
> > >> I have solrcloud configuration with 4 shard  & 4 replica with
> > >> below shard range.
> > >> shard1: 80000000-bfffffff
> > >> shard2: c0000000-ffffffff
> > >> shard3: 0-3fffffff
> > >> shard4: 40000000-7fffffff
> > >>
> > >> e.g: below show the project  works in organization which is my
> > >> routing
> > key.
> > >> Org1= works for project1,project2 Org2=works for project3
> > >> Org3=works for project4
> > >> Org4=project5
> > >>
> > >> So as mentions above I want to index org1 to shard1,org2 to
> > >> shard2,org3 to
> > >> shard3,org4 to shard4 meanwhile send it to particular shard.
> > >> How could I manage compositeId routing to do this.
> > >>
> > >> Regards,
> > >> Ketan.
> > >> Please cast a vote for Asite in the 2017 Construction Computing
> Awards:
> > >> Click here to
> > >> Vote<http://caddealer.com/concompawards/index.php?page=
> > >> cca2017vote>
> > >>
> > >> [CC Award Winners!]
> > >>
> > >>
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: How to routing document for send to particular shard range

Erick Erickson
These numbers aren't very useful. inserting how much data? Querying
how much data? What kinds of queries? Are you indexing in batches or
one document at a time? Are you using SolrJ and CloudSolrClient?

94 seconds to do _what_? Execute 1,000 queries? Fetch all the
documents from the shard? Execute one query?

What is "a huge amount of data"? I've seen 300M documents fit on one
shard. I've seen people claim 1M documents is "huge".

Details matter. You might review:

https://wiki.apache.org/solr/UsingMailingLists

1:inserting data in 4 shard without document routing  time taken ( in
millisecond)  =325108
Inserting data in 1 shard with document routing time time taken ( in
millisecond)  =251835

2: retrieving data from 4 shard without document routing time taken(
in millisecond)  = 234242
And retrieving data from 1 shard with document routing time taken ( in
millisecond)= 94562

Best,
Erick

On Tue, Nov 14, 2017 at 6:50 AM, Ketan Thanki <[hidden email]> wrote:

> Thanks Amrit ,
>
> Actually we have huge amount of data so that's why thinking to index data into particular shard accept it's looks difficult but need to achieve the performance using document routing for huge data.
>
> With configuration of  4 shard and 4 replica  is it better to distribute the one project data in multiple shard or in one shard which one is  feasible using document routing because needs the best performance while insertion & retrieval of document. And there would be the multiple projects of client which has huge amount of data.
>
> I also taken the reading with 4 shard and 4 replica where without routing data are distribute among all 4 shard  and with routing its distributes in 1 shard because of used 1 bit of shard key like projectId/1!DocumentId.my reading looks as below.
> 1:inserting data in 4 shard without document routing  time taken ( in millisecond)  =325108
> Inserting data in 1 shard with document routing time time taken ( in millisecond)  =251835
>
> 2: retrieving data from 4 shard without document routing time taken( in millisecond)  = 234242
> And retrieving data from 1 shard with document routing time taken ( in millisecond)= 94562
>
> As per above reading getting  performance in local  while data in 1 shard but in production there will be huge data so is it need to distribute in 2 shard or in 1 shard which one is feasible for achieve better performance.
>
>
> Regards,
> Ketan
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Monday, November 13, 2017 8:52 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard range
>
> Surely someone else can chim in;
>
> but when you say: "so regarding to it we need to index the particular
>> client data into particular shard so if its  manageable than we will
>> improve the performance as we need"
>
>
> You can / should create different collections for different client data, so that you can for surely improve performance as per need. There are multiple configurations which drives indexing and querying capabilities and incorporating everything in single collection will hinder that flexibility.
> Also if you need to add new client in future, you don't need to think about sharding again, add new collection and tweak its configuration as per need.
>
> Still if you need to use compositeKey to acheive your use-case, I am not sure how to do that honestly. Since shards are predefined when collection will be created. You cannot add more shards and such. You can only split a shard, which will divide the index and hence the hash range. I will strongly recommend you to reconsider your SolrCloud design technique for your use-case.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki <[hidden email]> wrote:
>
>>
>> Thanks Amrit,
>>
>> My requirement to achieve best performance while using document
>> routing facility in solr so regarding to it we need to index the
>> particular client data into particular shard so if its  manageable
>> than we will improve the performance as we need.
>>
>> Please do needful.
>>
>>
>> Regards,
>>
>>
>> -----Original Message-----
>> From: Amrit Sarkar [mailto:[hidden email]]
>> Sent: Friday, November 10, 2017 5:34 PM
>> To: [hidden email]
>> Subject: Re: How to routing document for send to particular shard
>> range
>>
>> Ketan,
>>
>> here I have also created new field 'core' which value is any shard
>> where I
>> > need to send documents and on retrieval use '_route_'  parameter
>> > with mentioning the particular shard. But issue facing still my
>> > clusterstate.json showing the "router":{"name":"compositeId"} is it
>> > means my settings not impacted? or its default.
>>
>>
>> Only answering this query, as Erick has already mentioned in the above
>> comment. You need to RECREATE the collection passinfg the
>> "route.field" in the "create collection" api parameters as
>> "route.field" is collection-specific property maintained at zookeeper
>> (state.json / clusterstate.json).
>>
>> https://lucene.apache.org/solr/guide/6_6/collections-
>> api.html#CollectionsAPI-create
>>
>> I highly recommend not to alter core.properties manually when dealing
>> with SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:
>>
>> > Hi Erik,
>> >
>> > My requirement to index the documents of particular organization to
>> > specific shard. Also I have made changes in core.properties as
>> > menions below.
>> >
>> > Model Collection:
>> > name=model
>> > shard=shard1
>> > collection=model
>> > router.name=implicit
>> > router.field=core
>> > shards=shard1,shard2
>> >
>> > Workset Collection:
>> > name=workset
>> > shard=shard1
>> > collection=workset
>> > router.name=implicit
>> > router.field=core
>> > shards=shard1,shard2
>> >
>> > here I have also created new field 'core' which value is any shard
>> > where I need to send documents and on retrieval use '_route_'
>> > parameter with mentioning the particular shard. But issue facing
>> > still my clusterstate.json showing the
>> > "router":{"name":"compositeId"} is it means my settings not impacted? or its default.
>> >
>> > Please do needful.
>> >
>> > Regards,
>> >
>> > -----Original Message-----
>> > From: Erick Erickson [mailto:[hidden email]]
>> > Sent: Friday, November 10, 2017 12:06 PM
>> > To: solr-user
>> > Subject: Re: How to routing document for send to particular shard
>> > range
>> >
>> > You cannot just make configuration changes, whether you use implicit
>> > or compositeId is defined when you _create_ the collection and
>> > cannot be changed later.
>> >
>> > You need to create a new collection and specify router.name=implicit
>> > when you create it. Then you can route documents as you desire.
>> >
>> > I would caution against this though. If you use implicit routing
>> > _you_ have to insure balancing. For instance, you could have
>> > 10,000,000 documents for "Org1" and 15 for "Org2", resulting in
>> > hugely unbalanced
>> shards.
>> >
>> > Implicit routing is particularly useful for time-series indexing,
>> > where you, say, index a day's worth of documents to each shard. It
>> > may be appropriate in your case, but so far you haven't told us
>> > _why_ you think routing docs to particular shards is desirable.
>> >
>> > Best,
>> > Erick
>> >
>> > On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
>> > > Thanks Amrit,
>> > >
>> > > For suggesting me the approach.
>> > >
>> > > I have got some understanding regarding to it and i need to
>> > > implement
>> > implicit routing for specific shard based. I have try by make
>> > changes on core.properties. but it can't work So can you please let
>> > me for the configuration changes needed. Is it need to create extra
>> > field for document to rout?
>> > >
>> > > I have below configuration Collection created manually:
>> > > 1: Workset with 4 shard and 4 replica
>> > > 2: Model with 4 shard and 4 replica
>> > >
>> > >
>> > > For e.g Core.properties for 1 shard :
>> > > Workset Colection:
>> > > name=workset
>> > > shard=shard1
>> > > collection=workset
>> > >
>> > > Model Collection:
>> > > name=model
>> > > shard=shard1
>> > > collection=model
>> > >
>> > >
>> > > So can u please let me the changes needed in configuration for the
>> > implicit routing.
>> > >
>> > > Please do needful.
>> > >
>> > > Regards,
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Amrit Sarkar [mailto:[hidden email]]
>> > > Sent: Wednesday, November 08, 2017 12:36 PM
>> > > To: [hidden email]
>> > > Subject: Re: How to routing document for send to particular shard
>> > > range
>> > >
>> > > Ketan,
>> > >
>> > > If you know defined indexing architecture; isn't it better to use
>> > "implicit" router by writing logic on your own end.
>> > >
>> > > If the document is of "Org1", send the document with extra param*
>> > > "_route_:shard1"* and likewise.
>> > >
>> > > Snippet from official doc:
>> > > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-
>> > > in -s
>> > > olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
>> > > :
>> > >
>> > > If you created the collection and defined the "implicit" router at
>> > > the time
>> > >> of creation, you can additionally define a router.field parameter
>> > >> to use a field from each document to identify a shard where the
>> > >> document belongs. If the field specified is missing in the
>> > >> document, however, the document will be rejected. You could also
>> > >> use the _route_ parameter to name a specific shard.
>> > >
>> > >
>> > >
>> > > Amrit Sarkar
>> > > Search Engineer
>> > > Lucidworks, Inc.
>> > > 415-589-9269
>> > > www.lucidworks.com
>> > > Twitter http://twitter.com/lucidworks
>> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > > Medium: https://medium.com/@sarkaramrit2
>> > >
>> > > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]>
>> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I have requirement now quite different as I need to set routing
>> > >> key hash for document which confirm it to send to particular
>> > >> shard as its
>> > range.
>> > >>
>> > >> I have solrcloud configuration with 4 shard  & 4 replica with
>> > >> below shard range.
>> > >> shard1: 80000000-bfffffff
>> > >> shard2: c0000000-ffffffff
>> > >> shard3: 0-3fffffff
>> > >> shard4: 40000000-7fffffff
>> > >>
>> > >> e.g: below show the project  works in organization which is my
>> > >> routing
>> > key.
>> > >> Org1= works for project1,project2 Org2=works for project3
>> > >> Org3=works for project4
>> > >> Org4=project5
>> > >>
>> > >> So as mentions above I want to index org1 to shard1,org2 to
>> > >> shard2,org3 to
>> > >> shard3,org4 to shard4 meanwhile send it to particular shard.
>> > >> How could I manage compositeId routing to do this.
>> > >>
>> > >> Regards,
>> > >> Ketan.
>> > >> Please cast a vote for Asite in the 2017 Construction Computing
>> Awards:
>> > >> Click here to
>> > >> Vote<http://caddealer.com/concompawards/index.php?page=
>> > >> cca2017vote>
>> > >>
>> > >> [CC Award Winners!]
>> > >>
>> > >>
>> >
>>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
Thanks Erik,

I have re-mentions it as some required details are missing in my mail. Using CloudSolrClient my test case as below.
I have used routing key as projectId/2!documentId
 
1: Detail of Insert Document in SolrIndex
Document Size: 919551
Document Batch Size for insert: 5000 documents in each thread
Total thread to Index: 184 (thread pool size 5)
With Routing: 251835 millisecond taken to index documents in index (Data index in 1 shard)
Without Routing: 325108 millisecond taken to index documents in index (Data index in 4 shard)

2: Detail of search document/Retrieve from SolrIndex
Document Size: 919551
Document Batch Size for retrieve: 10000 (per query 10000 document id used for search)
Total thread to Search: 93 (thread pool size 3)
Total Queries call: 93
With Routing: 94562 millisecond taken to search documents from index (Data in 1 shard)
Without Routing: 234242 millisecond taken to search documents From index (Data in 4 shard)

Retrieval query with parameter used fq & fl also.
 
So above shows my one model data which belongs to one project (also project is consisting of many models).
While refer above use-case my data in single shard which gives improvements with routing facility.
but when considering the production environment  If I have 800M documents (it may be more) in each of my 4 shard and considering above case so what would be better from below.
- to distribute project data in two shard
- to distribute project data in single shard

is there any documents limit for shard? And if my document  grow more than above mentions so is it causes any issue?

Please do needful.

Regards,
Ketan







 

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Tuesday, November 14, 2017 9:33 PM
To: solr-user
Subject: Re: How to routing document for send to particular shard range

These numbers aren't very useful. inserting how much data? Querying how much data? What kinds of queries? Are you indexing in batches or one document at a time? Are you using SolrJ and CloudSolrClient?

94 seconds to do _what_? Execute 1,000 queries? Fetch all the documents from the shard? Execute one query?

What is "a huge amount of data"? I've seen 300M documents fit on one shard. I've seen people claim 1M documents is "huge".

Details matter. You might review:

https://wiki.apache.org/solr/UsingMailingLists

1:inserting data in 4 shard without document routing  time taken ( in
millisecond)  =325108
Inserting data in 1 shard with document routing time time taken ( in
millisecond)  =251835

2: retrieving data from 4 shard without document routing time taken( in millisecond)  = 234242 And retrieving data from 1 shard with document routing time taken ( in millisecond)= 94562

Best,
Erick

On Tue, Nov 14, 2017 at 6:50 AM, Ketan Thanki <[hidden email]> wrote:

> Thanks Amrit ,
>
> Actually we have huge amount of data so that's why thinking to index data into particular shard accept it's looks difficult but need to achieve the performance using document routing for huge data.
>
> With configuration of  4 shard and 4 replica  is it better to distribute the one project data in multiple shard or in one shard which one is  feasible using document routing because needs the best performance while insertion & retrieval of document. And there would be the multiple projects of client which has huge amount of data.
>
> I also taken the reading with 4 shard and 4 replica where without routing data are distribute among all 4 shard  and with routing its distributes in 1 shard because of used 1 bit of shard key like projectId/1!DocumentId.my reading looks as below.
> 1:inserting data in 4 shard without document routing  time taken ( in
> millisecond)  =325108 Inserting data in 1 shard with document routing
> time time taken ( in millisecond)  =251835
>
> 2: retrieving data from 4 shard without document routing time taken(
> in millisecond)  = 234242 And retrieving data from 1 shard with
> document routing time taken ( in millisecond)= 94562
>
> As per above reading getting  performance in local  while data in 1 shard but in production there will be huge data so is it need to distribute in 2 shard or in 1 shard which one is feasible for achieve better performance.
>
>
> Regards,
> Ketan
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Monday, November 13, 2017 8:52 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard
> range
>
> Surely someone else can chim in;
>
> but when you say: "so regarding to it we need to index the particular
>> client data into particular shard so if its  manageable than we will
>> improve the performance as we need"
>
>
> You can / should create different collections for different client data, so that you can for surely improve performance as per need. There are multiple configurations which drives indexing and querying capabilities and incorporating everything in single collection will hinder that flexibility.
> Also if you need to add new client in future, you don't need to think about sharding again, add new collection and tweak its configuration as per need.
>
> Still if you need to use compositeKey to acheive your use-case, I am not sure how to do that honestly. Since shards are predefined when collection will be created. You cannot add more shards and such. You can only split a shard, which will divide the index and hence the hash range. I will strongly recommend you to reconsider your SolrCloud design technique for your use-case.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki <[hidden email]> wrote:
>
>>
>> Thanks Amrit,
>>
>> My requirement to achieve best performance while using document
>> routing facility in solr so regarding to it we need to index the
>> particular client data into particular shard so if its  manageable
>> than we will improve the performance as we need.
>>
>> Please do needful.
>>
>>
>> Regards,
>>
>>
>> -----Original Message-----
>> From: Amrit Sarkar [mailto:[hidden email]]
>> Sent: Friday, November 10, 2017 5:34 PM
>> To: [hidden email]
>> Subject: Re: How to routing document for send to particular shard
>> range
>>
>> Ketan,
>>
>> here I have also created new field 'core' which value is any shard
>> where I
>> > need to send documents and on retrieval use '_route_'  parameter
>> > with mentioning the particular shard. But issue facing still my
>> > clusterstate.json showing the "router":{"name":"compositeId"} is it
>> > means my settings not impacted? or its default.
>>
>>
>> Only answering this query, as Erick has already mentioned in the
>> above comment. You need to RECREATE the collection passinfg the
>> "route.field" in the "create collection" api parameters as
>> "route.field" is collection-specific property maintained at zookeeper
>> (state.json / clusterstate.json).
>>
>> https://lucene.apache.org/solr/guide/6_6/collections-
>> api.html#CollectionsAPI-create
>>
>> I highly recommend not to alter core.properties manually when dealing
>> with SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:
>>
>> > Hi Erik,
>> >
>> > My requirement to index the documents of particular organization to
>> > specific shard. Also I have made changes in core.properties as
>> > menions below.
>> >
>> > Model Collection:
>> > name=model
>> > shard=shard1
>> > collection=model
>> > router.name=implicit
>> > router.field=core
>> > shards=shard1,shard2
>> >
>> > Workset Collection:
>> > name=workset
>> > shard=shard1
>> > collection=workset
>> > router.name=implicit
>> > router.field=core
>> > shards=shard1,shard2
>> >
>> > here I have also created new field 'core' which value is any shard
>> > where I need to send documents and on retrieval use '_route_'
>> > parameter with mentioning the particular shard. But issue facing
>> > still my clusterstate.json showing the
>> > "router":{"name":"compositeId"} is it means my settings not impacted? or its default.
>> >
>> > Please do needful.
>> >
>> > Regards,
>> >
>> > -----Original Message-----
>> > From: Erick Erickson [mailto:[hidden email]]
>> > Sent: Friday, November 10, 2017 12:06 PM
>> > To: solr-user
>> > Subject: Re: How to routing document for send to particular shard
>> > range
>> >
>> > You cannot just make configuration changes, whether you use
>> > implicit or compositeId is defined when you _create_ the collection
>> > and cannot be changed later.
>> >
>> > You need to create a new collection and specify
>> > router.name=implicit when you create it. Then you can route documents as you desire.
>> >
>> > I would caution against this though. If you use implicit routing
>> > _you_ have to insure balancing. For instance, you could have
>> > 10,000,000 documents for "Org1" and 15 for "Org2", resulting in
>> > hugely unbalanced
>> shards.
>> >
>> > Implicit routing is particularly useful for time-series indexing,
>> > where you, say, index a day's worth of documents to each shard. It
>> > may be appropriate in your case, but so far you haven't told us
>> > _why_ you think routing docs to particular shards is desirable.
>> >
>> > Best,
>> > Erick
>> >
>> > On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
>> > > Thanks Amrit,
>> > >
>> > > For suggesting me the approach.
>> > >
>> > > I have got some understanding regarding to it and i need to
>> > > implement
>> > implicit routing for specific shard based. I have try by make
>> > changes on core.properties. but it can't work So can you please let
>> > me for the configuration changes needed. Is it need to create extra
>> > field for document to rout?
>> > >
>> > > I have below configuration Collection created manually:
>> > > 1: Workset with 4 shard and 4 replica
>> > > 2: Model with 4 shard and 4 replica
>> > >
>> > >
>> > > For e.g Core.properties for 1 shard :
>> > > Workset Colection:
>> > > name=workset
>> > > shard=shard1
>> > > collection=workset
>> > >
>> > > Model Collection:
>> > > name=model
>> > > shard=shard1
>> > > collection=model
>> > >
>> > >
>> > > So can u please let me the changes needed in configuration for
>> > > the
>> > implicit routing.
>> > >
>> > > Please do needful.
>> > >
>> > > Regards,
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Amrit Sarkar [mailto:[hidden email]]
>> > > Sent: Wednesday, November 08, 2017 12:36 PM
>> > > To: [hidden email]
>> > > Subject: Re: How to routing document for send to particular shard
>> > > range
>> > >
>> > > Ketan,
>> > >
>> > > If you know defined indexing architecture; isn't it better to use
>> > "implicit" router by writing logic on your own end.
>> > >
>> > > If the document is of "Org1", send the document with extra param*
>> > > "_route_:shard1"* and likewise.
>> > >
>> > > Snippet from official doc:
>> > > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data
>> > > -
>> > > in -s
>> > > olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
>> > > :
>> > >
>> > > If you created the collection and defined the "implicit" router
>> > > at the time
>> > >> of creation, you can additionally define a router.field
>> > >> parameter to use a field from each document to identify a shard
>> > >> where the document belongs. If the field specified is missing in
>> > >> the document, however, the document will be rejected. You could
>> > >> also use the _route_ parameter to name a specific shard.
>> > >
>> > >
>> > >
>> > > Amrit Sarkar
>> > > Search Engineer
>> > > Lucidworks, Inc.
>> > > 415-589-9269
>> > > www.lucidworks.com
>> > > Twitter http://twitter.com/lucidworks
>> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > > Medium: https://medium.com/@sarkaramrit2
>> > >
>> > > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]>
>> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I have requirement now quite different as I need to set routing
>> > >> key hash for document which confirm it to send to particular
>> > >> shard as its
>> > range.
>> > >>
>> > >> I have solrcloud configuration with 4 shard  & 4 replica with
>> > >> below shard range.
>> > >> shard1: 80000000-bfffffff
>> > >> shard2: c0000000-ffffffff
>> > >> shard3: 0-3fffffff
>> > >> shard4: 40000000-7fffffff
>> > >>
>> > >> e.g: below show the project  works in organization which is my
>> > >> routing
>> > key.
>> > >> Org1= works for project1,project2 Org2=works for project3
>> > >> Org3=works for project4
>> > >> Org4=project5
>> > >>
>> > >> So as mentions above I want to index org1 to shard1,org2 to
>> > >> shard2,org3 to
>> > >> shard3,org4 to shard4 meanwhile send it to particular shard.
>> > >> How could I manage compositeId routing to do this.
>> > >>
>> > >> Regards,
>> > >> Ketan.
>> > >> Please cast a vote for Asite in the 2017 Construction Computing
>> Awards:
>> > >> Click here to
>> > >> Vote<http://caddealer.com/concompawards/index.php?page=
>> > >> cca2017vote>
>> > >>
>> > >> [CC Award Winners!]
>> > >>
>> > >>
>> >
>>
Reply | Threaded
Open this post in threaded view
|

RE: How to routing document for send to particular shard range

Ketan Thanki
In reply to this post by Erick Erickson
Hi,

Can  someone please let me for my below mentions use-case.

Regards,
Ketan

-----Original Message-----
From: Ketan Thanki
Sent: Wednesday, November 15, 2017 3:42 PM
To: '[hidden email]'
Subject: RE: How to routing document for send to particular shard range

Thanks Erik,

I have re-mentions it as some required details are missing in my mail. Using CloudSolrClient my test case as below.
I have used routing key as projectId/2!documentId
 
1: Detail of Insert Document in SolrIndex Document Size: 919551 Document Batch Size for insert: 5000 documents in each thread Total thread to Index: 184 (thread pool size 5) With Routing: 251835 millisecond taken to index documents in index (Data index in 1 shard) Without Routing: 325108 millisecond taken to index documents in index (Data index in 4 shard)

2: Detail of search document/Retrieve from SolrIndex Document Size: 919551 Document Batch Size for retrieve: 10000 (per query 10000 document id used for search) Total thread to Search: 93 (thread pool size 3) Total Queries call: 93 With Routing: 94562 millisecond taken to search documents from index (Data in 1 shard) Without Routing: 234242 millisecond taken to search documents From index (Data in 4 shard)

Retrieval query with parameter used fq & fl also.
 
So above shows my one model data which belongs to one project (also project is consisting of many models).
While refer above use-case my data in single shard which gives improvements with routing facility.
but when considering the production environment  If I have 800M documents (it may be more) in each of my 4 shard and considering above case so what would be better from below.
- to distribute project data in two shard
- to distribute project data in single shard

is there any documents limit for shard? And if my document  grow more than above mentions so is it causes any issue?

Please do needful.

Regards,
Ketan







 

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Tuesday, November 14, 2017 9:33 PM
To: solr-user
Subject: Re: How to routing document for send to particular shard range

These numbers aren't very useful. inserting how much data? Querying how much data? What kinds of queries? Are you indexing in batches or one document at a time? Are you using SolrJ and CloudSolrClient?

94 seconds to do _what_? Execute 1,000 queries? Fetch all the documents from the shard? Execute one query?

What is "a huge amount of data"? I've seen 300M documents fit on one shard. I've seen people claim 1M documents is "huge".

Details matter. You might review:

https://wiki.apache.org/solr/UsingMailingLists

1:inserting data in 4 shard without document routing  time taken ( in
millisecond)  =325108
Inserting data in 1 shard with document routing time time taken ( in
millisecond)  =251835

2: retrieving data from 4 shard without document routing time taken( in millisecond)  = 234242 And retrieving data from 1 shard with document routing time taken ( in millisecond)= 94562

Best,
Erick

On Tue, Nov 14, 2017 at 6:50 AM, Ketan Thanki <[hidden email]> wrote:

> Thanks Amrit ,
>
> Actually we have huge amount of data so that's why thinking to index data into particular shard accept it's looks difficult but need to achieve the performance using document routing for huge data.
>
> With configuration of  4 shard and 4 replica  is it better to distribute the one project data in multiple shard or in one shard which one is  feasible using document routing because needs the best performance while insertion & retrieval of document. And there would be the multiple projects of client which has huge amount of data.
>
> I also taken the reading with 4 shard and 4 replica where without routing data are distribute among all 4 shard  and with routing its distributes in 1 shard because of used 1 bit of shard key like projectId/1!DocumentId.my reading looks as below.
> 1:inserting data in 4 shard without document routing  time taken ( in
> millisecond)  =325108 Inserting data in 1 shard with document routing
> time time taken ( in millisecond)  =251835
>
> 2: retrieving data from 4 shard without document routing time taken(
> in millisecond)  = 234242 And retrieving data from 1 shard with
> document routing time taken ( in millisecond)= 94562
>
> As per above reading getting  performance in local  while data in 1 shard but in production there will be huge data so is it need to distribute in 2 shard or in 1 shard which one is feasible for achieve better performance.
>
>
> Regards,
> Ketan
>
> -----Original Message-----
> From: Amrit Sarkar [mailto:[hidden email]]
> Sent: Monday, November 13, 2017 8:52 PM
> To: [hidden email]
> Subject: Re: How to routing document for send to particular shard
> range
>
> Surely someone else can chim in;
>
> but when you say: "so regarding to it we need to index the particular
>> client data into particular shard so if its  manageable than we will
>> improve the performance as we need"
>
>
> You can / should create different collections for different client data, so that you can for surely improve performance as per need. There are multiple configurations which drives indexing and querying capabilities and incorporating everything in single collection will hinder that flexibility.
> Also if you need to add new client in future, you don't need to think about sharding again, add new collection and tweak its configuration as per need.
>
> Still if you need to use compositeKey to acheive your use-case, I am not sure how to do that honestly. Since shards are predefined when collection will be created. You cannot add more shards and such. You can only split a shard, which will divide the index and hence the hash range. I will strongly recommend you to reconsider your SolrCloud design technique for your use-case.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> Medium: https://medium.com/@sarkaramrit2
>
> On Mon, Nov 13, 2017 at 7:31 PM, Ketan Thanki <[hidden email]> wrote:
>
>>
>> Thanks Amrit,
>>
>> My requirement to achieve best performance while using document
>> routing facility in solr so regarding to it we need to index the
>> particular client data into particular shard so if its  manageable
>> than we will improve the performance as we need.
>>
>> Please do needful.
>>
>>
>> Regards,
>>
>>
>> -----Original Message-----
>> From: Amrit Sarkar [mailto:[hidden email]]
>> Sent: Friday, November 10, 2017 5:34 PM
>> To: [hidden email]
>> Subject: Re: How to routing document for send to particular shard
>> range
>>
>> Ketan,
>>
>> here I have also created new field 'core' which value is any shard
>> where I
>> > need to send documents and on retrieval use '_route_'  parameter
>> > with mentioning the particular shard. But issue facing still my
>> > clusterstate.json showing the "router":{"name":"compositeId"} is it
>> > means my settings not impacted? or its default.
>>
>>
>> Only answering this query, as Erick has already mentioned in the
>> above comment. You need to RECREATE the collection passinfg the
>> "route.field" in the "create collection" api parameters as
>> "route.field" is collection-specific property maintained at zookeeper
>> (state.json / clusterstate.json).
>>
>> https://lucene.apache.org/solr/guide/6_6/collections-
>> api.html#CollectionsAPI-create
>>
>> I highly recommend not to alter core.properties manually when dealing
>> with SolrCloud and instead relying on SolrCloud APIs to make necessary change.
>>
>> Amrit Sarkar
>> Search Engineer
>> Lucidworks, Inc.
>> 415-589-9269
>> www.lucidworks.com
>> Twitter http://twitter.com/lucidworks
>> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> Medium: https://medium.com/@sarkaramrit2
>>
>> On Fri, Nov 10, 2017 at 5:23 PM, Ketan Thanki <[hidden email]> wrote:
>>
>> > Hi Erik,
>> >
>> > My requirement to index the documents of particular organization to
>> > specific shard. Also I have made changes in core.properties as
>> > menions below.
>> >
>> > Model Collection:
>> > name=model
>> > shard=shard1
>> > collection=model
>> > router.name=implicit
>> > router.field=core
>> > shards=shard1,shard2
>> >
>> > Workset Collection:
>> > name=workset
>> > shard=shard1
>> > collection=workset
>> > router.name=implicit
>> > router.field=core
>> > shards=shard1,shard2
>> >
>> > here I have also created new field 'core' which value is any shard
>> > where I need to send documents and on retrieval use '_route_'
>> > parameter with mentioning the particular shard. But issue facing
>> > still my clusterstate.json showing the
>> > "router":{"name":"compositeId"} is it means my settings not impacted? or its default.
>> >
>> > Please do needful.
>> >
>> > Regards,
>> >
>> > -----Original Message-----
>> > From: Erick Erickson [mailto:[hidden email]]
>> > Sent: Friday, November 10, 2017 12:06 PM
>> > To: solr-user
>> > Subject: Re: How to routing document for send to particular shard
>> > range
>> >
>> > You cannot just make configuration changes, whether you use
>> > implicit or compositeId is defined when you _create_ the collection
>> > and cannot be changed later.
>> >
>> > You need to create a new collection and specify
>> > router.name=implicit when you create it. Then you can route documents as you desire.
>> >
>> > I would caution against this though. If you use implicit routing
>> > _you_ have to insure balancing. For instance, you could have
>> > 10,000,000 documents for "Org1" and 15 for "Org2", resulting in
>> > hugely unbalanced
>> shards.
>> >
>> > Implicit routing is particularly useful for time-series indexing,
>> > where you, say, index a day's worth of documents to each shard. It
>> > may be appropriate in your case, but so far you haven't told us
>> > _why_ you think routing docs to particular shards is desirable.
>> >
>> > Best,
>> > Erick
>> >
>> > On Thu, Nov 9, 2017 at 10:27 PM, Ketan Thanki <[hidden email]> wrote:
>> > > Thanks Amrit,
>> > >
>> > > For suggesting me the approach.
>> > >
>> > > I have got some understanding regarding to it and i need to
>> > > implement
>> > implicit routing for specific shard based. I have try by make
>> > changes on core.properties. but it can't work So can you please let
>> > me for the configuration changes needed. Is it need to create extra
>> > field for document to rout?
>> > >
>> > > I have below configuration Collection created manually:
>> > > 1: Workset with 4 shard and 4 replica
>> > > 2: Model with 4 shard and 4 replica
>> > >
>> > >
>> > > For e.g Core.properties for 1 shard :
>> > > Workset Colection:
>> > > name=workset
>> > > shard=shard1
>> > > collection=workset
>> > >
>> > > Model Collection:
>> > > name=model
>> > > shard=shard1
>> > > collection=model
>> > >
>> > >
>> > > So can u please let me the changes needed in configuration for
>> > > the
>> > implicit routing.
>> > >
>> > > Please do needful.
>> > >
>> > > Regards,
>> > >
>> > >
>> > > -----Original Message-----
>> > > From: Amrit Sarkar [mailto:[hidden email]]
>> > > Sent: Wednesday, November 08, 2017 12:36 PM
>> > > To: [hidden email]
>> > > Subject: Re: How to routing document for send to particular shard
>> > > range
>> > >
>> > > Ketan,
>> > >
>> > > If you know defined indexing architecture; isn't it better to use
>> > "implicit" router by writing logic on your own end.
>> > >
>> > > If the document is of "Org1", send the document with extra param*
>> > > "_route_:shard1"* and likewise.
>> > >
>> > > Snippet from official doc:
>> > > https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data
>> > > -
>> > > in -s
>> > > olrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
>> > > :
>> > >
>> > > If you created the collection and defined the "implicit" router
>> > > at the time
>> > >> of creation, you can additionally define a router.field
>> > >> parameter to use a field from each document to identify a shard
>> > >> where the document belongs. If the field specified is missing in
>> > >> the document, however, the document will be rejected. You could
>> > >> also use the _route_ parameter to name a specific shard.
>> > >
>> > >
>> > >
>> > > Amrit Sarkar
>> > > Search Engineer
>> > > Lucidworks, Inc.
>> > > 415-589-9269
>> > > www.lucidworks.com
>> > > Twitter http://twitter.com/lucidworks
>> > > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>> > > Medium: https://medium.com/@sarkaramrit2
>> > >
>> > > On Wed, Nov 8, 2017 at 11:15 AM, Ketan Thanki <[hidden email]>
>> wrote:
>> > >
>> > >> Hi,
>> > >>
>> > >> I have requirement now quite different as I need to set routing
>> > >> key hash for document which confirm it to send to particular
>> > >> shard as its
>> > range.
>> > >>
>> > >> I have solrcloud configuration with 4 shard  & 4 replica with
>> > >> below shard range.
>> > >> shard1: 80000000-bfffffff
>> > >> shard2: c0000000-ffffffff
>> > >> shard3: 0-3fffffff
>> > >> shard4: 40000000-7fffffff
>> > >>
>> > >> e.g: below show the project  works in organization which is my
>> > >> routing
>> > key.
>> > >> Org1= works for project1,project2 Org2=works for project3
>> > >> Org3=works for project4
>> > >> Org4=project5
>> > >>
>> > >> So as mentions above I want to index org1 to shard1,org2 to
>> > >> shard2,org3 to
>> > >> shard3,org4 to shard4 meanwhile send it to particular shard.
>> > >> How could I manage compositeId routing to do this.
>> > >>
>> > >> Regards,
>> > >> Ketan.
>> > >> Please cast a vote for Asite in the 2017 Construction Computing
>> Awards:
>> > >> Click here to
>> > >> Vote<http://caddealer.com/concompawards/index.php?page=
>> > >> cca2017vote>
>> > >>
>> > >> [CC Award Winners!]
>> > >>
>> > >>
>> >
>>