shardkey

classic Classic list List threaded Threaded
15 messages Options
Reply | Threaded
Open this post in threaded view
|

shardkey

Joshi, Shital
Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple questions on shard key.

        1. Looking at the admin GUI, how do I know which field is being used for shard key.
        2. What is the default shard key used?
        3. How do I override the default shard key?

Thanks.
Reply | Threaded
Open this post in threaded view
|

Re: shardkey

bbarani
This post was updated on .
- A shard key can be prepended to the unique document id:
shard_key!unique_id
- Documents with the same shard_key will reside on the same
shard.

I suppose you can implement custom hashing by using "_shard_" field. I am not sure on this, but I have come across this approach sometime back..

At query time, you can specify "shard.keys" parameter...
Reply | Threaded
Open this post in threaded view
|

Re: shardkey

rishi
In reply to this post by Joshi, Shital
From my understanding.
In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
CompositeId router is default if your numShards>1 on collection creation.
CompositeId router generates an hash using the uniqueKey defined in your schema.xml to route your documents to a dedicated shard.

You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit only the shard that has your shard.key  

 

 Thanks,

Rishi.

 

-----Original Message-----
From: Joshi, Shital <[hidden email]>
To: '[hidden email]' <[hidden email]>
Sent: Wed, Jun 12, 2013 10:01 am
Subject: shardkey


Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
questions on shard key.

        1. Looking at the admin GUI, how do I know which field is being used for shard
key.
        2. What is the default shard key used?
        3. How do I override the default shard key?

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

RE: shardkey

James Thomas
This page has some good information on custom document routing:
http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud



-----Original Message-----
From: Rishi Easwaran [mailto:[hidden email]]
Sent: Wednesday, June 12, 2013 1:40 PM
To: [hidden email]
Subject: Re: shardkey

From my understanding.
In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
CompositeId router is default if your numShards>1 on collection creation.
CompositeId router generates an hash using the uniqueKey defined in your schema.xml to route your documents to a dedicated shard.

You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit only the shard that has your shard.key  

 

 Thanks,

Rishi.

 

-----Original Message-----
From: Joshi, Shital <[hidden email]>
To: '[hidden email]' <[hidden email]>
Sent: Wed, Jun 12, 2013 10:01 am
Subject: shardkey


Hi,

We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple questions on shard key.

        1. Looking at the admin GUI, how do I know which field is being used for shard key.
        2. What is the default shard key used?
        3. How do I override the default shard key?

Thanks.

 
Reply | Threaded
Open this post in threaded view
|

Re: shardkey

Joel Bernstein
Also you might want to check this blog post, just went up today.

http://searchhub.org/2013/06/13/solr-cloud-document-routing/


On Wed, Jun 12, 2013 at 2:18 PM, James Thomas <[hidden email]> wrote:

> This page has some good information on custom document routing:
>
> http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
>
>
> -----Original Message-----
> From: Rishi Easwaran [mailto:[hidden email]]
> Sent: Wednesday, June 12, 2013 1:40 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> From my understanding.
> In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
> CompositeId router is default if your numShards>1 on collection creation.
> CompositeId router generates an hash using the uniqueKey defined in your
> schema.xml to route your documents to a dedicated shard.
>
> You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit
> only the shard that has your shard.key
>
>
>
>  Thanks,
>
> Rishi.
>
>
>
> -----Original Message-----
> From: Joshi, Shital <[hidden email]>
> To: '[hidden email]' <[hidden email]>
> Sent: Wed, Jun 12, 2013 10:01 am
> Subject: shardkey
>
>
> Hi,
>
> We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
> questions on shard key.
>
>         1. Looking at the admin GUI, how do I know which field is being
> used for shard key.
>         2. What is the default shard key used?
>         3. How do I override the default shard key?
>
> Thanks.
>
>
>


--
Joel Bernstein
Professional Services LucidWorks
Reply | Threaded
Open this post in threaded view
|

RE: shardkey

Joshi, Shital
Thanks for the links. It was very useful.

Is there a way to use implicit router WITH numShards parameter? We have 5 shards and business day (Monday-Friday) is our shardkey. We want to be able to say Monday -> shard1, Tuesday -> shard2.....




-----Original Message-----
From: Joel Bernstein [mailto:[hidden email]]
Sent: Thursday, June 13, 2013 2:38 PM
To: [hidden email]
Subject: Re: shardkey

Also you might want to check this blog post, just went up today.

http://searchhub.org/2013/06/13/solr-cloud-document-routing/


On Wed, Jun 12, 2013 at 2:18 PM, James Thomas <[hidden email]> wrote:

> This page has some good information on custom document routing:
>
> http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
>
>
>
> -----Original Message-----
> From: Rishi Easwaran [mailto:[hidden email]]
> Sent: Wednesday, June 12, 2013 1:40 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> From my understanding.
> In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
> CompositeId router is default if your numShards>1 on collection creation.
> CompositeId router generates an hash using the uniqueKey defined in your
> schema.xml to route your documents to a dedicated shard.
>
> You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit
> only the shard that has your shard.key
>
>
>
>  Thanks,
>
> Rishi.
>
>
>
> -----Original Message-----
> From: Joshi, Shital <[hidden email]>
> To: '[hidden email]' <[hidden email]>
> Sent: Wed, Jun 12, 2013 10:01 am
> Subject: shardkey
>
>
> Hi,
>
> We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
> questions on shard key.
>
>         1. Looking at the admin GUI, how do I know which field is being
> used for shard key.
>         2. What is the default shard key used?
>         3. How do I override the default shard key?
>
> Thanks.
>
>
>


--
Joel Bernstein
Professional Services LucidWorks
Reply | Threaded
Open this post in threaded view
|

Re: shardkey

Shalin Shekhar Mangar
No, there is no way to do that right now. I think you'd be better off using
custom sharding because you can't really control that two shardKeys must go
to two different shards. We can only guarantee that docs with the same
shardKey will goto the same shard.


On Mon, Jun 17, 2013 at 9:47 PM, Joshi, Shital <[hidden email]> wrote:

> Thanks for the links. It was very useful.
>
> Is there a way to use implicit router WITH numShards parameter? We have 5
> shards and business day (Monday-Friday) is our shardkey. We want to be able
> to say Monday -> shard1, Tuesday -> shard2.....
>
>
>
>
> -----Original Message-----
> From: Joel Bernstein [mailto:[hidden email]]
> Sent: Thursday, June 13, 2013 2:38 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> Also you might want to check this blog post, just went up today.
>
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/
>
>
> On Wed, Jun 12, 2013 at 2:18 PM, James Thomas <[hidden email]> wrote:
>
> > This page has some good information on custom document routing:
> >
> >
> http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> >
> >
> > -----Original Message-----
> > From: Rishi Easwaran [mailto:[hidden email]]
> > Sent: Wednesday, June 12, 2013 1:40 PM
> > To: [hidden email]
> > Subject: Re: shardkey
> >
> > From my understanding.
> > In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
> > CompositeId router is default if your numShards>1 on collection creation.
> > CompositeId router generates an hash using the uniqueKey defined in your
> > schema.xml to route your documents to a dedicated shard.
> >
> > You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit
> > only the shard that has your shard.key
> >
> >
> >
> >  Thanks,
> >
> > Rishi.
> >
> >
> >
> > -----Original Message-----
> > From: Joshi, Shital <[hidden email]>
> > To: '[hidden email]' <[hidden email]>
> > Sent: Wed, Jun 12, 2013 10:01 am
> > Subject: shardkey
> >
> >
> > Hi,
> >
> > We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
> > questions on shard key.
> >
> >         1. Looking at the admin GUI, how do I know which field is being
> > used for shard key.
> >         2. What is the default shard key used?
> >         3. How do I override the default shard key?
> >
> > Thanks.
> >
> >
> >
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>



--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

RE: shardkey

Joshi, Shital
Thanks for answering my questions on shardkey. We experimented with implicit router and it works like you said.

We're experimenting with composite id router. This document http://searchhub.org/2013/06/13/solr-cloud-document-routing/ says:
"A shard key can be pre-pended to the unique document id to create a composite id. The composite id is formed with the following syntax: shard_key!document_id"

Say this is our CSV data and q_idn_s is unique id per document.

|q_idn_s |busdate |put_date            
|test_14  |20130611|02/06/2013  16:06:24
|test_15  |20130611|02/06/2013  16:06:24

Now we want to use busdate!q_idn_s as composite key. So we changed our data like this:

|q_idn_s |busdate |put_date            
|20130611!test_14  |20130611|02/06/2013  16:06:24
|20130611!test_15  |20130611|02/06/2013  16:06:24

And uploaded this data like this:

echo "$data" curl --proxy ""  "http://$HOST:8983/solr/collection1/update/csv?commit=true&separator=|&escape=\&trim=true&header=false&skipLines=2&overwrite=true&fieldnames=$fieldnames&"  --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'

But now Solr stores composite id in the document id (q_idn_s) column.  We were under impression that Solr won't store shard_key! in the document id column. It would only use it to calculate hash key but while storing, it would only store the document id. Is it not the case?



-----Original Message-----
From: Shalin Shekhar Mangar [mailto:[hidden email]]
Sent: Monday, June 17, 2013 3:48 PM
To: [hidden email]
Subject: Re: shardkey

No, there is no way to do that right now. I think you'd be better off using
custom sharding because you can't really control that two shardKeys must go
to two different shards. We can only guarantee that docs with the same
shardKey will goto the same shard.


On Mon, Jun 17, 2013 at 9:47 PM, Joshi, Shital <[hidden email]> wrote:

> Thanks for the links. It was very useful.
>
> Is there a way to use implicit router WITH numShards parameter? We have 5
> shards and business day (Monday-Friday) is our shardkey. We want to be able
> to say Monday -> shard1, Tuesday -> shard2.....
>
>
>
>
> -----Original Message-----
> From: Joel Bernstein [mailto:[hidden email]]
> Sent: Thursday, June 13, 2013 2:38 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> Also you might want to check this blog post, just went up today.
>
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/
>
>
> On Wed, Jun 12, 2013 at 2:18 PM, James Thomas <[hidden email]> wrote:
>
> > This page has some good information on custom document routing:
> >
> >
> http://docs.lucidworks.com/display/solr/Shards+and+Indexing+Data+in+SolrCloud
> >
> >
> >
> > -----Original Message-----
> > From: Rishi Easwaran [mailto:[hidden email]]
> > Sent: Wednesday, June 12, 2013 1:40 PM
> > To: [hidden email]
> > Subject: Re: shardkey
> >
> > From my understanding.
> > In SOLR cloud the CompositeIdDocRouter uses HashbasedDocRouter.
> > CompositeId router is default if your numShards>1 on collection creation.
> > CompositeId router generates an hash using the uniqueKey defined in your
> > schema.xml to route your documents to a dedicated shard.
> >
> > You can use select?q=xyz&shard.keys=uniquekey to focus your search to hit
> > only the shard that has your shard.key
> >
> >
> >
> >  Thanks,
> >
> > Rishi.
> >
> >
> >
> > -----Original Message-----
> > From: Joshi, Shital <[hidden email]>
> > To: '[hidden email]' <[hidden email]>
> > Sent: Wed, Jun 12, 2013 10:01 am
> > Subject: shardkey
> >
> >
> > Hi,
> >
> > We are using Solr 4.3.0 SolrCloud (5 shards, 10 replicas). I have couple
> > questions on shard key.
> >
> >         1. Looking at the admin GUI, how do I know which field is being
> > used for shard key.
> >         2. What is the default shard key used?
> >         3. How do I override the default shard key?
> >
> > Thanks.
> >
> >
> >
>
>
> --
> Joel Bernstein
> Professional Services LucidWorks
>



--
Regards,
Shalin Shekhar Mangar.
Reply | Threaded
Open this post in threaded view
|

Re: shardkey

Yonik Seeley-4
On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
> But now Solr stores composite id in the document id

Correct, it's the document id itself that contains everything needed
for tje compositeId router to determine the hash.

> It would only use it to calculate hash key but while storing

compositeId routing is when it makes sense to make the routing part of
the unique id so that an id is all the information needed to find the
document in the cluster.  For example customer_id!document_name.  From
your example of 20130611!test_14 it looks like you're doing time based
sharding, and one would normally not use the compositeId router for
that.

-Yonik
http://lucidworks.com
Reply | Threaded
Open this post in threaded view
|

RE: shardkey

Joshi, Shital
Thanks so much for answering!

"it looks like you're doing time based sharding, and one would normally not use the compositeId router for that."

What would be the recommend router or alternative if we wanted to do time-based sharding? We are using business date to build composite key (it's a String, without timestamp) and per business date we're expecting about 3 million documents.


-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik Seeley
Sent: Friday, June 21, 2013 8:50 PM
To: [hidden email]
Subject: Re: shardkey

On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
> But now Solr stores composite id in the document id

Correct, it's the document id itself that contains everything needed
for tje compositeId router to determine the hash.

> It would only use it to calculate hash key but while storing

compositeId routing is when it makes sense to make the routing part of
the unique id so that an id is all the information needed to find the
document in the cluster.  For example customer_id!document_name.  From
your example of 20130611!test_14 it looks like you're doing time based
sharding, and one would normally not use the compositeId router for
that.

-Yonik
http://lucidworks.com
Reply | Threaded
Open this post in threaded view
|

RE: shardkey

Joshi, Shital
In reply to this post by Yonik Seeley-4
Hi,

We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey.  We're maintaining shardkey -> shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying,  we don't specify shards parameter. We want to search across shards.

While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across  shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud.

We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work.

Many Thanks!

-----Original Message-----
From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik Seeley
Sent: Friday, June 21, 2013 8:50 PM
To: [hidden email]
Subject: Re: shardkey

On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
> But now Solr stores composite id in the document id

Correct, it's the document id itself that contains everything needed
for tje compositeId router to determine the hash.

> It would only use it to calculate hash key but while storing

compositeId routing is when it makes sense to make the routing part of
the unique id so that an id is all the information needed to find the
document in the cluster.  For example customer_id!document_name.  From
your example of 20130611!test_14 it looks like you're doing time based
sharding, and one would normally not use the compositeId router for
that.

-Yonik
http://lucidworks.com
Reply | Threaded
Open this post in threaded view
|

Re: shardkey

Mark Miller-3
You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?

The commit true part of the request that add documents? If so, it might be SOLR-4923 and you should try the commit in a request after adding the docs.

- Mark

On Jun 27, 2013, at 4:42 PM, "Joshi, Shital" <[hidden email]> wrote:

> Hi,
>
> We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey.  We're maintaining shardkey -> shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying,  we don't specify shards parameter. We want to search across shards.
>
> While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across  shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud.
>
> We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work.
>
> Many Thanks!
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik Seeley
> Sent: Friday, June 21, 2013 8:50 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
>> But now Solr stores composite id in the document id
>
> Correct, it's the document id itself that contains everything needed
> for tje compositeId router to determine the hash.
>
>> It would only use it to calculate hash key but while storing
>
> compositeId routing is when it makes sense to make the routing part of
> the unique id so that an id is all the information needed to find the
> document in the cluster.  For example customer_id!document_name.  From
> your example of 20130611!test_14 it looks like you're doing time based
> sharding, and one would normally not use the compositeId router for
> that.
>
> -Yonik
> http://lucidworks.com

Reply | Threaded
Open this post in threaded view
|

RE: shardkey

Joshi, Shital
Thanks Mark.

We use commit=true as part of the request to add documents. Something like this:

echo "$data"| curl --proxy "" --silent "http://HOST:9983/solr/collection1/update/csv?commit=true&separator=|&fieldnames=$fieldnames&_shard_=shard1"  --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'

You're suggesting that after this update, we should always execute, curl --proxy "" --silent "http://HOST:8983/solr/core3/update?commit=true"  Is that correct?
It doesn't matter whether HOST is leader or replica.



-----Original Message-----
From: Mark Miller [mailto:[hidden email]]
Sent: Thursday, June 27, 2013 5:35 PM
To: [hidden email]
Subject: Re: shardkey

You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?

The commit true part of the request that add documents? If so, it might be SOLR-4923 and you should try the commit in a request after adding the docs.

- Mark

On Jun 27, 2013, at 4:42 PM, "Joshi, Shital" <[hidden email]> wrote:

> Hi,
>
> We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey.  We're maintaining shardkey -> shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying,  we don't specify shards parameter. We want to search across shards.
>
> While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across  shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud.
>
> We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work.
>
> Many Thanks!
>
> -----Original Message-----
> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik Seeley
> Sent: Friday, June 21, 2013 8:50 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
>> But now Solr stores composite id in the document id
>
> Correct, it's the document id itself that contains everything needed
> for tje compositeId router to determine the hash.
>
>> It would only use it to calculate hash key but while storing
>
> compositeId routing is when it makes sense to make the routing part of
> the unique id so that an id is all the information needed to find the
> document in the cluster.  For example customer_id!document_name.  From
> your example of 20130611!test_14 it looks like you're doing time based
> sharding, and one would normally not use the compositeId router for
> that.
>
> -Yonik
> http://lucidworks.com

Reply | Threaded
Open this post in threaded view
|

Re: shardkey

Mark Miller-3
Yeah, that is what I would try until 4.4 comes out - and it should not matter replica or leader.

- Mark

On Jun 28, 2013, at 3:13 PM, "Joshi, Shital" <[hidden email]> wrote:

> Thanks Mark.
>
> We use commit=true as part of the request to add documents. Something like this:
>
> echo "$data"| curl --proxy "" --silent "http://HOST:9983/solr/collection1/update/csv?commit=true&separator=|&fieldnames=$fieldnames&_shard_=shard1"  --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'
>
> You're suggesting that after this update, we should always execute, curl --proxy "" --silent "http://HOST:8983/solr/core3/update?commit=true"  Is that correct?
> It doesn't matter whether HOST is leader or replica.
>
>
>
> -----Original Message-----
> From: Mark Miller [mailto:[hidden email]]
> Sent: Thursday, June 27, 2013 5:35 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?
>
> The commit true part of the request that add documents? If so, it might be SOLR-4923 and you should try the commit in a request after adding the docs.
>
> - Mark
>
> On Jun 27, 2013, at 4:42 PM, "Joshi, Shital" <[hidden email]> wrote:
>
>> Hi,
>>
>> We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey.  We're maintaining shardkey -> shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying,  we don't specify shards parameter. We want to search across shards.
>>
>> While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across  shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud.
>>
>> We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work.
>>
>> Many Thanks!
>>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik Seeley
>> Sent: Friday, June 21, 2013 8:50 PM
>> To: [hidden email]
>> Subject: Re: shardkey
>>
>> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
>>> But now Solr stores composite id in the document id
>>
>> Correct, it's the document id itself that contains everything needed
>> for tje compositeId router to determine the hash.
>>
>>> It would only use it to calculate hash key but while storing
>>
>> compositeId routing is when it makes sense to make the routing part of
>> the unique id so that an id is all the information needed to find the
>> document in the cluster.  For example customer_id!document_name.  From
>> your example of 20130611!test_14 it looks like you're doing time based
>> sharding, and one would normally not use the compositeId router for
>> that.
>>
>> -Yonik
>> http://lucidworks.com
>

Reply | Threaded
Open this post in threaded view
|

RE: shardkey

Joshi, Shital
Thanks!

-----Original Message-----
From: Mark Miller [mailto:[hidden email]]
Sent: Friday, June 28, 2013 5:06 PM
To: [hidden email]
Subject: Re: shardkey

Yeah, that is what I would try until 4.4 comes out - and it should not matter replica or leader.

- Mark

On Jun 28, 2013, at 3:13 PM, "Joshi, Shital" <[hidden email]> wrote:

> Thanks Mark.
>
> We use commit=true as part of the request to add documents. Something like this:
>
> echo "$data"| curl --proxy "" --silent "http://HOST:9983/solr/collection1/update/csv?commit=true&separator=|&fieldnames=$fieldnames&_shard_=shard1"  --data-binary @-  -H 'Content-type:text/plain; charset=utf-8'
>
> You're suggesting that after this update, we should always execute, curl --proxy "" --silent "http://HOST:8983/solr/core3/update?commit=true"  Is that correct?
> It doesn't matter whether HOST is leader or replica.
>
>
>
> -----Original Message-----
> From: Mark Miller [mailto:[hidden email]]
> Sent: Thursday, June 27, 2013 5:35 PM
> To: [hidden email]
> Subject: Re: shardkey
>
> You might be seeing https://issues.apache.org/jira/browse/SOLR-4923 ?
>
> The commit true part of the request that add documents? If so, it might be SOLR-4923 and you should try the commit in a request after adding the docs.
>
> - Mark
>
> On Jun 27, 2013, at 4:42 PM, "Joshi, Shital" <[hidden email]> wrote:
>
>> Hi,
>>
>> We finally decided on using custom sharding (implicit document routing) for our project. We will have ~3 mil documents per shardkey.  We're maintaining shardkey -> shardid mapping in a database table. While adding documents we always specify _shard_ parameter in update URL but while querying,  we don't specify shards parameter. We want to search across shards.
>>
>> While experimenting we found that right after hard committing (commit=true in update URL), at times the query didn't return documents across shards (40% of the time) But many times (60% of the time) it returned documents across shards. When queried after few hours, the query always returned documents across  shards. Is that expected behavior? Is there a parameter to enforce querying across all shards? This is very important point for us to move further with SolrCloud.
>>
>> We're experimenting with adding a new shard and start directing all new documents to this new shard. Hopefully that should work.
>>
>> Many Thanks!
>>
>> -----Original Message-----
>> From: [hidden email] [mailto:[hidden email]] On Behalf Of Yonik Seeley
>> Sent: Friday, June 21, 2013 8:50 PM
>> To: [hidden email]
>> Subject: Re: shardkey
>>
>> On Fri, Jun 21, 2013 at 6:08 PM, Joshi, Shital <[hidden email]> wrote:
>>> But now Solr stores composite id in the document id
>>
>> Correct, it's the document id itself that contains everything needed
>> for tje compositeId router to determine the hash.
>>
>>> It would only use it to calculate hash key but while storing
>>
>> compositeId routing is when it makes sense to make the routing part of
>> the unique id so that an id is all the information needed to find the
>> document in the cluster.  For example customer_id!document_name.  From
>> your example of 20130611!test_14 it looks like you're doing time based
>> sharding, and one would normally not use the compositeId router for
>> that.
>>
>> -Yonik
>> http://lucidworks.com
>