Quantcast

Distributed Searching + unique Ids

classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Distributed Searching + unique Ids

Eric Khoury




hey guys, the spec mentions the following:


 The unique
     key field must be unique across all shards. If docs with
     duplicate unique keys are encountered, Solr will make an attempt to return
     valid results, but the behavior may be non-deterministic.


I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.    
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed Searching + unique Ids

Erick Erickson
Don't do this. Many bits of sharding assume that a uniqueKey
exists on one and only one shard. Document counts may be
off. Faceting may be off.  Etc.

Why do you want to duplicate records across shards? What
benefit is this providing?

This feels like an XY problem...

Best
Erick

On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[hidden email]> wrote:

>
>
>
>
> hey guys, the spec mentions the following:
>
>
>  The unique
>      key field must be unique across all shards. If docs with
>      duplicate unique keys are encountered, Solr will make an attempt to return
>      valid results, but the behavior may be non-deterministic.
>
>
> I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Distributed Searching + unique Ids

Eric Khoury

Hey Erick, thanks.I was hoping to shard on a very logical boundary for my data, where most queries would only care about data on single shards, and some queries would go to all shards, but that would only work if certain common objects are duplicated across shards.Can you think of another way to get this done, other than grouping common objects to yet another shard?Thanks again,Eric.
 > Date: Tue, 14 Aug 2012 08:15:44 -0600

> Subject: Re: Distributed Searching + unique Ids
> From: [hidden email]
> To: [hidden email]
>
> Don't do this. Many bits of sharding assume that a uniqueKey
> exists on one and only one shard. Document counts may be
> off. Faceting may be off.  Etc.
>
> Why do you want to duplicate records across shards? What
> benefit is this providing?
>
> This feels like an XY problem...
>
> Best
> Erick
>
> On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[hidden email]> wrote:
> >
> >
> >
> >
> > hey guys, the spec mentions the following:
> >
> >
> >  The unique
> >      key field must be unique across all shards. If docs with
> >      duplicate unique keys are encountered, Solr will make an attempt to return
> >      valid results, but the behavior may be non-deterministic.
> >
> >
> > I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.
     
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

RE: Distributed Searching + unique Ids

Buttler, David
In reply to this post by Erick Erickson
I just downloaded the solr 4 beta, and was running through the tutorial.  It seemed to me that I was getting duplicate counts in my facet fields when I had two shards and four cores running. For example, http://localhost:8983/solr/collection1/browse
Reports 21 entries in the facet cat:electronics, but if I click on that facet, there are only 14 results, and it still reports 21 entries for cat:electronics.

Is this a known bug?

-----Original Message-----
From: Erick Erickson [mailto:[hidden email]]
Sent: Tuesday, August 14, 2012 7:16 AM
To: [hidden email]
Subject: Re: Distributed Searching + unique Ids

Don't do this. Many bits of sharding assume that a uniqueKey
exists on one and only one shard. Document counts may be
off. Faceting may be off.  Etc.

Why do you want to duplicate records across shards? What
benefit is this providing?

This feels like an XY problem...

Best
Erick

On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[hidden email]> wrote:

>
>
>
>
> hey guys, the spec mentions the following:
>
>
>  The unique
>      key field must be unique across all shards. If docs with
>      duplicate unique keys are encountered, Solr will make an attempt to return
>      valid results, but the behavior may be non-deterministic.
>
>
> I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed Searching + unique Ids

Erick Erickson
In reply to this post by Eric Khoury
I still don't see the need to have duplicate documents here.
Simply have your indexing process put the data that should be
grouped on a shard on that shard. Let the rest of the objects
be randomly distributed amongst the shards...

Now, your front end has to know that some queries only need
to go to one shard and just send them there (non-distributed).
The rest of the queries go to the sharded handler.

Or you're over-thinking the problem and should just do normal
sharding and not worry about it. Let's say you have partitioned
your data amongst the shards as you indicate. Just send _every_
request to all the shards. The shards that don't have any data
that you're interested in would presumably have very-little-to-zero
processing involved assuming something like &fq=shardId gets
tacked on to the query. The complexity probably isn't worth the
minuscule savings, this smells like premature optimization....

Best
Erick

On Tue, Aug 14, 2012 at 9:20 AM, Eric Khoury <[hidden email]> wrote:

>
> Hey Erick, thanks.I was hoping to shard on a very logical boundary for my data, where most queries would only care about data on single shards, and some queries would go to all shards, but that would only work if certain common objects are duplicated across shards.Can you think of another way to get this done, other than grouping common objects to yet another shard?Thanks again,Eric.
>  > Date: Tue, 14 Aug 2012 08:15:44 -0600
>> Subject: Re: Distributed Searching + unique Ids
>> From: [hidden email]
>> To: [hidden email]
>>
>> Don't do this. Many bits of sharding assume that a uniqueKey
>> exists on one and only one shard. Document counts may be
>> off. Faceting may be off.  Etc.
>>
>> Why do you want to duplicate records across shards? What
>> benefit is this providing?
>>
>> This feels like an XY problem...
>>
>> Best
>> Erick
>>
>> On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[hidden email]> wrote:
>> >
>> >
>> >
>> >
>> > hey guys, the spec mentions the following:
>> >
>> >
>> >  The unique
>> >      key field must be unique across all shards. If docs with
>> >      duplicate unique keys are encountered, Solr will make an attempt to return
>> >      valid results, but the behavior may be non-deterministic.
>> >
>> >
>> > I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed Searching + unique Ids

Erick Erickson
In reply to this post by Buttler, David
This shouldn't be happening, but it may well
be pilot error.

Could you show us the queries that get submitted?
Especially add &debug=query and paste the results?

Best
Erick

On Tue, Aug 14, 2012 at 3:55 PM, Buttler, David <[hidden email]> wrote:

> I just downloaded the solr 4 beta, and was running through the tutorial.  It seemed to me that I was getting duplicate counts in my facet fields when I had two shards and four cores running. For example, http://localhost:8983/solr/collection1/browse
> Reports 21 entries in the facet cat:electronics, but if I click on that facet, there are only 14 results, and it still reports 21 entries for cat:electronics.
>
> Is this a known bug?
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Tuesday, August 14, 2012 7:16 AM
> To: [hidden email]
> Subject: Re: Distributed Searching + unique Ids
>
> Don't do this. Many bits of sharding assume that a uniqueKey
> exists on one and only one shard. Document counts may be
> off. Faceting may be off.  Etc.
>
> Why do you want to duplicate records across shards? What
> benefit is this providing?
>
> This feels like an XY problem...
>
> Best
> Erick
>
> On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[hidden email]> wrote:
>>
>>
>>
>>
>> hey guys, the spec mentions the following:
>>
>>
>>  The unique
>>      key field must be unique across all shards. If docs with
>>      duplicate unique keys are encountered, Solr will make an attempt to return
>>      valid results, but the behavior may be non-deterministic.
>>
>>
>> I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Distributed Searching + unique Ids

Erick Erickson
In reply to this post by Buttler, David
Oh, and David Buttler:

Please start a new thread when asking unrelated questions, aka don't
hijack threads, see:
http://people.apache.org/~hossman/#threadhijack

Best
Erick

On Tue, Aug 14, 2012 at 3:55 PM, Buttler, David <[hidden email]> wrote:

> I just downloaded the solr 4 beta, and was running through the tutorial.  It seemed to me that I was getting duplicate counts in my facet fields when I had two shards and four cores running. For example, http://localhost:8983/solr/collection1/browse
> Reports 21 entries in the facet cat:electronics, but if I click on that facet, there are only 14 results, and it still reports 21 entries for cat:electronics.
>
> Is this a known bug?
>
> -----Original Message-----
> From: Erick Erickson [mailto:[hidden email]]
> Sent: Tuesday, August 14, 2012 7:16 AM
> To: [hidden email]
> Subject: Re: Distributed Searching + unique Ids
>
> Don't do this. Many bits of sharding assume that a uniqueKey
> exists on one and only one shard. Document counts may be
> off. Faceting may be off.  Etc.
>
> Why do you want to duplicate records across shards? What
> benefit is this providing?
>
> This feels like an XY problem...
>
> Best
> Erick
>
> On Fri, Aug 10, 2012 at 1:10 PM, Eric Khoury <[hidden email]> wrote:
>>
>>
>>
>>
>> hey guys, the spec mentions the following:
>>
>>
>>  The unique
>>      key field must be unique across all shards. If docs with
>>      duplicate unique keys are encountered, Solr will make an attempt to return
>>      valid results, but the behavior may be non-deterministic.
>>
>>
>> I'm actually looking to duplicate certain objects across shards, and hoping to have duplicates removed when querying over all shards.If these duplicates have the same ids, will that work?  Will this cause chaos with paging?  I imagine that it might affect faceting as well?thanks,Eric.
Loading...