Re:the number of docs in each group depends on rows

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

Re:the number of docs in each group depends on rows

Diego Ceccarelli (BLOOMBERG/ LONDON)
Hello,

I'm not sure 100% but I think that if you have multiple shards the number of docs matched in each group is *not* guarantee to be exact. Increasing the rows will increase the amount of partial information that each shard sends to the federator and make the number more precise.

For exact counts you might need one shard OR  to make sure that all the documents in the same group are in the same shard by using document routing via composite keys [1].

Thinking about that, it should be possible to fix grouping to compute the exact numbers on request...

cheers,
Diego


[1] https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#shards-and-indexing-data-in-solrcloud


From: [hidden email] At: 05/04/18 07:53:41To:  [hidden email]
Subject: the number of docs in each group depends on rows

Hi,
We used Solr Cloud 7.1.0(3 nodes, 3 shards with 2 replicas). When we used
group query, we found that the number of docs in each group depends on the
rows number(group number).

difference:
<http://lucene.472066.n3.nabble.com/file/t494000/difference.jpeg>

when the rows bigger then 5, the return docs are correct and stable, for the
rest, the number of docs is smaller than the actual result.

Could you please explain why and give me some suggestion about how to decide
the rows number?


--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Reply | Threaded
Open this post in threaded view
|

Re: the number of docs in each group depends on rows

WebsterHomer
We do group queries with Solrcloud all the time. You must set up your
collection so that all values for the field you are grouping on are in the
same shard.
This can easily be done with the composite router. Basically you do this be
creating a unique field that contains the field to group on, with your
unique id:
<groupfield>!<uniqueid>

See
https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#ShardsandIndexingDatainSolrCloud-DocumentRouting
for more details.

Solrcloud does limit you to grouping on fields, you cannot group on
function queries

On Fri, May 4, 2018 at 6:37 AM, Diego Ceccarelli (BLOOMBERG/ LONDON) <
[hidden email]> wrote:

> Hello,
>
> I'm not sure 100% but I think that if you have multiple shards the number
> of docs matched in each group is *not* guarantee to be exact. Increasing
> the rows will increase the amount of partial information that each shard
> sends to the federator and make the number more precise.
>
> For exact counts you might need one shard OR  to make sure that all the
> documents in the same group are in the same shard by using document routing
> via composite keys [1].
>
> Thinking about that, it should be possible to fix grouping to compute the
> exact numbers on request...
>
> cheers,
> Diego
>
>
> [1] https://lucene.apache.org/solr/guide/6_6/shards-and-
> indexing-data-in-solrcloud.html#shards-and-indexing-data-in-solrcloud
>
>
> From: [hidden email] At: 05/04/18 07:53:41To:
> [hidden email]
> Subject: the number of docs in each group depends on rows
>
> Hi,
> We used Solr Cloud 7.1.0(3 nodes, 3 shards with 2 replicas). When we used
> group query, we found that the number of docs in each group depends on the
> rows number(group number).
>
> difference:
> <http://lucene.472066.n3.nabble.com/file/t494000/difference.jpeg>
>
> when the rows bigger then 5, the return docs are correct and stable, for
> the
> rest, the number of docs is smaller than the actual result.
>
> Could you please explain why and give me some suggestion about how to
> decide
> the rows number?
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html
>
>
>

--


This message and any attachment are confidential and may be
privileged or
otherwise protected from disclosure. If you are not the intended
recipient,
you must not copy this message or attachment or disclose the
contents to
any other person. If you have received this transmission in error,
please
notify the sender immediately and delete the message and any attachment

from your system. Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do
not accept liability for any omissions or errors in this
message which may
arise as a result of E-Mail-transmission or for damages
resulting from any
unauthorized changes of the content of this message and
any attachment thereto.
Merck KGaA, Darmstadt, Germany and any of its
subsidiaries do not guarantee
that this message is free of viruses and does
not accept liability for any
damages caused by any virus transmitted
therewith.



Click http://www.emdgroup.com/disclaimer 
<http://www.emdgroup.com/disclaimer> to access the
German, French, Spanish
and Portuguese versions of this disclaimer.
Reply | Threaded
Open this post in threaded view
|

RE: Re:the number of docs in each group depends on rows

Ian Caldwell
In reply to this post by Diego Ceccarelli (BLOOMBERG/ LONDON)
When I looked at this in solr 5.5.3 The second phase of the query was only sent to the shards that returned documents in the first phase, the problem is that one shard may contain matching documents in a group but ranked outside the top N results.

Fatduo this solution won't help you unless you are looking at changing some solr code, but is to help with Diego point that maby this could be fixed(as a starting point to look at as the code may have changed in 7.0).

We changed the grouping code to search all shards on the second phase. (I think that this was all that was needed but we changed grouping to be two level so lots of change is grouping code)
In the 5.5.3 code base we changed the method construceRequest(ResponseBuilder rb) in TopGroupsShardRequestFactory to always call createRequestForAllShards(rb)


Ian
NLA

-----Original Message-----
From: Diego Ceccarelli (BLOOMBERG/ LONDON) <[hidden email]>
Sent: Friday, 4 May 2018 9:37 PM
To: [hidden email]
Subject: Re:the number of docs in each group depends on rows

Hello,

I'm not sure 100% but I think that if you have multiple shards the number of docs matched in each group is *not* guarantee to be exact. Increasing the rows will increase the amount of partial information that each shard sends to the federator and make the number more precise.

For exact counts you might need one shard OR  to make sure that all the documents in the same group are in the same shard by using document routing via composite keys [1].

Thinking about that, it should be possible to fix grouping to compute the exact numbers on request...

cheers,
Diego


[1] https://lucene.apache.org/solr/guide/6_6/shards-and-indexing-data-in-solrcloud.html#shards-and-indexing-data-in-solrcloud


From: [hidden email] At: 05/04/18 07:53:41To:  [hidden email]
Subject: the number of docs in each group depends on rows

Hi,
We used Solr Cloud 7.1.0(3 nodes, 3 shards with 2 replicas). When we used group query, we found that the number of docs in each group depends on the rows number(group number).

difference:
<http://lucene.472066.n3.nabble.com/file/t494000/difference.jpeg>

when the rows bigger then 5, the return docs are correct and stable, for the rest, the number of docs is smaller than the actual result.

Could you please explain why and give me some suggestion about how to decide the rows number?


--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html