Incorrect group.ngroups value

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Incorrect group.ngroups value

Bryan Bende
Is there any known issue with using group.ngroups in a distributed Solr
using version 4.8.1 ?

I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing several
queries where ngroups will be more than the actual groups returned in the
response. For example, ngroups will say 5, but then there will be 3 groups
in the response. It is not happening on all queries, only some.
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect group.ngroups value

jim ferenczi
Hi Bryan,
This is a known limitations of the grouping.
https://wiki.apache.org/solr/FieldCollapsing#RequestParameters

group.ngroups:


*WARNING: If this parameter is set to true on a sharded environment, all
the documents that belong to the same group have to be located in the same
shard, otherwise the count will be incorrect. If you are using SolrCloud
<https://wiki.apache.org/solr/SolrCloud>, consider using "custom hashing"*

Cheers,
Jim



2014-08-21 21:44 GMT+02:00 Bryan Bende <[hidden email]>:

> Is there any known issue with using group.ngroups in a distributed Solr
> using version 4.8.1 ?
>
> I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing several
> queries where ngroups will be more than the actual groups returned in the
> response. For example, ngroups will say 5, but then there will be 3 groups
> in the response. It is not happening on all queries, only some.
>
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect group.ngroups value

Bryan Bende
Thanks Jim.

We've been using the composite id approach where we put group value as the
leading portion of the id (i.e. groupValue!documentid), so I was expecting
all of the documents for a given group to be in the same shard, but at
least this gives me something to look into. I'm still suspicious of
something changing between 4.6.1 and 4.8.1, because we've had the grouping
implemented this way for a while, and only on the exact day we upgraded did
someone bring this problem forward. I will keep investigating, thanks.


On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi <[hidden email]>
wrote:

> Hi Bryan,
> This is a known limitations of the grouping.
> https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
>
> group.ngroups:
>
>
> *WARNING: If this parameter is set to true on a sharded environment, all
> the documents that belong to the same group have to be located in the same
> shard, otherwise the count will be incorrect. If you are using SolrCloud
> <https://wiki.apache.org/solr/SolrCloud>, consider using "custom hashing"*
>
> Cheers,
> Jim
>
>
>
> 2014-08-21 21:44 GMT+02:00 Bryan Bende <[hidden email]>:
>
> > Is there any known issue with using group.ngroups in a distributed Solr
> > using version 4.8.1 ?
> >
> > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> several
> > queries where ngroups will be more than the actual groups returned in the
> > response. For example, ngroups will say 5, but then there will be 3
> groups
> > in the response. It is not happening on all queries, only some.
> >
>
Reply | Threaded
Open this post in threaded view
|

RE: Incorrect group.ngroups value

Andrew Shumway
The Co-location section of this document  http://searchhub.org/2013/06/13/solr-cloud-document-routing/ might be of interest to you.  It mentions the need for using Solr Cloud routing to group documents in the same core so that grouping can work properly.

--Andrew Shumway


-----Original Message-----
From: Bryan Bende [mailto:[hidden email]]
Sent: Friday, August 22, 2014 9:01 AM
To: [hidden email]
Subject: Re: Incorrect group.ngroups value

Thanks Jim.

We've been using the composite id approach where we put group value as the leading portion of the id (i.e. groupValue!documentid), so I was expecting all of the documents for a given group to be in the same shard, but at least this gives me something to look into. I'm still suspicious of something changing between 4.6.1 and 4.8.1, because we've had the grouping implemented this way for a while, and only on the exact day we upgraded did someone bring this problem forward. I will keep investigating, thanks.


On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi <[hidden email]>
wrote:

> Hi Bryan,
> This is a known limitations of the grouping.
> https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
>
> group.ngroups:
>
>
> *WARNING: If this parameter is set to true on a sharded environment,
> all the documents that belong to the same group have to be located in
> the same shard, otherwise the count will be incorrect. If you are
> using SolrCloud <https://wiki.apache.org/solr/SolrCloud>, consider
> using "custom hashing"*
>
> Cheers,
> Jim
>
>
>
> 2014-08-21 21:44 GMT+02:00 Bryan Bende <[hidden email]>:
>
> > Is there any known issue with using group.ngroups in a distributed
> > Solr using version 4.8.1 ?
> >
> > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> several
> > queries where ngroups will be more than the actual groups returned
> > in the response. For example, ngroups will say 5, but then there
> > will be 3
> groups
> > in the response. It is not happening on all queries, only some.
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect group.ngroups value

Bryan Bende
Turns out there are in fact documents for the same group in different
shards which must be causing this problem. It looks like we have a slight
flaw in how we were trying to use the composite id routing.

Thanks for putting me down the right path.


On Fri, Aug 22, 2014 at 11:14 AM, Andrew Shumway <[hidden email]>
wrote:

> The Co-location section of this document
> http://searchhub.org/2013/06/13/solr-cloud-document-routing/ might be of
> interest to you.  It mentions the need for using Solr Cloud routing to
> group documents in the same core so that grouping can work properly.
>
> --Andrew Shumway
>
>
> -----Original Message-----
> From: Bryan Bende [mailto:[hidden email]]
> Sent: Friday, August 22, 2014 9:01 AM
> To: [hidden email]
> Subject: Re: Incorrect group.ngroups value
>
> Thanks Jim.
>
> We've been using the composite id approach where we put group value as the
> leading portion of the id (i.e. groupValue!documentid), so I was expecting
> all of the documents for a given group to be in the same shard, but at
> least this gives me something to look into. I'm still suspicious of
> something changing between 4.6.1 and 4.8.1, because we've had the grouping
> implemented this way for a while, and only on the exact day we upgraded did
> someone bring this problem forward. I will keep investigating, thanks.
>
>
> On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi <[hidden email]>
> wrote:
>
> > Hi Bryan,
> > This is a known limitations of the grouping.
> > https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
> >
> > group.ngroups:
> >
> >
> > *WARNING: If this parameter is set to true on a sharded environment,
> > all the documents that belong to the same group have to be located in
> > the same shard, otherwise the count will be incorrect. If you are
> > using SolrCloud <https://wiki.apache.org/solr/SolrCloud>, consider
> > using "custom hashing"*
> >
> > Cheers,
> > Jim
> >
> >
> >
> > 2014-08-21 21:44 GMT+02:00 Bryan Bende <[hidden email]>:
> >
> > > Is there any known issue with using group.ngroups in a distributed
> > > Solr using version 4.8.1 ?
> > >
> > > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> > several
> > > queries where ngroups will be more than the actual groups returned
> > > in the response. For example, ngroups will say 5, but then there
> > > will be 3
> > groups
> > > in the response. It is not happening on all queries, only some.
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Incorrect group.ngroups value

alxsss
In reply to this post by Andrew Shumway
Hi,

From the discussion it is not clear if this is a fixable bug in the case of documents being in different shards. If this is fixable could someone please direct me to the part of the code so that I could investigate.

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Andrew Shumway <[hidden email]>
To: solr-user <[hidden email]>
Sent: Fri, Aug 22, 2014 8:15 am
Subject: RE: Incorrect group.ngroups value


The Co-location section of this document  http://searchhub.org/2013/06/13/solr-cloud-document-routing/ 
might be of interest to you.  It mentions the need for using Solr Cloud routing
to group documents in the same core so that grouping can work properly.

--Andrew Shumway


-----Original Message-----
From: Bryan Bende [mailto:[hidden email]]
Sent: Friday, August 22, 2014 9:01 AM
To: [hidden email]
Subject: Re: Incorrect group.ngroups value

Thanks Jim.

We've been using the composite id approach where we put group value as the
leading portion of the id (i.e. groupValue!documentid), so I was expecting all
of the documents for a given group to be in the same shard, but at least this
gives me something to look into. I'm still suspicious of something changing
between 4.6.1 and 4.8.1, because we've had the grouping implemented this way for
a while, and only on the exact day we upgraded did someone bring this problem
forward. I will keep investigating, thanks.


On Fri, Aug 22, 2014 at 9:18 AM, jim ferenczi <[hidden email]>
wrote:

> Hi Bryan,
> This is a known limitations of the grouping.
> https://wiki.apache.org/solr/FieldCollapsing#RequestParameters
>
> group.ngroups:
>
>
> *WARNING: If this parameter is set to true on a sharded environment,
> all the documents that belong to the same group have to be located in
> the same shard, otherwise the count will be incorrect. If you are
> using SolrCloud <https://wiki.apache.org/solr/SolrCloud>, consider
> using "custom hashing"*
>
> Cheers,
> Jim
>
>
>
> 2014-08-21 21:44 GMT+02:00 Bryan Bende <[hidden email]>:
>
> > Is there any known issue with using group.ngroups in a distributed
> > Solr using version 4.8.1 ?
> >
> > I recently upgraded a cluster from 4.6.1 to 4.8.1, and I'm noticing
> several
> > queries where ngroups will be more than the actual groups returned
> > in the response. For example, ngroups will say 5, but then there
> > will be 3
> groups
> > in the response. It is not happening on all queries, only some.
> >
>