Quantcast

Grouping and result pagination

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Grouping and result pagination

Shawn Heisey-2
We use pagination (start/rows) frequently with our queries.  Nothing
unusual there.

Now we have need to use grouping with a request like this, for a
set-mode search, where only one document from each set is returned:

http://idxb1.REDACTED.com:8981/solr/ncmain/lbcheck?q=*:*&group=true&group.field=set_name&group.sort=set_lead%20desc&group.limit=1&rows=50

We've worked through most of the problems encountered with this idea.
The first page of results works perfectly.

The remaining problem is that I cannot seem to paginate -- set the start
value to 50, 100, etc.  I found some information saying that
group.ngroups=true is required for pagination, so I added that.  I have
found that occasionally I can load page two (rows=50&start=50), but that
*most* of the time, I can't even get page two to load, and further pages
have never worked.  The response contains no documents.

The index is distributed (sharded), but not running SolrCloud.

The server where I am trying this is running a SNAPSHOT build of 4.9.  I
haven't had an opportunity yet to try a newer version -- we don't have
newer versions on any of the machines for this index.  I can only
upgrade as far as 5.3, because that's as far as we can go with a
third-party plugin we are using.

I found the following issue, which says it was fixed before 4.0 was
released:

https://issues.apache.org/jira/browse/SOLR-2207

Does anyone know whether pagination with grouping is expected to work,
and if so, how to do it?

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grouping and result pagination

Erick Erickson
I think the answer is that you have to co-locate the docs with the
same value you're grouping by on the same shard whether in SolrCloud
or not...

Hmmm: from: https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats

"group.ngroups and group.facet require that all documents in each
group must be co-located on the same shard in order for accurate
counts to be returned."

Best,
Erick

On Fri, Mar 17, 2017 at 8:00 AM, Shawn Heisey <[hidden email]> wrote:

> We use pagination (start/rows) frequently with our queries.  Nothing
> unusual there.
>
> Now we have need to use grouping with a request like this, for a
> set-mode search, where only one document from each set is returned:
>
> http://idxb1.REDACTED.com:8981/solr/ncmain/lbcheck?q=*:*&group=true&group.field=set_name&group.sort=set_lead%20desc&group.limit=1&rows=50
>
> We've worked through most of the problems encountered with this idea.
> The first page of results works perfectly.
>
> The remaining problem is that I cannot seem to paginate -- set the start
> value to 50, 100, etc.  I found some information saying that
> group.ngroups=true is required for pagination, so I added that.  I have
> found that occasionally I can load page two (rows=50&start=50), but that
> *most* of the time, I can't even get page two to load, and further pages
> have never worked.  The response contains no documents.
>
> The index is distributed (sharded), but not running SolrCloud.
>
> The server where I am trying this is running a SNAPSHOT build of 4.9.  I
> haven't had an opportunity yet to try a newer version -- we don't have
> newer versions on any of the machines for this index.  I can only
> upgrade as far as 5.3, because that's as far as we can go with a
> third-party plugin we are using.
>
> I found the following issue, which says it was fixed before 4.0 was
> released:
>
> https://issues.apache.org/jira/browse/SOLR-2207
>
> Does anyone know whether pagination with grouping is expected to work,
> and if so, how to do it?
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grouping and result pagination

Shawn Heisey-2
On 3/17/2017 9:07 AM, Erick Erickson wrote:
> I think the answer is that you have to co-locate the docs with the
> same value you're grouping by on the same shard whether in SolrCloud
> or not...
>
> Hmmm: from: https://cwiki.apache.org/confluence/display/solr/Result+Grouping#ResultGrouping-DistributedResultGroupingCaveats
>
> "group.ngroups and group.facet require that all documents in each
> group must be co-located on the same shard in order for accurate
> counts to be returned."

That is not how things work right now.  The index has 170 million
documents in it, split into six large cold shards and a very small hot
shard.  The routing I'm using for the cold shards is the CRC32 hash of
the database primary key (different field than Solr's uniqueKey) run
through a mod function to determine shard number (0-5).  The hash/mod
calculation is done in the MySQL query.

Is pagination of a grouped query impossible with this index?

I suppose it's theoretically possible that I could hash the set name
instead of the DB primary key which would result in docs from a set
being co-located.  Would that help?  My worry with that approach is that
the cold shards would no longer have relatively uniform sizes.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grouping and result pagination

Shawn Heisey-2
On 3/17/2017 9:26 AM, Shawn Heisey wrote:
> On 3/17/2017 9:07 AM, Erick Erickson wrote:
>> "group.ngroups and group.facet require that all documents in each
>> group must be co-located on the same shard in order for accurate
>> counts to be returned."
> That is not how things work right now. The index has 170 million
> documents in it, split into six large cold shards and a very small hot
> shard.

Restating the original problem:  I cannot paginate through the groups in
a grouped query.  The first page works, subsequent pages do not.  I have
a distributed index.  Co-locating documents in the same group onto the
same shard is going to require a complete redesign of indexing.  It's
something that could be done, but not without a LOT of work.

Should it be considered a bug that this doesn't work at all?  I call it
a bug.  I'd be OK with being told that performance of paginated queries
with grouping is terrible on a distributed index, but I'd like it to at
least function.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Grouping and result pagination

Shawn Heisey-2
On 3/21/2017 10:34 AM, Shawn Heisey wrote:
> Restating the original problem:  I cannot paginate through the groups
> in a grouped query.  The first page works, subsequent pages do not.  I
> have a distributed index.  Co-locating documents in the same group
> onto the same shard is going to require a complete redesign of
> indexing.  It's something that could be done, but not without a LOT of
> work.

Strange thing ... now when I try a paginated query, it works.  I have no
idea what I was doing differently before when it wasn't working.

solr-impl version:
4.9-SNAPSHOT 1680667 - solr - 2015-05-20 14:23:11

I have discovered that I can't get the query to work at all on 6.3.0
with my schema even without pagination.  I've encountered this bug again:

https://issues.apache.org/jira/browse/SOLR-8088

Thanks,
Shawn

Loading...