[jira] [Commented] (SOLR-11733) json.facet refinement fails to bubble up some long tail (overrequested) terms?

Previous Topic Next Topic
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (SOLR-11733) json.facet refinement fails to bubble up some long tail (overrequested) terms?

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-11733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16281258#comment-16281258 ]

Yonik Seeley commented on SOLR-11733:

I mentioned in SOLR-11729 the refinement algorithm being different (and for a single-level facet field, simpler).
It can be explained as:
1) find buckets to return as if you weren't doing refinement
2) for those buckets, make sure all shards have contributed to the statistics

I started with the simplest for obvious reasons... to get something out.  From a correctness POV, smarter faceting is equivalent to increasing the overrequest amount... we still can't make guarantees.
We could easily implement a mode for some field facets that does the "could this possibly be in the top N" logic to consider more buckets in the first phase... but only if it's not a sub-facet of another partial facet (a facet with something like a limit).
If a partial facet is a sub-facet of another partial-facet, the logic of what one can exclude seems to get harder, and then sub-facets need to add new candidate buckets to parent facets (I think? need to think about it more... but I guess that's part of my point ;-).  Good ideas perhaps, but definitely more difficult to implement.

Other refinement implementations could range all the way to "exact"... guarantee that no buckets are missed, and there's more than one way to go about that too.

> json.facet refinement fails to bubble up some long tail (overrequested) terms?
> ------------------------------------------------------------------------------
>                 Key: SOLR-11733
>                 URL: https://issues.apache.org/jira/browse/SOLR-11733
>             Project: Solr
>          Issue Type: Bug
>      Security Level: Public(Default Security Level. Issues are Public)
>            Reporter: Hoss Man
> Something wonky is happening with {{json.facet}} refinement.
> "Long Tail" terms that may not be in the "top n" on every shard, but are in the "top n + overrequest" for at least 1 shard aren't getting refined and included in the aggragated response in some cases.
> I don't understand the code enough to explain this, but I have some steps to reproduce that i'll post in a comment shortly

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]