[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-13132) Improve JSON "terms" facet performance when sorted by relatedness

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-13132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16739938#comment-16739938 ]

Lucene/Solr QA commented on SOLR-13132:
---------------------------------------

| (x) *{color:red}-1 overall{color}* |
\\
\\
|| Vote || Subsystem || Runtime || Comment ||
|| || || || {color:brown} Prechecks {color} ||
| {color:red}-1{color} | {color:red} test4tests {color} | {color:red}  0m  0s{color} | {color:red} The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color} |
|| || || || {color:brown} master Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m 11s{color} | {color:green} master passed {color} |
|| || || || {color:brown} Patch Compile Tests {color} ||
| {color:green}+1{color} | {color:green} compile {color} | {color:green}  3m  9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} javac {color} | {color:green}  3m  9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Release audit (RAT) {color} | {color:green}  3m  9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Check forbidden APIs {color} | {color:green}  3m  9s{color} | {color:green} the patch passed {color} |
| {color:green}+1{color} | {color:green} Validate source patterns {color} | {color:green}  3m  9s{color} | {color:green} the patch passed {color} |
|| || || || {color:brown} Other Tests {color} ||
| {color:green}+1{color} | {color:green} unit {color} | {color:green} 63m 40s{color} | {color:green} core in the patch passed. {color} |
| {color:black}{color} | {color:black} {color} | {color:black} 75m  3s{color} | {color:black} {color} |
\\
\\
|| Subsystem || Report/Notes ||
| JIRA Issue | SOLR-13132 |
| JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12954484/SOLR-13132.patch |
| Optional Tests |  compile  javac  unit  ratsources  checkforbiddenapis  validatesourcepatterns  |
| uname | Linux lucene1-us-west 4.4.0-137-generic #163~14.04.1-Ubuntu SMP Mon Sep 24 17:14:57 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux |
| Build tool | ant |
| Personality | /home/jenkins/jenkins-slave/workspace/PreCommit-SOLR-Build/sourcedir/dev-tools/test-patch/lucene-solr-yetus-personality.sh |
| git revision | master / 4d23ca2 |
| ant | version: Apache Ant(TM) version 1.9.3 compiled on July 24 2018 |
| Default Java | 1.8.0_191 |
|  Test Results | https://builds.apache.org/job/PreCommit-SOLR-Build/261/testReport/ |
| modules | C: solr/core U: solr/core |
| Console output | https://builds.apache.org/job/PreCommit-SOLR-Build/261/console |
| Powered by | Apache Yetus 0.7.0   http://yetus.apache.org |


This message was automatically generated.



> Improve JSON "terms" facet performance when sorted by relatedness
> ------------------------------------------------------------------
>
>                 Key: SOLR-13132
>                 URL: https://issues.apache.org/jira/browse/SOLR-13132
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: Facet Module
>    Affects Versions: 7.4, master (9.0)
>            Reporter: Michael Gibney
>            Priority: Major
>         Attachments: SOLR-13132.patch
>
>
> When sorting buckets by {{relatedness}}, JSON "terms" facet must calculate {{relatedness}} for every term. 
> The current implementation uses a standard uninverted approach (either {{docValues}} or {{UnInvertedField}}) to get facet counts over the domain base docSet, and then uses that initial pass as a pre-filter for a second-pass, inverted approach of fetching docSets for each relevant term (i.e., {{count > minCount}}?) and calculating intersection size of those sets with the domain base docSet.
> Over high-cardinality fields, the overhead of per-term docSet creation and set intersection operations increases request latency to the point where relatedness sort may not be usable in practice (for my use case, even after applying the patch for SOLR-13108, for a field with ~220k unique terms per core, QTime for high-cardinality domain docSets were, e.g.: cardinality 1816684=9000ms, cardinality 5032902=18000ms).
> The attached patch brings the above example QTimes down to a manageable ~300ms and ~250ms respectively. The approach calculates uninverted facet counts over domain base, foreground, and background docSets in parallel in a single pass. This allows us to take advantage of the efficiencies built into the standard uninverted {{FacetFieldProcessorByArray[DV|UIF]}}), and avoids the per-term docSet creation and set intersection overhead.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]