[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view

[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478170#comment-16478170 ]

David Smiley commented on SOLR-12366:

* adds new {{SolrIndexSearcher.getLiveDocsBits()}} method that works like {{LeafReader.getLiveDocs}} does.  I don't actually like the name of this method; IMO it ought to be simply {{getLiveDocs}} but that conflicts with an existing one that I think ought to be named something like {{getLiveDocSet}}.  Since these are internal methods I think just rename it but I'm okay with renaming in master.
 * affects SimpleFacets.getFacetTermEnumCounts (classic faceting), FacetFieldProcessorByEnumTermsStream (JSON facets), UnInvertedField, GraphTermsQParser, JoinQParser, SolrIndexSearcher.getFirstMatch
 * In GraphTermsQParser I further noticed the non-SolrIndexSearcher fallback logic was broken as it didn't check for a null liveDocs.  Will we ever even get to this code?  Any way, I decided to replace these many lines with something simpler.

IMO some callers of {{SolrIndexSearcher.getSlowAtomicReader}} should change to use {{MultiFields}} to avoid the temptation to have a LeafReader that has many slow methods.  I made this change in SimpleFacets.getFacetTermEnumCounts.  This could be a follow-up issue.

IMO {{SolrIndexSearcher.getFirstMatch}} should be removed in lieu of \{{lookupId}} so there's less code to maintain.  Admittedly the latter is more verbose but we could add a utility method for callers who don't care about the segment ordinal and only want the global ID.

[~[hidden email]] could you please review?  This touches stuff you have been involved with.


> Avoid SlowAtomicReader.getLiveDocs -- it's slow
> -----------------------------------------------
>                 Key: SOLR-12366
>                 URL: https://issues.apache.org/jira/browse/SOLR-12366
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: search
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>         Attachments: SOLR-12366.patch, SOLR-12366.patch
> SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) is slow as it uses a binary search for each lookup.  There are various places in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the liveDocs.  Most of these places ought to work with SolrIndexSearcher's getLiveDocs method.

This message was sent by Atlassian JIRA

To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]