[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Commented] (SOLR-12366) Avoid SlowAtomicReader.getLiveDocs -- it's slow

JIRA jira@apache.org

    [ https://issues.apache.org/jira/browse/SOLR-12366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16478170#comment-16478170 ]

David Smiley commented on SOLR-12366:
-------------------------------------

* adds new {{SolrIndexSearcher.getLiveDocsBits()}} method that works like {{LeafReader.getLiveDocs}} does.  I don't actually like the name of this method; IMO it ought to be simply {{getLiveDocs}} but that conflicts with an existing one that I think ought to be named something like {{getLiveDocSet}}.  Since these are internal methods I think just rename it but I'm okay with renaming in master.
 * affects SimpleFacets.getFacetTermEnumCounts (classic faceting), FacetFieldProcessorByEnumTermsStream (JSON facets), UnInvertedField, GraphTermsQParser, JoinQParser, SolrIndexSearcher.getFirstMatch
 * In GraphTermsQParser I further noticed the non-SolrIndexSearcher fallback logic was broken as it didn't check for a null liveDocs.  Will we ever even get to this code?  Any way, I decided to replace these many lines with something simpler.

IMO some callers of {{SolrIndexSearcher.getSlowAtomicReader}} should change to use {{MultiFields}} to avoid the temptation to have a LeafReader that has many slow methods.  I made this change in SimpleFacets.getFacetTermEnumCounts.  This could be a follow-up issue.

IMO {{SolrIndexSearcher.getFirstMatch}} should be removed in lieu of \{{lookupId}} so there's less code to maintain.  Admittedly the latter is more verbose but we could add a utility method for callers who don't care about the segment ordinal and only want the global ID.

[~[hidden email]] could you please review?  This touches stuff you have been involved with.

 

> Avoid SlowAtomicReader.getLiveDocs -- it's slow
> -----------------------------------------------
>
>                 Key: SOLR-12366
>                 URL: https://issues.apache.org/jira/browse/SOLR-12366
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public)
>          Components: search
>            Reporter: David Smiley
>            Assignee: David Smiley
>            Priority: Major
>         Attachments: SOLR-12366.patch, SOLR-12366.patch
>
>
> SlowAtomicReader is of course slow, and it's getLiveDocs (based on MultiBits) is slow as it uses a binary search for each lookup.  There are various places in Solr that use SolrIndexSearcher.getSlowAtomicReader and then get the liveDocs.  Most of these places ought to work with SolrIndexSearcher's getLiveDocs method.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]