[jira] Created: (LUCENE-2724) BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (LUCENE-2724) BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets

Soren Daugaard (Jira)
BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets
----------------------------------------------------------------------

                 Key: LUCENE-2724
                 URL: https://issues.apache.org/jira/browse/LUCENE-2724
             Project: Lucene - Java
          Issue Type: Improvement
          Components: contrib/*
    Affects Versions: 3.0.2
            Reporter: Fatih Uzdilli


In line 65 of the BooleanFilter class there is an optimization for OpenBitSets, but i miss an optimization in line 62.

I would replace the existing line:
{code}
res = new OpenBitSetDISI(getDISI(shouldFilters, i, reader), reader.maxDoc());
{code}

with following code:
{code}
DocIdSet docIdSet = shouldFilters.get(i).getDocIdSet(reader);
if(docIdSet instanceof OpenBitSet) {
        res = new OpenBitSetDISI(reader.maxDoc());
        res.or((OpenBitSet) docIdSet);
} else {
        res = new OpenBitSetDISI(docIdSet.iterator(), reader.maxDoc());
}
{code}

Same for line 78 and 95, adjusted for not and must filters.

That leads to an up to 5 times slower AND-combination in my test, where i had two filters to be AND-combined returning each a cached OpenBitSet, one with a cardinality of 15000 and the other with a cardinality of 13000. The result had a cardinality of 8300. Thats important if you do that 1000 times with a lot more documents.

The same must be also done for ChainedFilter in the method initialResult(..).

Also, the getDISI method in the BooleanFilter must be replaced by a getDocIdSet(..) method. This is useful because in line 87 the docIdSet is retrieved and in line 92 again when it is not of type OpenBitSet. This may also lead to a performance issue if the getDocIdSet method of a sub filter is not super fast.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (LUCENE-2724) BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets

Soren Daugaard (Jira)

    [ https://issues.apache.org/jira/browse/LUCENE-2724?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12925021#action_12925021 ]

Paul Elschot commented on LUCENE-2724:
--------------------------------------

Indeed that should speed up things.

The first case in the replacing code is actually only a copy of the underlying OpenBitSet, so perhaps it could be simplified to do just that.

And some common code for this between ChainedFilter and BooleanFilter could perhaps be moved to OpenBitSetDISI.


> BooleanFilter and ChainedFilter miss to fully optimize for OpenBitSets
> ----------------------------------------------------------------------
>
>                 Key: LUCENE-2724
>                 URL: https://issues.apache.org/jira/browse/LUCENE-2724
>             Project: Lucene - Java
>          Issue Type: Improvement
>          Components: contrib/*
>    Affects Versions: 3.0.2
>            Reporter: Fatih Uzdilli
>
> In line 65 of the BooleanFilter class there is an optimization for OpenBitSets, but i miss an optimization in line 62.
> I would replace the existing line:
> {code}
> res = new OpenBitSetDISI(getDISI(shouldFilters, i, reader), reader.maxDoc());
> {code}
> with following code:
> {code}
> DocIdSet docIdSet = shouldFilters.get(i).getDocIdSet(reader);
> if(docIdSet instanceof OpenBitSet) {
> res = new OpenBitSetDISI(reader.maxDoc());
> res.or((OpenBitSet) docIdSet);
> } else {
> res = new OpenBitSetDISI(docIdSet.iterator(), reader.maxDoc());
> }
> {code}
> Same for line 78 and 95, adjusted for not and must filters.
> That leads to an up to 5 times slower AND-combination in my test, where i had two filters to be AND-combined returning each a cached OpenBitSet, one with a cardinality of 15000 and the other with a cardinality of 13000. The result had a cardinality of 8300. Thats important if you do that 1000 times with a lot more documents.
> The same must be also done for ChainedFilter in the method initialResult(..).
> Also, the getDISI method in the BooleanFilter must be replaced by a getDocIdSet(..) method. This is useful because in line 87 the docIdSet is retrieved and in line 92 again when it is not of type OpenBitSet. This may also lead to a performance issue if the getDocIdSet method of a sub filter is not super fast.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]