EarlyTerminatingSortingCollector is expired in lucene 7.2.1

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

EarlyTerminatingSortingCollector is expired in lucene 7.2.1

Yonghui Zhao
Hi,

I find EarlyTerminatingSortingCollector is expired in lucene 7.2.1.

Java doc says Pass trackTotalHits=false to {@link TopFieldCollector}
instead of using this class.

But I find TopFiledCollector can not fully replace
EarlyTerminatingSortingCollector.

In EarlyTerminatingSortingCollector there is a numDocsToCollect parameter
while TopFiledCollector has not.

Usually we want to early terminate collecting in a reasonable big number.

Let's say I want top 10 result from top 1,000,000 sorting doc while real
total hit may be a huge number.

In TopFiledCollector, if I pass numHits as 1,000,000 then the priority
queue size should be 1,000,000 which is a waste of memory.

In EarlyTerminatingSortingCollector we can set numDocsToCollect as 1,000,000,
but the priority queue size of filter collector may be only 10.

Is it right?
Reply | Threaded
Open this post in threaded view
|

Re: EarlyTerminatingSortingCollector is expired in lucene 7.2.1

Adrien Grand
You are right that TopFieldCollector doesn't address some expert use-cases
that EarlyTerminatingSortingCollect used to address. If you need to do
something like this I think it's fine for you to fork
EarlyTerminatingSortingCollector.

Do I get it right that you have two fields A and B and want the top 10
documents sorted by A among the top 1M documents when sorting by B? If yes
then beware that EarlyTerminatingSortingCollector does not exactly do that
since it works on a per-segment basis, so you could get some hits in your
results that are not within the top 1M hits when sorting by B.
Reply | Threaded
Open this post in threaded view
|

Re: EarlyTerminatingSortingCollector is expired in lucene 7.2.1

Yonghui Zhao
Thanks Adrien!
Yes I am aware of this "that EarlyTerminatingSortingCollector does not
exactly do that
since it works on a per-segment basis"

I use EarlyTerminatingSortingCollector for performance when docs hit are
too much.

2018-06-04 19:09 GMT+08:00 Adrien Grand <[hidden email]>:

> You are right that TopFieldCollector doesn't address some expert use-cases
> that EarlyTerminatingSortingCollect used to address. If you need to do
> something like this I think it's fine for you to fork
> EarlyTerminatingSortingCollector.
>
> Do I get it right that you have two fields A and B and want the top 10
> documents sorted by A among the top 1M documents when sorting by B? If yes
> then beware that EarlyTerminatingSortingCollector does not exactly do that
> since it works on a per-segment basis, so you could get some hits in your
> results that are not within the top 1M hits when sorting by B.
>
Reply | Threaded
Open this post in threaded view
|

Re: EarlyTerminatingSortingCollector is expired in lucene 7.2.1

Adrien Grand
Cool. Then my advice would be to fork this collector in your code base.

Le mer. 6 juin 2018 à 12:45, Yonghui Zhao <[hidden email]> a écrit :

> Thanks Adrien!
> Yes I am aware of this "that EarlyTerminatingSortingCollector does not
> exactly do that
> since it works on a per-segment basis"
>
> I use EarlyTerminatingSortingCollector for performance when docs hit are
> too much.
>
> 2018-06-04 19:09 GMT+08:00 Adrien Grand <[hidden email]>:
>
> > You are right that TopFieldCollector doesn't address some expert
> use-cases
> > that EarlyTerminatingSortingCollect used to address. If you need to do
> > something like this I think it's fine for you to fork
> > EarlyTerminatingSortingCollector.
> >
> > Do I get it right that you have two fields A and B and want the top 10
> > documents sorted by A among the top 1M documents when sorting by B? If
> yes
> > then beware that EarlyTerminatingSortingCollector does not exactly do
> that
> > since it works on a per-segment basis, so you could get some hits in your
> > results that are not within the top 1M hits when sorting by B.
> >
>