Legacy filter strategy in Lucene 6.0

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Legacy filter strategy in Lucene 6.0

alex stark
As FilteredQuery are removed in Lucene 6.0, we should use boolean query to do the filtering. How about the legacy filter strategy such as LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? What is the current filter strategy?  Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Legacy filter strategy in Lucene 6.0

Adrien Grand
Hi Alex,

These strategies still exist internally, but BooleanQuery decides which one
to use automatically based on the cost API (cheaper clauses run first) and
whether sub clauses produce bitset-based or postings-based iterators.

Le mer. 8 août 2018 à 09:46, alex stark <[hidden email]> a écrit :

> As FilteredQuery are removed in Lucene 6.0, we should use boolean query to
> do the filtering. How about the legacy filter strategy such as
> LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? What is the
> current filter strategy?  Thanks,
Reply | Threaded
Open this post in threaded view
|

Re: Legacy filter strategy in Lucene 6.0

alex stark
Thanks Adrien, I want to filter out docs base on conditions which stored in doc values (those conditions are unselective ranges which is not appropriate to put into reverse index), so I plan to use some selective term conditions to do first round search and then filter in second phase.  I see there is two phase iterator, but I did not find how to use it. Is it a appropriate scenario to use two phase iterator? or It is better to do it in a collector? Is there any guide of two phase iterator? Best Regards   ---- On Wed, 08 Aug 2018 16:08:39 +0800 Adrien Grand <[hidden email]> wrote ---- Hi Alex, These strategies still exist internally, but BooleanQuery decides which one to use automatically based on the cost API (cheaper clauses run first) and whether sub clauses produce bitset-based or postings-based iterators. Le mer. 8 août 2018 à 09:46, alex stark <[hidden email]> a écrit : > As FilteredQuery are removed in Lucene 6.0, we should use boolean query to > do the filtering. How about the legacy filter strategy such as > LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? What is the > current filter strategy? Thanks,
Reply | Threaded
Open this post in threaded view
|

RE: Legacy filter strategy in Lucene 6.0

Uwe Schindler
Hi,

IMHO: I'd split the whole code into a BooleanQuery with two filter clauses. The reverse index based condition (term condition, e.g., TermInSetQuery) gets added as a Occur.FILTER and the DocValues condition is a separate Occur.FILTER. If Lucene executes such a query, it would use the more specific condition (based on cost) to lead the execution, which should be the terms condition. The docvalues condition is then only checked for matches of the first.

But you can still go and implement the two-phase iterator, but I'd not do that.

Uwe

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
http://www.thetaphi.de
eMail: [hidden email]

> -----Original Message-----
> From: alex stark <[hidden email]>
> Sent: Thursday, August 9, 2018 9:12 AM
> To: java-user <[hidden email]>
> Cc: [hidden email]
> Subject: Re: Legacy filter strategy in Lucene 6.0
>
> Thanks Adrien, I want to filter out docs base on conditions which stored in
> doc values (those conditions are unselective ranges which is not appropriate
> to put into reverse index), so I plan to use some selective term conditions to
> do first round search and then filter in second phase.  I see there is two
> phase iterator, but I did not find how to use it. Is it a appropriate scenario to
> use two phase iterator? or It is better to do it in a collector? Is there any
> guide of two phase iterator? Best Regards   ---- On Wed, 08 Aug 2018
> 16:08:39 +0800 Adrien Grand <[hidden email]> wrote ---- Hi Alex, These
> strategies still exist internally, but BooleanQuery decides which one to use
> automatically based on the cost API (cheaper clauses run first) and whether
> sub clauses produce bitset-based or postings-based iterators. Le mer. 8 août
> 2018 à 09:46, alex stark <[hidden email]> a écrit : > As FilteredQuery
> are removed in Lucene 6.0, we should use boolean query to > do the
> filtering. How about the legacy filter strategy such as >
> LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY?
> What is the > current filter strategy? Thanks,


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Legacy filter strategy in Lucene 6.0

alex stark
Thanks Uwe, I think you are recommending IndexOrDocValuesQuery/DocValuesRangeQuery, and the articles by Adrien,  https://www.elastic.co/blog/better-query-planning-for-range-queries-in-elasticsearch It looks promising for my requirement, I will try on that. ---- On Thu, 09 Aug 2018 16:04:27 +0800 Uwe Schindler <[hidden email]> wrote ---- Hi, IMHO: I'd split the whole code into a BooleanQuery with two filter clauses. The reverse index based condition (term condition, e.g., TermInSetQuery) gets added as a Occur.FILTER and the DocValues condition is a separate Occur.FILTER. If Lucene executes such a query, it would use the more specific condition (based on cost) to lead the execution, which should be the terms condition. The docvalues condition is then only checked for matches of the first. But you can still go and implement the two-phase iterator, but I'd not do that. Uwe ----- Uwe Schindler Achterdiek 19, D-28357 Bremen http://www.thetaphi.de eMail: [hidden email] > -----Original Message----- > From: alex stark <[hidden email]> > Sent: Thursday, August 9, 2018 9:12 AM > To: java-user <[hidden email]> > Cc: [hidden email] > Subject: Re: Legacy filter strategy in Lucene 6.0 > > Thanks Adrien, I want to filter out docs base on conditions which stored in > doc values (those conditions are unselective ranges which is not appropriate > to put into reverse index), so I plan to use some selective term conditions to > do first round search and then filter in second phase. I see there is two > phase iterator, but I did not find how to use it. Is it a appropriate scenario to > use two phase iterator? or It is better to do it in a collector? Is there any > guide of two phase iterator? Best Regards ---- On Wed, 08 Aug 2018 > 16:08:39 +0800 Adrien Grand <[hidden email]> wrote ---- Hi Alex, These > strategies still exist internally, but BooleanQuery decides which one to use > automatically based on the cost API (cheaper clauses run first) and whether > sub clauses produce bitset-based or postings-based iterators. Le mer. 8 août > 2018 à 09:46, alex stark <[hidden email]> a écrit : > As FilteredQuery > are removed in Lucene 6.0, we should use boolean query to > do the > filtering. How about the legacy filter strategy such as > > LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? > What is the > current filter strategy? Thanks, --------------------------------------------------------------------- To unsubscribe, e-mail: [hidden email] For additional commands, e-mail: [hidden email]
Reply | Threaded
Open this post in threaded view
|

Re: Legacy filter strategy in Lucene 6.0

Adrien Grand
Hi Alex,

IndexOrDocValuesQuery builds on the same blocks but I don't think you need
it here. Uwe's idea it to put both your selective term queries and
unselective doc-value queries in the same BooleanQuery. Lucene will know
that it needs to run the selective clauses first thanks to the cost API.

Le ven. 10 août 2018 à 05:13, alex stark <[hidden email]> a écrit :

> Thanks Uwe, I think you are recommending
> IndexOrDocValuesQuery/DocValuesRangeQuery, and the articles by Adrien,
> https://www.elastic.co/blog/better-query-planning-for-range-queries-in-elasticsearch
> It looks promising for my requirement, I will try on that. ---- On Thu, 09
> Aug 2018 16:04:27 +0800 Uwe Schindler <[hidden email]> wrote ---- Hi,
> IMHO: I'd split the whole code into a BooleanQuery with two filter clauses.
> The reverse index based condition (term condition, e.g., TermInSetQuery)
> gets added as a Occur.FILTER and the DocValues condition is a separate
> Occur.FILTER. If Lucene executes such a query, it would use the more
> specific condition (based on cost) to lead the execution, which should be
> the terms condition. The docvalues condition is then only checked for
> matches of the first. But you can still go and implement the two-phase
> iterator, but I'd not do that. Uwe ----- Uwe Schindler Achterdiek 19,
> D-28357 Bremen
> <https://maps.google.com/?q=Achterdiek+19,+D-28357+Bremen&entry=gmail&source=g>
> http://www.thetaphi.de eMail: [hidden email] > -----Original
> Message----- > From: alex stark <[hidden email]> > Sent: Thursday,
> August 9, 2018 9:12 AM > To: java-user <[hidden email]> >
> Cc: [hidden email] > Subject: Re: Legacy filter strategy in
> Lucene 6.0 > > Thanks Adrien, I want to filter out docs base on conditions
> which stored in > doc values (those conditions are unselective ranges which
> is not appropriate > to put into reverse index), so I plan to use some
> selective term conditions to > do first round search and then filter in
> second phase. I see there is two > phase iterator, but I did not find how
> to use it. Is it a appropriate scenario to > use two phase iterator? or It
> is better to do it in a collector? Is there any > guide of two phase
> iterator? Best Regards ---- On Wed, 08 Aug 2018 > 16:08:39 +0800 Adrien
> Grand <[hidden email]> wrote ---- Hi Alex, These > strategies still
> exist internally, but BooleanQuery decides which one to use > automatically
> based on the cost API (cheaper clauses run first) and whether > sub clauses
> produce bitset-based or postings-based iterators. Le mer. 8 août > 2018 à
> 09:46, alex stark <[hidden email]> a écrit : > As FilteredQuery >
> are removed in Lucene 6.0, we should use boolean query to > do the >
> filtering. How about the legacy filter strategy such as > >
> LEAP_FROG_FILTER_FIRST_STRATEGY or QUERY_FIRST_FILTER_STRATEGY? > What is
> the > current filter strategy? Thanks,
> --------------------------------------------------------------------- To
> unsubscribe, e-mail: [hidden email] For
> additional commands, e-mail: [hidden email]