Filters Vs queries - for terms more than 1024

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
7 messages Options
Reply | Threaded
Open this post in threaded view
|

Filters Vs queries - for terms more than 1024

Kumaran Ramasubramanian
Hi All,

i am using lucene 4.10.4

In lucene search, i know we have 1024 limitation in number of boolean query
clauses. i know we can increase this limit.. but i want to understand
queries vs filter in lucene 4.10.4...

i want to make queries larger than 1024.. Relevance is not needed for
me. What are the best possible options?

1. using boolean filters is working for even 1lakh Filter Clauses in
booleanFilter... is there any consequence using filters in this case? shall
i proceed with this?

2. if i am giving very less memory for filters, it is managed to complete a
search after so much GC cycles.. Why cannot we do the same for query
clauses too? What is the actual technical reason for 1024 limitation in
boolean query?

3. if i disable scoring process using ConstantScoreQuery, is it possible
give more than 1024 query clauses?
       i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?

java.lang.OutOfMemoryError: Java heap space

>
> at
>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
>
> at
>> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
>
> at
>> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
>
> at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
>
> at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
>
> at
>> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
>
> at
>> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
>
> at
>> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
>
> at
>> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
>
> at
>> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
>
>



Any pointers are much appreciated... Thank you..



--
Kumaran R
Reply | Threaded
Open this post in threaded view
|

Re: Filters Vs queries - for terms more than 1024

Adrien Grand
Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
worse at skipping over matches than a BooleanQuery but keeps memory
usage low and disk access sequential, on the contrary to large boolean
queries.

Otherwise you would probably need to rethink how you design your documents
in order to be able to run simpler queries.

Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <[hidden email]>
a écrit :

> Hi All,
>
> i am using lucene 4.10.4
>
> In lucene search, i know we have 1024 limitation in number of boolean query
> clauses. i know we can increase this limit.. but i want to understand
> queries vs filter in lucene 4.10.4...
>
> i want to make queries larger than 1024.. Relevance is not needed for
> me. What are the best possible options?
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> booleanFilter... is there any consequence using filters in this case? shall
> i proceed with this?
>
> 2. if i am giving very less memory for filters, it is managed to complete a
> search after so much GC cycles.. Why cannot we do the same for query
> clauses too? What is the actual technical reason for 1024 limitation in
> boolean query?
>
> 3. if i disable scoring process using ConstantScoreQuery, is it possible
> give more than 1024 query clauses?
>        i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?
>
> java.lang.OutOfMemoryError: Java heap space
> >
> > at
> >>
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> >
> > at
> >>
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> >
> > at
> >>
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> >
> > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> >
> > at
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> >
> > at
> >>
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> >
> > at
> >>
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> >
> > at
> >>
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> >
> > at
> >>
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> >
> > at
> >>
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
> >
> >
>
>
>
> Any pointers are much appreciated... Thank you..
>
>
>
> --
> Kumaran R
>
Reply | Threaded
Open this post in threaded view
|

Re: Filters Vs queries - for terms more than 1024

Kumaran Ramasubramanian
​Hi Adrien,

Thanks for your input...

1. using boolean filters is working for even 1lakh Filter Clauses in
> booleanFilter... is there any consequence using filters in this case? shall
> i proceed with this?


​code snippet i used for this statement 1.. ​

                for (int i = 0; i < 10

> ​00​
> 00; i++)
>                 {
>                     Term term = new Term("
> ​key
> "
> ​+i​
> , "
> ​value
> "
> ​+i​
> );
>                     TermsFilter filter = new
> ​​
> TermsFilter(term);
>                     FilterClause filterClause = new FilterClause(filter,
> BooleanClause.Occur.SHOULD);
>                     boolFilter.add(filterClause);
>                 }



Do you see any problem in using

TermsFilter over TermsQuery?

btw, i will test with TermsQuery and let you know.



​--
Kumaran ​R




On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <[hidden email]> wrote:

> Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
> worse at skipping over matches than a BooleanQuery but keeps memory
> usage low and disk access sequential, on the contrary to large boolean
> queries.
>
> Otherwise you would probably need to rethink how you design your documents
> in order to be able to run simpler queries.
>
> Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <[hidden email]
> >
> a écrit :
>
> > Hi All,
> >
> > i am using lucene 4.10.4
> >
> > In lucene search, i know we have 1024 limitation in number of boolean
> query
> > clauses. i know we can increase this limit.. but i want to understand
> > queries vs filter in lucene 4.10.4...
> >
> > i want to make queries larger than 1024.. Relevance is not needed for
> > me. What are the best possible options?
> >
> > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
> >
> > 2. if i am giving very less memory for filters, it is managed to
> complete a
> > search after so much GC cycles.. Why cannot we do the same for query
> > clauses too? What is the actual technical reason for 1024 limitation in
> > boolean query?
> >
> > 3. if i disable scoring process using ConstantScoreQuery, is it possible
> > give more than 1024 query clauses?
> >        i tried this.. But still getting java.lang.OutOfMemoryError.. Why
> ?
> >
> > java.lang.OutOfMemoryError: Java heap space
> > >
> > > at
> > >>
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > >
> > > at
> > >>
> > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> Lucene41PostingsReader.java:254)
> > >
> > > at
> > >>
> > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> docs(SegmentTermsEnum.java:999)
> > >
> > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > >
> > > at
> > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > >
> > > at
> > >>
> > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> scorer(BooleanQuery.java:356)
> > >
> > > at
> > >>
> > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> ConstantScoreQuery.java:164)
> > >
> > > at
> > >>
> > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> filteredScorer(FilteredQuery.java:542)
> > >
> > > at
> > >>
> > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> filteredBulkScorer(FilteredQuery.java:504)
> > >
> > > at
> > >>
> > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> FilteredQuery.java:150)
> > >
> > >
> >
> >
> >
> > Any pointers are much appreciated... Thank you..
> >
> >
> >
> > --
> > Kumaran R
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Filters Vs queries - for terms more than 1024

Adrien Grand
Sorry for the confusion, I keep saying query in all cases because queries
and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
than BooleanQuery with Lucene 4 then things should be mostly ok if you have
many clauses. But like TermsQuery, BooleanFilter always consume all
matching documents from all its clauses. So if you intersect it with a
selective query, it is wasteful.

Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <[hidden email]>
a écrit :

> ​Hi Adrien,
>
> Thanks for your input...
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
>
>
> ​code snippet i used for this statement 1.. ​
>
>                 for (int i = 0; i < 10
> > ​00​
> > 00; i++)
> >                 {
> >                     Term term = new Term("
> > ​key
> > "
> > ​+i​
> > , "
> > ​value
> > "
> > ​+i​
> > );
> >                     TermsFilter filter = new
> > ​​
> > TermsFilter(term);
> >                     FilterClause filterClause = new FilterClause(filter,
> > BooleanClause.Occur.SHOULD);
> >                     boolFilter.add(filterClause);
> >                 }
>
>
>
> Do you see any problem in using
> ​
> TermsFilter over TermsQuery?
>
> btw, i will test with TermsQuery and let you know.
>
>
>
> ​--
> Kumaran ​R
>
>
>
>
> On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <[hidden email]> wrote:
>
> > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It is
> > worse at skipping over matches than a BooleanQuery but keeps memory
> > usage low and disk access sequential, on the contrary to large boolean
> > queries.
> >
> > Otherwise you would probably need to rethink how you design your
> documents
> > in order to be able to run simpler queries.
> >
> > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> [hidden email]
> > >
> > a écrit :
> >
> > > Hi All,
> > >
> > > i am using lucene 4.10.4
> > >
> > > In lucene search, i know we have 1024 limitation in number of boolean
> > query
> > > clauses. i know we can increase this limit.. but i want to understand
> > > queries vs filter in lucene 4.10.4...
> > >
> > > i want to make queries larger than 1024.. Relevance is not needed for
> > > me. What are the best possible options?
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> > >
> > > 2. if i am giving very less memory for filters, it is managed to
> > complete a
> > > search after so much GC cycles.. Why cannot we do the same for query
> > > clauses too? What is the actual technical reason for 1024 limitation in
> > > boolean query?
> > >
> > > 3. if i disable scoring process using ConstantScoreQuery, is it
> possible
> > > give more than 1024 query clauses?
> > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> Why
> > ?
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > Lucene41PostingsReader.java:254)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > docs(SegmentTermsEnum.java:999)
> > > >
> > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > >
> > > > at
> > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > scorer(BooleanQuery.java:356)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > ConstantScoreQuery.java:164)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > filteredScorer(FilteredQuery.java:542)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > filteredBulkScorer(FilteredQuery.java:504)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > FilteredQuery.java:150)
> > > >
> > > >
> > >
> > >
> > >
> > > Any pointers are much appreciated... Thank you..
> > >
> > >
> > >
> > > --
> > > Kumaran R
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Filters Vs queries - for terms more than 1024

Kumaran Ramasubramanian
Thank you Adrien :-)



On 18-Jul-2017 3:21 PM, "Adrien Grand" <[hidden email]> wrote:

Sorry for the confusion, I keep saying query in all cases because queries
and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
than BooleanQuery with Lucene 4 then things should be mostly ok if you have
many clauses. But like TermsQuery, BooleanFilter always consume all
matching documents from all its clauses. So if you intersect it with a
selective query, it is wasteful.

Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <[hidden email]>
a écrit :

> ​Hi Adrien,
>
> Thanks for your input...
>
> 1. using boolean filters is working for even 1lakh Filter Clauses in
> > booleanFilter... is there any consequence using filters in this case?
> shall
> > i proceed with this?
>
>
> ​code snippet i used for this statement 1.. ​
>
>                 for (int i = 0; i < 10
> > ​00​
> > 00; i++)
> >                 {
> >                     Term term = new Term("
> > ​key
> > "
> > ​+i​
> > , "
> > ​value
> > "
> > ​+i​
> > );
> >                     TermsFilter filter = new
> > ​​
> > TermsFilter(term);
> >                     FilterClause filterClause = new FilterClause(filter,
> > BooleanClause.Occur.SHOULD);
> >                     boolFilter.add(filterClause);
> >                 }
>
>
>
> Do you see any problem in using
> ​
> TermsFilter over TermsQuery?
>
> btw, i will test with TermsQuery and let you know.
>
>
>
> ​--
> Kumaran ​R
>
>
>
>
> On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <[hidden email]> wrote:
>
> > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It
is

> > worse at skipping over matches than a BooleanQuery but keeps memory
> > usage low and disk access sequential, on the contrary to large boolean
> > queries.
> >
> > Otherwise you would probably need to rethink how you design your
> documents
> > in order to be able to run simpler queries.
> >
> > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> [hidden email]
> > >
> > a écrit :
> >
> > > Hi All,
> > >
> > > i am using lucene 4.10.4
> > >
> > > In lucene search, i know we have 1024 limitation in number of boolean
> > query
> > > clauses. i know we can increase this limit.. but i want to understand
> > > queries vs filter in lucene 4.10.4...
> > >
> > > i want to make queries larger than 1024.. Relevance is not needed for
> > > me. What are the best possible options?
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> > >
> > > 2. if i am giving very less memory for filters, it is managed to
> > complete a
> > > search after so much GC cycles.. Why cannot we do the same for query
> > > clauses too? What is the actual technical reason for 1024 limitation
in

> > > boolean query?
> > >
> > > 3. if i disable scoring process using ConstantScoreQuery, is it
> possible
> > > give more than 1024 query clauses?
> > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> Why
> > ?
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > Lucene41PostingsReader.java:254)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > docs(SegmentTermsEnum.java:999)
> > > >
> > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > >
> > > > at
> > > org.apache.lucene.search.TermQuery$TermWeight.scorer(
TermQuery.java:84)

> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > scorer(BooleanQuery.java:356)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > ConstantScoreQuery.java:164)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > filteredScorer(FilteredQuery.java:542)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > filteredBulkScorer(FilteredQuery.java:504)
> > > >
> > > > at
> > > >>
> > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > FilteredQuery.java:150)
> > > >
> > > >
> > >
> > >
> > >
> > > Any pointers are much appreciated... Thank you..
> > >
> > >
> > >
> > > --
> > > Kumaran R
> > >
> >
>
Reply | Threaded
Open this post in threaded view
|

Re: Filters Vs queries - for terms more than 1024

Kumaran Ramasubramanian
Hi Adrien


i have tried

BooleanQuery with ConstantScoreQuery based suggestion from this link,
http://lucene.472066.n3.nabble.com/BooleanFilter-vs-BooleanQuery-performance-td4106920.html

If you want it fast, use
> ​​
> BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no
> scoring done (in most cases, older BooleanQuery sometimes still calculated
> the score).




3. if i disable scoring process using ConstantScoreQuery, is it possible
> give more than 1024 query clauses?
>        i tried this.. But still getting java.lang.OutOfMemoryError.. Why ?


java.lang.OutOfMemoryError: Java heap space

> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> at
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> at
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> at org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> at
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> at
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> at
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> at
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> at
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)




If i use BooleanQuery and wrap it with ConstantScoreQuery, shall i use 1
lakh boolean clauses in booleanquery ?





-
​-
Kumaran R



On Wed, Jul 19, 2017 at 8:26 AM, Kumaran Ramasubramanian <[hidden email]
> wrote:

>
>
> Thank you Adrien :-)
>
>
>
> On 18-Jul-2017 3:21 PM, "Adrien Grand" <[hidden email]> wrote:
>
> Sorry for the confusion, I keep saying query in all cases because queries
> and filters got merged in Lucene 5.0. If you are using BooleanFilter rather
> than BooleanQuery with Lucene 4 then things should be mostly ok if you have
> many clauses. But like TermsQuery, BooleanFilter always consume all
> matching documents from all its clauses. So if you intersect it with a
> selective query, it is wasteful.
>
> Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <[hidden email]
> >
> a écrit :
>
> > ​Hi Adrien,
> >
> > Thanks for your input...
> >
> > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > booleanFilter... is there any consequence using filters in this case?
> > shall
> > > i proceed with this?
> >
> >
> > ​code snippet i used for this statement 1.. ​
> >
> >                 for (int i = 0; i < 10
> > > ​00​
> > > 00; i++)
> > >                 {
> > >                     Term term = new Term("
> > > ​key
> > > "
> > > ​+i​
> > > , "
> > > ​value
> > > "
> > > ​+i​
> > > );
> > >                     TermsFilter filter = new
> > > ​​
> > > TermsFilter(term);
> > >                     FilterClause filterClause = new
> FilterClause(filter,
> > > BooleanClause.Occur.SHOULD);
> > >                     boolFilter.add(filterClause);
> > >                 }
> >
> >
> >
> > Do you see any problem in using
> > ​
> > TermsFilter over TermsQuery?
> >
> > btw, i will test with TermsQuery and let you know.
> >
> >
> >
> > ​--
> > Kumaran ​R
> >
> >
> >
> >
> > On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <[hidden email]> wrote:
> >
> > > Could you use TermInSetQuery (TermsQuery in older Lucene versions)? It
> is
> > > worse at skipping over matches than a BooleanQuery but keeps memory
> > > usage low and disk access sequential, on the contrary to large boolean
> > > queries.
> > >
> > > Otherwise you would probably need to rethink how you design your
> > documents
> > > in order to be able to run simpler queries.
> > >
> > > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> > [hidden email]
> > > >
> > > a écrit :
> > >
> > > > Hi All,
> > > >
> > > > i am using lucene 4.10.4
> > > >
> > > > In lucene search, i know we have 1024 limitation in number of boolean
> > > query
> > > > clauses. i know we can increase this limit.. but i want to understand
> > > > queries vs filter in lucene 4.10.4...
> > > >
> > > > i want to make queries larger than 1024.. Relevance is not needed for
> > > > me. What are the best possible options?
> > > >
> > > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > > booleanFilter... is there any consequence using filters in this case?
> > > shall
> > > > i proceed with this?
> > > >
> > > > 2. if i am giving very less memory for filters, it is managed to
> > > complete a
> > > > search after so much GC cycles.. Why cannot we do the same for query
> > > > clauses too? What is the actual technical reason for 1024 limitation
> in
> > > > boolean query?
> > > >
> > > > 3. if i disable scoring process using ConstantScoreQuery, is it
> > possible
> > > > give more than 1024 query clauses?
> > > >        i tried this.. But still getting java.lang.OutOfMemoryError..
> > Why
> > > ?
> > > >
> > > > java.lang.OutOfMemoryError: Java heap space
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > > Lucene41PostingsReader.java:254)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > > docs(SegmentTermsEnum.java:999)
> > > > >
> > > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > > >
> > > > > at
> > > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQue
> ry.java:84)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > > scorer(BooleanQuery.java:356)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > > ConstantScoreQuery.java:164)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > > filteredScorer(FilteredQuery.java:542)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > > filteredBulkScorer(FilteredQuery.java:504)
> > > > >
> > > > > at
> > > > >>
> > > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > > FilteredQuery.java:150)
> > > > >
> > > > >
> > > >
> > > >
> > > >
> > > > Any pointers are much appreciated... Thank you..
> > > >
> > > >
> > > >
> > > > --
> > > > Kumaran R
> > > >
> > >
> >
>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Filters Vs queries - for terms more than 1024

Adrien Grand
BooleanQuery is subject to the 1024 limit on the number of clauses, so you
can't use it in that case. You should use TermsQuery/TermsFilter instead.

Le mer. 19 juil. 2017 à 13:52, Kumaran Ramasubramanian <[hidden email]>
a écrit :

> Hi Adrien
>
>
> i have tried
> ​
> BooleanQuery with ConstantScoreQuery based suggestion from this link,
>
> http://lucene.472066.n3.nabble.com/BooleanFilter-vs-BooleanQuery-performance-td4106920.html
>
> If you want it fast, use
> > ​​
> > BooleanQuery and wrap it with ConstantScoreQuery. Then there is also no
> > scoring done (in most cases, older BooleanQuery sometimes still
> calculated
> > the score).
>
>
>
>
> 3. if i disable scoring process using ConstantScoreQuery, is it possible
> > give more than 1024 query clauses?
> >        i tried this.. But still getting java.lang.OutOfMemoryError.. Why
> ?
>
>
> java.lang.OutOfMemoryError: Java heap space
> > at
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > at
> >
> org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(Lucene41PostingsReader.java:254)
> > at
> >
> org.apache.lucene.codecs.blocktree.SegmentTermsEnum.docs(SegmentTermsEnum.java:999)
> > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > at
> org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQuery.java:84)
> > at
> >
> org.apache.lucene.search.BooleanQuery$BooleanWeight.scorer(BooleanQuery.java:356)
> > at
> >
> org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(ConstantScoreQuery.java:164)
> > at
> >
> org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.filteredScorer(FilteredQuery.java:542)
> > at
> >
> org.apache.lucene.search.FilteredQuery$FilterStrategy.filteredBulkScorer(FilteredQuery.java:504)
> > at
> >
> org.apache.lucene.search.FilteredQuery$1.bulkScorer(FilteredQuery.java:150)
>
>
>
>
> If i use BooleanQuery and wrap it with ConstantScoreQuery, shall i use 1
> lakh boolean clauses in booleanquery ?
>
>
>
>
>
> -
> ​-
> Kumaran R
>
> ​
>
> On Wed, Jul 19, 2017 at 8:26 AM, Kumaran Ramasubramanian <
> [hidden email]
> > wrote:
>
> >
> >
> > Thank you Adrien :-)
> >
> >
> >
> > On 18-Jul-2017 3:21 PM, "Adrien Grand" <[hidden email]> wrote:
> >
> > Sorry for the confusion, I keep saying query in all cases because queries
> > and filters got merged in Lucene 5.0. If you are using BooleanFilter
> rather
> > than BooleanQuery with Lucene 4 then things should be mostly ok if you
> have
> > many clauses. But like TermsQuery, BooleanFilter always consume all
> > matching documents from all its clauses. So if you intersect it with a
> > selective query, it is wasteful.
> >
> > Le mar. 18 juil. 2017 à 11:42, Kumaran Ramasubramanian <
> [hidden email]
> > >
> > a écrit :
> >
> > > ​Hi Adrien,
> > >
> > > Thanks for your input...
> > >
> > > 1. using boolean filters is working for even 1lakh Filter Clauses in
> > > > booleanFilter... is there any consequence using filters in this case?
> > > shall
> > > > i proceed with this?
> > >
> > >
> > > ​code snippet i used for this statement 1.. ​
> > >
> > >                 for (int i = 0; i < 10
> > > > ​00​
> > > > 00; i++)
> > > >                 {
> > > >                     Term term = new Term("
> > > > ​key
> > > > "
> > > > ​+i​
> > > > , "
> > > > ​value
> > > > "
> > > > ​+i​
> > > > );
> > > >                     TermsFilter filter = new
> > > > ​​
> > > > TermsFilter(term);
> > > >                     FilterClause filterClause = new
> > FilterClause(filter,
> > > > BooleanClause.Occur.SHOULD);
> > > >                     boolFilter.add(filterClause);
> > > >                 }
> > >
> > >
> > >
> > > Do you see any problem in using
> > > ​
> > > TermsFilter over TermsQuery?
> > >
> > > btw, i will test with TermsQuery and let you know.
> > >
> > >
> > >
> > > ​--
> > > Kumaran ​R
> > >
> > >
> > >
> > >
> > > On Tue, Jul 18, 2017 at 1:59 AM, Adrien Grand <[hidden email]>
> wrote:
> > >
> > > > Could you use TermInSetQuery (TermsQuery in older Lucene versions)?
> It
> > is
> > > > worse at skipping over matches than a BooleanQuery but keeps memory
> > > > usage low and disk access sequential, on the contrary to large
> boolean
> > > > queries.
> > > >
> > > > Otherwise you would probably need to rethink how you design your
> > > documents
> > > > in order to be able to run simpler queries.
> > > >
> > > > Le lun. 17 juil. 2017 à 16:28, Kumaran Ramasubramanian <
> > > [hidden email]
> > > > >
> > > > a écrit :
> > > >
> > > > > Hi All,
> > > > >
> > > > > i am using lucene 4.10.4
> > > > >
> > > > > In lucene search, i know we have 1024 limitation in number of
> boolean
> > > > query
> > > > > clauses. i know we can increase this limit.. but i want to
> understand
> > > > > queries vs filter in lucene 4.10.4...
> > > > >
> > > > > i want to make queries larger than 1024.. Relevance is not needed
> for
> > > > > me. What are the best possible options?
> > > > >
> > > > > 1. using boolean filters is working for even 1lakh Filter Clauses
> in
> > > > > booleanFilter... is there any consequence using filters in this
> case?
> > > > shall
> > > > > i proceed with this?
> > > > >
> > > > > 2. if i am giving very less memory for filters, it is managed to
> > > > complete a
> > > > > search after so much GC cycles.. Why cannot we do the same for
> query
> > > > > clauses too? What is the actual technical reason for 1024
> limitation
> > in
> > > > > boolean query?
> > > > >
> > > > > 3. if i disable scoring process using ConstantScoreQuery, is it
> > > possible
> > > > > give more than 1024 query clauses?
> > > > >        i tried this.. But still getting
> java.lang.OutOfMemoryError..
> > > Why
> > > > ?
> > > > >
> > > > > java.lang.OutOfMemoryError: Java heap space
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader$
> > > > BlockDocsEnum.<init>(Lucene41PostingsReader.java:345)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.lucene41.Lucene41PostingsReader.docs(
> > > > Lucene41PostingsReader.java:254)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.codecs.blocktree.SegmentTermsEnum.
> > > > docs(SegmentTermsEnum.java:999)
> > > > > >
> > > > > > at org.apache.lucene.index.TermsEnum.docs(TermsEnum.java:149)
> > > > > >
> > > > > > at
> > > > > org.apache.lucene.search.TermQuery$TermWeight.scorer(TermQue
> > ry.java:84)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.BooleanQuery$BooleanWeight.
> > > > scorer(BooleanQuery.java:356)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.ConstantScoreQuery$ConstantWeight.scorer(
> > > > ConstantScoreQuery.java:164)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$RandomAccessFilterStrategy.
> > > > filteredScorer(FilteredQuery.java:542)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$FilterStrategy.
> > > > filteredBulkScorer(FilteredQuery.java:504)
> > > > > >
> > > > > > at
> > > > > >>
> > > > > org.apache.lucene.search.FilteredQuery$1.bulkScorer(
> > > > FilteredQuery.java:150)
> > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > Any pointers are much appreciated... Thank you..
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Kumaran R
> > > > >
> > > >
> > >
> >
> >
> >
>