Too many clauses

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Too many clauses

Sharma, Siddharth
Query:  caught a class org.apache.lucene.queryParser.ParseException
 with message: Too many boolean clauses

I realize why this is happening (the 1024 clauses limit for BooleanQuery).
My question is more design related.

During customer registration, the customer defines a set of skus/products
that we should never display to them. These products are part of our catalog
offering but we are forbidden to make them available to this customer. This
list is called the block list and can potentially be large (6 to 7
thousand).

When a customer logs in, this block list is identified and currently I am
using QueryParser to parse these skus to block/exclude. That is why I am
hitting against the 1024 upper bound.

To circumvent it, here are a few options that I have thought of:
1. Chunk it up:
  a. Create a filter based on a query that has a maximum of 1024.
  b. Get its bits.
  c. Get the next 1024 blocked skus and create a filter out of it and get  
     its bits.
  d. AND the two BitSets.
  e. Do this till all blocked skus and other filters are ANDed together for
     the final BitSet.

2. Build the block list into the index somehow
  a. My index is based on SKUs, not on customer.
  b. I could add a field in each SKU document that contains the customer-ids

     who want this SKU blocked.
  c. But this field's value could be very large.

3. Some other obvious way that I am stupid enough not to be able to
   visualize.

Thanks in advance
Sid





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Too many clauses

Aigner, Thomas
Another way around it is to increase the max clause count.

//Setting the clause Count
 BooleanQuery.setMaxClauseCount(int);

Can use maxint or some number smaller.. When I set this high, I have had
to set the java pool higher for memory as well.

Tom

-----Original Message-----
From: Sharma, Siddharth [mailto:[hidden email]]
Sent: Monday, October 17, 2005 3:32 PM
To: [hidden email]
Subject: Too many clauses

Query:  caught a class org.apache.lucene.queryParser.ParseException
 with message: Too many boolean clauses

I realize why this is happening (the 1024 clauses limit for
BooleanQuery).
My question is more design related.

During customer registration, the customer defines a set of
skus/products
that we should never display to them. These products are part of our
catalog
offering but we are forbidden to make them available to this customer.
This
list is called the block list and can potentially be large (6 to 7
thousand).

When a customer logs in, this block list is identified and currently I
am
using QueryParser to parse these skus to block/exclude. That is why I am
hitting against the 1024 upper bound.

To circumvent it, here are a few options that I have thought of:
1. Chunk it up:
  a. Create a filter based on a query that has a maximum of 1024.
  b. Get its bits.
  c. Get the next 1024 blocked skus and create a filter out of it and
get  
     its bits.
  d. AND the two BitSets.
  e. Do this till all blocked skus and other filters are ANDed together
for
     the final BitSet.

2. Build the block list into the index somehow
  a. My index is based on SKUs, not on customer.
  b. I could add a field in each SKU document that contains the
customer-ids

     who want this SKU blocked.
  c. But this field's value could be very large.

3. Some other obvious way that I am stupid enough not to be able to
   visualize.

Thanks in advance
Sid





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Too many clauses

Sharma, Siddharth
In reply to this post by Sharma, Siddharth
I thought of that but I had that listed as a last fallback option because I
was not sure what it meant in terms of performance since I am a newbie to
Lucene.
So if I bump up my heap (I assume that's what you are referring to when you
say java pool) it'll be ok?
Are there metrics around this?
At x max_clauses, jvm heap should be y meg
At x + 1024, it should be z meg




-----Original Message-----
From: Aigner, Thomas [mailto:[hidden email]]
Sent: Monday, October 17, 2005 3:42 PM
To: [hidden email]
Subject: RE: Too many clauses

Another way around it is to increase the max clause count.

//Setting the clause Count
 BooleanQuery.setMaxClauseCount(int);

Can use maxint or some number smaller.. When I set this high, I have had
to set the java pool higher for memory as well.

Tom

-----Original Message-----
From: Sharma, Siddharth [mailto:[hidden email]]
Sent: Monday, October 17, 2005 3:32 PM
To: [hidden email]
Subject: Too many clauses

Query:  caught a class org.apache.lucene.queryParser.ParseException
 with message: Too many boolean clauses

I realize why this is happening (the 1024 clauses limit for
BooleanQuery).
My question is more design related.

During customer registration, the customer defines a set of
skus/products
that we should never display to them. These products are part of our
catalog
offering but we are forbidden to make them available to this customer.
This
list is called the block list and can potentially be large (6 to 7
thousand).

When a customer logs in, this block list is identified and currently I
am
using QueryParser to parse these skus to block/exclude. That is why I am
hitting against the 1024 upper bound.

To circumvent it, here are a few options that I have thought of:
1. Chunk it up:
  a. Create a filter based on a query that has a maximum of 1024.
  b. Get its bits.
  c. Get the next 1024 blocked skus and create a filter out of it and
get  
     its bits.
  d. AND the two BitSets.
  e. Do this till all blocked skus and other filters are ANDed together
for
     the final BitSet.

2. Build the block list into the index somehow
  a. My index is based on SKUs, not on customer.
  b. I could add a field in each SKU document that contains the
customer-ids

     who want this SKU blocked.
  c. But this field's value could be very large.

3. Some other obvious way that I am stupid enough not to be able to
   visualize.

Thanks in advance
Sid





---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: Too many clauses

Chris Hostetter-3
In reply to this post by Sharma, Siddharth
:
: To circumvent it, here are a few options that I have thought of:
: 1. Chunk it up:
:   a. Create a filter based on a query that has a maximum of 1024.
:   b. Get its bits.
:   c. Get the next 1024 blocked skus and create a filter out of it and get
:      its bits.
:   d. AND the two BitSets.
:   e. Do this till all blocked skus and other filters are ANDed together for
:      the final BitSet.

Instead of building up your filter based on a query, why not build up your
filter directly? ... Using a QueryFilter requires that scoring happen --
but you don't care about the scoring, you just want to know if a doc
matches a keyword or not.  Take a look at the way RangeFilter is
implimented.  it should be able to searve as a good example of how you can
write a "SetFilter" that takes in a field name and a set of keywords, and
only "passes" documents where one of the keywords shows up as an indexed
value for that field.  Now you don't have toworry baout the 1024 limit,
you don't have to "chunk" anything, your searches will be faster because
you don't need to worry about the scoring aspects of a the BooleanQueries.


Hint: you can sort the input Set, and then iterate over it, pulling out
the TermDocs for each, and scoring each doc in each TermDocs.  now your
Filter indicates all the products that do match those skus, and you'll
want an "InverseFilter to wrap it and indicate all the products that
*don't* match those skus.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

need help for generating Query String

jibu mathew
In reply to this post by Aigner, Thomas
Hi all,

I need urgent help for the following issues.

 

What is the query string to retrieve all the documents indexed
(something similar to *.*)?
In a program I have indexed 10 files. When I do a search using the query
"contents:java", it will return 2 documents. But when I give
"-contents:java", then it will return an empty result set. Does anyone
know what the right query string for this? I.e., to retrieve all
documents that does not contain the word 'java'.
What is the query string to retrieve all the documents which content is
empty?
 

Please help me as soon as possible

 

Thanks

Jibu


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

Re: need help for generating Query String

Koji Sekiguchi-4
Hi,

> In a program I have indexed 10 files. When I do a search using the query
> "contents:java", it will return 2 documents. But when I give
> "-contents:java", then it will return an empty result set. Does anyone
> know what the right query string for this? I.e., to retrieve all
> documents that does not contain the word 'java'.

Please see FAQ of Lucene:

How does one determine which documents do not have a certain term?
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0cda565d913389773ca9c3246bde894c3e99084e

Thank you,

Koji

----- Original Message -----
From: "jibu mathew" <[hidden email]>
To: <[hidden email]>
Sent: Tuesday, October 18, 2005 1:45 PM
Subject: need help for generating Query String


> Hi all,
>
> I need urgent help for the following issues.
>
>
>
> What is the query string to retrieve all the documents indexed
> (something similar to *.*)?
> In a program I have indexed 10 files. When I do a search using the query
> "contents:java", it will return 2 documents. But when I give
> "-contents:java", then it will return an empty result set. Does anyone
> know what the right query string for this? I.e., to retrieve all
> documents that does not contain the word 'java'.
> What is the query string to retrieve all the documents which content is
> empty?
>
>
> Please help me as soon as possible
>
>
>
> Thanks
>
> Jibu
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [hidden email]
> For additional commands, e-mail: [hidden email]
>
>



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: need help for generating Query String

jibu mathew
Thanks Koji. It worked for me.
jibu

-----Original Message-----
From: Koji Sekiguchi [mailto:[hidden email]]
Sent: Tuesday, October 18, 2005 10:51 AM
To: [hidden email]
Subject: Re: need help for generating Query String

Hi,

> In a program I have indexed 10 files. When I do a search using the
query
> "contents:java", it will return 2 documents. But when I give
> "-contents:java", then it will return an empty result set. Does anyone
> know what the right query string for this? I.e., to retrieve all
> documents that does not contain the word 'java'.

Please see FAQ of Lucene:

How does one determine which documents do not have a certain term?
http://wiki.apache.org/jakarta-lucene/LuceneFAQ#head-0cda565d913389773ca
9c3246bde894c3e99084e

Thank you,

Koji



---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

Reply | Threaded
Open this post in threaded view
|

RE: Too many clauses

Sharma, Siddharth
In reply to this post by Sharma, Siddharth
Thanks Chris

I haven't tried it yet, but I think I understand your idea now (after 24
hours, man I'm slow on the uptake;)
I'll try it today.
-Sid


-----Original Message-----
From: Chris Hostetter [mailto:[hidden email]]
Sent: Monday, October 17, 2005 5:05 PM
To: [hidden email]
Subject: Re: Too many clauses

:
: To circumvent it, here are a few options that I have thought of:
: 1. Chunk it up:
:   a. Create a filter based on a query that has a maximum of 1024.
:   b. Get its bits.
:   c. Get the next 1024 blocked skus and create a filter out of it and get
:      its bits.
:   d. AND the two BitSets.
:   e. Do this till all blocked skus and other filters are ANDed together
for
:      the final BitSet.

Instead of building up your filter based on a query, why not build up your
filter directly? ... Using a QueryFilter requires that scoring happen --
but you don't care about the scoring, you just want to know if a doc
matches a keyword or not.  Take a look at the way RangeFilter is
implimented.  it should be able to searve as a good example of how you can
write a "SetFilter" that takes in a field name and a set of keywords, and
only "passes" documents where one of the keywords shows up as an indexed
value for that field.  Now you don't have toworry baout the 1024 limit,
you don't have to "chunk" anything, your searches will be faster because
you don't need to worry about the scoring aspects of a the BooleanQueries.


Hint: you can sort the input Set, and then iterate over it, pulling out
the TermDocs for each, and scoring each doc in each TermDocs.  now your
Filter indicates all the products that do match those skus, and you'll
want an "InverseFilter to wrap it and indicate all the products that
*don't* match those skus.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]