GET or POST for large queries?

classic Classic list List threaded Threaded
11 messages Options
mrw
Reply | Threaded
Open this post in threaded view
|

GET or POST for large queries?

mrw
We are running into some issues with large queries.  Initially, they were ostensibly header buffer overruns, because increasing Jetty's headerBufferSize value to 65536 resolved them. This seems like a kludge, but it does solve the problem for 95% of our users.

However, we do have queries that are physically larger than that and for which increasing the headerBufferSize to 65536 does not work.  This is due to security requirements:  Security descriptors are baked into the index, and then potentially thousands of them (depending on the user context) are passed in with each query.  These excessive queries are only a problem with approximately 5% of users who are highly entitled, but the number of security descriptors in are likely to increase and we won't have a workaround for this security policy any time soon.

After a lot of Googling, it seems to me that it's common to increase the headerBufferSize, but I don't see any other strategies.  Is it possible/feasible to switch to use POST for querying?

Thanks!
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

Erik Hatcher-4
Yes, you may use POST to make search requests to Solr.

        Erik

On Feb 17, 2011, at 14:27 , mrw wrote:

>
> We are running into some issues with large queries.  Initially, they were
> ostensibly header buffer overruns, because increasing Jetty's
> headerBufferSize value to 65536 resolved them. This seems like a kludge, but
> it does solve the problem for 95% of our users.
>
> However, we do have queries that are physically larger than that and for
> which increasing the headerBufferSize to 65536 does not work.  This is due
> to security requirements:  Security descriptors are baked into the index,
> and then potentially thousands of them (depending on the user context) are
> passed in with each query.  These excessive queries are only a problem with
> approximately 5% of users who are highly entitled, but the number of
> security descriptors in are likely to increase and we won't have a
> workaround for this security policy any time soon.
>
> After a lot of Googling, it seems to me that it's common to increase the
> headerBufferSize, but I don't see any other strategies.  Is it
> possible/feasible to switch to use POST for querying?
>
> Thanks!
> --
> View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html
> Sent from the Solr - User mailing list archive at Nabble.com.

mrw
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

mrw
Yeah, I tried switching to POST.

It seems to be handling the size, but apparently Solr has a limit on the number of boolean comparisons -- I'm now getting "too many boolean clauses" errors emanating from

org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108).  :)


Thanks for responding.


Erik Hatcher-4 wrote
Yes, you may use POST to make search requests to Solr.

        Erik
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

Jonathan Rochkind
Yes, I think it's 1024 by default.  I think you can raise it in your
config. But your performance may suffer.

Best would be to try and find a better way to do what you want without
using thousands of clauses. This might require some custom Java plugins
to Solr though.

On 2/17/2011 3:52 PM, mrw wrote:

> Yeah, I tried switching to POST.
>
> It seems to be handling the size, but apparently Solr has a limit on the
> number of boolean comparisons -- I'm now getting "too many boolean clauses"
> errors emanating from
>
> org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:108).
> :)
>
>
> Thanks for responding.
>
>
>
> Erik Hatcher-4 wrote:
>> Yes, you may use POST to make search requests to Solr.
>>
>> Erik
>>
>>
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

gearond
In reply to this post by mrw
Probably you could do it, and solving a problem in business supersedes
'rightness' concerns, much to the dismay of geeks and 'those who like rightness
and say the word "Neemph!" '.


the not rightness about this is that:
POST, PUT, DELETE are assumed to make changes to the URL's backend.
GET is assumed NOT to make changes.

So if your POST does not make a change . . . it breaks convention. But if it
solves the problem . . . :-)

Another way would be to GET with a 'query file' location, and then have the
server fetch that query and execute it.

Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs in
them :-)

 Dennis Gearon


Signature Warning
----------------
It is always a good idea to learn from your own mistakes. It is usually a better
idea to learn from others’ mistakes, so you do not have to make them yourself.
from 'http://blogs.techrepublic.com.com/security/?p=4501&tag=nl.e036'


EARTH has a Right To Life,
otherwise we all die.




________________________________
From: mrw <[hidden email]>
To: [hidden email]
Sent: Thu, February 17, 2011 11:27:06 AM
Subject: GET or POST for large queries?


We are running into some issues with large queries.  Initially, they were
ostensibly header buffer overruns, because increasing Jetty's
headerBufferSize value to 65536 resolved them. This seems like a kludge, but
it does solve the problem for 95% of our users.

However, we do have queries that are physically larger than that and for
which increasing the headerBufferSize to 65536 does not work.  This is due
to security requirements:  Security descriptors are baked into the index,
and then potentially thousands of them (depending on the user context) are
passed in with each query.  These excessive queries are only a problem with
approximately 5% of users who are highly entitled, but the number of
security descriptors in are likely to increase and we won't have a
workaround for this security policy any time soon.

After a lot of Googling, it seems to me that it's common to increase the
headerBufferSize, but I don't see any other strategies.  Is it
possible/feasible to switch to use POST for querying?

Thanks!
--
View this message in context:
http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2521700.html

Sent from the Solr - User mailing list archive at Nabble.com.
mrw
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

mrw
Thanks for the response.

Yes, the queries are fairly large.  Basically, the corporate security policy dictates that we use row-level security attributes from the DB for access control to Solr.   So,  we bake row-level security attributes from the database into the index, and then, at query time, ask for those same attributes from the DB and pass them as part of the Solr query.  So, imagine a bank VP with access to tens of thousands of customer records and transactions, and all those access attributes get sent to Solr.  The system works well for the low-level account managers and low-entitlement users, but cannot scale for the high-level folks.

POSTing the data appears to avoid the header threshold issue, but it breaks because of the "too many boolean clauses" error.



gearond wrote
Probably you could do it, and solving a problem in business supersedes
'rightness' concerns, much to the dismay of geeks and 'those who like rightness
and say the word "Neemph!" '.


the not rightness about this is that:
POST, PUT, DELETE are assumed to make changes to the URL's backend.
GET is assumed NOT to make changes.

So if your POST does not make a change . . . it breaks convention. But if it
solves the problem . . . :-)

Another way would be to GET with a 'query file' location, and then have the
server fetch that query and execute it.

Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs in
them :-)

 Dennis Gearon
mrw
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

mrw
In reply to this post by Jonathan Rochkind
Thanks for the response and info.

I'll try that.  

Jonathan Rochkind wrote
Yes, I think it's 1024 by default.  I think you can raise it in your
config. But your performance may suffer.

Best would be to try and find a better way to do what you want without
using thousands of clauses. This might require some custom Java plugins
to Solr though.
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

Markus Jelsma-2
In reply to this post by mrw
Increase the setting in solrconfig

On Friday 18 February 2011 15:30:11 mrw wrote:

> Thanks for the response.
>
> POSTing the data appears to avoid the header threshold issue, but it breaks
> because of the "too many boolean clauses" error.
>
> gearond wrote:
> > Probably you could do it, and solving a problem in business supersedes
> > 'rightness' concerns, much to the dismay of geeks and 'those who like
> > rightness
> > and say the word "Neemph!" '.
> >
> >
> > the not rightness about this is that:
> > POST, PUT, DELETE are assumed to make changes to the URL's backend.
> > GET is assumed NOT to make changes.
> >
> > So if your POST does not make a change . . . it breaks convention. But if
> > it
> > solves the problem . . . :-)
> >
> > Another way would be to GET with a 'query file' location, and then have
> > the
> > server fetch that query and execute it.
> >
> > Boy!!! I'd love to see one of your queries!!! You must have a few
> > ANDs/ORs in
> > them :-)
> >
> >  Dennis Gearon

--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

Jan Høydahl / Cominvent
In reply to this post by mrw
Hi,

There are better ways to combat row level security in search than sending huge lists of users over the wire.

Have you checked out the ManifoldCF project with which you can integrate security to Solr? http://incubator.apache.org/connectors/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. feb. 2011, at 15.30, mrw wrote:

>
> Thanks for the response.
>
> Yes, the queries are fairly large.  Basically, the corporate security policy
> dictates that we use row-level security attributes from the DB for access
> control to Solr.   So,  we bake row-level security attributes from the
> database into the index, and then, at query time, ask for those same
> attributes from the DB and pass them as part of the Solr query.  So, imagine
> a bank VP with access to tens of thousands of customer records and
> transactions, and all those access attributes get sent to Solr.  The system
> works well for the low-level account managers and low-entitlement users, but
> cannot scale for the high-level folks.
>
> POSTing the data appears to avoid the header threshold issue, but it breaks
> because of the "too many boolean clauses" error.
>
>
>
>
> gearond wrote:
>>
>> Probably you could do it, and solving a problem in business supersedes
>> 'rightness' concerns, much to the dismay of geeks and 'those who like
>> rightness
>> and say the word "Neemph!" '.
>>
>>
>> the not rightness about this is that:
>> POST, PUT, DELETE are assumed to make changes to the URL's backend.
>> GET is assumed NOT to make changes.
>>
>> So if your POST does not make a change . . . it breaks convention. But if
>> it
>> solves the problem . . . :-)
>>
>> Another way would be to GET with a 'query file' location, and then have
>> the
>> server fetch that query and execute it.
>>
>> Boy!!! I'd love to see one of your queries!!! You must have a few ANDs/ORs
>> in
>> them :-)
>>
>> Dennis Gearon
>>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2526934.html
> Sent from the Solr - User mailing list archive at Nabble.com.

mrw
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

mrw
Thanks for the tip.  No, I did not know about that.  Unfortunately, we use Oracle OLS which does not appear to be supported.

Jan Høydahl / Cominvent wrote
Hi,

There are better ways to combat row level security in search than sending huge lists of users over the wire.

Have you checked out the ManifoldCF project with which you can integrate security to Solr? http://incubator.apache.org/connectors/

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Reply | Threaded
Open this post in threaded view
|

Re: GET or POST for large queries?

Jan Høydahl / Cominvent
OK.

I would ask on the mailing list of ManifoldCF to see if they have some experience with OLS.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

On 18. feb. 2011, at 17.29, mrw wrote:

>
> Thanks for the tip.  No, I did not know about that.  Unfortunately, we use
> Oracle OLS which does not appear to be supported.
>
>
> Jan Høydahl / Cominvent wrote:
>>
>> Hi,
>>
>> There are better ways to combat row level security in search than sending
>> huge lists of users over the wire.
>>
>> Have you checked out the ManifoldCF project with which you can integrate
>> security to Solr? http://incubator.apache.org/connectors/
>>
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com
>>
>>
>>
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/GET-or-POST-for-large-queries-tp2521700p2527765.html
> Sent from the Solr - User mailing list archive at Nabble.com.