switch query parser and solr cloud

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

switch query parser and solr cloud

Dwane Hall
Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.

I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.

Essentially I want the following to occur:

-A user has the option to pass through an optional request parameter "allResults" to solr
-If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
-If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})

Environment
Solr 7.3.1 (1 solr node DEV, 4 solr nodes PTST)
4 shard collection

My Implementation
I'm using the switch query parser to choose client behaviour by appending a filter query to the user request very similar to what is documented in the solr reference guide here (https://lucene.apache.org/solr/guide/7_4/other-parsers.html#switch-query-parser)

The request uses the params api (pertinent line below is the _appends_ filter queries)
(useParams=firstParams,secondParams)

  "set":{
    "firstParams":{
        "op":"AND",
        "wt":"json",
        "start":0,
        "allResults":"false",
        "fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
      "_appends_":{
        "fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" case.true=*:* v=${allResults}}",
      },
      "_invariants_":{
        "deftype":"edismax",
        "timeAllowed":20000,
        "rows":"30",
        "echoParams":"none",
        }
      }
   }

   "set":{
    "secondParams":{
        "df":"FIELD_1",
        "q":"{!edismax v=${searchString} df=FIELD_1 q.op=${op}}",
      "_invariants_":{
        "qf":"FIELD_1,FIELD_2,SUMMARY_FIELD",
        }
      }
   }}

Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.

Does anyone know if the switch query parser has any limitations in a sharded solr cloud environment or can provide any possible troubleshooting advice?

Any community recommendations would be greatly appreciated

Solr stack trace
2018-09-12 12:16:12,918 4064160860 ERROR : [c:my_collection s:shard1 r:core_node3 x:my_collection_ptst_shard1_replica_n1] org.apache.solr.common.SolrException : org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2: java.lang.NullPointerException
        at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:643)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:255)
        at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:244)
        at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1219)
        at org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:172)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.run(FutureTask.java:266)
        at com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:176)
        at org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:188)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:748

Thanks for taking the time to assist,

Dwane
Reply | Threaded
Open this post in threaded view
|

Re: switch query parser and solr cloud

Shawn Heisey-2
On 9/12/2018 5:47 AM, Dwane Hall wrote:
> Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
>
> I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
>
> Essentially I want the following to occur:
>
> -A user has the option to pass through an optional request parameter "allResults" to solr
> -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})

I'm looking at the documentation for the switch parser and I'm having
difficulty figuring out what it actually does.

This is the kind of thing that is better to handle in your client
instead of asking Solr to do it for you.  You'd have to have your code
construct the complex localparam for the switch parser ... it would be
much easier to write code to insert your special collapse filter when it
is required.

> Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.

Some features break down when working with sharded indexes.  This is one
of the reasons that sharding should only be done when it is absolutely
required.  A single-shard index tends to perform better anyway, unless
it's really really huge.

The error is a remote exception, from
https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
suggests that maybe not all your documents are co-located on the same
shard the way you think they are.  Is this a remote server/shard?  I am
completely guessing here.  It's always possible that you've encountered
a bug.  Does this one (not fixed) look like it might apply?

https://issues.apache.org/jira/browse/SOLR-9104

There should be a server-side error logged by the Solr instance running
on myserver:1234 as well.  Have you looked at that?

I do not know what PTST means.  Is that important for me to understand?

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: switch query parser and solr cloud

Erick Erickson
You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey <[hidden email]> wrote:

>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter "allResults" to solr
> > -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: switch query parser and solr cloud

Dwane Hall
Thanks for the suggestions and responses Erick and Shawn.  Erick I only return 30 records irrespective of the query (not the entire payload) I removed some of my configuration settings for readability. The parameter "allResults" was a little misleading I apologise for that but I appreciate your input.

Shawn thanks for your comments. Regarding the switch query parser the Hossman has a great description of its use and application here (https://lucidworks.com/2013/02/20/custom-solr-request-params/).  PTST is just our performance testing environment and is not important in the context of the question other than it being a multi node solr environment.  The server side error was the null pointer which is why I was having a few difficulties debugging it as there was not a lot of info to troubleshoot.  I'll keep playing and explore the client filter option for addressing this issue.

Thanks again for both of your input

Cheers,

Dwane
________________________________
From: Erick Erickson <[hidden email]>
Sent: Thursday, 13 September 2018 12:20 AM
To: solr-user
Subject: Re: switch query parser and solr cloud

You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey <[hidden email]> wrote:

>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter "allResults" to solr
> > -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>
Reply | Threaded
Open this post in threaded view
|

Re: switch query parser and solr cloud

Dwane Hall
Afternoon all,

Just to add some closure to this topic in case anybody else stumbles across a similar problem I've managed to resolve my issue by removing the switch query parser from the _appends_ component of the parameter set.

so the parameter set changes from this

 "set":{
    "firstParams":{
        "op":"AND",
        "wt":"json",
        "start":0,
        "allResults":"false",
        "fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
      "_appends_":{
        "fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" case.true=*:* v=${allResults}}",
      },

to just a regular old filter query

 "set":{
    "firstParams":{
        "op":"AND",
        "wt":"json",
        "start":0,
        "allResults":"false",
        "fl":"FIELD_1,FIELD_2,SUMMARY_FIELD",
        "fq":"{!switch default=\"{!collapse field=SUMMARY_FIELD}\" case.true=*:* v=${allResults}}",

Somewhat odd.

Thanks again to Erick and Shawn for taking the time to assist and talk this through.

Dwane
________________________________
From: Dwane Hall <[hidden email]>
Sent: Thursday, 13 September 2018 6:42 AM
To: Erick Erickson; [hidden email]
Subject: Re: switch query parser and solr cloud

Thanks for the suggestions and responses Erick and Shawn.  Erick I only return 30 records irrespective of the query (not the entire payload) I removed some of my configuration settings for readability. The parameter "allResults" was a little misleading I apologise for that but I appreciate your input.

Shawn thanks for your comments. Regarding the switch query parser the Hossman has a great description of its use and application here (https://lucidworks.com/2013/02/20/custom-solr-request-params/).  PTST is just our performance testing environment and is not important in the context of the question other than it being a multi node solr environment.  The server side error was the null pointer which is why I was having a few difficulties debugging it as there was not a lot of info to troubleshoot.  I'll keep playing and explore the client filter option for addressing this issue.

Thanks again for both of your input

Cheers,

Dwane
________________________________
From: Erick Erickson <[hidden email]>
Sent: Thursday, 13 September 2018 12:20 AM
To: solr-user
Subject: Re: switch query parser and solr cloud

You will run into significant problems if, when returning "all
results", you return large result sets. For regular queries I like to
limit the return to 100, although 1,000 is sometimes OK.

Millions will blow you out of the water, use CursorMark or Streaming
for very large result sets. CursorMark gets you a page at a time, but
efficiently and Streaming doesn't consume huge amounts of memory.

And assuming you could possible return 1M rows, say, what would the
user do with it? Displaying in a browser is problematic for instance.

Best,
Erick
On Wed, Sep 12, 2018 at 5:54 AM Shawn Heisey <[hidden email]> wrote:

>
> On 9/12/2018 5:47 AM, Dwane Hall wrote:
> > Good afternoon Solr brains trust I'm seeking some community advice if somebody can spare a minute from their busy schedules.
> >
> > I'm attempting to use the switch query parser to influence client search behaviour based on a client specified request parameter.
> >
> > Essentially I want the following to occur:
> >
> > -A user has the option to pass through an optional request parameter "allResults" to solr
> > -If "allResults" is true then return all matching query records by appending a filter query for all records (fq=*:*)
> > -If "allResults" is empty then apply a filter using the collapse query parser ({!collapse field=SUMMARY_FIELD})
>
> I'm looking at the documentation for the switch parser and I'm having
> difficulty figuring out what it actually does.
>
> This is the kind of thing that is better to handle in your client
> instead of asking Solr to do it for you.  You'd have to have your code
> construct the complex localparam for the switch parser ... it would be
> much easier to write code to insert your special collapse filter when it
> is required.
>
> > Everything works nicely until I move from a single node solr instance (DEV) to a clustered solr instance (PTST) in which I receive a null pointer exception from Solr which I'm having trouble picking apart.  I've co-located the solr documents using document routing which appear to be the only requirement for the collapse query parser's use.
>
> Some features break down when working with sharded indexes.  This is one
> of the reasons that sharding should only be done when it is absolutely
> required.  A single-shard index tends to perform better anyway, unless
> it's really really huge.
>
> The error is a remote exception, from
> https://myserver:1234/solr/my_collection_ptst_shard2_replica_n2. Which
> suggests that maybe not all your documents are co-located on the same
> shard the way you think they are.  Is this a remote server/shard?  I am
> completely guessing here.  It's always possible that you've encountered
> a bug.  Does this one (not fixed) look like it might apply?
>
> https://issues.apache.org/jira/browse/SOLR-9104
>
> There should be a server-side error logged by the Solr instance running
> on myserver:1234 as well.  Have you looked at that?
>
> I do not know what PTST means.  Is that important for me to understand?
>
> Thanks,
> Shawn
>