using HttpSolrServer with PoolingHttpClientConnectionManager

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

using HttpSolrServer with PoolingHttpClientConnectionManager

Renee Sun
first of all I apologize for the length of this message ... there are few questions I would appreciate your help please:

1. originally I wanted to use solrj in my application layer (webapp deployed with tomcat), to query the solr server(s) with multi-cores, non-cloud setup.

Since I need send back XML format to my client, I realize it is not an use case for solrj, so I should abandon the idea (correct?)

2. I also looked into CommonsHttpSolrServer trying to query solr directly, which supposedly allow me to set XMLResponseParser as  ResponseParser.  however, it seems CommonsHttpSolrServer is deprecated, with httpclient 4.x I think I should use HttpSolrServer. I do need to have a way to set the returned data in xml format, and I want to use pooled http conn manager to support multiple thread for queries. I thought I could do all this with HttpSolrServer, (yes?) as below:

PoolingHttpClientConnectionManager connManager = new PoolingHttpClientConnectionManager();
connManager.setMaxTotal(5);
connManager.setDefaultMaxPerRoute(4);
... ...
CloseableHttpClient httpclient = HttpClients.custom().setConnectionManager(connManager).build();
... ...
ResponseParser parser = new XMLResponseParser();
... ...
HttpSolrServer server = new HttpSolrServer(myUrl, httpclient, parser);

... ...
SolrQuery query = new SolrQuery();
query.setQuery(q);
query.setParam("wt", "xml"); // not needed?
... ...
QueryResponse response = server.query(query);
SolrDocumentList sdl = response.getResults();

at this point will the documents in sdl be in xml format if I use toString() looping through them? will there be overhead if this works at all? will solrj skip the xml parsing and simply return the results as I requested xml parser?

I somehow feel its very fishy and I could be better off just not use solrj ? what is the best practice here?

3. I think my next question could be more like a httpclient question, but it does relate to solr / cores, so I will hope someone can give me help here:

when I try to config PoolingHttpClientConnectionManager, for the per route connection etc, will the following different url considered to be different routes, or since they hit the same server, it will ignore the collection/core part?

String myUrl = "http://localhost:8983/solr/core1";

and

String myUrl = "http://localhost:8983/solr/core2";


Thanks!
Renee
Reply | Threaded
Open this post in threaded view
|

Re: using HttpSolrServer with PoolingHttpClientConnectionManager

Shawn Heisey-2
On 3/1/2017 3:13 PM, Renee Sun wrote:

> first of all I apologize for the length of this message ... there are few
> questions I would appreciate your help please:
>
> 1. originally I wanted to use solrj in my application layer (webapp deployed
> with tomcat), to query the solr server(s) with multi-cores, non-cloud setup.
>
> Since I need send back XML format to my client, I realize it is not an use
> case for solrj, so I should abandon the idea (correct?)
>
> 2. I also looked into CommonsHttpSolrServer trying to query solr directly,
> which supposedly allow me to set XMLResponseParser as  ResponseParser.
> however, it seems CommonsHttpSolrServer is deprecated, with httpclient 4.x I
> think I should use HttpSolrServer. I do need to have a way to set the
> returned data in xml format, and I want to use pooled http conn manager to
> support multiple thread for queries. I thought I could do all this with
> HttpSolrServer, (yes?) as below:
>
> PoolingHttpClientConnectionManager connManager = new
> PoolingHttpClientConnectionManager();
> connManager.setMaxTotal(5);
> connManager.setDefaultMaxPerRoute(4);
> ... ...
> CloseableHttpClient httpclient =
> HttpClients.custom().setConnectionManager(connManager).build();
> ... ...
> ResponseParser parser = new XMLResponseParser();
> ... ...
> HttpSolrServer server = new HttpSolrServer(myUrl, httpclient, parser);
>
> ... ...
> SolrQuery query = new SolrQuery();
> query.setQuery(q);
> query.setParam("wt", "xml"); // not needed?
> ... ...
> QueryResponse response = server.query(query);
> SolrDocumentList sdl = response.getResults();

In the newest versions, both CommonsHttpSolrServer and HttpSolrServer
are completely gone.  The class to use has been HttpSolrClient since
5.0.  If you are using a 4.x version, then you would use HttpSolrServer.

> at this point will the documents in sdl be in xml format if I use toString()
> looping through them? will there be overhead if this works at all? will
> solrj skip the xml parsing and simply return the results as I requested xml
> parser?

No, this will not produce XML format.  At best, it will produce a
human-readable representation of the document contents, which likely
does not conform to any specific well-known format.  SolrJ is designed
to return Java objects from Solr that can easily be used by Java code
without any need to convert from the wire format.  Setting the XML
response parser will control what format of response the Solr server
sends to SolrJ, but regardless of the wire format, the response object
will be the same.  The default binary writer is faster than XML.

If you want all the docs in XML format, you have two choices:  1) Don't
use SolrJ at all, and specify xml format when making the HTTP request.
2) Write code to convert from solr document objects to xml.  Either way
you're going to be handling a lot of it yourself.

When it comes to making sure SolrJ can handle many threads at once, this
is what I do:

  RequestConfig rc = RequestConfig.custom().setConnectTimeout(15000)
    .setSocketTimeout(Const.SOCKET_TIMEOUT).build();
  httpClient = HttpClients.custom().setDefaultRequestConfig(rc)
    .setMaxConnPerRoute(300).setMaxConnTotal(5000).disableAutomaticRetries()
    .build();
  client = new HttpSolrClient(serverBaseUrl, httpClient);

This code uses SolrJ 5.5.3, which depends on HttpClient 4.4.1.

I do not explicitly specify the connection manager.  I specify the
maximum number of connections and let HttpClient handle it in the way
that it thinks is best.

I do not know exactly how HttpClient determines what URLs are the same
route.  You would need to consult HttpClient documentation.

Thanks,
Shawn

Reply | Threaded
Open this post in threaded view
|

Re: using HttpSolrServer with PoolingHttpClientConnectionManager

Renee Sun
Thank you Shawn! this is very helpful.

Renee