Remote Searcher performance and Document retrieval
I have a question about the performance of the following Lucene setup:
I have a MultiSearcher that contains 3 RemoteSearchables that point to IndexSearchables on 3 different boxes, and each index contains about 20,000 docs. When an end user runs a search, results are displayed with 20 hits per page, and each hits shows about 10 fields from the document. My issue is that the remote protocol used to retrieve the Hits appears to be very chatty, requiring one remote network call for each Hit retrieved (that is, each call to Hits.doc(n) ends up making an individual remote call). Thus, to display a result page to an end user, there are 3 remote search() calls (one for each index) and 20 remote doc() calls. When I added some performance monitoring logging to my code, I found that for each results page retrieving the Documents took more than twice as long as the actual search() call.
It seems like things would be much more performant if instead there were a method like "Document Hits.docs(int docIds)" (and a corresponding method on the Searcher "Document Searcher.docs(int docIds)") that could return a bulk set of Documents at once. This way I would only need to make one remote call to each index for each page of results.
Can anyone comment on this performance issue? Am I doing something wrong, or there is some other way I can retrieve Documents faster when using a Remote index?
Re: Remote Searcher performance and Document retrieval
No, I haven't compared it to a local index, but when working on other projects in the past I've always seen that "chatty" remote access can really hurt performance, and that batching up multiple calls into a single remote access saves a lot of time.