Commit strategies

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Commit strategies

James Brady-3
Hi all,
So the Solr tutorial recommends batching operation to improve  
performance by avoiding multiple costly commits.

To implement this, I originally had a couple of methods in my python  
app reading from or writing to Solr, with a scheduled task blindly  
committing every 15 seconds.

However, my logs were chock full of errors such as:
   File "/mnt/yelteam/server_dev/YelServer/yel/yel_search.py", line  
73, in __add
     self.conn.add(**params)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in  
add
     return self.doUpdateXML(xstr)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in  
doUpdateXML
     rsp = self.doPost(self.solrBase+'/update', request,  
self.xmlheaders)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 94, in  
doPost
     return self.__errcheck(self.conn.getresponse())
   File "/usr/lib64/python2.4/httplib.py", line 856, in getresponse
     raise ResponseNotReady()
ResponseNotReady

and:
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 159, in  
add
     return self.doUpdateXML(xstr)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 106, in  
doUpdateXML
     rsp = self.doPost(self.solrBase+'/update', request,  
self.xmlheaders)
   File "/mnt/yelteam/server_dev/YelServer/yel/solr.py", line 102, in  
doPost
     return self.__errcheck(self.conn.getresponse())
   File "/usr/lib64/python2.4/httplib.py", line 866, in getresponse
     response.begin()
   File "/usr/lib64/python2.4/httplib.py", line 336, in begin
     version, status, reason = self._read_status()
   File "/usr/lib64/python2.4/httplib.py", line 294, in _read_status
     line = self.fp.readline()
   File "/usr/lib64/python2.4/socket.py", line 317, in readline
     data = recv(1)
error: (104, 'Connection reset by peer')

and a few other variations.

I thought it might be to do with commit operations conflicting with  
reads or writes, so wrote and even dumber queueing system to hold  
onto pending reads/writes while a commit went through.

However, my logs are still full of those errors :) I doubt that  
either python's httplib library or Solr are buggy, so is it something  
to do with the way I'm using the API?

How do people generally approach the deferred commit issue? Do I need  
to queue index and search requests myself or does Solr handle it? My  
app indexes about 100 times more than it searches, but searching is  
more time critical. Does that change anything?

Thanks!
James
Reply | Threaded
Open this post in threaded view
|

Re: Commit strategies

hossman

if you just want commits ot happen on a regular frequenty take a look at
the autoCommit options.

sa for the specific errors you are getting, i don't know enouugh python to
unerstand them, but it may just be that your commits are taking too long
and your client is timing out on waiting for the commit to finish.

have you tried increasing the timeout?

: How do people generally approach the deferred commit issue? Do I need to queue
: index and search requests myself or does Solr handle it? My app indexes about
: 100 times more than it searches, but searching is more time critical. Does
: that change anything?

searches can go on happily while commits/adds are happening, and multiple
adds can happen in parallel, ... but all adds block while a commit is
taking place.  i just give all of clients that update the index a really
large timeout value (ie: 60 seconds or so) and don't worry about queing up
indexing requests.  the only intelegence you typically need to worry about
is that there's very little reason to ever do a commit if you know you've
got more adds ready to go.




-Hoss