Deadlock with DirectUpdateHandler2

classic Classic list List threaded Threaded
8 messages Options
Reply | Threaded
Open this post in threaded view
|

Deadlock with DirectUpdateHandler2

Toby Cole-2
Has anyone else experienced a deadlock when the DirectUpdateHandler2  
does an autocommit?
I'm using a recent snapshot from hudson (apache-
solr-2008-11-12_08-06-21), and quite often when I'm loading data the  
server (tomcat 6) gets stuck at line 469 of DirectUpdateHandler2:

       // Check if there is a commit already scheduled for longer then  
this time
       if( pending != null &&
           pending.getDelay(TimeUnit.MILLISECONDS) >= commitMaxTime )

Anyone got any enlightening tips?
Cheers,

Toby Cole
Software Engineer

Semantico
Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
T: +44 (0)1273 358 238
F: +44 (0)1273 723 232
E: [hidden email]
W: www.semantico.com

Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Mark Miller-3
Toby Cole wrote:

> Has anyone else experienced a deadlock when the DirectUpdateHandler2
> does an autocommit?
> I'm using a recent snapshot from hudson
> (apache-solr-2008-11-12_08-06-21), and quite often when I'm loading
> data the server (tomcat 6) gets stuck at line 469 of
> DirectUpdateHandler2:
>
>       // Check if there is a commit already scheduled for longer then
> this time
>       if( pending != null &&
>           pending.getDelay(TimeUnit.MILLISECONDS) >= commitMaxTime )
>
> Anyone got any enlightening tips?
> Cheers,
>
> Toby Cole
> Software Engineer
>
> Semantico
> Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
> T: +44 (0)1273 358 238
> F: +44 (0)1273 723 232
> E: [hidden email]
> W: www.semantico.com
>
There is some inconsistent synchronization I think. Especially involving
pending. Yuck <g>
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Mark Miller-3
Mark Miller wrote:

> Toby Cole wrote:
>> Has anyone else experienced a deadlock when the DirectUpdateHandler2
>> does an autocommit?
>> I'm using a recent snapshot from hudson
>> (apache-solr-2008-11-12_08-06-21), and quite often when I'm loading
>> data the server (tomcat 6) gets stuck at line 469 of
>> DirectUpdateHandler2:
>>
>>       // Check if there is a commit already scheduled for longer then
>> this time
>>       if( pending != null &&
>>           pending.getDelay(TimeUnit.MILLISECONDS) >= commitMaxTime )
>>
>> Anyone got any enlightening tips?
>> Cheers,
>>
>> Toby Cole
>> Software Engineer
>>
>> Semantico
>> Lees House, Floor 1, 21-23 Dyke Road, Brighton BN1 3FE
>> T: +44 (0)1273 358 238
>> F: +44 (0)1273 723 232
>> E: [hidden email]
>> W: www.semantico.com
>>
> There is some inconsistent synchronization I think. Especially
> involving pending. Yuck <g>
I would say there are problems with pending, autoCommitCount, and
lastAddedTime. That alone could probably cause a deadlock (who knows),
but it also seems somewhat possible that there is an issue with the
heavy intermingling of locks (there a bunch of locks to be had in that
class). I havn't looked for evidence of that though - prob makes sense
to fix those 3 guys and see if you get reports from there.
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Mike Klaas

On 18-Nov-08, at 8:54 AM, Mark Miller wrote:

> Mark Miller wrote:
>> Toby Cole wrote:
>>> Has anyone else experienced a deadlock when the  
>>> DirectUpdateHandler2 does an autocommit?
>>> I'm using a recent snapshot from hudson (apache-
>>> solr-2008-11-12_08-06-21), and quite often when I'm loading data  
>>> the server (tomcat 6) gets stuck at line 469 of  
>>> DirectUpdateHandler2:
>>>
>>>      // Check if there is a commit already scheduled for longer  
>>> then this time
>>>      if( pending != null &&
>>>          pending.getDelay(TimeUnit.MILLISECONDS) >= commitMaxTime )
>>>
>>> Anyone got any enlightening tips?
>>>

>> There is some inconsistent synchronization I think. Especially  
>> involving pending. Yuck <g>
> I would say there are problems with pending, autoCommitCount, and  
> lastAddedTime. That alone could probably cause a deadlock (who  
> knows), but it also seems somewhat possible that there is an issue  
> with the heavy intermingling of locks (there a bunch of locks to be  
> had in that class). I havn't looked for evidence of that though -  
> prob makes sense to fix those 3 guys and see if you get reports from  
> there.


autoCommitCount is written in a CommitTracker.synchronized block  
only.  It is read to print stats in an unsynchronized fashion, which  
perhaps could be fixed, though I can't see how it could cause a problem

lastAddedTime is only written in a call path within a  
DirectUpdateHandler2.synchronized block.  It is only read in a  
CommitTracker.synchronized block.  It could read the wrong value, but  
I also don't see this causing a problem (a commit might fail to be  
scheduled).  This could probably also be improved, but doesn't seem  
important.

pending seems to be the issue.  As long as commit are only triggered  
by autocommit, there is no issue as manipulation of pending is always  
performed inside CommitTracker.synchronized.  But didCommit()/
didRollback() could be called via manual commit, and pending is  
directly manipulated during DUH2.close().  I'm having trouble coming  
up with a plausible deadlock scenario, but this needs to be fixed.  It  
isn't as easy as synchronizing didCommit/didRollback, though--this  
would introduce definite deadlock scenarios.

Mark, is there any chance you could post the thread dump for the  
deadlocked process?  Do you issue manual commits during insertion?

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Mark Miller-3
Mike Klaas wrote:

>
>
> autoCommitCount is written in a CommitTracker.synchronized block
> only.  It is read to print stats in an unsynchronized fashion, which
> perhaps could be fixed, though I can't see how it could cause a problem
>
> lastAddedTime is only written in a call path within a
> DirectUpdateHandler2.synchronized block.  It is only read in a
> CommitTracker.synchronized block.  It could read the wrong value, but
> I also don't see this causing a problem (a commit might fail to be
> scheduled).  This could probably also be improved, but doesn't seem
> important.
Right. I don't see these as causing a deadlock either, but whatever
happens, its pretty much JVM undefined right, hence 'who knows' (I'll go
with pretty doubtful <g>). I am not so sure its safe to read a value
from an unsynced method whether you care about the result or not though.
Its prob safe for atomic types and volatiles, but I'm fairly sure your
playing with fire doing read/write in and out of sync. I don't think its
just about stale values. But then again, it probably works 99.9% of the
time or something.

>
> pending seems to be the issue.  As long as commit are only triggered
> by autocommit, there is no issue as manipulation of pending is always
> performed inside CommitTracker.synchronized.  But
> didCommit()/didRollback() could be called via manual commit, and
> pending is directly manipulated during DUH2.close().  I'm having
> trouble coming up with a plausible deadlock scenario, but this needs
> to be fixed.  It isn't as easy as synchronizing didCommit/didRollback,
> though--this would introduce definite deadlock scenarios.
>
> Mark, is there any chance you could post the thread dump for the
> deadlocked process?  Do you issue manual commits during insertion?
Toby reported it. Thread dump Toby?
>
> -Mike

Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Toby Cole-2
On 18 Nov 2008, at 20:18, Mark Miller wrote:

> Mike Klaas wrote:
>>
>>
>> autoCommitCount is written in a CommitTracker.synchronized block  
>> only.  It is read to print stats in an unsynchronized fashion,  
>> which perhaps could be fixed, though I can't see how it could cause  
>> a problem
>>
>> lastAddedTime is only written in a call path within a  
>> DirectUpdateHandler2.synchronized block.  It is only read in a  
>> CommitTracker.synchronized block.  It could read the wrong value,  
>> but I also don't see this causing a problem (a commit might fail to  
>> be scheduled).  This could probably also be improved, but doesn't  
>> seem important.
> Right. I don't see these as causing a deadlock either, but whatever  
> happens, its pretty much JVM undefined right, hence 'who  
> knows' (I'll go with pretty doubtful <g>). I am not so sure its safe  
> to read a value from an unsynced method whether you care about the  
> result or not though. Its prob safe for atomic types and volatiles,  
> but I'm fairly sure your playing with fire doing read/write in and  
> out of sync. I don't think its just about stale values. But then  
> again, it probably works 99.9% of the time or something.
>>
>> pending seems to be the issue.  As long as commit are only  
>> triggered by autocommit, there is no issue as manipulation of  
>> pending is always performed inside CommitTracker.synchronized.  But  
>> didCommit()/didRollback() could be called via manual commit, and  
>> pending is directly manipulated during DUH2.close().  I'm having  
>> trouble coming up with a plausible deadlock scenario, but this  
>> needs to be fixed.  It isn't as easy as synchronizing didCommit/
>> didRollback, though--this would introduce definite deadlock  
>> scenarios.
>>
>> Mark, is there any chance you could post the thread dump for the  
>> deadlocked process?  Do you issue manual commits during insertion?
> Toby reported it. Thread dump Toby?
>>
>> -Mike

I'll try and post a thread dump when I get to work, can't remote in  
from here.
I don't mind helping out with the fix, I've been getting to know  
solr's internals quite intimately recently after writing a few  
handlers/components for internal projects.

T
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Mike Klaas
In reply to this post by Mark Miller-3
On 18-Nov-08, at 12:18 PM, Mark Miller wrote:

> Mike Klaas wrote:
>>
>>
>> autoCommitCount is written in a CommitTracker.synchronized block  
>> only.  It is read to print stats in an unsynchronized fashion,  
>> which perhaps could be fixed, though I can't see how it could cause  
>> a problem
>>
>> lastAddedTime is only written in a call path within a  
>> DirectUpdateHandler2.synchronized block.  It is only read in a  
>> CommitTracker.synchronized block.  It could read the wrong value,  
>> but I also don't see this causing a problem (a commit might fail to  
>> be scheduled).  This could probably also be improved, but doesn't  
>> seem important.
> Right. I don't see these as causing a deadlock either, but whatever  
> happens, its pretty much JVM undefined right, hence 'who  
> knows' (I'll go with pretty doubtful <g>). I am not so sure its safe  
> to read a value from an unsynced method whether you care about the  
> result or not though. Its prob safe for atomic types and volatiles,  
> but I'm fairly sure your playing with fire doing read/write in and  
> out of sync. I don't think its just about stale values. But then  
> again, it probably works 99.9% of the time or something.

Yeah, something like lastAddedTime is worth getting right, even if it  
works most of the time.  Things like autoCommitCount that read a long  
unsynchronized only to display on an admin screen don't concern me as  
much.

-Mike
Reply | Threaded
Open this post in threaded view
|

Re: Deadlock with DirectUpdateHandler2

Ryan McKinley
In reply to this post by Mike Klaas
I'm also hitting some threading issues with autocommit -- JConsole  
does not show deadlock, but it shows some threads 'BLOCKED' on  
scheduleCommitWithin

Perhaps this has something to do with the changes we made for: SOLR-793

I am able to fix this (at least I don't see the blocking with the data  
I am using) by not calling the synchronized function from within  
CommitTracker#addedDocument

Mike, can you look over the changes in:
  http://svn.apache.org/viewvc?view=rev&revision=719347

and make sure it makes sense to you?

Essentially it is not calling a synchronized scheduleCommitWithin()  
from within the (already blocked) addedDocument callback

ryan



On Nov 18, 2008, at 3:10 PM, Mike Klaas wrote:

>
> On 18-Nov-08, at 8:54 AM, Mark Miller wrote:
>
>> Mark Miller wrote:
>>> Toby Cole wrote:
>>>> Has anyone else experienced a deadlock when the  
>>>> DirectUpdateHandler2 does an autocommit?
>>>> I'm using a recent snapshot from hudson (apache-
>>>> solr-2008-11-12_08-06-21), and quite often when I'm loading data  
>>>> the server (tomcat 6) gets stuck at line 469 of  
>>>> DirectUpdateHandler2:
>>>>
>>>>     // Check if there is a commit already scheduled for longer  
>>>> then this time
>>>>     if( pending != null &&
>>>>         pending.getDelay(TimeUnit.MILLISECONDS) >= commitMaxTime )
>>>>
>>>> Anyone got any enlightening tips?
>>>>
>
>>> There is some inconsistent synchronization I think. Especially  
>>> involving pending. Yuck <g>
>> I would say there are problems with pending, autoCommitCount, and  
>> lastAddedTime. That alone could probably cause a deadlock (who  
>> knows), but it also seems somewhat possible that there is an issue  
>> with the heavy intermingling of locks (there a bunch of locks to be  
>> had in that class). I havn't looked for evidence of that though -  
>> prob makes sense to fix those 3 guys and see if you get reports  
>> from there.
>
>
> autoCommitCount is written in a CommitTracker.synchronized block  
> only.  It is read to print stats in an unsynchronized fashion, which  
> perhaps could be fixed, though I can't see how it could cause a  
> problem
>
> lastAddedTime is only written in a call path within a  
> DirectUpdateHandler2.synchronized block.  It is only read in a  
> CommitTracker.synchronized block.  It could read the wrong value,  
> but I also don't see this causing a problem (a commit might fail to  
> be scheduled).  This could probably also be improved, but doesn't  
> seem important.
>
> pending seems to be the issue.  As long as commit are only triggered  
> by autocommit, there is no issue as manipulation of pending is  
> always performed inside CommitTracker.synchronized.  But didCommit()/
> didRollback() could be called via manual commit, and pending is  
> directly manipulated during DUH2.close().  I'm having trouble coming  
> up with a plausible deadlock scenario, but this needs to be fixed.  
> It isn't as easy as synchronizing didCommit/didRollback, though--
> this would introduce definite deadlock scenarios.
>
> Mark, is there any chance you could post the thread dump for the  
> deadlocked process?  Do you issue manual commits during insertion?
>
> -Mike