[jira] [Resolved] (HADOOP-17462) Hadoop Client getRpcResponse May Return Wrong Result

classic Classic list List threaded Threaded
1 message Options
Reply | Threaded
Open this post in threaded view
|

[jira] [Resolved] (HADOOP-17462) Hadoop Client getRpcResponse May Return Wrong Result

Steve Loughran (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-17462?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mollitor resolved HADOOP-17462.
-------------------------------------
    Resolution: Not A Problem

Thanks [~sjlee0] for pointing outing that the {{call}} object is synchronized on.

It is implemented a bit differently in the reference docs.  This functionality is implemented within the same class itself instead of synchronizing and accessing the variable from an external class, but I'm thinking it should be the same.

https://docs.oracle.com/javase/tutorial/essential/concurrency/guardmeth.html

I'll test more in depth and if I can prove it definitively, I'll re-open this ticket.

> Hadoop Client getRpcResponse May Return Wrong Result
> ----------------------------------------------------
>
>                 Key: HADOOP-17462
>                 URL: https://issues.apache.org/jira/browse/HADOOP-17462
>             Project: Hadoop Common
>          Issue Type: Improvement
>          Components: common
>            Reporter: David Mollitor
>            Assignee: David Mollitor
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> {code:java|Title=Client.java}
>   /** @return the rpc response or, in case of timeout, null. */
>   private Writable getRpcResponse(final Call call, final Connection connection,
>       final long timeout, final TimeUnit unit) throws IOException {
>     synchronized (call) {
>       while (!call.done) {
>         try {
>           AsyncGet.Util.wait(call, timeout, unit);
>           if (timeout >= 0 && !call.done) {
>             return null;
>           }
>         } catch (InterruptedException ie) {
>           Thread.currentThread().interrupt();
>           throw new InterruptedIOException("Call interrupted");
>         }
>       }
>  */
>   static class Call {
>     final int id;               // call id
>     final int retry;           // retry count
> ...
>     boolean done;               // true when call is done
> ...
> }
> {code}
> The {{done}} variable is not marked as {{volatile}} so the thread which is checking its status is free to cache the value and never reload it even though it is expected to change by a different thread.  The while loop may be stuck waiting for the change, but is always looking at a cached value.  If that happens, timeout will occur and then return 'null'.
> In previous versions of Hadoop, there was no time-out at this level, so it would cause endless loop.  Really tough error to track down if it happens.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [hidden email]
For additional commands, e-mail: [hidden email]