[jira] Created: (HADOOP-1542) Speculative execution used when property set to false

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

[jira] Created: (HADOOP-1542) Speculative execution used when property set to false

Parth (Jira)
Speculative execution used when property set to false
-----------------------------------------------------

                 Key: HADOOP-1542
                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
             Project: Hadoop
          Issue Type: Bug
          Components: mapred
            Reporter: Nigel Daley


Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  

However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1542) Speculative execution used when property set to false

Parth (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-1542:
--------------------------------

    Attachment: jobtracker.log

Here's the jobtracker log.  Follow the life cycle of task_0005_m_000004.  This is a TestDFSIO map task that should be writing data.

Note that task_0005_m_000004_1 is created right after task_0005_m_000004_0 even though speculative execution should be off.  task_0005_m_000004_0 seems to complete fine (task_0005_m_000004_1 fails with AlreadyBeingCreatedException -- see namenode.log) but the file it creates (/benchmarks/TestDFSIO/io_data/test_io_12) seems to get lost.

> Speculative execution used when property set to false
> -----------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Nigel Daley
>         Attachments: jobtracker.log
>
>
> Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  
> However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1542) Speculative execution used when property set to false

Parth (Jira)
In reply to this post by Parth (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nigel Daley updated HADOOP-1542:
--------------------------------

    Attachment: namenode.log

Attaching namenode.log.  

> Speculative execution used when property set to false
> -----------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Nigel Daley
>         Attachments: jobtracker.log, namenode.log
>
>
> Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  
> However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1542) Speculative execution used when property set to false

Parth (Jira)
In reply to this post by Parth (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508733 ]

Nigel Daley commented on HADOOP-1542:
-------------------------------------

Is it possible that setting speculation off in mapred-default.xml on my job submission host has no effect and that i really need to set it off in hadoop-site.xml?  

If that is the case, then speculation would be on -- which explains perfectly the immediate execution of task_0005_m_000004_1.  In which case this should be filed as a dfs bug since the file created by the first map (task_0005_m_000004_0) is getting lost.

> Speculative execution used when property set to false
> -----------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Nigel Daley
>         Attachments: jobtracker.log, namenode.log
>
>
> Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  
> However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1542) Speculative execution used when property set to false

Parth (Jira)
In reply to this post by Parth (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12508739 ]

Devaraj Das commented on HADOOP-1542:
-------------------------------------

It looks like the JobTracker is ignoring the mapred-default.xml config items and hence the speculative execution setting in mapred-default.xml is not reflected. However, on the tasktrackers, the mapred-default.xml indeed overrides the config in hadoop-site/hadoop-default.xml, and hence sees speculative execution switched off.
This could be the cause of the dfs file-lost problem. Here's the theory (not yet validated from the source code): When the maps tries to create files on dfs, they try to the create the "final" files (as opposed to the speculative case where the output path for the files would point to task specific directories). Hence the spec instance of a map gets the AlreadyBeingCreatedException. Finally the JT, which thinks that spec exec is turned on, tries to rename the empty file path to its final destination and that overwrites the real file that the task originally created.

> Speculative execution used when property set to false
> -----------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Nigel Daley
>         Attachments: jobtracker.log, namenode.log
>
>
> Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  
> However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1542) Speculative execution used when property set to false

Parth (Jira)
In reply to this post by Parth (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1542:
----------------------------------

    Assignee: Devaraj Das
    Priority: Blocker  (was: Major)

This seems like a blocker until we understand what is happening.

> Speculative execution used when property set to false
> -----------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>            Reporter: Nigel Daley
>            Assignee: Devaraj Das
>            Priority: Blocker
>         Attachments: jobtracker.log, namenode.log
>
>
> Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  
> However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Updated: (HADOOP-1542) Incorrect task/tip being scheduled (looks like speculative execution)

Parth (Jira)
In reply to this post by Parth (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley updated HADOOP-1542:
----------------------------------

        Fix Version/s: 0.14.0
             Assignee: Owen O'Malley  (was: Devaraj Das)
          Description:
The change in HADOOP-1440 broke map/reduce by breaking the assumption that Task.getPartition() corresponded to the JobInProgress.map[] order.

Currently JobInProgress.findNewTask uses Task.getPartition as the index of the map to run. This can be a completely different tip, which will cause incorrect tasks to be run, including duplicates of tasks that are already running.

  was:
Speculative execution is now on by default.  When running TestDFSIO, I set speculative execution off in my mapred-default.xml since this test has maps that create files in DFS (side-effects).  

However, it seems that speculative tasks get started even though I have set speculation off.  I'll attached the NN and JT logs.

    Affects Version/s: 0.14.0
              Summary: Incorrect task/tip being scheduled (looks like speculative execution)  (was: Speculative execution used when property set to false)

> Incorrect task/tip being scheduled (looks like speculative execution)
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.0
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: jobtracker.log, namenode.log
>
>
> The change in HADOOP-1440 broke map/reduce by breaking the assumption that Task.getPartition() corresponded to the JobInProgress.map[] order.
> Currently JobInProgress.findNewTask uses Task.getPartition as the index of the map to run. This can be a completely different tip, which will cause incorrect tasks to be run, including duplicates of tasks that are already running.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Commented: (HADOOP-1542) Incorrect task/tip being scheduled (looks like speculative execution)

Parth (Jira)
In reply to this post by Parth (Jira)

    [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12509159 ]

Owen O'Malley commented on HADOOP-1542:
---------------------------------------

For now, I've reverted HADOOP-1440, which fixes this problem.

> Incorrect task/tip being scheduled (looks like speculative execution)
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.0
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: jobtracker.log, namenode.log
>
>
> The change in HADOOP-1440 broke map/reduce by breaking the assumption that Task.getPartition() corresponded to the JobInProgress.map[] order.
> Currently JobInProgress.findNewTask uses Task.getPartition as the index of the map to run. This can be a completely different tip, which will cause incorrect tasks to be run, including duplicates of tasks that are already running.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply | Threaded
Open this post in threaded view
|

[jira] Resolved: (HADOOP-1542) Incorrect task/tip being scheduled (looks like speculative execution)

Parth (Jira)
In reply to this post by Parth (Jira)

     [ https://issues.apache.org/jira/browse/HADOOP-1542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Owen O'Malley resolved HADOOP-1542.
-----------------------------------

    Resolution: Fixed

HADOOP-1440 needs to be re-created as a new issue, but this bug is fixed.

> Incorrect task/tip being scheduled (looks like speculative execution)
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-1542
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1542
>             Project: Hadoop
>          Issue Type: Bug
>          Components: mapred
>    Affects Versions: 0.14.0
>            Reporter: Nigel Daley
>            Assignee: Owen O'Malley
>            Priority: Blocker
>             Fix For: 0.14.0
>
>         Attachments: jobtracker.log, namenode.log
>
>
> The change in HADOOP-1440 broke map/reduce by breaking the assumption that Task.getPartition() corresponded to the JobInProgress.map[] order.
> Currently JobInProgress.findNewTask uses Task.getPartition as the index of the map to run. This can be a completely different tip, which will cause incorrect tasks to be run, including duplicates of tasks that are already running.

--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.