Lack of progress info in map tasks

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

Lack of progress info in map tasks

Andrzej Białecki-2
Hi all,

Is it just me, or is there something strange with Hadoop since ~0.10 or
thereabout .. With older version of Hadoop I would get a nice often
updated progress status for each map task. What I'm seeing now is that
map tasks stay at 0.0% and then finally jump to 100.0% and finish.
Consequently, for jobs with small number of long-running map tasks, the
progress update is very coarse.

As I understand, this progress meter (in absence of map tasks explicitly
setting the progress) was based on the RecordReader reporting of how
much of the current split has been read. Is this something that got
broken on the way? If not, what's the reason for this, and how to fix it?

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Lack of progress info in map tasks

Andrzej Białecki-2
Andrzej Bialecki wrote:

> Hi all,
>
> Is it just me, or is there something strange with Hadoop since ~0.10
> or thereabout .. With older version of Hadoop I would get a nice often
> updated progress status for each map task. What I'm seeing now is that
> map tasks stay at 0.0% and then finally jump to 100.0% and finish.
> Consequently, for jobs with small number of long-running map tasks,
> the progress update is very coarse.
>
> As I understand, this progress meter (in absence of map tasks
> explicitly setting the progress) was based on the RecordReader
> reporting of how much of the current split has been read. Is this
> something that got broken on the way? If not, what's the reason for
> this, and how to fix it?
>

Does anyone have a suggestion about this problem? It's rather irritating
- long-running tasks seem to be stuck at 0%, and only jump to 100% at
the end of the task. This happens with 0.11.2 as well.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Lack of progress info in map tasks

Owen O'Malley-5

On Mar 15, 2007, at 2:36 PM, Andrzej Bialecki wrote:

> Does anyone have a suggestion about this problem? It's rather  
> irritating - long-running tasks seem to be stuck at 0%, and only  
> jump to 100% at the end of the task. This happens with 0.11.2 as well.

Most of my maps happen so fast that it isn't that easy to watch  
individual ones. The "progress" is based on the getPos() of the  
RecordReader feeding the maps. How long do your maps run?

-- Owen
Reply | Threaded
Open this post in threaded view
|

Re: Lack of progress info in map tasks

Espen Amble Kolstad-2
In reply to this post by Andrzej Białecki-2
It's tiny bug in SequenceFileRecordReader. A cast to float is needed here
      return (in.getPosition() - start) / (end - start);
gives
      return (in.getPosition() - start) / (float) (end - start);

As well as assigning start in the constructor:
    this.start = split.getStart();

- Espen

(Sorry, about this not being a patch ... windoze ... arg)

Andrzej Bialecki wrote:

> Andrzej Bialecki wrote:
>> Hi all,
>>
>> Is it just me, or is there something strange with Hadoop since ~0.10
>> or thereabout .. With older version of Hadoop I would get a nice
>> often updated progress status for each map task. What I'm seeing now
>> is that map tasks stay at 0.0% and then finally jump to 100.0% and
>> finish. Consequently, for jobs with small number of long-running map
>> tasks, the progress update is very coarse.
>>
>> As I understand, this progress meter (in absence of map tasks
>> explicitly setting the progress) was based on the RecordReader
>> reporting of how much of the current split has been read. Is this
>> something that got broken on the way? If not, what's the reason for
>> this, and how to fix it?
>>
>
> Does anyone have a suggestion about this problem? It's rather
> irritating - long-running tasks seem to be stuck at 0%, and only jump
> to 100% at the end of the task. This happens with 0.11.2 as well.
>
Reply | Threaded
Open this post in threaded view
|

Re: Lack of progress info in map tasks

Andrzej Białecki-2
In reply to this post by Owen O'Malley-5
Owen O'Malley wrote:

>
> On Mar 15, 2007, at 2:36 PM, Andrzej Bialecki wrote:
>
>> Does anyone have a suggestion about this problem? It's rather
>> irritating - long-running tasks seem to be stuck at 0%, and only jump
>> to 100% at the end of the task. This happens with 0.11.2 as well.
>
> Most of my maps happen so fast that it isn't that easy to watch
> individual ones. The "progress" is based on the getPos() of the
> RecordReader feeding the maps. How long do your maps run?

Several hours up to 1-2 days (Nutch fetcher).

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply | Threaded
Open this post in threaded view
|

Re: Lack of progress info in map tasks

Andrzej Białecki-2
In reply to this post by Espen Amble Kolstad-2
Espen Amble Kolstad wrote:
> It's tiny bug in SequenceFileRecordReader. A cast to float is needed here
>      return (in.getPosition() - start) / (end - start);
> gives
>      return (in.getPosition() - start) / (float) (end - start);
>
> As well as assigning start in the constructor:
>    this.start = split.getStart();

Thanks Espen, that's exactly the issue! I discovered that this bug is
also replicated in LineRecordReader (which is used by TextInputFormat).
I'll create a patch and submit it.

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com