Nutch 2.2.1 Hadoop map tasks

classic Classic list List threaded Threaded
5 messages Options
Reply | Threaded
Open this post in threaded view
|

Nutch 2.2.1 Hadoop map tasks

Ásgeir Halldórsson
Hello,

                I am using Nutch with a Hadoop cluster of 5 servers.  The Reduce job is split into many jobs like my config sets but the map only uses one job always.



Running Map Tasks

Running Reduce Tasks

Total Submissions

Nodes

Occupied Map Slots

Occupied Reduce Slots

Reserved Map Slots

Reserved Reduce Slots

Map Task Capacity

Reduce Task Capacity

Avg. Tasks/Node

Blacklisted Nodes

Graylisted Nodes

Excluded Nodes

1

0

213

5

1

0

0

0

20

20

8.00

0

0

0


        <property>
                <name>mapred.map.tasks</name>
                <value>20</value>
        </property>

        <property>
                <name>mapred.reduce.tasks</name>
                <value>15</value>
        </property>

Regards,
                Ásgeir Halldórsson
Reply | Threaded
Open this post in threaded view
|

Re: Nutch 2.2.1 Hadoop map tasks

Talat Uyarer
Hi,

Which do Jobs generate only 1 map taks ? I think your map input size very
little. If your size bigger than map input split limit it would generate
more than one.

Thanks



2014-03-19 11:11 GMT+02:00 Ásgeir Halldórsson <[hidden email]>:

> Hello,
>
>                 I am using Nutch with a Hadoop cluster of 5 servers.  The
> Reduce job is split into many jobs like my config sets but the map only
> uses one job always.
>
>
>
> Running Map Tasks
>
> Running Reduce Tasks
>
> Total Submissions
>
> Nodes
>
> Occupied Map Slots
>
> Occupied Reduce Slots
>
> Reserved Map Slots
>
> Reserved Reduce Slots
>
> Map Task Capacity
>
> Reduce Task Capacity
>
> Avg. Tasks/Node
>
> Blacklisted Nodes
>
> Graylisted Nodes
>
> Excluded Nodes
>
> 1
>
> 0
>
> 213
>
> 5
>
> 1
>
> 0
>
> 0
>
> 0
>
> 20
>
> 20
>
> 8.00
>
> 0
>
> 0
>
> 0
>
>
>         <property>
>                 <name>mapred.map.tasks</name>
>                 <value>20</value>
>         </property>
>
>         <property>
>                 <name>mapred.reduce.tasks</name>
>                 <value>15</value>
>         </property>
>
> Regards,
>                 Ásgeir Halldórsson
>



--
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
Reply | Threaded
Open this post in threaded view
|

RE: Nutch 2.2.1 Hadoop map tasks

Ásgeir Halldórsson
Hi Talat,

        At the moment its the parse job that is causing me problems.  Its been running parse in the map job for few hours now (1 job).  I googled a bit but I can't find a map input size parameter.

        Btw I am using Gora and Cassandra. (2.x branch)

Ásgeir Halldórsson

-----Original Message-----
From: Talat Uyarer [mailto:[hidden email]]
Sent: 19. mars 2014 12:04
To: [hidden email]
Subject: Re: Nutch 2.2.1 Hadoop map tasks

Hi,

Which do Jobs generate only 1 map taks ? I think your map input size very little. If your size bigger than map input split limit it would generate more than one.

Thanks



2014-03-19 11:11 GMT+02:00 Ásgeir Halldórsson <[hidden email]>:

> Hello,
>
>                 I am using Nutch with a Hadoop cluster of 5 servers.  
> The Reduce job is split into many jobs like my config sets but the map
> only uses one job always.
>
>
>
> Running Map Tasks
>
> Running Reduce Tasks
>
> Total Submissions
>
> Nodes
>
> Occupied Map Slots
>
> Occupied Reduce Slots
>
> Reserved Map Slots
>
> Reserved Reduce Slots
>
> Map Task Capacity
>
> Reduce Task Capacity
>
> Avg. Tasks/Node
>
> Blacklisted Nodes
>
> Graylisted Nodes
>
> Excluded Nodes
>
> 1
>
> 0
>
> 213
>
> 5
>
> 1
>
> 0
>
> 0
>
> 0
>
> 20
>
> 20
>
> 8.00
>
> 0
>
> 0
>
> 0
>
>
>         <property>
>                 <name>mapred.map.tasks</name>
>                 <value>20</value>
>         </property>
>
>         <property>
>                 <name>mapred.reduce.tasks</name>
>                 <value>15</value>
>         </property>
>
> Regards,
>                 Ásgeir Halldórsson
>



--
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
Reply | Threaded
Open this post in threaded view
|

Re: Nutch 2.2.1 Hadoop map tasks

Julien Nioche-4
Your problem might come from the Cassandra module in GORA, see discussion
on
http://mail-archives.apache.org/mod_mbox/gora-dev/201310.mbox/%3CCAHYFERDpgvSFcHnw=ter25eoc6wooY81iZ2p2zSTK4s=7AgWEg@...%3E

Julien


On 19 March 2014 11:15, Ásgeir Halldórsson <[hidden email]> wrote:

> Hi Talat,
>
>         At the moment its the parse job that is causing me problems.  Its
> been running parse in the map job for few hours now (1 job).  I googled a
> bit but I can't find a map input size parameter.
>
>         Btw I am using Gora and Cassandra. (2.x branch)
>
> Ásgeir Halldórsson
>
> -----Original Message-----
> From: Talat Uyarer [mailto:[hidden email]]
> Sent: 19. mars 2014 12:04
> To: [hidden email]
> Subject: Re: Nutch 2.2.1 Hadoop map tasks
>
> Hi,
>
> Which do Jobs generate only 1 map taks ? I think your map input size very
> little. If your size bigger than map input split limit it would generate
> more than one.
>
> Thanks
>
>
>
> 2014-03-19 11:11 GMT+02:00 Ásgeir Halldórsson <[hidden email]>:
>
> > Hello,
> >
> >                 I am using Nutch with a Hadoop cluster of 5 servers.
> > The Reduce job is split into many jobs like my config sets but the map
> > only uses one job always.
> >
> >
> >
> > Running Map Tasks
> >
> > Running Reduce Tasks
> >
> > Total Submissions
> >
> > Nodes
> >
> > Occupied Map Slots
> >
> > Occupied Reduce Slots
> >
> > Reserved Map Slots
> >
> > Reserved Reduce Slots
> >
> > Map Task Capacity
> >
> > Reduce Task Capacity
> >
> > Avg. Tasks/Node
> >
> > Blacklisted Nodes
> >
> > Graylisted Nodes
> >
> > Excluded Nodes
> >
> > 1
> >
> > 0
> >
> > 213
> >
> > 5
> >
> > 1
> >
> > 0
> >
> > 0
> >
> > 0
> >
> > 20
> >
> > 20
> >
> > 8.00
> >
> > 0
> >
> > 0
> >
> > 0
> >
> >
> >         <property>
> >                 <name>mapred.map.tasks</name>
> >                 <value>20</value>
> >         </property>
> >
> >         <property>
> >                 <name>mapred.reduce.tasks</name>
> >                 <value>15</value>
> >         </property>
> >
> > Regards,
> >                 Ásgeir Halldórsson
> >
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>



--

Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble
Reply | Threaded
Open this post in threaded view
|

Re: Nutch 2.2.1 Hadoop map tasks

Talat Uyarer
In reply to this post by Ásgeir Halldórsson
I am not sure Asgeir. Julien may be right. We use Hbase. If you use Hbase
We can help you. But I dont know :)


2014-03-19 13:15 GMT+02:00 Ásgeir Halldórsson <[hidden email]>:

> Hi Talat,
>
>         At the moment its the parse job that is causing me problems.  Its
> been running parse in the map job for few hours now (1 job).  I googled a
> bit but I can't find a map input size parameter.
>
>         Btw I am using Gora and Cassandra. (2.x branch)
>
> Ásgeir Halldórsson
>
> -----Original Message-----
> From: Talat Uyarer [mailto:[hidden email]]
> Sent: 19. mars 2014 12:04
> To: [hidden email]
> Subject: Re: Nutch 2.2.1 Hadoop map tasks
>
> Hi,
>
> Which do Jobs generate only 1 map taks ? I think your map input size very
> little. If your size bigger than map input split limit it would generate
> more than one.
>
> Thanks
>
>
>
> 2014-03-19 11:11 GMT+02:00 Ásgeir Halldórsson <[hidden email]>:
>
> > Hello,
> >
> >                 I am using Nutch with a Hadoop cluster of 5 servers.
> > The Reduce job is split into many jobs like my config sets but the map
> > only uses one job always.
> >
> >
> >
> > Running Map Tasks
> >
> > Running Reduce Tasks
> >
> > Total Submissions
> >
> > Nodes
> >
> > Occupied Map Slots
> >
> > Occupied Reduce Slots
> >
> > Reserved Map Slots
> >
> > Reserved Reduce Slots
> >
> > Map Task Capacity
> >
> > Reduce Task Capacity
> >
> > Avg. Tasks/Node
> >
> > Blacklisted Nodes
> >
> > Graylisted Nodes
> >
> > Excluded Nodes
> >
> > 1
> >
> > 0
> >
> > 213
> >
> > 5
> >
> > 1
> >
> > 0
> >
> > 0
> >
> > 0
> >
> > 20
> >
> > 20
> >
> > 8.00
> >
> > 0
> >
> > 0
> >
> > 0
> >
> >
> >         <property>
> >                 <name>mapred.map.tasks</name>
> >                 <value>20</value>
> >         </property>
> >
> >         <property>
> >                 <name>mapred.reduce.tasks</name>
> >                 <value>15</value>
> >         </property>
> >
> > Regards,
> >                 Ásgeir Halldórsson
> >
>
>
>
> --
> Talat UYARER
> Websitesi: http://talat.uyarer.com
> Twitter: http://twitter.com/talatuyarer
> Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304
>



--
Talat UYARER
Websitesi: http://talat.uyarer.com
Twitter: http://twitter.com/talatuyarer
Linkedin: http://tr.linkedin.com/pub/talat-uyarer/10/142/304