Quantcast

multioutput dfs.datanode.max.xcievers and too many open files

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

multioutput dfs.datanode.max.xcievers and too many open files

Marc Sturlese
Hey there,
I've been running a cluster for about a year (about 20 machines). I've run many concurrent jobs there and some of them with multiOutput and never had any problem (multiOutputs where creating just 3 or 4 different outputs).
Now I've a job with multiOutputs that creates 100 different outputs and it always end up with errors.
Tasks start throwing this erros:

java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)


or:
java.io.EOFException
        at java.io.DataInputStream.readByte(DataInputStream.java:250)
        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
        at org.apache.hadoop.io.Text.readString(Text.java:400)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
        at org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)


Checking the datanode log I see hundreds of times this error:
2012-02-23 14:22:56,008 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block for append blk_3368446040000470452_29464903
2012-02-23 14:22:56,008 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_3368446040000470452_29464903 received exception java.net.SocketException: Too many open files
2012-02-23 14:22:56,008 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.2.0.156:50010, storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketException: Too many open files
        at sun.nio.ch.Net.socket0(Native Method)
        at sun.nio.ch.Net.socket(Net.java:97)
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
        at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
2012-02-23 14:22:56,034 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest: /10.2.0.156:50010
2012-02-23 14:22:56,035 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock blk_-2698946892792040969_29464904 received exception java.net.SocketException: Too many open files
2012-02-23 14:22:56,035 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DatanodeRegistration(10.2.0.156:50010, storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075, ipcPort=50020):DataXceiver
java.net.SocketException: Too many open files
        at sun.nio.ch.Net.socket0(Native Method)
        at sun.nio.ch.Net.socket(Net.java:97)
        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
        at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
        at org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)


I've always had configured in hdfs-site.xml:
        <property>
                <name>dfs.datanode.max.xcievers</name>
                <value>4096</value>
        </property>

But I think now it's not enough to handle that many multipleOutputs. If I increase  even more max.xcievers which are de side effects? Wich value should be considered as maximum (I suppose it depends on the CPU and RAM, but aprox).

Thanks in advance.
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multioutput dfs.datanode.max.xcievers and too many open files

Harsh J-2
Hi Marc,

Your error is not related to the transfer thread limit (xceivers).
You're hitting a "ulimit -n" cap at your DataNode, i.e. system maximum
number of open files allowed for the user running the DN process.

Check what your limits say for 'Max open files' in /proc/<DN
PID>/limits and raise it if its proving insufficient today. Also do
try to upgrade your cluster to a more recent release, as there were
some improvements on this front you can benefit from.

On Thu, Feb 23, 2012 at 8:56 PM, Marc Sturlese <[hidden email]> wrote:

> Hey there,
> I've been running a cluster for about a year (about 20 machines). I've run
> many concurrent jobs there and some of them with multiOutput and never had
> any problem (multiOutputs where creating just 3 or 4 different outputs).
> Now I've a job with multiOutputs that creates 100 different outputs and it
> always end up with errors.
> Tasks start throwing this erros:
>
> java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
>
> or:
> java.io.EOFException
>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>        at org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>        at org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
>
> Checking the datanode log I see hundreds of times this error:
> 2012-02-23 14:22:56,008 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
> for append blk_3368446040000470452_29464903
> 2012-02-23 14:22:56,008 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_3368446040000470452_29464903 received exception
> java.net.SocketException: Too many open files
> 2012-02-23 14:22:56,008 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.2.0.156:50010,
> storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketException: Too many open files
>        at sun.nio.ch.Net.socket0(Native Method)
>        at sun.nio.ch.Net.socket(Net.java:97)
>        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>        at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
> 2012-02-23 14:22:56,034 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest:
> /10.2.0.156:50010
> 2012-02-23 14:22:56,035 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-2698946892792040969_29464904 received exception
> java.net.SocketException: Too many open files
> 2012-02-23 14:22:56,035 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.2.0.156:50010,
> storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketException: Too many open files
>        at sun.nio.ch.Net.socket0(Native Method)
>        at sun.nio.ch.Net.socket(Net.java:97)
>        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>        at
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
>        at
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
>
>
> I've always had configured in hdfs-site.xml:
>        <property>
>                <name>dfs.datanode.max.xcievers</name>
>                <value>4096</value>
>        </property>
>
> But I think now it's not enough to handle that many multipleOutputs. If I
> increase  even more max.xcievers which are de side effects? Wich value
> should be considered as maximum (I suppose it depends on the CPU and RAM,
> but aprox).
>
> Thanks in advance.
>
> --
> View this message in context: http://lucene.472066.n3.nabble.com/multioutput-dfs-datanode-max-xcievers-and-too-many-open-files-tp3770024p3770024.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.



--
Harsh J
Customer Ops. Engineer
Cloudera | http://tiny.cloudera.com/about
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: multioutput dfs.datanode.max.xcievers and too many open files

Doug Judd-3
In reply to this post by Marc Sturlese
Hi Marc,

Take a look at How to Increase Open File
Limit<http://www.hypertable.com/documentation/misc/how_to_increase_open_file_limit/>for
instructions on how to increase the file limit.

- Doug

On Thu, Feb 23, 2012 at 7:26 AM, Marc Sturlese <[hidden email]>wrote:

> Hey there,
> I've been running a cluster for about a year (about 20 machines). I've run
> many concurrent jobs there and some of them with multiOutput and never had
> any problem (multiOutputs where creating just 3 or 4 different outputs).
> Now I've a job with multiOutputs that creates 100 different outputs and it
> always end up with errors.
> Tasks start throwing this erros:
>
> java.io.IOException: Bad connect ack with firstBadLink 10.2.0.154:50010
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2963)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
>
> or:
> java.io.EOFException
>        at java.io.DataInputStream.readByte(DataInputStream.java:250)
>        at
> org.apache.hadoop.io.WritableUtils.readVLong(WritableUtils.java:298)
>        at
> org.apache.hadoop.io.WritableUtils.readVInt(WritableUtils.java:319)
>        at org.apache.hadoop.io.Text.readString(Text.java:400)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.createBlockOutputStream(DFSClient.java:2961)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.nextBlockOutputStream(DFSClient.java:2888)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream.access$1900(DFSClient.java:2139)
>        at
>
> org.apache.hadoop.hdfs.DFSClient$DFSOutputStream$DataStreamer.run(DFSClient.java:2329)
>
>
> Checking the datanode log I see hundreds of times this error:
> 2012-02-23 14:22:56,008 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Reopen already-open Block
> for append blk_3368446040000470452_29464903
> 2012-02-23 14:22:56,008 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_3368446040000470452_29464903 received exception
> java.net.SocketException: Too many open files
> 2012-02-23 14:22:56,008 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.2.0.156:50010,
> storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketException: Too many open files
>        at sun.nio.ch.Net.socket0(Native Method)
>        at sun.nio.ch.Net.socket(Net.java:97)
>        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>        at
>
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
> 2012-02-23 14:22:56,034 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block
> blk_-2698946892792040969_29464904 src: /10.2.0.156:40969 dest:
> /10.2.0.156:50010
> 2012-02-23 14:22:56,035 INFO
> org.apache.hadoop.hdfs.server.datanode.DataNode: writeBlock
> blk_-2698946892792040969_29464904 received exception
> java.net.SocketException: Too many open files
> 2012-02-23 14:22:56,035 ERROR
> org.apache.hadoop.hdfs.server.datanode.DataNode:
> DatanodeRegistration(10.2.0.156:50010,
> storageID=DS-1194175480-10.2.0.156-50010-1329304363220, infoPort=50075,
> ipcPort=50020):DataXceiver
> java.net.SocketException: Too many open files
>        at sun.nio.ch.Net.socket0(Native Method)
>        at sun.nio.ch.Net.socket(Net.java:97)
>        at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:84)
>        at
>
> sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:37)
>        at java.nio.channels.SocketChannel.open(SocketChannel.java:105)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataNode.newSocket(DataNode.java:429)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.writeBlock(DataXceiver.java:296)
>        at
>
> org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:118)
>
>
> I've always had configured in hdfs-site.xml:
>        <property>
>                <name>dfs.datanode.max.xcievers</name>
>                <value>4096</value>
>        </property>
>
> But I think now it's not enough to handle that many multipleOutputs. If I
> increase  even more max.xcievers which are de side effects? Wich value
> should be considered as maximum (I suppose it depends on the CPU and RAM,
> but aprox).
>
> Thanks in advance.
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/multioutput-dfs-datanode-max-xcievers-and-too-many-open-files-tp3770024p3770024.html
> Sent from the Hadoop lucene-users mailing list archive at Nabble.com.
>
Loading...