Issue while running sqoop script parallel

classic Classic list List threaded Threaded
2 messages Options
Reply | Threaded
Open this post in threaded view
|

Issue while running sqoop script parallel

Saravanan Nagarajan
Hi,

Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.

 

"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue. 



Thanks,
NS Saravanan

Reply | Threaded
Open this post in threaded view
|

Re: Issue while running sqoop script parallel

Chris Nauroth
Hello Saravanan,

HDFS implements a single-writer model for its files, so if 2 clients concurrently try to open the same file path for write or append, then one of them will receive an error.  It looks to me like tasks from 2 different job submissions collided on the same path.  I think you're on the right track investigating why this application used the same temp directory.  Is the temp directory something that is controlled by the parameters that you pass to your script?  Do you know how the "055830" gets determined in this example?

--Chris Nauroth

From: Saravanan Nagarajan <[hidden email]>
Date: Thursday, December 17, 2015 at 9:18 PM
To: "[hidden email]" <[hidden email]>, "[hidden email]" <[hidden email]>
Subject: Issue while running sqoop script parallel

Hi,

Need your expert guidance to resolve a sqoop script Error. We are using sqoop & invoking TDCH(Teradata Hadoop Connector) to archive data from Teradata to HADOOP Hive Tables .

We have created a generic Sqoop script which accepts source DB, View Name,target name as input parameter & loads into Hive Tables. If I try to parallely invoke the same script with different set of parameters both instances  of the scripts are failing with below error.

 

"

Error: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.protocol.AlreadyBeingCreatedException): Failed to create file [/user/svc_it/temp_055830/part-m-00000] for [DFSClient_attempt_1439568235974_1009_m_000000_1_909694314_1] for client [39.6.64.13], because this file is already being created by [DFSClient_attempt_1439568235974_1010_m_000000_0_-997224822_1] on [39.6.64.13]

"


The issue in the Map step & its trying to write the file to HDFS disk. Looks like one instance is trying to overwrite the files created by other instance as the Temp folder being created by mapper is with the same Name (/user/svc_it/temp_05583 )



Please let me know how to fix this issue. 



Thanks,
NS Saravanan