Naveen's Weblog

Bridge to future

Bulk Insert and Update MSSQL

Posted by codingsense on December 23, 2008


Bulk insert is a good feature that inserts data into a table from any file at great speed. But this option is only applicable for insertion there is no option for updation. If we have a primary key in our table then many times we face the problem “Cannot insert duplicate key”. I faced the same problem while using bulk insert so made a manual way to handle updates during bulk insert.

I will explain you the whole process which i followed with a sample

Create a CSV file:
Consider a CSV file containing email, firstname, lastname and DateOfBirth of clients as below and save the file as Test.csv in C:\


naveen@gmail.com,Naveen,Prabhu,08/04/1983
aatish@yahoo.com,Aatish,Sethi,10/08/1984
gautham@hotmail.com,Gautham,Nayak,10/08/1984
manas@rediff.com,Manas,Patnaik,10/08/1984
raghu@India.com,Raghu,Kini,10/08/1984

Create a table TestCSV:
For the above CSV file we need to create a table with the same fields, here email will be a unique key so the table can be designed as,

Create table TestCSV
	(
		Email Nvarchar(100),
		FirstName varchar(100),
		LastName varchar(100),
		DOB datetime,
		Constraint PK_TestCSV Primary Key(Email)
	);

Create a stored procedure for Insertion and Updation:

Create PROCEDURE InsertClientInfo( 
@CSVPath NVARCHAR(200))
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

	--Create a Temporary table to input the data from CSV file    
Create table #LookUpTable
	(
		Email Nvarchar(100),
		FirstName varchar(100),
		LastName varchar(100),
		DOB datetime,
	);

	--Declare a Variable to hold the command for Bulk insert    
Declare @UpdateQuery NVArchar(200)
	Set @UpdateQuery = 'bulk insert #LookUpTable
	from ''' + @CSVPath + ''' with(
	fieldterminator = '','',
	rowterminator =''\n'')';

	--Execute the Bulk Insert command
	Exec SP_ExecuteSQL @UpdateQuery

	--Check if Email from is already in the database and delete those from TestCSV
	Delete from TestCSV where Email in (Select distinct Email from #LookUpTable);

	--Insert the Updated or New entries in TestCSV table
	Insert into TestCSV select Distinct * from #LookUpTable;

	--Drop the temporary table
	Drop table #LookUpTable;
END

Run the Stored Procedure:

InsertClientInfo 'C:\Test.csv'

This will insert the new records and update the old ones from the file Test.CSV into TestCSV table

Allow Duplicates
If your table does not have a primary key and can accept duplicates then the stored procedure will be simple and will look like

Create PROCEDURE InsertClientInfo( 
@CSVPath NVARCHAR(200))
AS
BEGIN
	-- SET NOCOUNT ON added to prevent extra result sets from
	-- interfering with SELECT statements.
	SET NOCOUNT ON;

	Set @UpdateQuery = 'bulk insert TestCSV
	from ''' + @CSVPath + ''' with(
	fieldterminator = '','',
	rowterminator =''\n'')';

	--Execute the Bulk Insert command
	Exec SP_ExecuteSQL @UpdateQuery
END

Hope you are clear with the article, If you have any other good solution for this issue, Kindly help.

Happy Learning 🙂

Advertisements

14 Responses to “Bulk Insert and Update MSSQL”

  1. Irwin said

    Your method is really helpful.
    But there is a even simple way for this in Mysql.

    As we all know, we can use “insert into LookUpTable (“Email”,”firstName”,”LastName”,”dob”) values (?,?,?,?) on duplicate key update firstName=Values(firstName)” for single insert.
    By the way we can still use it for batch insert, we just append the parameters for other inserts, for example if we want to insert 2 records at once, we just use
    “insert into LookUpTable (“Email”,”firstName”,”LastName”,”dob”) values (?,?,?,?),(?,?,?,?) on duplicate key update firstName=Values(firstName)”

    (?,?,?,?) is the data you want to insert

    The batch insert is reliable and performed well in Mysql.

  2. Irwin said

    Hope you found it helpful:)

    • codingsense said

      Hi Irwin,

      Thanks for sharing the information, it was helpful.

      But is there a way to read directly from the CSV file by giving separators as in MSSQL, Or for inserting multiple records we have to write another function or procedure to read the CSV file and append a string then use the command that you have specified.

      We also have to search if there is any option like “duplicate key update” in MSSQL.

      Thanks,
      Naveen Prabhu

  3. Irwin said

    Your consideration is reasonable.
    Yes, we need to prepare the query string before execute that command.but I think the good of this method is to done everything within single operation on database, maybe it can improve the performance and save time when duel with bulk data.

    • codingsense said

      Hi,

      Both the methods have advantages over the other, But still we cannot conclude a method as best. If both the methods are merged then it will surely be the best ;). But for now we can just have a try inserting a million records using both the methods and check which takes less time
      1) Concatenating string or
      2) Removal of duplicate entries and inserting

      This will give us a clear idea about how the performance is getting affected.

      Thanks,
      Naveen Prabhu

  4. Jose said

    Is it possible to use Bulk Insert to only add new rows to the target table? I have a CSV file that’s constantly being updated and I’m only interested in new records.

    Thanks,
    Jose

    • codingsense said

      Hi Jose,

      View the page related to Bulk Insert options and usage at http://msdn.microsoft.com/en-us/library/ms188365.aspx

      In that page check the CHECK_CONSTRAINTS usage description. Here microsoft says
      “A situation in which you might want constraints disabled (the default behavior) is if the input data contains rows that violate constraints. With CHECK constraints disabled, you can import the data and then use Transact-SQL statements to remove the invalid data.”

      i.e While inserting we can check for constraints but as a constraint fails the insertion comes to end, no later records will be inserted, so Microsoft suggest to insert only valid data or remove the invalid data after insertion.

      So you can use the method that i have used in this post to filter the invalid data and insert only valid once Or If you are using LINQ you can filter out the invalid data and insert only valid data easily.

      Thanks,
      Naveen Prabhu

  5. JR said

    Hi Naveen,

    I was reading your blog regarding Bulk Insert and Updates. Cool.

    About three month ago I had the following problem (Which I fixed with a VB Script utility. It worked, but was not as performant as I would have liked it to be).

    Premises:

    1) Imagine you have a fixed width field flat file that contains mix data. One of the fileds is a Type Identifier.

    01Company ABC Third Field Fourth Field
    01Company XYZ Third Field Fourth Field
    02Client01 Third Field Fourth Field
    01Company 123 Thrid Field Fourth Field
    etc, etc

    all records starting with 01 will go to MS SQL Table A
    all records starting with 02 will go to MS SQL Table B

    The columns for records 01 are fixed in length, the columns for records 02 are also fixed in length. The length of the columsn are different between records of type 01 and 02.

    Would I be able to read the flat file and update the tables selectively according to the record types (first two characters of the Flat File) using BULK INSERT. A combination of what you did?

    I had 3 million records in that format.

    What I did, I wrote a script ta would read the flat file and split it in one or more tables according to TYPE.

    The I wrote BULK INSERTs agains each one of the new Flat Files.

    Any ideas, of how to optimized it, make it more elegant.

    (The first path where I split the file, takes a while) I would be great to do it all inside a Bulk Insert.

    Ciao

    JR

    • codingsense said

      Hi JR,

      Within the bulk insert there is no option to handle more conditions, it helps in just inserting the data without checking for any criteria.

      I had worked on a project which had similar requirement, I used bulk insert to insert all the records in a temp table and then called a stored procedure that would check the first char and shift the records in respective tables and then clearing the temp table for next input.

      Instead of making the separation process in VB Script, i think it would be better if you perform the same logic using a stored procedure. Check it, and let me know the performance.

      Regards,
      Naveen Prabhu

  6. Bala said

    I have huge data in a table (nearly 63 million data). I have an update query which updates nearly 1 million rows. It takes nearly one day to complete the query. ANy ideas to minimize the I/O operations?
    Query looks like this,
    UPDATE A SET field=’Yes’
    FROM T1 A INNER JOIN T2 A1 with (NOlock) ON A.[id] = A1.Srcitem_ID
    INNER JOIN T1 A2 with (nolock) ON A2.[id] = A1.DstITem_ID
    –T1 is the table having 63 million rows.
    –[id] cols is not a primary key.combination of id and one more column is a primary key in T1 table.I also kept an index for that
    — A1 is a small table with 2400 entries

    • codingsense said

      Hi, Can you give me more details and structure of the tables.
      Can you send me the script of the table creation.
      Will try my best to help you out.

      Thanks,
      Codingsense

  7. Hi Naveen! I found your article of great importance and I checked it out if it will work on my project. It worked well but if my table contains a primary key that has it’s Identity turned on the procedure failed. Even if I turn on identity by code it still fails
    Here is how i’ve done it using the code above:

    set IDENTITY_INSERT dbo.TestCategory ON
    –Insert the Updated or New entries in TestCategory table
    Insert into Product select Distinct * from #LookUpTable;
    set IDENTITY_INSERT dbo.TestCategory OFF

    can you give me an idea as to how i’am going to solve this problem. Thanks!

    Aurel Cortez

    • codingsense said

      Hi Aurel,

      If you have an identity column then you need not try to insert anything in that field. Just ignore that field and write insert statement to rest of the fields from the CSV file. The identity column will be filled automatically.

      Thanks,
      Naveen Prabhu

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: