Wednesday, August 10, 2016

Bulk upsert into SQL Server

I recently worked through a situation where a bulk upsert (update if a matching record already exists; otherwise insert a new record) to SQL Server implemented using Entity Framework 6 was producing the correct results, but running very slowly. 

I was able to improve the performance of the bulk upsert by about 600x (!!!) by replacing the Entity Framework implementation with a an new approach based on a stored procedure taking a table-valued parameter.

Entity Framework implementation – very slow

The original code, using Entity Framework 6 (altered to use a table / object type of “Employee” instead of the actual type I was working with):

DateTime now = DateTime.Now;
using (MyCustomContext context = new MyCustomContext())
{
    foreach (Employee employee in employeeData)
    {
        // Get the matching Employee record from the database, if there is one.
        Employee employeeRecord = context.Employees
            .Where(e => e.EmployeeID == employee.EmployeeID)
            .SingleOrDefault();

        bool isNewRecord = false;
        if (employeeRecord == null)
        {
            // We don't have a record in the database for this employee yet, 
            // so we'll add a new one.
            isNewRecord = true;
            employeeRecord = new Employee()
            {
                EmployeeID = employee.EmployeeID,
                CreationDate = now
            };
        }

        employeeRecord.ModificationDate = now;
        employeeRecord.FirstName = employee.FirstName;
        // (Set the remaining attributes...)

        if (isNewRecord)
        {
            context.Employees.Add(employeeRecord);
        }

        context.SaveChanges();
    }
}

That approach was taking around 300 seconds per 1000 records inserted.  It was bogging down my application in situations where I needed to insert/update tens of thousands of records at once.

Stored Procedure + Table-Valued Parameter implementation – fast!

In my testing, the revised approach below, replacing the Entity Framework implementation with a stored procedure taking a table-valued parameter, was able to upsert the same 1000 records in 0.5 second – a huge improvement!

I needed to add a custom table type with columns matching the Employees table:

CREATE TYPE EmployeesTableType AS TABLE
(
    [EmployeeID] [int] NOT NULL,
    [CreationDate] [datetime] NOT NULL,
    [FirstName] [nvarchar(100)] NOT NULL,
    /* More fields here... */
);

Next, the stored procedure to perform the upsert, taking input of an instance of the custom EmployeesTableType, and using the SQL Server MERGE command:

CREATE PROCEDURE dbo.UpsertEmployees
    @UpdateRecords dbo.EmployeesTableType READONLY
AS
BEGIN
    MERGE INTO Employees AS Target
    USING @UpdateRecords AS Source
    ON Target.EmployeeID = Source.EmployeeID
    WHEN MATCHED THEN
        UPDATE SET Target.EmployeeID = Source.EmployeeID,
            Target.FirstName = Source.FirstName,
            /* More fields here... */
    WHEN NOT MATCHED THEN           
        INSERT (EmployeeID,
            FirstName,
            /* More field names here... */
            )
        VALUES (Source.EmployeeID,
            Source.FirstName,
            /* More field values here... */
            );
END

Finally, here’s the C# code I used to execute the stored procedure:

public static void UpsertEmployeeData(List<Employee> employees)
{
    string connectionString = MyCustomMethodToGetConnectionString();
    using(SqlConnection connection = new SqlConnection(connectionString))
    {
        connection.Open();

        using (SqlCommand command = connection.CreateCommand())
        {
            command.CommandText = "dbo.UpsertEmployees";
            command.CommandType = CommandType.StoredProcedure;

            SqlParameter parameter = command.Parameters.AddWithValue("@UpdateRecords",
                CreateUpdateRecordsSqlParam(employees));  // See implementation below
            parameter.SqlDbType = SqlDbType.Structured;
            parameter.TypeName = "dbo.EmployeesTableType";

            command.ExecuteNonQuery();
        }
    }
}

private static DataTable CreateUpdateRecordsSqlParam(IEnumerable employees)
{
    DataTable table = new DataTable();
    table.Columns.Add("EmployeeID", typeof(Int32));
    table.Columns.Add("FirstName", typeof(string));
    // More fields here...

    foreach (Employee employee in employees)
    {
        table.Rows.Add(employee.EmployeeID,
            employee.FirstName,
            // More fields here...
    }

    return table;
}

Ideally, this post will serve as a useful reference the next time I, or you, need to code up a bulk upsert into SQL Server!

Credit to this StackOverflow answer by Ryan Prechel which was the primary inspiration for this approach.