I recently worked through a situation where a bulk upsert (update if a matching record already exists; otherwise insert a new record) to SQL Server implemented using Entity Framework 6 was producing the correct results, but running very slowly.
I was able to improve the performance of the bulk upsert by about 600x (!!!) by replacing the Entity Framework implementation with a an new approach based on a stored procedure taking a table-valued parameter.
Entity Framework implementation – very slow
The original code, using Entity Framework 6 (altered to use a table / object type of “Employee” instead of the actual type I was working with):
DateTime now = DateTime.Now;
using (MyCustomContext context = new MyCustomContext())
{
    foreach (Employee employee in employeeData)
    {
        // Get the matching Employee record from the database, if there is one.
        Employee employeeRecord = context.Employees
            .Where(e => e.EmployeeID == employee.EmployeeID)
            .SingleOrDefault();
        bool isNewRecord = false;
        if (employeeRecord == null)
        {
            // We don't have a record in the database for this employee yet, 
            // so we'll add a new one.
            isNewRecord = true;
            employeeRecord = new Employee()
            {
                EmployeeID = employee.EmployeeID,
                CreationDate = now
            };
        }
        employeeRecord.ModificationDate = now;
        employeeRecord.FirstName = employee.FirstName;
        // (Set the remaining attributes...)
        if (isNewRecord)
        {
            context.Employees.Add(employeeRecord);
        }
        context.SaveChanges();
    }
}
That approach was taking around 300 seconds per 1000 records inserted. It was bogging down my application in situations where I needed to insert/update tens of thousands of records at once.
Stored Procedure + Table-Valued Parameter implementation – fast!
In my testing, the revised approach below, replacing the Entity Framework implementation with a stored procedure taking a table-valued parameter, was able to upsert the same 1000 records in 0.5 second – a huge improvement!
I needed to add a custom table type with columns matching the Employees table:
CREATE TYPE EmployeesTableType AS TABLE
(
    [EmployeeID] [int] NOT NULL,
    [CreationDate] [datetime] NOT NULL,
    [FirstName] [nvarchar(100)] NOT NULL,
    /* More fields here... */
);
Next, the stored procedure to perform the upsert, taking input of an instance of the custom EmployeesTableType, and using the SQL Server MERGE command:
CREATE PROCEDURE dbo.UpsertEmployees 
    
    @UpdateRecords dbo.EmployeesTableType READONLY 
    
AS 
    
BEGIN 
    
    MERGE INTO Employees AS Target 
    
    USING @UpdateRecords AS Source 
    
    ON Target.EmployeeID = Source.EmployeeID 
    
    WHEN MATCHED THEN 
    
        UPDATE SET Target.EmployeeID = Source.EmployeeID, 
    
            Target.FirstName = Source.FirstName, 
    
            /* More fields here... */ 
    
    WHEN NOT MATCHED THEN            
        INSERT (EmployeeID, 
    
            FirstName, 
    
            /* More field names here... */ 
    
            ) 
    
        VALUES (Source.EmployeeID, 
    
            Source.FirstName, 
    
            /* More field values here... */ 
    
            ); 
    
END 
    
Finally, here’s the C# code I used to execute the stored procedure:
public static void UpsertEmployeeData(List<Employee> employees)
{
    string connectionString = MyCustomMethodToGetConnectionString();
    using(SqlConnection connection = new SqlConnection(connectionString))
    {
        connection.Open();
        using (SqlCommand command = connection.CreateCommand())
        {
            command.CommandText = "dbo.UpsertEmployees";
            command.CommandType = CommandType.StoredProcedure;
            SqlParameter parameter = command.Parameters.AddWithValue("@UpdateRecords",
                CreateUpdateRecordsSqlParam(employees));  // See implementation below
            parameter.SqlDbType = SqlDbType.Structured;
            parameter.TypeName = "dbo.EmployeesTableType";
            command.ExecuteNonQuery();
        }
    }
}
private static DataTable CreateUpdateRecordsSqlParam(IEnumerable employees)
{
    DataTable table = new DataTable();
    table.Columns.Add("EmployeeID", typeof(Int32));
    table.Columns.Add("FirstName", typeof(string));
    // More fields here...
    foreach (Employee employee in employees)
    {
        table.Rows.Add(employee.EmployeeID,
            employee.FirstName,
            // More fields here...
    }
    return table;
}
  Ideally, this post will serve as a useful reference the next time I, or you, need to code up a bulk upsert into SQL Server!
Credit to this StackOverflow answer by Ryan Prechel which was the primary inspiration for this approach.
Thanks Jon. This article has been very helpful.
ReplyDelete