Wednesday, April 01, 2009

Working Around DataTable.Merge() Not Changing DataRowState

Before I forget, while the documentation on the DataTable.Merge() methods would seem to indicate that the method is smart about using FK relationships and RowState. The reality is that Microsoft has acknowledged in the Connect forum that the Merge is much more like a batch load in that constraints are dropped, data is correlated, and constraints are re-enabled (possibly throwing exceptions) but *without* changing row states. There is a workaround suggested using LinQ which I can't work with right now as I'm restricted to the 2.0 framework.

I was hoping to use the merge functionality to perform some small ETL tasks. Instead, after researching for awhile, I came up with the following, relatively simple workaround. I'm using Strong-Typed DataSets which affects the nature of the implementation, but the approach should still work for the more generic form.
If it's not obvious, this sample is in C#.

public static void Upsert
(DataTable sourceTable, TableType targetTable)
where TableType : DataTable, new()
where RowType : DataRow
{
TableType inputTable = sourceTable as TableType;

//if we can't cast to the appropriate type we can run into
//data problems down below in LoadDataRow.
if (inputTable == null)
{
inputTable = new TableType();
inputTable.Merge(sourceTable, false, MissingSchemaAction.Add);
}

//pull in schema from candidate table
targetTable.Merge(sourceTable.Clone(), true, MissingSchemaAction.Add);

//apply resuling schema to candidate table
sourceTable.Merge(targetTable.Clone(), true, MissingSchemaAction.Add);

//now use LoadDataRow to "merge" the tables, together
foreach (RowType row in inputTable.Rows)
{
try
{
targetTable.LoadDataRow(row.ItemArray, LoadOption.Upsert);
}
catch (Exception ex)
{
//TODO: Tighten this exception down to what is actually thrown (not in MSDN Documentation?)
//exception will be thrown on FK validation failure
//this is ok and actually expected in the design of the load.
System.Diagnostics.Debug.WriteLine("Verify Expected Error: " + ex.Message);
}
}//next row
}


As you can see, this could use a little more work, but I've managed an Upsert (Insert or Update combined method) implementation. This method was placed in the DataSet's partial class and each DataTable partial class contains a typed version (non-generic) of the Upsert method that delegates to the above generic form for it's implementation.

public partial class MyDataTable
{
public void Upsert(DataTable table)
{
MyParentDS.Upsert(table, this);
}
}

The table Upsert does accept the base DataTable reference to allow the merge to come from a query result that was not filled into a MyParentDS instance. The generic method on the parent either casts or merges into the strong-typed form.

The gotcha in all this is that the item array from the row must match or you can end up with some seemingly unexpected type-mismatch errors from LoadDataRow. The safest way, I felt, was to make sure I "cast" to the MyParentDS before going into the upsert section itself.

Recorded here so I don't have to reinvent the wheel again.