Migrating Tabular Data between Schemas

Posted by Dave Bouwman | Posted in SQL Server | Posted on 30-04-2009

1

Despite being a rather “dry” topic, there are times when you simply need to change the schema of a database and migrate that data. In my current scenario the “core” schema relationships are reasonable, but the naming conventions (or lack there of) make it very difficult to work with. Also, the previous DBA’s fetish for GUIDs as primary keys simply must be remedied.

Thus I’ve been spending some time migrating data. Since I’m working with SQL Server, I have a couple of options.

SQL Server Business Intelligence Developer Studio

Aside from the terrible name, this Visual Studio add-in/package allows you to to all sorts of data migration & manipulation in a graphical environment. Think of it as ArcGIS Model Builder for your tabular database (along the same lines, please please please don’t use Model Builder to migrate tabular data!)

SQL Server Business Intelligence Studio

There are two levels – the “Control Flow”, which is the order in which the operations occur, and then the “Data Flow” which are the details. This next image shows the “Control Flow” for migrating the Projects section of the schema.

proj-ctl-flow

While some of these boxed represent execution of SQL code, most of them represent a data flow for migrating a particular table. This is one of those data flows…

data-flow

So this is relatively straight forward – lots of dragging and dropping, repetitive picking of values, and mapping of columns between tables.

Once you have a flow setup, you can execute it – and this is where things get a little more dicey.

Since this toolset is designed to do almost anything, it has some issues when doing everything. Take unicode conversions. As part of the migration, we are changing all the text fields to unicode equivalents – basically CHAR(3) becomes NCHAR(3). But since SSBIS does all this fun stuff in managed code, it sees these two types as being cosmically different, thus we have to add in a extra step to convert to unicode. However, at the database level, you can just directly insert a CHAR into a NCHAR without an explicit casting. Anyhow – it’s just an extra step that seems like a waste of time.

Another issue is that the error messaging is feeble at best. When something fails, it will let you know that it failed, and a whole bunch about the internal state of things, but almost nothing about WHY it failed – as in what value in the data caused the failure. Interesting, but not useful.

In addition, it seems that the packages fail randomly. Some days it works flawlessly, the next day it fails. The source data has not changed, and we wipe out the target data, so your guess is as good as mine as to why this occurs. Sunspots? Air Pressure? Who knows, but it does not inspire a whole lot of confidence.

The final issue I had was that while Brian Noyle and I both have the same version of this tool on our systems, we can not open the packages on each other’s systems. We are not sure why, but it’s problematic. I will say that we are using the SQL 2005 version of this tool, and apparently there is a SQL2008 / Visual Studio 2008 version is also available, so that may be better.

So I decided to check out my other option…

Raw SQL

Yep, hand coded(ish) raw SQL. I was concerned that this would be a giant time-suck, and be really painful, but it was not too bad. Will say that for those who don’t speak fluent SQL this is going to be much harder than SSBIS. Interestingly it turned out to be about the same amount of effort, but with some definite up-sides.

First, much of what I was doing was “templatable” – at a high-level, I’m really just creating a new table from an old one, changing the field names to something readable, doing the CHAR –> NCHAR change, and more or less copying the data over. And doing the same for associated lookup tables. This is my empty template…

-------------------------------------------------------
-- IMPORT-<TABLENAME>.sql
-------------------------------------------------------

use <TARGETDB>

-- ==================================
-- Load Lookup Tables
-- ==================================
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].<LUTNAME>') AND type in (N'U'))
    drop table <LUTNAME>
select lkp_key as <LUTNAME>_Code, lkp_Value as <LUTNAME>
into <LUTNAME>
from <OLDLUTNAME>
alter table <LUTNAME> add CONSTRAINT [PK_<LUTNAME>] PRIMARY KEY CLUSTERED (<LUTNAME>_Code ASC )

-- ==================================
-- Load Data Table
-- ==================================
IF  EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[dbo].<TABLENAME>') AND type in (N'U'))
    drop table <TABLENAME>
CREATE TABLE <TABLENAME>(
    ... FIELDS ...

PRIMARY KEY CLUSTERED
(
    [TABLENAME_ID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

I recreate the constraints in a separate master-blaster script.

The upsides with using SQL is that it’s wicked fast to execute, you have complete control, and when something fails, it gives you a very direct message… and it looks really bad ass when you’ve got three instances of SQL Management Studio open with all sorts of data/scripts up ;-)

3-sql

(click for full size)

Have fun with your data migrations!

Comments (1)

Nice blog, Dave. Also, have you looked at SQL Server’s BCP utility? It’s older, but pretty flexible (similar to Oracle’s SQL Loader). Once you get the hang of it, it’s pretty handy for things like this. No gui, so it’s nice for building scheduled tasks to import or export nightly with SQL Server.

Anyway, just thought I’d mention it in case you were unaware.