What does it mean to validate a master data migration?

Validating a migration means proving that the data in the new system matches the source you migrated from, not just that the load ran without errors. It is a reconciliation: row counts per entity, control totals on key fields, referential integrity on every reference, hierarchy member counts, business-key uniqueness, and a hand-checked sample of high-value records. A clean import log only confirms the rows that loaded did not break anything. It says nothing about the rows that were silently rejected.

Why is a clean import log not enough?

A clean log means the rows that made it in did not violate a constraint. It does not account for rows the loader dropped: a domain reference that did not resolve, a value truncated to fit a shorter column, duplicates the new system merged, or nulls quietly replaced with a default. None of those raise an error. They surface weeks later as a downstream report that comes up short, by which point the wrong numbers are already in production.

How many records should I spot-check in a migration?

Counts and checksums cover the whole set statistically, so the manual spot-check does not need to be large. Pull 30 to 50 of the highest-value or highest-risk records per domain and diff them field by field against the source by eye. The point is to catch the failure modes a count cannot see: a column mapped to the wrong field, a date format inverted, an encoding that mangled accented names.

Who signs off on a master data migration?

Reconciliation is a business sign-off, not an IT script. The DBA or integrator produces the counts, checksums, and integrity reports. The data steward who owns the domain decides whether the migrated data is good enough to trust and puts their name on it. Freeze the source during the cutover window so the reconciliation runs against a fixed point, and keep the evidence in case anyone asks later why a number changed.

How to Validate a Master Data Migration

The load finished. Nothing red in the log, the counts look about right, and there is a meeting on Thursday where someone wants to point the first downstream system at the new MDM. This is the most dangerous moment in the whole project, because "the import ran" and "the data is correct" are two completely different claims, and the gap between them is where a bad year starts.

I watched one sail through with a clean log and a confident sign-off, then a finance report came up eight suppliers short three weeks later. The rows had been rejected during the load: a country code that did not resolve against the target list, and the loader skipped those records rather than failing the batch. Nobody reconciled, so nobody knew until the numbers were already wrong in production and a buyer was chasing a payment run that never went out.

Why a clean log tells you almost nothing

A clean import log means one thing: the rows that made it in did not break a constraint. It says nothing about the rows that never made it in, and those are the ones that hurt you. The failures that matter in a master data migration are almost all silent.

–

Rejected rows

A reference does not resolve or a key collides, and the loader drops the record instead of failing. The batch reports success, minus the rows you cannot see.

–

Silent truncation

A description that was 600 characters in the source lands in an NVARCHAR(255) column. No error. Just a sentence that stops mid-word, on every long record.

–

Default substitution

A field that was optional becomes required, the loader fills the gap with a default, and you end up with 1,200 records in a country called "Unknown".

–

Quiet deduplication

Two records the old system tolerated share a business key. The new system merges them or rejects the second. Either way a record is gone and the count is one short.

–

Encoding damage

Accented names, currency symbols, anything outside plain ASCII gets mangled by the wrong code page. The load succeeds; every Müller is now a MÃ¼ller.

None of these raise an error. They are not bugs in the loader; they are the loader doing exactly what it was told. The only way to find them is to compare the target against the source on purpose, before anyone downstream starts reading from it.

The reconciliation that proves something

Reconciliation is boring, mechanical, and the single most valuable hour in the project. There are eight checks, and each catches a failure the others would miss. Run all of them, in this order:

Row counts per entity

Source versus target, exact, for every entity you migrated. A difference of even one record needs an explanation before anyone signs off. This is the cheapest check and the one most often skipped because the numbers looked "about right".

Control totals on key fields

Counts can match while contents are scrambled. Sum a numeric column, count distinct on a status code, hash the concatenated business keys. If the source and target totals diverge, rows got shuffled into the wrong columns even though the count is identical.

Every reference resolves

Walk every domain attribute and confirm it points at a member that exists in the target. Orphaned references are the most common silent loss in a migration: the row loaded, but the value it pointed at did not, so the dropdown is now blank and the rollup is short.

Business keys are unique

The old system may have tolerated two records with the same key for years. The new one will either reject the second or merge the pair. Either way a record vanishes unless you check uniqueness against the source count first.

Required fields are populated

Run a null count on every mandatory attribute. A field that was optional in the source and required in the target is a classic trap: the loader fills the gap with a default, the load succeeds, and now you have 1,200 suppliers in a country called "Unknown".

Hierarchy member counts

For every level of every hierarchy, the node and leaf counts have to reconcile against the source. An orphaned member does not raise an error. It just drops off the rollup, and a regional total comes up short three weeks later.

A hand-checked sample

Pull 30 to 50 of the highest-value records and diff them field by field against the source by eye. This is the only check that catches a column mapped to the wrong field, an inverted date, or an encoding that turned every accented name into mojibake.

Business-rule pass rate

Run your validation rules across the loaded set and record how many records pass clean. You are not aiming for 100 percent on day one; you are establishing the baseline you will improve against, and catching the rules the old system never enforced.

The first six are queries; you write them once and rerun them on every load until the gaps close. The seventh is the one people resist because it is manual, and it is the one that catches the mappings a count will never see. The eighth ties the migration back to your ongoing data-quality metrics, so the baseline you set at go-live is the number you measure against afterward.

Hierarchies are where the quiet damage hides

If you are coming off MDS, you lived in hierarchies, and they are the part of the migration most likely to look fine and be wrong. When you flatten derived and explicit hierarchies into parent references on a flat list, the rows can all load while the tree comes apart underneath.

A member whose parent did not resolve does not error. It just detaches, and now a region rolls up one site short. So reconcile the structure as well as the rows: node and leaf counts at every level against the source, no orphans, no cycles, and the same number of members rolling up under each parent. The total record count can match exactly while a third of your locations have dropped out of the regional tree.

Reconciliation is a sign-off, not a script

The counts and checksums are IT's job. Deciding the migrated data is good enough to trust is not. That call belongs to the data steward who owns the domain, the person who will field the call when a number looks wrong in March. They read the reconciliation pack and put their name on it. The DBA who ran the load should never be the one who declares it correct.

Two things make that sign-off real. Freeze the source during the cutover window, so you are reconciling against a fixed point instead of a list that is still changing under you. And keep the evidence. When someone asks in March why the supplier count moved, the reconciliation pack and a proper audit trail are the difference between a two-minute answer and a two-week investigation.

This is the step that turns a data load into a migration you can defend. It does not need a tool; it needs an hour, a handful of queries, and someone with the authority to say the data is good. Skip it, and the first 90 days after go-live become an unpaid debugging contract against a system everyone already assumed was correct.

Common questions

What does it mean to validate a migration?

Proving the new system matches the source, not just that the load ran. It is a reconciliation: counts per entity, control totals on key fields, referential and hierarchy integrity, business-key uniqueness, and a hand-checked sample. A clean log only confirms the rows that loaded did not break a constraint.

Isn't a clean import log enough?

No. A clean log means the rows that made it in did not break anything. It says nothing about the rows the loader dropped: an unresolved reference, a truncated value, a merged duplicate, a null replaced with a default. None of those error. They surface later as a report that comes up short.

How many records should I spot-check?

Counts and checksums cover the set statistically, so the manual check stays small. Pull 30 to 50 high-value or high-risk records per domain and diff them field by field. The point is to catch what a count cannot: a misaligned column, an inverted date, a mangled encoding.

Who signs off on the migration?

The data steward who owns the domain, not the DBA who ran the load. IT produces the counts and integrity reports; the business decides the data is good enough to trust and puts their name on it. Freeze the source during cutover and keep the reconciliation evidence.

Migrate, then prove it

Primentra imports your MDS models, entities, and hierarchies through a two-phase wizard that stages every row before it commits, so unresolved references and key collisions surface during validation instead of three weeks after go-live. Counts and integrity reports come straight from your SQL Server, an approval step sits in front of every change, and the audit trail records who signed off and when. It runs on your own database, deploys in a day, and costs €7,500 per year flat. The 60-day trial includes everything.

Start free trial →Try the demo →

How to validate a master data migration before you trust the new system

Why a clean log tells you almost nothing

The reconciliation that proves something

Hierarchies are where the quiet damage hides

Reconciliation is a sign-off, not a script

Common questions

More from the blog

Ready to migrate from Microsoft MDS?