Of all the decisions you make in the first month of an MDM project, none is harder to undo than the format of the master identifier. The data model can be refactored after launch. Validation rules and approval workflow are things teams cheerfully rebuild a year later. The ID is the one thing every downstream system embeds in a join, a foreign key, or an integration log. Once it is out the door, it is out forever.
This is one of the reasons MDS migrations hurt. Teams who picked the wrong identifier in 2014 are living inside the consequence right now, and the migration is the moment when the consequence becomes visible.
The three choices, briefly
There are three practical options for the primary identifier on a master record. Each is a defensible choice in some situation and a quiet disaster in some other.
A business key is something that already means something to a human. An ISO country code. A SAP supplier number. A SKU. Easy to type into a query and easy to recognize on a printout.
A surrogate integer is something the database made up. An auto-increment INT or BIGINT with no meaning outside its own table. Small, fast on lookups, and small enough that the index fits in memory on almost any server you would buy.
A UUID is something the application made up. 128 bits of randomness, globally unique, generated by anyone without coordination. Bigger, uglier, and immune to a lot of problems the other two are not.
| Property | Business key | Surrogate INT | UUID |
|---|---|---|---|
| Size on disk | Variable | 4 or 8 bytes | 16 bytes |
| Readable in a query | Yes | Yes | Awkward |
| Survives source-system renumber | No | Yes | Yes |
| Survives merger of two MDM instances | No | No without renumbering | Yes |
| Generated outside the database | Yes | No | Yes |
| Survives test-to-prod promotion | Sometimes | No | Yes |
Why the business key is the mistake almost everyone makes first
The pull toward business keys is strong, and the reason it is strong is that for the first six months they look like the obvious right answer. You can read the value. You can find a record by typing the number. Stewards do not have to translate between two columns. Everyone agrees on what the field means.
Then a few things start happening. None of them are exotic. They happen at every organization of a certain size eventually.
| The event | What happens to your master ID |
|---|---|
| Acquisition | The acquired company has its own SAP with its own supplier numbers. Supplier 4001 now means two suppliers. |
| Renumbering | Product team aligns SKUs to a new category structure. Twelve months of master records have to be re-keyed. |
| Reuse | Source system retires supplier 4001 in March. Finance reuses 4001 for a new supplier in September. Six months of postings point at the wrong master. |
| Split | A country splits, two-letter ISO code retires. Values you had on file in 2018 no longer exist as valid codes. |
| Rebrand | A product line gets renamed, and the customer-facing SKU prefix changes. Now your "primary key" disagrees with every label printed in the warehouse. |
A business key as a primary identifier turns every one of these into a migration. Every integration that joined on the old value has to be updated. Every report. Every external file format. Every contract that quoted the supplier number on a printed page.
I am not saying business keys should disappear. They are useful. People need them. They just should not be the join key.
Surrogate INT or UUID, when does each one win
With business keys ruled out as the primary identifier, the real choice is between a surrogate INT and a UUID. Both work. They win in different situations.
Surrogate INTs are smaller, faster on lookups, and easy on the eyes. A junior analyst can paste 4827 into a query and find the supplier. The index is half the size of the equivalent UUID index, which matters more than people think on a table that joins to every transaction your warehouse touches.
UUIDs cost you those things, and buy you others. Two systems can generate identifiers at the same time without colliding. A record created in a test environment keeps its identifier when promoted to production, which sounds small until you have spent a Friday afternoon translating IDs in a deployment script. If you ever merge two MDM instances after an acquisition, UUIDs make the merge a copy and INTs make it a re-keying project.
For most mid-market MDMs that run on a single SQL Server and create records in one place, the INT is fine. If your roadmap includes federated authoring, multiple regional instances, or external systems that need to mint master IDs and push them in, start with the UUID. Half a kilobyte of extra index per thousand records is cheap compared to the alternative.
The hybrid almost every long-lived MDM ends up with
Every MDM that survives ten years uses the same pattern. The master identifier is a surrogate, either an INT or a UUID. The business keys live as attributes on the record, or in a cross-reference table next to it. Stewards can search on them. Reports can filter on them. The application can render them on screen. They simply are not the join key.
This is the design that lets business keys exist without ruining you. When SAP renumbers, the cross-reference row moves. The master ID stays where it was. Every downstream consumer keeps working because nothing they joined on actually changed.
If you have not seen the post on cross-reference tables, that is the structure that makes the hybrid work. It is the difference between an MDM that holds up through a decade of source-system changes and one that has to be partially rebuilt the first time SAP merges two instances.
In practice the data model looks something like this. Master record on the left, business keys on the right, joined through a cross-reference table that owns the mapping:
-- The master table: one row per real-world entity, surrogate key only
CREATE TABLE master_supplier (
supplier_id INT IDENTITY(1,1) PRIMARY KEY,
legal_name NVARCHAR(255) NOT NULL,
tax_id NVARCHAR(64),
created_utc DATETIME2 DEFAULT SYSUTCDATETIME()
);
-- The cross-reference: source system identifiers live here, not on the master
CREATE TABLE supplier_xref (
xref_id INT IDENTITY(1,1) PRIMARY KEY,
supplier_id INT NOT NULL REFERENCES master_supplier(supplier_id),
source_system NVARCHAR(64) NOT NULL, -- e.g. SAP_UK, SAP_US, NETSUITE
source_id NVARCHAR(128) NOT NULL, -- the business key in that system
active BIT NOT NULL DEFAULT 1,
deactivated_utc DATETIME2 NULL,
UNIQUE (source_system, source_id, active)
);What MDS got right, and what most MDS migrations get wrong
MDS actually had a sensible split. Every entity carried both an integer Id and a Code field. The Id was the primary key. The Code was the business identifier, intended for humans to recognize.
The mistake a lot of teams made was treating Code as the real identifier anyway. Subscriber views were keyed on Code. Downstream integrations joined on Code. Audit reports referenced Code. When the business changed and Code had to change with it, every one of those consumers broke at the same time.
Migrating off MDS is the moment to undo this. The new MDM is a fresh chance to put the surrogate back where it belongs as the primary key, and to demote Code to a queryable attribute that downstream consumers can read but not depend on. If you carry the same join pattern into the next platform, you are buying yourself the same problem in a different color.
A small number of mistakes that show up over and over
A few specific patterns to avoid, from projects that ran for years and then had to walk things back:
Storing the master ID as a string when it is really an integer, to keep your options open. The flexibility is theoretical. The cost of string comparison on every join is real, every day, forever.
Reusing integer IDs after a hard delete. The next consumer that loads the old ID for a historical lookup gets the wrong record back, silently. Soft-delete or skip the number forever.
Letting the application format the ID for display in ways that make it look meaningful. SUP-00004827 looks tidier than 4827 until someone in finance assumes the leading zeros are significant and writes a report that filters on the wrong string.
Exposing the surrogate publicly in URLs or APIs without thinking about what it leaks. An auto-increment ID tells the world how many records you have and the order they were created in. For some domains that is fine. For others it is information you did not want to give away.
What this looks like in Primentra
Primentra uses an integer master identifier by default and keeps business keys as attributes on the record, where downstream consumers can read them without joining on them. Source system IDs go into the cross-reference table rather than onto the master itself, so renumbering a source does not touch the master ID.
If you are migrating from MDS, the import wizard preserves your old Code values as attributes while the new master IDs are assigned fresh. That gives you a clean break from the pattern of joining on Code, without losing the values stewards have spent years recognizing.
The data lives in your own SQL Server. The IDs are real integers in a real table. You can paste them into a query, embed them in a foreign key, or hand them to a consumer system, and they will mean the same thing tomorrow.
Related reading
The identifier choice is half the story. The other half is the table that maps every source system ID to one master, which is what the cross-reference post covers. If you are still working out how to load the first version of the master data, the post on the initial data load walks through what to do with the legacy keys you inherit. And if you have not picked your first domain yet, how to pick your first MDM domain is the place to start before any of this matters.
Common questions
Should I use a business key as the primary identifier?
No. Business keys look correct for the first six months, then break the first acquisition, the first renumber, the first reuse, or the first country code retirement. Use a surrogate INT or a UUID as the master ID and keep business keys as queryable attributes.
Surrogate INT or UUID?
INT if records are only ever created in one MDM instance, on one database. UUID if records can be created in multiple environments or systems, or if you need test-to-production promotion to preserve identifiers. For most on-premise mid-market MDMs, an INT is fine.
Why are identifiers so hard to change later?
Because every downstream system embeds the ID. Once it is out the door to twelve consumers, changing the format means coordinating a rewrite across all twelve. The cost of getting it right on day one is hours. The cost of changing it later is a year of cleanup.
Can I migrate the master ID after go-live?
You can run dual IDs in parallel and translate at every integration boundary, but most projects that start this never finish it. One downstream consumer always slips through and the dual system becomes permanent.
Get the identifier decision right on day one
Primentra ships with a surrogate master ID and the cross-reference table already wired in, so business keys live where they belong and the join key holds steady through the things that change. Runs on your own SQL Server, where the IDs are integers in a table you can query directly. The 60-day trial runs on your real source systems, which is long enough to find out whether the model holds up against the keys you actually have.