Most MDM projects I have walked into treat the cross-reference table as plumbing. The data model is the real work. The validation rules are the real work. The cross-reference is something to figure out closer to go-live.
Then go-live happens. SAP wants to know which master record corresponds to its supplier 4001, and the answer is a column on the master. The month after that Finance imports an old AP system, and that system has its own IDs, so now there is a second column. A year later the master row has SapId, OracleId, AccessDbId, LegacyId, NewLegacyId, and a stored procedure with thirty CASE branches for system-to-system lookup.
None of that has to happen. A cross-reference table designed properly takes about an hour. Skipped or done in a hurry, it eats the rest of the project.
What it actually is
A cross-reference table is one row per source identifier per master record. The shape is unremarkable:
CREATE TABLE dbo.SupplierXref (
SupplierXrefId INT IDENTITY PRIMARY KEY,
SupplierId INT NOT NULL, -- the master ID
SourceSystem NVARCHAR(50) NOT NULL,
SourceId NVARCHAR(100) NOT NULL,
FirstSeenAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
LastSeenAt DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
DeactivatedAt DATETIME2 NULL,
CONSTRAINT FK_SupplierXref_Supplier
FOREIGN KEY (SupplierId) REFERENCES dbo.Supplier(SupplierId),
CONSTRAINT UQ_SupplierXref_Source
UNIQUE (SourceSystem, SourceId, DeactivatedAt)
);Nothing clever. What matters is what is in the table that you would not write in a hurry: the source system name, the timestamps, and the deactivation column. Each one prevents a specific failure mode I have watched happen.
The three queries that hit it constantly
If the table is doing its job, three lookups run all day every day:
-- 1. Source to master. The ERP hands you 4001, you hand back the golden record. SELECT s.* FROM dbo.Supplier s JOIN dbo.SupplierXref x ON x.SupplierId = s.SupplierId WHERE x.SourceSystem = 'SAP' AND x.SourceId = '4001' AND x.DeactivatedAt IS NULL; -- 2. Master to source. You are exporting back to SAP and need its ID. SELECT SourceId FROM dbo.SupplierXref WHERE SupplierId = 73 AND SourceSystem = 'SAP' AND DeactivatedAt IS NULL; -- 3. Coverage. What systems even know this supplier exists? SELECT SourceSystem, SourceId FROM dbo.SupplierXref WHERE SupplierId = 73 AND DeactivatedAt IS NULL;
All three should be indexed point lookups. The third is what feeds your outbound integrations, so a slow query there is felt in every downstream system at once.
The cardinality decision you cannot defer
Before the first row goes in, you have to decide whether one master record can have more than one source ID per system. The answer is almost always yes. People default to no because it looks tidier on a whiteboard, and almost everyone regrets it.
| Cardinality | How it looks on day one | What breaks it |
|---|---|---|
| One master, one source ID | Clean. Easy to query. | The first acquisition. Two SAP instances, same supplier, different IDs. |
| One master, many source IDs | Slightly more boilerplate. | Almost nothing. Default to this even when you think you do not need it. |
| Many masters, one source ID | Unusual but legal. | Only useful when one source row legitimately maps to several governed entities. Rare. |
The one-to-one model survives until somebody points out that the same supplier has a different number in the UK instance of SAP than in the US one. That happens at every organization that has ever made an acquisition, which is almost all of them. The cheaper move is to assume many-to-one from the start and use it sparingly when you do not need it.
The five mistakes that turn into incidents
| The mistake | What it costs you |
|---|---|
| Source IDs as columns on the master | Every new system is a schema migration. Nullable columns nobody can describe. |
| No deactivation marker | A reused source ID silently links transactions to the wrong master. Finance finds out months later. |
| No history | You cannot answer what SAP was calling this record last year. Audit replies become guesses. |
| Single instance per source assumed | Falls over the first time the same SAP supplier has a different number in two regional instances. |
| No reconciliation job | The table rots quietly. You find out it is wrong when a downstream consumer notices an unfamiliar record. |
The case that hurts the most: a reused source ID
SAP retires customer 4001. Six months later, somebody in finance reuses 4001 for a new customer. If your cross-reference still maps 4001 to the original master, three months of postings now sit against the wrong golden record. By the time anybody notices, you have an audit problem the size of those three months, and reversing the postings is an audit on its own.
The defense is the deactivation timestamp plus a reconciliation job that runs nightly. When 4001 disappears from SAP, the row gets a DeactivatedAt and stops being matched. When 4001 reappears, you do not silently revive the old mapping. You create a new row pointing to whatever master the new entity actually belongs to. The few minutes of bookkeeping each time pay for themselves the first time a reused ID does not become a finance incident.
The reconciliation job almost nobody writes
Cross-references rot quietly. Source systems get cleaned up, IDs get merged, records get archived. If you set the table up once and never check it, you find out it is wrong when a downstream consumer asks about a record nobody recognizes.
A reconciliation job, run nightly or weekly, has two jobs. Find cross-reference rows that no longer match anything in the source. Find source IDs that have no cross-reference row at all. The first list is your orphans, and they need deactivation. The second is your unmapped records, and they need a steward to decide whether they are a new master or a missing mapping to an existing one.
Both lists belong in a queue the steward actually sees, not in a log file nobody opens. The whole point of doing this in MDM is that the exceptions become work for a human who is paid to resolve them, not silent drift in a database.
What this looks like in Primentra
Primentra treats cross-references as a first-class concept rather than columns you bolt onto the master. Every entity supports source mappings out of the box, with the history and deactivation already in the model. Unmapped source records and orphan mappings surface as exceptions in the steward queue during import, so they do not disappear into a stored procedure nobody maintains.
The data still lives in your own SQL Server, so the same join you would write against a homegrown xref table works against the Primentra one. The difference is that you did not have to build it, and nobody on your team has to keep building it as new source systems show up.
Related reading
The cross-reference is half the integration problem. The other half is what happens when two of those source systems disagree about a field on the master. That decision is what survivorship rules are for. If you are still deciding how to wire the MDM into the rest of the stack, the post on four integration patterns covers when batch wins and when API wins. The matching post covers how records find each other before any of this is possible.
Common questions
What is a cross-reference table in MDM?
It is the table that maps every source system identifier for an entity to the one master identifier. One master can have many source IDs. The table is what lets downstream integrations look up the master from a source ID, look up source IDs from a master, and list every system that knows about a master record.
Why not just store source IDs as columns on the master?
Because it stops scaling the third time you add a source system. Each new system becomes an ALTER TABLE on the master, the columns are mostly null, and you cannot represent the same source system having two instances. A separate cross-reference table makes adding a source a single insert.
What happens when a source system reuses an ID?
Without a deactivation marker, transactions silently link to the wrong master. With one, you mark the old mapping inactive when the source ID disappears, and treat the next appearance as a new mapping. The few minutes of bookkeeping prevent a finance incident.
How do you handle multiple instances of the same source?
Identify the source by instance, not by platform. SAP_UK and SAP_US are two sources. One master can have one mapping in each. A one-to-one design across all of SAP breaks the first time you encounter this, and you will encounter it after the first acquisition.
Cross-references that do not break at the third source system
Primentra ships with source mappings, deactivation, history, and a steward queue for unmapped records already wired in. Runs on your own SQL Server, so the data is in a database you can still query directly. The 60-day trial runs on your real source IDs, which is long enough to find out whether the model holds up against the systems you actually have.