Back to blog
PrimentraPrimentra
·May 25, 2026·9 min read

Your MDM is only as good as its cross-reference table

Home/Blog/Your MDM is only as good as its cross-reference table

Three source IDs, one master record

SAP
4001
Salesforce
0H5L00
NetSuite
12345
MASTER RECORD
SupplierId: 73

The cross-reference table is the bit in the middle. Get it wrong and the diagram does not work.

Most MDM projects I have walked into treat the cross-reference table as plumbing. The data model is the real work. The validation rules are the real work. The cross-reference is something to figure out closer to go-live.

Then go-live happens. SAP wants to know which master record corresponds to its supplier 4001, and the answer is a column on the master. The month after that Finance imports an old AP system, and that system has its own IDs, so now there is a second column. A year later the master row has SapId, OracleId, AccessDbId, LegacyId, NewLegacyId, and a stored procedure with thirty CASE branches for system-to-system lookup.

None of that has to happen. A cross-reference table designed properly takes about an hour. Skipped or done in a hurry, it eats the rest of the project.

What it actually is

A cross-reference table is one row per source identifier per master record. The shape is unremarkable:

CREATE TABLE dbo.SupplierXref (
    SupplierXrefId INT IDENTITY PRIMARY KEY,
    SupplierId     INT NOT NULL,         -- the master ID
    SourceSystem   NVARCHAR(50) NOT NULL,
    SourceId       NVARCHAR(100) NOT NULL,
    FirstSeenAt    DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
    LastSeenAt     DATETIME2 NOT NULL DEFAULT SYSUTCDATETIME(),
    DeactivatedAt  DATETIME2 NULL,
    CONSTRAINT FK_SupplierXref_Supplier
        FOREIGN KEY (SupplierId) REFERENCES dbo.Supplier(SupplierId),
    CONSTRAINT UQ_SupplierXref_Source
        UNIQUE (SourceSystem, SourceId, DeactivatedAt)
);

Nothing clever. What matters is what is in the table that you would not write in a hurry: the source system name, the timestamps, and the deactivation column. Each one prevents a specific failure mode I have watched happen.

The three queries that hit it constantly

If the table is doing its job, three lookups run all day every day:

-- 1. Source to master. The ERP hands you 4001, you hand back the golden record.
SELECT s.*
FROM dbo.Supplier s
JOIN dbo.SupplierXref x ON x.SupplierId = s.SupplierId
WHERE x.SourceSystem = 'SAP'
  AND x.SourceId = '4001'
  AND x.DeactivatedAt IS NULL;

-- 2. Master to source. You are exporting back to SAP and need its ID.
SELECT SourceId
FROM dbo.SupplierXref
WHERE SupplierId = 73
  AND SourceSystem = 'SAP'
  AND DeactivatedAt IS NULL;

-- 3. Coverage. What systems even know this supplier exists?
SELECT SourceSystem, SourceId
FROM dbo.SupplierXref
WHERE SupplierId = 73
  AND DeactivatedAt IS NULL;

All three should be indexed point lookups. The third is what feeds your outbound integrations, so a slow query there is felt in every downstream system at once.

The cardinality decision you cannot defer

Before the first row goes in, you have to decide whether one master record can have more than one source ID per system. The answer is almost always yes. People default to no because it looks tidier on a whiteboard, and almost everyone regrets it.

CardinalityHow it looks on day oneWhat breaks it
One master, one source IDClean. Easy to query.The first acquisition. Two SAP instances, same supplier, different IDs.
One master, many source IDsSlightly more boilerplate.Almost nothing. Default to this even when you think you do not need it.
Many masters, one source IDUnusual but legal.Only useful when one source row legitimately maps to several governed entities. Rare.

The one-to-one model survives until somebody points out that the same supplier has a different number in the UK instance of SAP than in the US one. That happens at every organization that has ever made an acquisition, which is almost all of them. The cheaper move is to assume many-to-one from the start and use it sparingly when you do not need it.

The five mistakes that turn into incidents

The mistakeWhat it costs you
Source IDs as columns on the masterEvery new system is a schema migration. Nullable columns nobody can describe.
No deactivation markerA reused source ID silently links transactions to the wrong master. Finance finds out months later.
No historyYou cannot answer what SAP was calling this record last year. Audit replies become guesses.
Single instance per source assumedFalls over the first time the same SAP supplier has a different number in two regional instances.
No reconciliation jobThe table rots quietly. You find out it is wrong when a downstream consumer notices an unfamiliar record.

The case that hurts the most: a reused source ID

SAP retires customer 4001. Six months later, somebody in finance reuses 4001 for a new customer. If your cross-reference still maps 4001 to the original master, three months of postings now sit against the wrong golden record. By the time anybody notices, you have an audit problem the size of those three months, and reversing the postings is an audit on its own.

The defense is the deactivation timestamp plus a reconciliation job that runs nightly. When 4001 disappears from SAP, the row gets a DeactivatedAt and stops being matched. When 4001 reappears, you do not silently revive the old mapping. You create a new row pointing to whatever master the new entity actually belongs to. The few minutes of bookkeeping each time pay for themselves the first time a reused ID does not become a finance incident.

The reconciliation job almost nobody writes

Cross-references rot quietly. Source systems get cleaned up, IDs get merged, records get archived. If you set the table up once and never check it, you find out it is wrong when a downstream consumer asks about a record nobody recognizes.

A reconciliation job, run nightly or weekly, has two jobs. Find cross-reference rows that no longer match anything in the source. Find source IDs that have no cross-reference row at all. The first list is your orphans, and they need deactivation. The second is your unmapped records, and they need a steward to decide whether they are a new master or a missing mapping to an existing one.

Both lists belong in a queue the steward actually sees, not in a log file nobody opens. The whole point of doing this in MDM is that the exceptions become work for a human who is paid to resolve them, not silent drift in a database.

What this looks like in Primentra

Primentra treats cross-references as a first-class concept rather than columns you bolt onto the master. Every entity supports source mappings out of the box, with the history and deactivation already in the model. Unmapped source records and orphan mappings surface as exceptions in the steward queue during import, so they do not disappear into a stored procedure nobody maintains.

The data still lives in your own SQL Server, so the same join you would write against a homegrown xref table works against the Primentra one. The difference is that you did not have to build it, and nobody on your team has to keep building it as new source systems show up.

Related reading

The cross-reference is half the integration problem. The other half is what happens when two of those source systems disagree about a field on the master. That decision is what survivorship rules are for. If you are still deciding how to wire the MDM into the rest of the stack, the post on four integration patterns covers when batch wins and when API wins. The matching post covers how records find each other before any of this is possible.

Common questions

What is a cross-reference table in MDM?

It is the table that maps every source system identifier for an entity to the one master identifier. One master can have many source IDs. The table is what lets downstream integrations look up the master from a source ID, look up source IDs from a master, and list every system that knows about a master record.

Why not just store source IDs as columns on the master?

Because it stops scaling the third time you add a source system. Each new system becomes an ALTER TABLE on the master, the columns are mostly null, and you cannot represent the same source system having two instances. A separate cross-reference table makes adding a source a single insert.

What happens when a source system reuses an ID?

Without a deactivation marker, transactions silently link to the wrong master. With one, you mark the old mapping inactive when the source ID disappears, and treat the next appearance as a new mapping. The few minutes of bookkeeping prevent a finance incident.

How do you handle multiple instances of the same source?

Identify the source by instance, not by platform. SAP_UK and SAP_US are two sources. One master can have one mapping in each. A one-to-one design across all of SAP breaks the first time you encounter this, and you will encounter it after the first acquisition.

Cross-references that do not break at the third source system

Primentra ships with source mappings, deactivation, history, and a steward queue for unmapped records already wired in. Runs on your own SQL Server, so the data is in a database you can still query directly. The 60-day trial runs on your real source IDs, which is long enough to find out whether the model holds up against the systems you actually have.

Start free trial →Try the demo →

More from the blog

Build vs buy for master data management: rolling your own is how you become the next abandoned MDS10 min readHow to measure master data quality: the six numbers that tell you the truth10 min readSlowly changing dimensions in master data: not every field needs a time machine9 min read

Ready to migrate from Microsoft MDS?

Join the waitlist and be the first to try Primentra. All features included.

Download Free TrialTry DemoCompare MDM tools