What is the difference between a ragged and a balanced hierarchy?

A balanced hierarchy has the same number of levels on every branch: every product is Category then Subcategory then SKU, no exceptions. A ragged hierarchy lets branches end at different depths, so one SKU sits directly under a category with no subcategory in between. Most real structures are ragged. Forcing a fixed-level model onto a ragged reality means the report either drops the short branches or invents empty levels, and the category total stops matching the sum of its children.

How do you govern a master data hierarchy?

Govern the edges, not just the nodes. Enforce one parent per node unless you have deliberately modelled a many-to-many structure. Reject a reparenting that would create a loop (A under B under A) at the moment of save, not in the query that walks the tree. Forbid orphans, so every node either has a parent or is an explicit root. Put an approval step in front of moving a node, because a reparenting silently rewrites every rollup above it. Effective-date the moves so history still totals correctly.

Master data hierarchies: the rollup that counts the same revenue twice

The VP of sales pulled the regional rollup the morning of the board meeting, and the three regions added up to €112 million. The company had booked €100 million. Nobody had invented €12 million of revenue. A handful of national accounts sat under their geographic region and under a separate National Accounts branch at the same time, and the rollup walked both paths, so their sales landed in the total twice. Every individual number was right. The shape they hung on was wrong.

Hierarchies are the part of master data everyone forgets is data. The records get governed: the customer, the cost center, the product all have owners and validation rules. The tree those records hang on gets drawn once in a workshop, loaded, and then quietly edited by whoever needs a node moved, with no rule in front of the edit. It works, because every report resolves and every number looks plausible. It only breaks when something has to add up, and by then the broken edge is six months old and nobody remembers placing it.

A hierarchy is structure laid over master data, and it is data too

A hierarchy has two parts. The nodes are master records you already know: the region, the cost center, the product category. The edges are the parent-child links between them, and they are where the meaning lives. Which division a cost center rolls into decides whose budget it spends. Which category a SKU sits under decides which manager owns its margin. Move an edge and you change an answer, even though no record changed at all.

The nodes get treated as master data. The edges get treated as a drawing. That is the whole problem in one sentence. A new cost center goes through a request and an owner signs off, but attaching it to the wrong parent, or to no parent, is a two-second drag that no one reviews. The structure carries the rollup, and the structure is the least governed thing in the system.

Where the tree breaks

None of these throw an error. The report runs, the tree resolves, and the damage shows up as a total that will not reconcile.

–

The node with two parents

A handful of accounts sit under their region and under a National Accounts branch at the same time. Both links were placed on purpose. Every rollup that starts at the top reaches those accounts twice and books their revenue on each pass, so the regions sum to more than the company ever sold. Nothing is wrong at the record level, which is exactly why nobody can find the extra twelve million.

–

The orphan that drops out

A new cost center is created and never attached to a division. It belongs to no parent, so a top-down rollup walks past it and the company total comes in short. It reads as a soft month, not a missing edge, and the cost center keeps spending against a budget that no report rolls up. The opposite of the double-count, from the same missing rule.

–

The loop that never resolves

Someone reparents node A under B in March, and in October someone else reparents B under A. The recursive query that walks the tree now either runs until it hits a depth limit or returns a number that depends entirely on where it gave up. Two valid edits, each sensible alone, combine into a cycle the structure was never checked against.

–

The move nobody reparented

A plant is sold from one division to another, but its node still hangs under the old division in the hierarchy. Two quarters of results land in the wrong P&L, and both divisions sign off because each number looks roughly like last year. The org changed in a memo. The tree did not, and the tree is what the report reads.

–

The ragged branch in a leveled model

The model assumes every product rolls Category to Subcategory to SKU, but some SKUs sit straight under a category with no subcategory. The fixed-depth report either drops them or invents an empty subcategory to fill the slot. Either way the category total no longer equals the sum of its subcategories, and the variance is the branch that did not fit the shape.

The double-count and the orphan are the same bug seen from two sides. One says a node belongs to too many parents, the other says it belongs to none, and both come from the same missing rule about what an edge is allowed to be. Get that rule right and the rest narrows to two questions: can a node have more than one parent, and what happens when one moves.

Ragged, balanced, recursive: pick the shape on purpose

A balanced hierarchy has the same depth on every branch. Every product is Category, Subcategory, SKU, with no exceptions. It is the easiest to query and the rarest to find in the wild, because real structures grow unevenly. The moment one SKU needs to sit directly under a category, the model is no longer balanced, and pretending it still is forces the report to drop the short branch or pad it with an empty level.

A ragged hierarchy lets branches end where they actually end. It matches the data, but the query has to walk a variable number of levels, which is why people reach for a recursive parent-child model: each node stores its parent, and a recursive query climbs from any node to the root. That model handles ragged depth and reorganizations without a schema change. It is also the one that lets a loop form, so it only holds together if a save-time check refuses any edge that would point a node back at its own ancestor.

Pick the shape because it fits the data, not because it is convenient to report on. A balanced model imposed on ragged reality does not make the reality balanced. It just moves the mismatch into a variance nobody can explain.

Govern the edges, not just the nodes

Four rules carry almost all of the weight, and all four are about the edge rather than the node.

One parent per node, unless you have deliberately decided on a many-to-many structure and your reports know to deduplicate. The National Accounts problem disappears the moment the structure refuses to let an account hang in two branches by accident. If an account genuinely needs to roll up two ways, that is a second hierarchy on the same nodes, not a second parent smuggled into the first.

No orphans. Every node either has a parent or is an explicit, named root. A node attached to nothing is not a small mistake to clean up later, it is a number that silently leaves every total until someone notices the company is quietly short.

No loops, checked when the edge is saved. Walking the tree is the wrong place to discover a cycle, because by then the bad edge is already live and the only symptom is a query that hangs or a depth limit that returns a partial answer. The check belongs at the desk: reject any reparenting that would point a node at one of its own descendants.

An approval step in front of moving a node, because reparenting is the most consequential edit in the whole structure and the least visible. Dragging a plant from one division to another rewrites every rollup above it, retroactively, and the only trace is a number that shifted. Put the move through the same approval workflow a new record gets, and effective-date it so the months before the move still total to the old structure. That last part is the slowly-changing-dimension problem again: the tree has a history, and last quarter's report has to read last quarter's tree.

If you are coming off MDS

MDS leaned hard on hierarchies, and a lot of reporting was built on top of them, which is exactly why they are worth getting right in whatever replaces it. MDS gave you derived hierarchies, generated from a domain attribute so the tree updated when the data did, and explicit hierarchies, where you placed members by hand and consolidated members let a node appear under more than one parent. The explicit kind is where the double-count lived: the structure allowed multiple parents, and nothing downstream was told to expect them.

None of that migrates on its own, and the migration is the moment to decide the shape deliberately rather than carry the old one across by reflex. The detail on what each MDS hierarchy type was and how to rebuild it is in the post on migrating MDS hierarchies. The point here is narrower: the records are the easy part of that migration. The edges are where the meaning is, and the edges are what nobody validated for years.

One more thing makes reorganizations survivable. Join the hierarchy on a surrogate identifier, not on a name or a code that carries position. A cost center keyed to its place in the tree breaks every report the day it moves. A cost center with a stable surrogate can be reparented as often as the org demands, and the edge is the only thing that changes.

Common questions

What is a master data hierarchy?

The parent-child structure laid over a set of master records: cost centers rolling up to divisions, products to categories, regions to a total. The nodes are the master records and usually get governed. The edges between them carry just as much meaning and almost never do. A hierarchy is data, not decoration.

Why does a rollup count the same revenue twice?

Because a node has more than one parent. An account under both its region and a National Accounts branch is reached through two paths from the top, and its revenue is added once on each. The record is correct, both links are intentional, nothing errors. The total is just inflated by whatever hangs in two places.

Ragged or balanced: which should I model?

Most real structures are ragged, meaning branches end at different depths. A balanced model forces the same number of levels everywhere, which is cleaner to query but wrong for the data. Pick ragged unless every branch genuinely has the same depth, and make the report handle short branches instead of dropping them.

How do you govern a hierarchy?

Govern the edges. One parent per node unless you deliberately modelled many-to-many. Reject loops at save, not in the query. Forbid orphans. Put approval in front of moving a node, because a reparenting rewrites every rollup above it. Effective-date the moves so history still totals.

Make your hierarchies total correctly

Primentra governs the tree, not just the records: one parent per node by default, orphans and loops refused at save, and an approval step in front of every reparenting so a move that rewrites a rollup gets reviewed before it lands. Nodes join on a stable surrogate ID, so a reorganization changes an edge instead of breaking every report. It runs on your own SQL Server, deploys in a day, and costs €7,500 per year flat. The 60-day trial runs against your real structure, which is long enough to find the double-counted branches your rollups have been carrying.

Start free trial →Try the demo →