Star Schema vs Snowflake in Power BI: Performance Comparison

Co-authored by

Moray Taylor — Analytics & Development Manager

Why Does Star Schema vs Snowflake Schema Matter in Power BI?

Quick answer: Star schemas deliver better compression, faster queries, and more predictable DAX behaviour in Power BI because the VertiPaq engine and DAX language are explicitly optimised for dimensional models with separated fact and dimension tables.

Every Power BI report visual generates a query against the semantic model. That query filters, groups, and summarises data. The shape of your data model determines how efficiently the VertiPaq storage engine answers those queries. Choose the wrong schema and you get bloated models, slow slicers, and subtle calculation errors that only surface when a finance director questions why the numbers do not add up.

Microsoft's official guidance is unambiguous: star schema design is "highly relevant to developing Power BI models that are optimized for performance and usability." SQLBI's research goes further, stating that "a star schema is always the best choice" based on benchmarks against a four-billion-row fact table.

This guide breaks down exactly why — grounded in VertiPaq engine internals — and covers the scenarios where snowflake schemas might still be acceptable.

What Is a Star Schema in Power BI?

Quick answer: A star schema places a central fact table (transactions, events, measurements) surrounded by denormalised dimension tables (products, customers, dates, geographies) connected via one-to-many relationships.

Dimension tables describe business entities (products, customers, dates), while fact tables record events (sales, shipments, inventory snapshots). Each dimension connects directly to the fact table through a single relationship, creating the star shape.

Key characteristics for Power BI:

Dimension tables are wide (many descriptive columns) but short (thousands to low millions of rows). They enable filtering and grouping.
Fact tables are narrow (mostly foreign keys and numeric measures) but tall (millions to billions of rows). They enable summarisation.
Relationships are one-to-many, flowing from dimension (one side) to fact (many side). Filter propagation moves in one direction by default.

This separation directly aligns with how the VertiPaq columnar engine compresses and scans data. A Category column with 12 distinct values compresses dramatically better in a small dimension table than when repeated across billions of fact rows.

For a deeper look at how relationships work in Power BI, see our relationships and semantic model guide.

What Is a Snowflake Schema and How Does It Differ?

Quick answer: A snowflake schema normalises dimension tables into multiple related sub-tables — for example, splitting a single Product dimension into separate Product, Subcategory, and Category tables — creating longer relationship chains between the fact table and the outermost dimension attributes.

In a classic snowflake design, a product dimension becomes three tables: DimProduct links to DimProductSubcategory, which links to DimProductCategory. The fact table connects only to DimProduct, and filters must propagate through two additional hops to reach category-level attributes.

This normalisation reduces data redundancy within the dimension layer. In a disk-based relational warehouse, that reduction can matter. In Power BI's in-memory VertiPaq engine, the calculus is different.

Microsoft Learn states: "Generally, the benefits of a single model table outweigh the benefits of multiple model tables." The documentation lists specific drawbacks:

Power BI loads more tables, which is less efficient from storage and performance perspectives
Tables require additional columns for model relationships, increasing overall model size
Longer relationship filter propagation chains must be traversed
Hierarchies cannot span multiple tables, making drill-down paths harder to implement
The Data pane becomes cluttered with tables that may contain only one or two columns

How Does VertiPaq Handle Star Schema vs Snowflake Schema Differently?

Quick answer: VertiPaq's columnar compression (dictionary encoding, value encoding, and run-length encoding) works most efficiently when dimension attributes live in small, separate dimension tables rather than repeated across fact table rows or fragmented across chains of normalised sub-tables.

Dictionary Encoding and Value Encoding

VertiPaq compresses each column independently, choosing between two primary encoding strategies:

Value encoding applies to integer columns where a mathematical transformation (such as subtracting a base value) can reduce storage. A column containing values 10,000 and 10,001 becomes 0 and 1 internally.
Dictionary encoding (also called hash encoding) builds a lookup table of distinct values and replaces each cell with a compact index. A Country column with 195 distinct values stores a dictionary of 195 entries and replaces every row's text with a small integer index.

In a star schema, the Country column lives in a Geography dimension with perhaps 195 rows. The dictionary has 195 entries, and the column is tiny. In a flat or denormalised model, that same Country column sits in the fact table across millions or billions of rows — the dictionary is the same size, but the index column storing row-level references is vastly larger.

Run-Length Encoding

After dictionary or value encoding, VertiPaq applies run-length encoding (RLE) to columns where consecutive rows share the same value. Sorted columns compress spectacularly under RLE — a million consecutive rows with the value "Electronics" become a single entry: "Electronics, count: 1,000,000."

Dimension tables benefit disproportionately from RLE because they are small, often sorted, and contain columns with low cardinality. Fact tables benefit too, but only for their narrow foreign key columns — not for denormalised text attributes that would otherwise live in dimensions.

The Compression Impact

SQLBI's benchmark quantified this on a four-billion-row dataset:

Metric	Star Schema ("Slim")	Single Table ("Fat")
Model size in RAM	16.80 GB	44.51 GB
Size ratio	1x	2.65x larger

The star schema model consumed less than 40% of the memory required by the denormalised equivalent. Snowflake schemas fall between these two extremes — they avoid fact-table bloat but introduce additional relationship overhead and redundant key columns across the normalised dimension chain.

How Do Star and Snowflake Schemas Compare on Key Performance Metrics?

Quick answer: Star schemas outperform snowflake schemas in Power BI across nearly every metric that matters for real-world reporting: slicer speed, DAX calculation time, filter propagation efficiency, and model size.

Criteria	Star Schema	Snowflake Schema
VertiPaq compression	Optimal — dimension attributes in small tables compress well via RLE and dictionary encoding	Moderate — more tables and key columns add overhead, though better than flat models
Filter propagation	Single hop from dimension to fact	Multi-hop chains (e.g., Category to Subcategory to Product to Fact)
Slicer performance	Scans dimension table (thousands of rows)	Scans outermost table then propagates through chain — additional latency
DAX complexity	Standard patterns work as expected	Longer chains may require explicit `CROSSFILTER` or `TREATAS` patterns
Model size	Smallest for equivalent data	Larger due to additional tables, key columns, and relationship metadata
Hierarchy support	Single-table hierarchies (Category > Subcategory > Product)	Cannot create cross-table hierarchies — must denormalise anyway
Data pane usability	Fewer tables, cleaner navigation	More tables, potentially confusing for report authors
Auto-exist behaviour	Predictable — dimensions are separate tables	Can produce unexpected filter interactions in certain configurations

Slicer Performance: The Most Dramatic Difference

SQLBI's testing revealed the starkest contrast in slicer queries. To populate a Year slicer, the star schema retrieves ten distinct values from a Date dimension table containing a few thousand rows — execution time is approximately one millisecond. A flat model must scan the entire four-billion-row fact table to find those same ten distinct values, taking roughly five seconds. Snowflake schemas perform closer to star schemas here (the outermost dimension table is still small), but the additional relationship traversal adds measurable overhead when multiple slicers interact.

Why Do Flat Tables Cause Incorrect Results in Power BI?

Quick answer: Flat (denormalised) tables trigger DAX auto-exist behaviour that silently filters out valid dimension combinations, producing incorrect totals and misleading reports — a problem that star schemas avoid entirely.

SQLBI's research on auto-exist documents a subtle but critical issue. When multiple columns from the same table are used as filters (which happens constantly in flat models), DAX applies an optimisation called auto-exist that intersects filter values based on combinations that actually exist in the data. If a product has never been sold in a particular region, that combination is silently excluded — even when business logic requires it to appear.

In a star schema, filters on Product[Category] and Geography[Region] operate on different tables. Auto-exist does not apply across tables, so cross-joins produce the complete set of combinations and the result is correct.

The Missing Data Problem

A related issue occurs when fact table rows are absent for certain dimension members. In a flat model, if no transactions exist for a particular customer, that customer disappears entirely from slicers and visuals — there is no dimension row to anchor on. In a star schema, the Customer dimension table independently contains all customers regardless of whether they have transactions, ensuring complete visibility.

Refactoring a Flat Table to Star Schema in Power Query

If you have inherited a flat table, Power Query provides the tools to restructure it:

Identify dimension candidates — columns with low cardinality (categories, regions, customer segments, date attributes).
Create dimension queries using Reference — right-click the source query and select "Reference" to create a new query from the same source.
Isolate dimension columns — use "Remove Other Columns" to keep only relevant attributes, then "Remove Duplicates" to produce a distinct list.
Add a surrogate key — add an Index column as the primary key.
Merge keys back to the fact query — merge each dimension into the fact query, replace text columns with key references.
Validate relationships — verify one-to-many relationships between each dimension key and the corresponding fact foreign key.

For organisations migrating from spreadsheet-based reporting, our Excel to Power BI migration guide covers this flat-to-star transformation in the context of replacing workbook-based analytics.

When Is a Snowflake Schema Acceptable in Power BI?

Quick answer: Snowflake schemas are acceptable when a dimension is extremely large (millions of rows), when row-level security requires a separate security mapping table, or when the source system's normalised structure provides a compelling maintenance benefit that outweighs the performance cost.

Microsoft does not say "never use snowflake schemas" — it says the benefits of denormalising "generally" outweigh keeping the normalised structure. Here are the scenarios where a snowflake dimension can be justified:

Very Large Dimensions

If a dimension table itself contains millions of rows (for example, a customer dimension in a telecom company with 50 million subscribers), normalising stable attributes like Region > City > District into a separate geography sub-table can reduce the overall dimension size. The storage savings may offset the filter propagation cost.

Row-Level Security Mapping Tables

When implementing dynamic row-level security, you often need a mapping table that connects user principals to dimension values (for example, which sales regions each user can see). This table naturally forms a snowflake extension off the main dimension. SQLBI has documented how moving security filters to a smaller table can reduce the performance cost of RLS evaluation.

Source System Already Normalised

If data arrives from a third-normal-form relational database and the dimension is small enough that performance impact is negligible, the maintenance cost of writing Power Query denormalisation logic may not justify the marginal gain. This is a pragmatic trade-off, not a best practice.

In all cases, snowflake schemas cannot support cross-table hierarchies. Microsoft recommends creating a denormalised view (using Power Query merge) for hierarchy support, even if you keep the snowflake structure for filtering.

How Should You Choose Between Star and Snowflake for a New Power BI Model?

Quick answer: Default to star schema unless you have a documented, specific reason to normalise a dimension — the performance, correctness, and usability benefits are too significant to give up without justification.

For every new semantic model, begin with a star schema. Create denormalised dimension tables for each business entity (Date, Product, Customer, Geography) connected to fact tables. Consider snowflake extensions only if a dimension exceeds one million rows, RLS requires a separate mapping table, or source system constraints make denormalisation impractical.

After building the model, use VertiPaq Analyzer (available in DAX Studio and Tabular Editor) to measure actual compression ratios and column sizes. Use Performance Analyzer in Power BI Desktop to measure query execution times — pay special attention to slicer population queries and visuals that filter across dimension chains. Our DAX best practices guide covers performance profiling workflows with DAX Studio and VertiPaq Analyzer in detail.

Organisations migrating from Qlik Sense, which uses an associative engine with different modelling assumptions, should pay particular attention to schema design during the transition. Our Qlik to Power BI migration guide covers how to translate associative models into proper star schemas.

Frequently Asked Questions

Q: Should I always use a star schema in Power BI?

Yes, for the vast majority of cases. Microsoft's guidance, SQLBI's benchmarks, and VertiPaq's internal optimisations all align on this. Star schemas deliver better compression, faster queries, correct DAX behaviour, and a cleaner authoring experience. The only exceptions involve very large dimensions, RLS mapping tables, or source-system constraints — and even then, snowflake portions should be limited to specific dimensions, not the overall architecture.

Q: Does a snowflake schema save storage in Power BI?

Not typically. VertiPaq already handles redundancy efficiently through dictionary encoding — repeated category names compress well because the engine stores distinct values once and uses compact integer indices for each row. The additional tables, key columns, and relationship metadata that snowflake schemas introduce often consume more memory than the redundancy they eliminate.

Q: How does filter propagation differ between star and snowflake schemas?

In a star schema, a filter propagates through a single one-to-many relationship to reach the fact table — one hop. In a snowflake schema, a filter on Category must propagate through Subcategory, then Product, then to the fact table — three hops. Each hop adds processing overhead. The difference is negligible for simple queries but compounds in complex reports with multiple slicers and cross-filtering.

Q: Can I create hierarchies across snowflake dimension tables in Power BI?

No. Power BI hierarchies require all columns to belong to a single table. If your Product dimension is split across three tables, you cannot create a Category > Subcategory > Product drill-down. The workaround is merging the snowflake tables into a single dimension in Power Query for hierarchy support — which effectively recreates a star schema for that dimension.

Q: What is auto-exist and why does schema choice affect it?

Auto-exist is a DAX optimisation that filters dimension combinations based on what actually exists in the data when multiple columns from the same table are used as filters. In flat models, auto-exist can silently exclude valid business combinations (like a new product in a new region with no sales yet), producing incorrect totals. In a star schema, filters on different dimensions operate on separate tables, so auto-exist does not cross-filter between them — the result includes all expected combinations correctly.

Q: How do I convert an existing flat model to a star schema?

Use Power Query's Reference feature to create dimension queries from the existing flat table. For each dimension, reference the source query, use "Remove Other Columns" to keep only relevant attributes, apply "Remove Duplicates," and add an Index column as a surrogate key. Merge each dimension back into the fact query to replace text columns with key references. After loading, verify that one-to-many relationships are correctly established.

Q: Does Microsoft Fabric change the star schema vs snowflake recommendation?

No. Fabric semantic models use the same VertiPaq engine as Power BI Desktop and the Power BI service. The Fabric dimensional modelling documentation explicitly recommends star schema design for Fabric warehouses and lakehouses. Direct Lake mode — which reads Parquet files directly from OneLake into VertiPaq — benefits from the same columnar compression advantages. The guidance remains consistent: denormalise dimensions, separate facts from dimensions, keep relationship chains short.