Snowflake and Power BI: DirectQuery, Import & the Fabric Alternative

Co-authored by

Moray Taylor — Analytics & Development Manager

Why Pair Snowflake with Power BI in the First Place?

Quick answer: Snowflake is the storage and compute layer enterprise warehouses need; Power BI is the semantic model and reporting layer Snowflake does not provide. GCC enterprises that already run Snowflake on AWS usually pair it with Power BI because the rest of the business already lives in Microsoft 365 and Microsoft Entra ID.

Snowflake is now generally available on the AWS Middle East (UAE) Region, which joins the older AWS Middle East (Bahrain) Region for in-region storage. On the Microsoft side, Azure UAE North and Qatar Central are live, with Saudi Arabia East scheduled for cloud workloads. Most GCC data teams now run a cross-cloud stack by default: warehouse data on AWS, reporting on Azure, identity in Microsoft Entra ID.

The connector choice (DirectQuery, Import, composite, or Fabric mirroring) is not academic. It drives Snowflake credit consumption, dashboard latency, and how much data leaves Snowflake's residency boundary. This post covers the four practical patterns and when each one is the right call. If you are still picking a license tier, the Power BI Pro vs Premium vs Fabric licensing decision tree sits upstream of this decision.

What Does the Snowflake Power BI Connector Actually Support?

Quick answer: The Power BI Snowflake connector supports Import, DirectQuery, custom SQL, role specification, and four authentication modes (Microsoft Entra ID, Snowflake username and password, key pair via ADBC, and service principal). Microsoft Entra ID single sign-on only works with DirectQuery.

The connector is in general availability across Power BI semantic models, Power BI dataflows, Fabric Dataflow Gen2, and Power Apps dataflows. Microsoft documents the current state on the Power Query Snowflake connector page.

Authentication Modes That Trip Most Teams Up

Authentication	Refresh in Service	SSO to Snowflake	Notes
Microsoft Entra ID (recommended)	Yes (with workspace identity)	DirectQuery only	Required for end-user identity to flow into Snowflake row access policies
Snowflake username and password	Yes	No	Being deprecated by Snowflake; do not build new connections on this
Key pair authentication (ADBC)	Yes	No	Service-level identity, certificate-backed
Service principal (SPN)	Yes (depends on Snowflake config)	No	Application-level identity for automation

The most common production pattern in GCC enterprises is Microsoft Entra ID for the user-facing semantic model plus a service principal or key pair for any nightly Import refresh. Snowflake's username and password mode is being phased out per Snowflake's own announcement, so new projects should not depend on it.

Connector Implementation 2.0 Is Worth the Upgrade

Since July 2025 the connector uses the Apache Arrow Database Connectivity (ADBC) driver by default for new connections. It pulls large result sets faster and uses less serialization overhead. Two real-world consequences:

Memory consumption can be higher than the old ODBC driver. On Premium capacities, some semantic models that previously refreshed cleanly now hit Resource Governing cancellations. The fix is either raising capacity memory or trimming the imported dataset.
The COUNT DISTINCT pattern has a documented incorrect-result bug in the 2.0 driver, which Microsoft tracks as a known issue. If a measure depends on COUNT DISTINCT at the SQL layer, validate against the source before trusting the report.

For existing connections built before March 2025, opt in by adding Implementation="2.0" to Snowflake.Databases in the M query and validate row counts before promoting.

When DirectQuery on Snowflake Earns the Credit Spend

Quick answer: Use DirectQuery for Snowflake when data freshness matters more than dashboard latency, when the dataset is too large for VertiPaq, or when row-level security must be enforced at the Snowflake layer. Otherwise, Import is usually cheaper and faster.

DirectQuery sends every visual interaction back to Snowflake as a SQL query against a virtual warehouse. Filter changes, slicer clicks, and report loads each wake a warehouse and burn credits. The economics only work when DirectQuery is doing something Import cannot.

Warehouse Sizing for DirectQuery Workloads

Snowflake virtual warehouses scale by t-shirt size (XS, S, M, L, XL, and up). Larger warehouses are not faster for simple aggregates; they are faster for complex joins and high concurrency. For Power BI DirectQuery, the practical rules:

Start at XS or S for typical BI aggregations. Most Power BI visuals fire SUM, COUNT, or GROUP BY queries that complete in seconds on the smallest warehouses.
Dedicate a separate warehouse to Power BI. Mixing BI queries with ETL on the same warehouse makes credit attribution impossible, and lets one bad query starve the dashboards.
Set auto-suspend to 60 seconds. Snowflake bills per-second after the first minute, so a warehouse that suspends quickly between report sessions can cost a fraction of one that stays warm.
Raise the size only if concurrency or join complexity demands it. A multi-cluster warehouse handles concurrent dashboard users better than a single very large warehouse.

Snowflake publishes detailed guidance in its own end-to-end analytics with Power BI guide, which covers query tag patterns useful for cost attribution.

Query Pushdown: What Folds and What Does Not

DirectQuery works because Power Query translates the semantic model's operations into native Snowflake SQL. Most operations fold cleanly: filters, joins between modeled tables, visual-level aggregations, and basic transformations. What does not fold well:

Complex M transformations applied after the source query
DAX measures that force per-row evaluation across the whole table (uncached iterators on large fact tables)
Mixed-source joins between Snowflake and a non-foldable source

Each non-folded operation forces Power BI to pull more rows than necessary, which inflates Snowflake compute and the network egress bill. The first thing to do after building a DirectQuery model is open Performance Analyzer in Power BI Desktop, capture the SQL Snowflake actually receives, and confirm the query is doing what you expect.

DirectQuery Decision Criteria

Use DirectQuery when at least one is true:

Data freshness is measured in minutes, not hours, and Snowflake is the source of truth.
Dataset size exceeds Premium capacity memory after compression (typically beyond several hundred GB).
Row-level security needs to enforce Snowflake row access policies based on the authenticated user's identity, which requires Microsoft Entra ID SSO and DirectQuery.
Regulatory rules prohibit copying personal data into the Power BI service. For UAE PDPL and Saudi PDPL contexts, the data residency guide for Power BI in the GCC walks through the boundary analysis.

Import Mode: Faster Dashboards, Lower Snowflake Bills

Quick answer: Import mode pulls Snowflake data into Power BI's VertiPaq columnar engine, where queries run from memory. Dashboards run much faster, Snowflake compute is hit only at refresh time, and total cost is usually lower for any dataset that fits in capacity memory.

The trade-off is freshness. Imported data is only as current as the last refresh: by default once or twice a day on Pro, up to 48 times a day on Premium and Fabric capacities. For most enterprise dashboards, daily freshness is sufficient and the cost savings are substantial.

When Import Is the Right Default

Import wins in the common cases:

Executive dashboards refreshing nightly against curated marts
Operational reports where same-day data is enough
Self-service workspaces where dozens of users hit the same visuals concurrently
Any model where the warehouse query cost would exceed the storage cost in Power BI capacity

The pattern: build dimensional models in Snowflake (star schema, narrow fact tables, conformed dimensions), pull those marts into Power BI on a schedule, and serve the cached model to users. The star schema vs snowflake schema comparison for Power BI covers the modeling decisions that make Import refreshes cheap.

Refresh Cost Modeling

A practical estimate for an Import semantic model:

Refresh cost = (Snowflake warehouse credits per refresh)
             + (Power BI capacity time consumed by refresh)
             + (Egress, if any, from AWS to Azure region)

The first term is bounded: a nightly refresh runs once, the warehouse suspends after, and the bill stops. With DirectQuery, that first term becomes a function of dashboard interaction volume, which is much harder to predict and budget.

Incremental Refresh Is the Standard Production Pattern

For fact tables larger than a few million rows, configure incremental refresh in Power BI: partition by date, refresh only the most recent N days, and keep historical partitions cached. The result is a refresh that completes in minutes instead of hours and queries Snowflake for only the new rows. Microsoft documents the pattern in the incremental refresh guide for Power BI.

Composite Models: DirectQuery Plus Import in One Semantic Model

Quick answer: Composite models let one semantic model combine DirectQuery to Snowflake with Import tables and DAX calculations layered on top. Use the pattern for hot-cold splits: keep recent transactional data live via DirectQuery while caching historical aggregates in VertiPaq.

The composite model documentation on Microsoft Learn describes the storage modes Power BI supports per table:

Import: cached in VertiPaq memory
DirectQuery: queried live from source
Dual: Power BI decides per query whether to use the cached copy or the live source

A common Snowflake pattern: model dimensions as Dual (cached, but available for join pushdown when the fact table is queried live), the historical fact as Import, and the current-period fact as DirectQuery. The model reports against live current-day numbers while the year-to-date aggregates serve instantly from the cache.

Composite models add complexity. Relationships that cross storage modes are limited, security boundaries get harder to reason about, and DAX behavior can differ from a pure-Import model. Reach for composite models when the data volume genuinely requires it, not as a default.

The Fabric Alternative: Mirroring Snowflake into OneLake

Quick answer: Microsoft Fabric mirroring continuously replicates Snowflake tables into OneLake as Delta tables, with no ETL pipeline. From there, Power BI semantic models can query the mirrored copy at OneLake speed without any Snowflake compute on the read path. The Fabric compute used for replication is free; you still pay Snowflake for the change-data-capture queries.

Snowflake mirroring in Microsoft Fabric is generally available and works by reading Snowflake streams, replicating the changes into OneLake in Delta Parquet format, and exposing the result through a SQL analytics endpoint. If the workload pattern is event-driven rather than tabular replication, the Microsoft Fabric Real-Time Intelligence stack (Eventstream, Eventhouse, Real-Time hub) is the better fit. The pattern shifts the workload boundary:

Snowflake remains the system of record and the place where transactional data lands
Fabric becomes the read-optimized analytics layer for Power BI
DirectQuery against the mirrored copy avoids waking Snowflake warehouses

When Mirroring Beats DirectQuery

Mirroring earns its place when these are true:

The organization already pays for a Fabric capacity (F-SKU or P-SKU equivalent)
The dashboard read volume against Snowflake is high enough that DirectQuery credit consumption is material
Near-real-time freshness (seconds to minutes) is acceptable — mirroring is continuous but not synchronous
The Snowflake schemas are stable enough that schema-change handling does not become operational drag

The cost model is genuinely different. Microsoft does not charge for the Fabric compute that performs the replication or for OneLake ingress. Snowflake still charges for the virtual warehouse compute that produces the change rows and for the cloud services compute behind metadata queries. For high-traffic dashboards, the net total bill is usually lower than equivalent DirectQuery, but only above a certain read volume threshold. For a single dashboard refreshed daily, mirroring is overkill.

Mirroring Permissions and Network Setup

Mirroring requires the following Snowflake permissions on the source database: CREATE STREAM, SELECT TABLE, SHOW TABLES, and DESCRIBE TABLES. If the Snowflake instance is behind a private endpoint or firewall, the Fabric workspace needs either a virtual network data gateway or an on-premises data gateway with network reachability to Snowflake.

Granular Snowflake security (masking policies, row access policies, dynamic data masking) does not flow into the mirrored copy. Any access controls have to be rebuilt in Fabric using SQL granular permissions or Power BI row-level security. For regulated workloads, that re-implementation cost is the main reason to stay on direct DirectQuery instead of mirroring.

OneLake Shortcuts vs Mirroring: Different Tools

Quick answer: OneLake shortcuts are pointers to existing storage (S3, ADLS Gen2, Google Cloud Storage, Azure Blob, Dataverse). They do not currently target Snowflake directly. Mirroring is the right Snowflake-into-Fabric path; shortcuts are useful when the Snowflake data lands first in S3 or ADLS as Iceberg or Parquet.

One hybrid pattern that does work: Snowflake unloads daily aggregates into an S3 bucket as Parquet, and a OneLake shortcut points at that bucket. Power BI semantic models in Direct Lake mode then read the data from OneLake at memory speed without ever touching Snowflake compute on the read path. The OneLake shortcuts documentation lists the supported external sources and the per-item limits (up to 100,000 shortcuts per item, 10 shortcuts per path).

The trade-off is operational: now there is a Snowflake unload job to maintain. For most teams, mirroring is the cleaner option because it removes the unload pipeline entirely.

Cost Comparison: Direct Snowflake Compute vs Fabric Capacity

Quick answer: Snowflake compute scales with query volume and warehouse size, while a Fabric capacity is a fixed monthly commitment for compute and storage. The break-even point is dashboard read volume: below a threshold, DirectQuery on Snowflake is cheaper; above it, a Fabric capacity with mirroring usually wins.

The honest version: cost modeling for this depends on metrics nobody publishes externally. Snowflake credit consumption is driven by warehouse size, query complexity, and concurrency. Fabric capacity pricing depends on region and reserved-instance terms. The general shape:

Scenario	DirectQuery cost behavior	Fabric mirroring cost behavior
Single executive dashboard, refreshed daily	Lowest	Capacity overhead exceeds the workload
Departmental reporting, dozens of users, hourly refresh	Linear with usage	Capacity may be partially utilized
Enterprise platform, hundreds of users, real-time slicing	Highest	Capacity utilization improves
Mixed BI plus data science plus engineering on same platform	Multiple warehouses needed	Single Fabric capacity consolidates compute

For the GCC market specifically, the operational dimension matters as much as the credit math. A Fabric capacity is a Microsoft-managed environment that fits the Microsoft 365 and Microsoft Entra ID stack most enterprises already standardize on. A separate Snowflake bill, managed by a different team, often introduces procurement and SSO friction that is hard to see in a unit cost comparison. The Power BI licensing cost optimization guide walks through the per-user and capacity math on the Microsoft side.

Recommended Pattern by Use Case

Quick answer: Default to Import for most dashboards. Use DirectQuery only when freshness, scale, or row-level security in Snowflake makes it unavoidable. Use composite models for hot-cold splits. Use Fabric mirroring when total dashboard read volume exceeds what direct DirectQuery can absorb economically and the organization already runs Fabric.

Use case	Recommended pattern
Daily executive KPIs	Import with scheduled refresh
Operational dashboards with hourly freshness	Import with incremental refresh
Real-time fraud or operations monitoring	DirectQuery on a dedicated warehouse
Mixed historical plus current-day reporting	Composite model (Import plus DQ)
High-volume self-service across many business units	Fabric mirroring plus Direct Lake
Regulated workload using Snowflake row access policies	DirectQuery with Microsoft Entra ID

For DAX-heavy semantic models on top of any of these patterns, the DAX best practices guide covers the measure patterns that perform well on both Import and DirectQuery storage modes.

Frequently Asked Questions

Q: Does Microsoft Entra ID SSO work with Import mode for Snowflake?

No. Microsoft Entra ID single sign-on for the Snowflake connector only supports DirectQuery. For Import, refresh runs under a service account: a Snowflake username and password (being deprecated), key pair authentication via ADBC, or a service principal. If end-user identity needs to flow into Snowflake row access policies, the semantic model has to use DirectQuery.

Q: Can I switch a semantic model from Import to DirectQuery without rebuilding?

Not directly. Power BI Desktop does not support converting an existing Import model to DirectQuery in place. The supported path is to rebuild the model with DirectQuery as the storage mode and copy over the measures, relationships, and visual definitions. For composite models, you can add DirectQuery tables alongside existing Import tables, which is the closer-to-supported migration approach.

Q: How much will DirectQuery to Snowflake actually cost?

It depends on dashboard interaction volume, warehouse size, and how well your queries fold. A typical pattern: dedicate an XS or S warehouse to Power BI, set auto-suspend to 60 seconds, and instrument with Snowflake query tags so you can attribute credits to specific reports. Run for a month, measure, and adjust. There is no useful generic answer below the unit-cost level, because usage variance dominates.

Q: Is Microsoft Fabric mirroring of Snowflake generally available?

Yes. Mirroring Snowflake databases into Microsoft Fabric is in general availability. The Fabric compute that performs the replication is free; Snowflake virtual warehouse compute is still charged for the change-data-capture queries that produce the replicated rows. Granular Snowflake permissions like row access policies do not flow through into the mirrored Delta tables, so those have to be rebuilt in Fabric.

Q: Does Snowflake on AWS UAE satisfy UAE PDPL data residency for Power BI?

Snowflake's AWS Middle East (UAE) Region keeps Snowflake-stored data in the UAE. Power BI Premium and Fabric capacities can be pinned to Azure UAE North so report processing also stays in-region. The combined architecture is a defensible posture under the UAE PDPL, but every cross-border data flow (Microsoft Entra ID tenant region, support telemetry, Power BI metadata) needs to be validated against the specific obligations of the organization. See the UAE PDPL section of the GCC residency guide linked earlier in this post.

Q: What is the COUNT DISTINCT bug in the Snowflake connector 2.0?

Microsoft has documented an issue in the Snowflake connector implementation 2.0 (ADBC driver) where queries using COUNT DISTINCT logic can return incorrect results. The fix is in progress per the connector documentation. Until it lands, validate any COUNT DISTINCT measure against the source: either run the same query directly in Snowflake, or temporarily revert that connection to the ODBC implementation by removing Implementation="2.0" from the M query.

Q: Should I use the Snowflake native connector or ODBC for Power BI?

Use the native Snowflake connector. Microsoft's Power Query Snowflake connector is the supported path. The connector now defaults to the ADBC driver (Implementation 2.0) for new connections since July 2025. The older ODBC implementation remains available as a fallback for connections that hit a 2.0 bug, but new development should not target it.