See your company like never before
Power BI deep-dives, migration playbooks, and data strategy for enterprise teams.


Join dozens of organizations who have moved to Beyond The Analytics. Book your personalized demo today
How to materialize point-in-time snapshots from a Power BI semantic model into a Fabric lakehouse using notebooks, semantic-link, and scheduled pipelines.
Quick answer: A snapshot table is a Delta table in a Fabric lakehouse that captures the state of a semantic model's data at a fixed point in time. You build one when you need a defensible historical record (slowly changing dimensions, end-of-month KPI values, audit trails) because Power BI semantic models and Direct Lake on OneLake only keep the latest version of the data.
Most GCC finance and operations teams hit the same wall about six months into a Fabric rollout. The Power BI dashboards work, the semantic model refreshes nightly, and the numbers look right today. Then a CFO asks a reasonable question: "What did our DSO look like on the 28th of every month for the last two years?" The dashboard cannot answer. The model holds the current state, not the history.
The reflex is to enable OneLake integration on the semantic model and call it done. That is not a snapshot. As of 2026-05, Microsoft Learn is explicit about the limit: "only a single version of the delta tables is exported and stored on OneLake, with old versions being deleted after a successful export." OneLake integration gives you a live mirror, not a time series.
To get a real snapshot table, you have to write one. The supported pattern in Fabric is a notebook that reads from the semantic model using semantic-link (sempy), stamps each row with a snapshot date, and appends or merges into a Delta table inside a lakehouse. A scheduled pipeline triggers the notebook on whatever cadence the business actually needs (daily, month-end, quarter-end). That is the rest of this post.
If your team is still deciding whether Fabric capacity is the right home for this kind of workload, the Power BI Pro vs Premium vs Fabric licensing guide covers the SKU decisions that have to land first. If you are running real-time workloads alongside batch snapshots, the Microsoft Fabric real-time analytics walkthrough shows where Eventhouse fits next to a lakehouse.
Quick answer: You need a snapshot table when the business question is "what did this look like on date X" and the answer cannot be reconstructed from the live source. The four common cases are slowly changing dimensions, period-end KPIs, regulatory audit trails, and reporting on figures that have since been restated.
Four use cases cover almost every snapshot a BTA client has ever asked for.
When a customer changes credit limit, a product moves between categories, or an employee changes department, the source system usually overwrites the previous value. Reports that join old fact rows to the new dimension produce wrong answers for old periods. A snapshot table holds the dimension as it existed on each business date, with valid_from and valid_to columns. The Fabric community pattern using Delta MERGE handles this cleanly inside a notebook (Fabric community thread on SCD Type 2 with MERGE).
Treasury teams in UAE and Saudi groups usually need DSO, working capital, AR aging buckets, and cash position captured at every month-end and quarter-end. These figures must match the financial statements filed with the parent group, not the live numbers that have since drifted. A snapshot job that runs on the last business day of each month writes one row per entity per KPI and never gets overwritten.
UAE PDPL Article 6 and DIFC DP Law 2020 both require demonstrable records of data state at the time a decision was made. If a customer disputes a transaction or a regulator asks for the customer record as it stood on a specific date, "the dashboard updates nightly" is not an acceptable answer. A snapshot table dated to the decision moment is.
In group reporting, prior-period figures often get restated (a JV adjustment, a reclassification, a corrected exchange rate). Boards and regulators expect to see both the originally reported number and the restated number. The originally reported number only exists if you captured it before the restatement landed.
If none of those four cases applies, do not build snapshot infrastructure. Direct Lake on OneLake and a refreshable semantic model are simpler and cheaper.
Quick answer: The supported pattern is to read tables or DAX measures from the semantic model in a Fabric notebook using sempy, add a snapshot_date column, and write the result to a Delta table in a lakehouse. A Fabric data pipeline orchestrates the run on a schedule.
The flow has five components and one cardinal rule.
| Component | Role |
|---|---|
| Semantic model | Source of truth for the business logic. Holds the dimensions, facts, and measures you want to snap. |
| Fabric notebook | Reads from the model with sempy.fabric, stamps rows, writes to Delta. |
| Lakehouse | Holds the snapshot Delta tables. Tables are partitioned by snapshot_date. |
| Data pipeline | Triggers the notebook on a schedule (daily, end-of-month, end-of-quarter). |
| Reporting layer | A second semantic model in Direct Lake mode, built on the snapshot tables. |
The cardinal rule: the snapshot table is a separate artifact from the source semantic model. Do not try to add a "history" table to the original model. Keep snapshots in a lakehouse, build a second Direct Lake semantic model on top of the snapshot Delta tables, and let report authors query that second model. Mixing the two breaks both refresh patterns and security.
OneLake integration on a semantic model exports the latest version of each table to OneLake as a Delta table. It is useful for downstream Spark workloads that want the current state without writing a separate ETL job. But the export overwrites the previous version every refresh. Quoting Microsoft Learn, as of 2026-05: measures, DirectQuery tables, hybrid tables, and aggregation tables cannot be exported, and "only a single version of the delta tables is exported and stored on OneLake." It is a mirror, not a history. Snapshots need an explicit, dated write.
Delta Lake time travel lets you query a table as of a previous version or timestamp. It works on the raw Delta tables that back a lakehouse, and it is genuinely useful for short-window debugging. It is not a snapshot mechanism for two reasons. First, version history is bounded by retention policy: once VACUUM runs (default seven days), older versions are gone. Second, time travel queries every old version individually. It cannot answer "show me the value on the 28th of every month for two years" in a single query without a separately materialized table. Use time travel when something broke yesterday and you want to see the table before the bad load. Use a snapshot table for anything you need to keep beyond the retention window.
Quick answer: Create a Fabric notebook in the workspace that hosts the lakehouse, attach the lakehouse as default, use sempy.fabric to read tables or run DAX against the source semantic model, add a snapshot_date column with pyspark.sql.functions.current_date(), and append to a Delta table.
The minimum viable snapshot is about thirty lines of PySpark. The pattern that follows assumes you have a semantic model called fin_core and a lakehouse called lh_snapshots in the same workspace.
sempy is pre-installed in every Fabric notebook runtime (Microsoft Learn — Read data from semantic models). Attach lh_snapshots as the notebook's default lakehouse, then in the first cell:
import sempy.fabric as fabric
from pyspark.sql import functions as F
from delta.tables import DeltaTableTwo functions cover almost every snapshot use case. read_table returns a model table as a pandas DataFrame. evaluate_dax runs an arbitrary DAX query and returns a FabricDataFrame. Use the second one when the business logic lives in measures rather than raw tables.
dim_customer = fabric.read_table(
dataset="fin_core",
table="DimCustomer",
)
dax_query = """
EVALUATE
SUMMARIZECOLUMNS(
DimEntity[EntityCode],
DimDate[YearMonth],
"DSO", [DSO Days],
"WorkingCapitalAED", [Working Capital AED]
)
"""
kpi_snapshot = fabric.evaluate_dax(
dataset="fin_core",
dax_string=dax_query,
)Reading measures via evaluate_dax is usually the right move for period-end KPI snapshots. The DAX engine applies the same business logic the dashboard uses, so your snapshot value matches the dashboard value on the day the snapshot was taken. Reading raw tables is the right move for SCD snapshots, where you want the dimension exactly as the source model holds it.
Convert to a Spark DataFrame and add snapshot_date. Use current_date() for daily snapshots; for end-of-month or end-of-quarter jobs, pass the run date in as a notebook parameter so reruns are deterministic.
sdf = spark.createDataFrame(kpi_snapshot).withColumn(
"snapshot_date", F.current_date()
)First run writes the table with snapshot_date as the partition column. Subsequent runs append.
target = "Tables/snap_kpi_monthly"
if not spark.catalog.tableExists("snap_kpi_monthly"):
sdf.write.format("delta").partitionBy("snapshot_date").save(target)
else:
sdf.write.format("delta").mode("append").save(target)Partitioning by snapshot_date is non-negotiable for tables that will hold years of history. It keeps point-in-time queries cheap and lets VACUUM and OPTIMIZE work on individual partitions.
A pure append works for KPI history because every period gets exactly one row. For SCD Type 2, where you only want to write a new dimension row when an attribute actually changed, use Delta MERGE:
target_table = DeltaTable.forPath(spark, target)
target_table.alias("t").merge(
sdf.alias("s"),
"t.customer_key = s.customer_key AND t.is_current = true",
).whenMatchedUpdate(
condition="t.credit_limit_aed <> s.credit_limit_aed OR t.segment <> s.segment",
set={"is_current": "false", "valid_to": "s.snapshot_date"},
).whenNotMatchedInsert(
values={
"customer_key": "s.customer_key",
"credit_limit_aed": "s.credit_limit_aed",
"segment": "s.segment",
"valid_from": "s.snapshot_date",
"valid_to": "null",
"is_current": "true",
},
).execute()That is the pattern the Fabric community settled on for SCD Type 2 inside a lakehouse. It works because Delta MERGE is supported in lakehouses; it is not yet supported in Fabric warehouses, which is one of the reasons snapshot infrastructure belongs in a lakehouse.
Quick answer: Wrap the notebook in a Fabric data pipeline, schedule the pipeline on the cadence the business needs, and parameterize the snapshot date so reruns and back-fills are deterministic.
Notebooks should not schedule themselves. In Fabric, the supported approach is a data pipeline that includes a Notebook activity (Microsoft Learn — Data ingestion options for a lakehouse). Pipelines support cron-style schedules, run history, retry policies, and email-on-failure. Stand-alone notebook schedules have none of that.
A typical setup for a finance KPI snapshot:
| Cadence | Trigger | Parameters passed in |
|---|---|---|
| Daily | Pipeline scheduled at 02:00 GST | snapshot_date = today |
| Month-end | Pipeline scheduled at 23:30 GST on last weekday | snapshot_date = today |
| Quarter-end | Same pipeline, quarter-end calendar trigger | snapshot_date = today |
| Manual reload | Pipeline triggered ad-hoc with explicit date param | snapshot_date = 2026-04-30 |
Passing the date in as a parameter means a failed month-end run can be re-executed on day +1 with the correct historical date, not the rerun date. This is the single most common cause of "the dashboard says the wrong thing for March" tickets.
For dimensions with under a million rows, snapshot the whole table on every run. Storage is cheap and the logic is simple. For larger fact tables or transaction-level snapshots, switch to incremental: only insert rows where last_modified_timestamp > last_snapshot_run_timestamp. Maintain a tiny _snapshot_runs Delta table that records the watermark per source table.
The OneLake bill is driven by storage, not the number of snapshots, so the typical question is "how long do we keep this." A defensible rule:
Implement retention with a separate maintenance notebook that runs OPTIMIZE and VACUUM on each snapshot partition, and DELETE on partitions older than the retention window. Schedule it weekly. Microsoft Learn — Delta Table Maintenance covers the exact commands.
Quick answer: Build a second semantic model in Direct Lake mode on top of the snapshot lakehouse, expose snapshot_date as a date dimension, and create measures that filter to the snapshot the user asked for (month-end, quarter-end, or "as of" a parameter).
This is the step most teams get wrong. They build the snapshot tables, then add them as new tables to the original fin_core model. The original model is now doing two jobs (live reporting and historical reporting) and the refresh logic, security, and Direct Lake framing all get tangled.
Build a second semantic model. Call it something obvious like fin_history. Point it at the snapshot Delta tables in lh_snapshots. Use Direct Lake mode so the snapshot tables stay queryable without copying data again (Microsoft Learn — Direct Lake overview).
A minimal measure set looks like this:
KPI Value as of Snapshot =
CALCULATE(
SUM ( 'snap_kpi_monthly'[Value] ),
'snap_kpi_monthly'[snapshot_date] = SELECTEDVALUE ( 'DimDate'[Date] )
)
Latest Snapshot Date =
MAX ( 'snap_kpi_monthly'[snapshot_date] )
KPI Value as of Latest =
CALCULATE(
SUM ( 'snap_kpi_monthly'[Value] ),
'snap_kpi_monthly'[snapshot_date] = [Latest Snapshot Date]
)The pattern that produces the most-used reports: a Power BI page with a single slicer for snapshot_date, KPI cards using the "as of" measures, and a line chart that plots the measure across every snapshot date in the selected range. The same model can serve "as of yesterday," "as of last month-end," and "as of the same date last year" without any additional infrastructure.
Quick answer: Snapshot storage in OneLake is cheap (around USD 0.023 per GB per month in Azure UAE North as of 2026-05), but the compute to run the snapshot notebook is the real cost driver. Keep notebook sessions short, use serverless Spark pools, and aggregate granular snapshots to coarser ones on a retention schedule.
Three cost levers, in rough order of impact.
A snapshot notebook session that runs for 4 minutes on a small Spark pool is cheap. The same notebook left attached to a session because someone forgot to detach is not. Set a session timeout of 20 minutes on the workspace. Use serverless Spark pools (now the default in Fabric) rather than pinned starter pools, because serverless pools shut down between runs.
Granular snapshots become low-value over time. A daily customer-credit-limit snapshot is useful for 90 days, then the monthly snapshot covers anything you actually need to ask. Compact the daily snapshots into a monthly aggregate after the 90-day window. OneLake storage is cheap, but it is not free, and the table-maintenance notebooks have to scan everything you keep.
The reporting model on top of the snapshot tables loads columns into memory on demand. Wide snapshot tables with 200 columns will fall back to slow paths or hit capacity limits. Build the reporting model on a narrow projection of the snapshot table (only the columns the reports actually use), even if the underlying snapshot is wide.
Quick answer: The four mistakes that cost BTA clients the most rework are: trying to snapshot from the same model that reports run on, partitioning by an irrelevant column, snapshotting measures with date filters baked in, and forgetting that semantic model security context follows the notebook's identity.
Adding history tables to the source semantic model makes refresh times explode and forces report authors to navigate a model that is doing two unrelated jobs. Keep snapshots in a lakehouse. Build a second model on top.
customer_id Instead of snapshot_dateSnapshot tables are almost always queried by date range. Partitioning by entity ID makes the queries that the business actually runs slow. Partition by snapshot_date.
A measure defined as CALCULATE([Sales], FILTER('DimDate', 'DimDate'[Date] = TODAY())) will hard-code today's date into the snapshot row, defeating the point. When writing snapshot DAX, strip out built-in date filters and let the snapshot's snapshot_date column hold the temporal context.
The notebook runs as either the workspace identity, a service principal, or the user who triggered it. If the source semantic model has row-level security, the data the snapshot captures depends on whose identity executed the DAX. Use a dedicated service principal for snapshot jobs, document the security context, and confirm that the principal has read access to all rows the business expects to be snapshotted.
No, not for historical snapshots. OneLake integration exports the current state of a model's tables to a single Delta table in OneLake on every refresh, overwriting the previous version. As of 2026-05, Microsoft Learn confirms that "only a single version of the delta tables is exported and stored on OneLake, with old versions being deleted after a successful export." Use OneLake integration when you want downstream Spark jobs to read the current state of the model. Use a snapshot notebook when you need history.
No. Time travel is bounded by the table's VACUUM retention (seven days by default), and it queries one historical version at a time. It is fine for short-window debugging like "show me this table as it was yesterday before the bad load ran." It cannot serve queries like "show me the value at the end of every month for the last two years" without a separately materialized table that holds those points in time.
Match the business question. Daily for operational dashboards that need rolling history. End-of-month for financial KPIs that have to tie to filed statements. End-of-quarter for board reporting. Real-time snapshotting is almost never what the business actually needs. If it is, you are looking at a real-time analytics workload rather than a snapshot workload, and the Fabric Real-Time Intelligence walkthrough is the better reference.
Yes. sempy.evaluate_dax runs the DAX query against the model the same way Power BI does. If the model is in DirectQuery mode, the query hits the underlying source at snapshot time. The trade-off is that the snapshot's accuracy depends on the source being available and consistent at the moment the notebook runs. For end-of-month KPIs, schedule the snapshot for a quiet window on the source system, typically after the daily ETL has settled.
Yes, provided the Fabric capacity is deployed in a compliant region. Azure UAE North is the supported region for UAE data residency requirements, and Saudi Central for in-country Saudi workloads. Snapshot tables stored in OneLake inherit the region of the capacity that hosts the workspace. Do not assume cross-region access. A workspace hosted in West Europe will hold snapshot data outside the GCC, which is a problem for both PDPL Article 22 and NDMO data classification.
You cannot back-fill the past from the current state of the source. The data has already moved on. The only honest options are: (1) reconstruct from any audit logs or change-data-capture history the source system already keeps, or (2) start capturing snapshots from today forward and accept the gap. This is the strongest argument for setting up snapshot infrastructure on day one of a Fabric deployment, not after the first "what did this look like in February" question.
Microsoft Partner · Dubai
Your business intelligence partner for the GCC
Have a data challenge or a project in mind? Reach out and let's explore how we can help.
Clients we've worked with






