2026-03-05 19:13:15 +00:00
2026-03-05 19:12:33 +00:00
2026-03-05 10:38:09 +00:00
2026-03-04 16:59:56 +00:00
2026-03-05 19:12:33 +00:00
2026-03-04 10:03:48 +00:00
2026-03-04 10:24:13 +00:00
2026-03-05 17:23:41 +00:00
2026-03-04 10:17:20 +00:00
2026-03-05 17:23:41 +00:00
2026-03-04 22:18:30 +00:00
2026-03-05 10:41:44 +00:00
2026-03-05 10:41:44 +00:00
2026-03-04 22:18:30 +00:00
2026-03-04 22:18:30 +00:00

data-platform

A personal data platform for ingesting, transforming and analysing external data sources. Built with Dagster, dbt and PostgreSQL, managed with uv and deployed via Docker Compose.

Stack

Layer Tool
Orchestration Dagster (webserver + daemon)
Transformation dbt-core + dbt-postgres
Storage PostgreSQL 16
Observability Elementary (report served via nginx)
CI GitHub Actions (Ruff, SQLFluff, Prettier, pytest)
Package / venv uv
Infrastructure Docker Compose

Data pipeline

Ingestion

Python-based Dagster assets fetch data from external APIs and load it into a raw PostgreSQL schema using upsert semantics. Assets are prefixed with raw_ to clearly separate unmodelled data from transformed outputs. They are chained via explicit dependencies and run on a recurring cron schedule.

Transformation

dbt models follow the staging → intermediate → marts layering pattern:

Layer Materialisation Purpose
Staging view 1:1 cleaning of each raw table with enforced contracts
Intermediate view Business-logic joins and enrichment across staging models
Marts incremental Analysis-ready tables with derived metrics, loaded incrementally

Mart models use dbt's incremental materialisation — on each run only rows with a newer ingestion timestamp are merged into the target table. On a full-refresh the entire table is rebuilt.

All staging and intermediate models have dbt contracts enforced, locking column names and data types to prevent silent schema drift.

Data quality

  • dbt tests: uniqueness, not-null, accepted values, referential integrity and expression-based tests run as part of every dbt build.
  • Source freshness: a scheduled job verifies raw tables haven't gone stale.
  • Elementary: collects test results and generates an HTML observability report served via nginx.

Scheduling & automation

Ingestion assets run on cron schedules managed by the Dagster daemon. Downstream dbt models use eager auto-materialisation: whenever an upstream raw asset completes, Dagster automatically triggers the dbt build for all dependent staging, intermediate and mart models.

Assets tagged "manual" are excluded from auto-materialisation and only run when triggered explicitly. A guard condition (~any_deps_in_progress) prevents duplicate runs while upstream assets are still materialising.

Project layout

data_platform/              # Dagster Python package
  assets/
    dbt.py                  # @dbt_assets definition
    ingestion/              # Raw ingestion assets + SQL templates
  helpers/                  # Shared utilities (SQL rendering, formatting, automation)
  jobs/                     # Job definitions
  schedules/                # Schedule definitions
  resources/                # Dagster resources (API clients, Postgres)
  definitions.py            # Main Definitions entry point
dbt/                        # dbt project
  models/
    staging/                # 1:1 views on raw tables + source definitions
    intermediate/           # Enrichment joins
    marts/                  # Incremental analysis-ready tables
  macros/                   # Custom schema generation, Elementary compat
  profiles.yml              # Reads credentials from env vars
dagster_home/               # dagster.yaml + workspace.yaml
tests/                      # pytest test suite
nginx/                      # Elementary report nginx config
docker-compose.yaml         # All services
Dockerfile                  # Multi-stage: usercode + dagster-infra
Makefile                    # Developer shortcuts

Getting started

# 1. Install uv (if not already)
curl -Lsf https://astral.sh/uv/install.sh | sh

# 2. Clone and enter the project
cd ~/git/data-platform

# 3. Create your credentials file
cp .env.example .env        # edit .env with your passwords

# 4. Install dependencies into a local venv
uv sync

# 5. Generate the dbt manifest (needed before first run)
uv run dbt deps  --project-dir dbt --profiles-dir dbt
uv run dbt parse --project-dir dbt --profiles-dir dbt

# 6. Start all services
docker compose up -d --build

# 7. Open the Dagster UI
#    http://localhost:3000

Local development

uv sync
source .venv/bin/activate

# Run the Dagster UI locally
DAGSTER_HOME=$PWD/dagster_home dagster dev

# Useful Make targets
make validate          # Check Dagster definitions load
make lint              # Ruff + SQLFluff + Prettier
make lint-fix          # Auto-fix all linters
make test              # pytest
make reload-code       # Rebuild + restart user-code container

Services

Service URL
Dagster UI http://localhost:3000
pgAdmin http://localhost:5050
Elementary report http://localhost:8080
Description
Complete data platform solution using Dagster, dbt, MLFlow and Postgres
Readme MIT 707 KiB
Languages
Python 94.4%
Makefile 3.5%
Shell 1.2%
Dockerfile 0.9%