Dbt Transformation Patterns

Build production-ready data pipelines with dbt best practices

✨ The solution you've been looking for

Verified
Tested and verified by our team
25450 Stars

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

dbt data-engineering analytics-engineering sql data-transformation data-modeling etl medallion-architecture
Repository

See It In Action

Interactive preview & real-world examples

Live Demo
Skill Demo Animation

AI Conversation Simulator

See how users interact with this skill

User Prompt

Help me structure a new dbt project for e-commerce analytics with customer and order data

Skill Processing

Analyzing request...

Agent Response

Complete project structure with proper folder organization, naming conventions, and layer separation

Quick Start (3 Steps)

Get up and running in minutes

1

Install

claude-code skill install dbt-transformation-patterns

claude-code skill install dbt-transformation-patterns
2

Config

3

First Trigger

@dbt-transformation-patterns help

Commands

CommandDescriptionRequired Args
@dbt-transformation-patterns setting-up-dbt-project-structureOrganize models into staging, intermediate, and marts layers following medallion architectureNone
@dbt-transformation-patterns creating-incremental-modelsBuild efficient incremental models for processing large datasets with proper merge strategiesNone
@dbt-transformation-patterns implementing-data-quality-testsSet up comprehensive testing for data models including custom business logic testsNone

Typical Use Cases

Setting up dbt project structure

Organize models into staging, intermediate, and marts layers following medallion architecture

Creating incremental models

Build efficient incremental models for processing large datasets with proper merge strategies

Implementing data quality tests

Set up comprehensive testing for data models including custom business logic tests

Overview

dbt Transformation Patterns

Production-ready patterns for dbt (data build tool) including model organization, testing strategies, documentation, and incremental processing.

When to Use This Skill

  • Building data transformation pipelines with dbt
  • Organizing models into staging, intermediate, and marts layers
  • Implementing data quality tests
  • Creating incremental models for large datasets
  • Documenting data models and lineage
  • Setting up dbt project structure

Core Concepts

1. Model Layers (Medallion Architecture)

sources/          Raw data definitions
    ↓
staging/          1:1 with source, light cleaning
    ↓
intermediate/     Business logic, joins, aggregations
    ↓
marts/            Final analytics tables

2. Naming Conventions

LayerPrefixExample
Stagingstg_stg_stripe__payments
Intermediateint_int_payments_pivoted
Martsdim_, fct_dim_customers, fct_orders

Quick Start

 1# dbt_project.yml
 2name: "analytics"
 3version: "1.0.0"
 4profile: "analytics"
 5
 6model-paths: ["models"]
 7analysis-paths: ["analyses"]
 8test-paths: ["tests"]
 9seed-paths: ["seeds"]
10macro-paths: ["macros"]
11
12vars:
13  start_date: "2020-01-01"
14
15models:
16  analytics:
17    staging:
18      +materialized: view
19      +schema: staging
20    intermediate:
21      +materialized: ephemeral
22    marts:
23      +materialized: table
24      +schema: analytics
# Project structure
models/
├── staging/
│   ├── stripe/
│   │   ├── _stripe__sources.yml
│   │   ├── _stripe__models.yml
│   │   ├── stg_stripe__customers.sql
│   │   └── stg_stripe__payments.sql
│   └── shopify/
│       ├── _shopify__sources.yml
│       └── stg_shopify__orders.sql
├── intermediate/
│   └── finance/
│       └── int_payments_pivoted.sql
└── marts/
    ├── core/
    │   ├── _core__models.yml
    │   ├── dim_customers.sql
    │   └── fct_orders.sql
    └── finance/
        └── fct_revenue.sql

Patterns

Pattern 1: Source Definitions

 1# models/staging/stripe/_stripe__sources.yml
 2version: 2
 3
 4sources:
 5  - name: stripe
 6    description: Raw Stripe data loaded via Fivetran
 7    database: raw
 8    schema: stripe
 9    loader: fivetran
10    loaded_at_field: _fivetran_synced
11    freshness:
12      warn_after: { count: 12, period: hour }
13      error_after: { count: 24, period: hour }
14    tables:
15      - name: customers
16        description: Stripe customer records
17        columns:
18          - name: id
19            description: Primary key
20            tests:
21              - unique
22              - not_null
23          - name: email
24            description: Customer email
25          - name: created
26            description: Account creation timestamp
27
28      - name: payments
29        description: Stripe payment transactions
30        columns:
31          - name: id
32            tests:
33              - unique
34              - not_null
35          - name: customer_id
36            tests:
37              - not_null
38              - relationships:
39                  to: source('stripe', 'customers')
40                  field: id

Pattern 2: Staging Models

 1-- models/staging/stripe/stg_stripe__customers.sql
 2with source as (
 3    select * from {{ source('stripe', 'customers') }}
 4),
 5
 6renamed as (
 7    select
 8        -- ids
 9        id as customer_id,
10
11        -- strings
12        lower(email) as email,
13        name as customer_name,
14
15        -- timestamps
16        created as created_at,
17
18        -- metadata
19        _fivetran_synced as _loaded_at
20
21    from source
22)
23
24select * from renamed
 1-- models/staging/stripe/stg_stripe__payments.sql
 2{{
 3    config(
 4        materialized='incremental',
 5        unique_key='payment_id',
 6        on_schema_change='append_new_columns'
 7    )
 8}}
 9
10with source as (
11    select * from {{ source('stripe', 'payments') }}
12
13    {% if is_incremental() %}
14    where _fivetran_synced > (select max(_loaded_at) from {{ this }})
15    {% endif %}
16),
17
18renamed as (
19    select
20        -- ids
21        id as payment_id,
22        customer_id,
23        invoice_id,
24
25        -- amounts (convert cents to dollars)
26        amount / 100.0 as amount,
27        amount_refunded / 100.0 as amount_refunded,
28
29        -- status
30        status as payment_status,
31
32        -- timestamps
33        created as created_at,
34
35        -- metadata
36        _fivetran_synced as _loaded_at
37
38    from source
39)
40
41select * from renamed

Pattern 3: Intermediate Models

 1-- models/intermediate/finance/int_payments_pivoted_to_customer.sql
 2with payments as (
 3    select * from {{ ref('stg_stripe__payments') }}
 4),
 5
 6customers as (
 7    select * from {{ ref('stg_stripe__customers') }}
 8),
 9
10payment_summary as (
11    select
12        customer_id,
13        count(*) as total_payments,
14        count(case when payment_status = 'succeeded' then 1 end) as successful_payments,
15        sum(case when payment_status = 'succeeded' then amount else 0 end) as total_amount_paid,
16        min(created_at) as first_payment_at,
17        max(created_at) as last_payment_at
18    from payments
19    group by customer_id
20)
21
22select
23    customers.customer_id,
24    customers.email,
25    customers.created_at as customer_created_at,
26    coalesce(payment_summary.total_payments, 0) as total_payments,
27    coalesce(payment_summary.successful_payments, 0) as successful_payments,
28    coalesce(payment_summary.total_amount_paid, 0) as lifetime_value,
29    payment_summary.first_payment_at,
30    payment_summary.last_payment_at
31
32from customers
33left join payment_summary using (customer_id)

Pattern 4: Mart Models (Dimensions and Facts)

 1-- models/marts/core/dim_customers.sql
 2{{
 3    config(
 4        materialized='table',
 5        unique_key='customer_id'
 6    )
 7}}
 8
 9with customers as (
10    select * from {{ ref('int_payments_pivoted_to_customer') }}
11),
12
13orders as (
14    select * from {{ ref('stg_shopify__orders') }}
15),
16
17order_summary as (
18    select
19        customer_id,
20        count(*) as total_orders,
21        sum(total_price) as total_order_value,
22        min(created_at) as first_order_at,
23        max(created_at) as last_order_at
24    from orders
25    group by customer_id
26),
27
28final as (
29    select
30        -- surrogate key
31        {{ dbt_utils.generate_surrogate_key(['customers.customer_id']) }} as customer_key,
32
33        -- natural key
34        customers.customer_id,
35
36        -- attributes
37        customers.email,
38        customers.customer_created_at,
39
40        -- payment metrics
41        customers.total_payments,
42        customers.successful_payments,
43        customers.lifetime_value,
44        customers.first_payment_at,
45        customers.last_payment_at,
46
47        -- order metrics
48        coalesce(order_summary.total_orders, 0) as total_orders,
49        coalesce(order_summary.total_order_value, 0) as total_order_value,
50        order_summary.first_order_at,
51        order_summary.last_order_at,
52
53        -- calculated fields
54        case
55            when customers.lifetime_value >= 1000 then 'high'
56            when customers.lifetime_value >= 100 then 'medium'
57            else 'low'
58        end as customer_tier,
59
60        -- timestamps
61        current_timestamp as _loaded_at
62
63    from customers
64    left join order_summary using (customer_id)
65)
66
67select * from final
 1-- models/marts/core/fct_orders.sql
 2{{
 3    config(
 4        materialized='incremental',
 5        unique_key='order_id',
 6        incremental_strategy='merge'
 7    )
 8}}
 9
10with orders as (
11    select * from {{ ref('stg_shopify__orders') }}
12
13    {% if is_incremental() %}
14    where updated_at > (select max(updated_at) from {{ this }})
15    {% endif %}
16),
17
18customers as (
19    select * from {{ ref('dim_customers') }}
20),
21
22final as (
23    select
24        -- keys
25        orders.order_id,
26        customers.customer_key,
27        orders.customer_id,
28
29        -- dimensions
30        orders.order_status,
31        orders.fulfillment_status,
32        orders.payment_status,
33
34        -- measures
35        orders.subtotal,
36        orders.tax,
37        orders.shipping,
38        orders.total_price,
39        orders.total_discount,
40        orders.item_count,
41
42        -- timestamps
43        orders.created_at,
44        orders.updated_at,
45        orders.fulfilled_at,
46
47        -- metadata
48        current_timestamp as _loaded_at
49
50    from orders
51    left join customers on orders.customer_id = customers.customer_id
52)
53
54select * from final

Pattern 5: Testing and Documentation

 1# models/marts/core/_core__models.yml
 2version: 2
 3
 4models:
 5  - name: dim_customers
 6    description: Customer dimension with payment and order metrics
 7    columns:
 8      - name: customer_key
 9        description: Surrogate key for the customer dimension
10        tests:
11          - unique
12          - not_null
13
14      - name: customer_id
15        description: Natural key from source system
16        tests:
17          - unique
18          - not_null
19
20      - name: email
21        description: Customer email address
22        tests:
23          - not_null
24
25      - name: customer_tier
26        description: Customer value tier based on lifetime value
27        tests:
28          - accepted_values:
29              values: ["high", "medium", "low"]
30
31      - name: lifetime_value
32        description: Total amount paid by customer
33        tests:
34          - dbt_utils.expression_is_true:
35              expression: ">= 0"
36
37  - name: fct_orders
38    description: Order fact table with all order transactions
39    tests:
40      - dbt_utils.recency:
41          datepart: day
42          field: created_at
43          interval: 1
44    columns:
45      - name: order_id
46        tests:
47          - unique
48          - not_null
49      - name: customer_key
50        tests:
51          - not_null
52          - relationships:
53              to: ref('dim_customers')
54              field: customer_key

Pattern 6: Macros and DRY Code

 1-- macros/cents_to_dollars.sql
 2{% macro cents_to_dollars(column_name, precision=2) %}
 3    round({{ column_name }} / 100.0, {{ precision }})
 4{% endmacro %}
 5
 6-- macros/generate_schema_name.sql
 7{% macro generate_schema_name(custom_schema_name, node) %}
 8    {%- set default_schema = target.schema -%}
 9    {%- if custom_schema_name is none -%}
10        {{ default_schema }}
11    {%- else -%}
12        {{ default_schema }}_{{ custom_schema_name }}
13    {%- endif -%}
14{% endmacro %}
15
16-- macros/limit_data_in_dev.sql
17{% macro limit_data_in_dev(column_name, days=3) %}
18    {% if target.name == 'dev' %}
19        where {{ column_name }} >= dateadd(day, -{{ days }}, current_date)
20    {% endif %}
21{% endmacro %}
22
23-- Usage in model
24select * from {{ ref('stg_orders') }}
25{{ limit_data_in_dev('created_at') }}

Pattern 7: Incremental Strategies

 1-- Delete+Insert (default for most warehouses)
 2{{
 3    config(
 4        materialized='incremental',
 5        unique_key='id',
 6        incremental_strategy='delete+insert'
 7    )
 8}}
 9
10-- Merge (best for late-arriving data)
11{{
12    config(
13        materialized='incremental',
14        unique_key='id',
15        incremental_strategy='merge',
16        merge_update_columns=['status', 'amount', 'updated_at']
17    )
18}}
19
20-- Insert Overwrite (partition-based)
21{{
22    config(
23        materialized='incremental',
24        incremental_strategy='insert_overwrite',
25        partition_by={
26            "field": "created_date",
27            "data_type": "date",
28            "granularity": "day"
29        }
30    )
31}}
32
33select
34    *,
35    date(created_at) as created_date
36from {{ ref('stg_events') }}
37
38{% if is_incremental() %}
39where created_date >= dateadd(day, -3, current_date)
40{% endif %}

dbt Commands

 1# Development
 2dbt run                          # Run all models
 3dbt run --select staging         # Run staging models only
 4dbt run --select +fct_orders     # Run fct_orders and its upstream
 5dbt run --select fct_orders+     # Run fct_orders and its downstream
 6dbt run --full-refresh           # Rebuild incremental models
 7
 8# Testing
 9dbt test                         # Run all tests
10dbt test --select stg_stripe     # Test specific models
11dbt build                        # Run + test in DAG order
12
13# Documentation
14dbt docs generate                # Generate docs
15dbt docs serve                   # Serve docs locally
16
17# Debugging
18dbt compile                      # Compile SQL without running
19dbt debug                        # Test connection
20dbt ls --select tag:critical     # List models by tag

Best Practices

Do’s

  • Use staging layer - Clean data once, use everywhere
  • Test aggressively - Not null, unique, relationships
  • Document everything - Column descriptions, model descriptions
  • Use incremental - For tables > 1M rows
  • Version control - dbt project in Git

Don’ts

  • Don’t skip staging - Raw → mart is tech debt
  • Don’t hardcode dates - Use {{ var('start_date') }}
  • Don’t repeat logic - Extract to macros
  • Don’t test in prod - Use dev target
  • Don’t ignore freshness - Monitor source data

Resources

What Users Are Saying

Real feedback from the community

Environment Matrix

Dependencies

dbt Core 1.0+
Data warehouse connection (Snowflake, BigQuery, Redshift, etc.)
dbt-utils package (recommended)

Framework Support

dbt Core ✓ (recommended) dbt Cloud ✓ Snowflake ✓ BigQuery ✓ Redshift ✓ PostgreSQL ✓

Context Window

Token Usage ~5K-15K tokens for complex transformation patterns

Security & Privacy

Information

Author
wshobson
Updated
2026-01-30
Category
productivity-tools