DBT Tutorial

What is DBT?

DBT (Data Build Tool) is an open-source analytics engineering tool that enables data analysts and engineers to transform data in the warehouse more effectively. With DBT, you can write modular SQL, version control it with Git, test it, and deploy it like code.

It allows the T in ELT (Extract, Load, Transform). You extract and load data using tools like Fivetran or Airbyte, and transform it in your warehouse using DBT.

Why Use DBT?

Feature Benefit
SQL-Based No need to learn Python or Scala
Modular Split queries into reusable models
Git Integration Version control your transformations
Testing Validate data quality with built-in assertions
Documentation Auto-generates model lineage & docs
Deployment CI/CD, scheduler, and production monitoring

DBT supports major warehouses like Snowflake, BigQuery, Redshift, Databricks, Postgres.

DBT Architecture Overview

DBT operates in the data transformation layer. Here’s how a typical ELT flow works with DBT:

plaintext
1. Data Source → 2. ETL Tool (e.g., Fivetran) → 3. Warehouse → 4. DBT → 5. BI Tool (e.g., Looker, Tableau)

DBT sits between your warehouse and BI tools. It helps to:

  • Clean and transform raw data

  • Apply business logic

  • Create unified models for analytics

DBT Core vs DBT Cloud

Feature DBT Core DBT Cloud
Free Limited free tier
Local Execution
Web UI
Job Scheduling
CI/CD Manual Built-in
IDE Terminal In-browser SQL editor

Use DBT Core for local/dev environments, and DBT Cloud for team collaboration and production deployment.

Installing DBT Core

You can install DBT Core using pip:

bash
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-bigquery

Then initialize a project:

bash
dbt init my_project cd my_project

This creates:

  • /models: where your SQL models live

  • /dbt_project.yml: your project settings

  • /profiles.yml: connection settings (in ~/.dbt/)

  • Also Read: DBT Interview Questions

DBT Project Structure

plaintext
my_project/ │ ├── dbt_project.yml ├── models/ │ ├── staging/ │ ├── marts/ │ └── core/ ├── macros/ ├── snapshots/ ├── tests/ ├── seeds/ └── analysis/
  • models/: Core of your transformation logic

  • snapshots/: For slowly changing dimensions

  • seeds/: CSV files loaded into the warehouse

  • macros/: Jinja functions and reusable logic

DBT Model Basics

What is a Model?

A model is a SQL file (e.g., stg_orders.sql) that transforms data. Example:

sql
-- models/stg_orders.sql SELECT id AS order_id, customer_id, order_date FROM raw.orders

Run it with:

bash
dbt run --select stg_orders

The output table will be created in your warehouse.

Ref and DAG

Use ref() to define dependencies between models:

sql
SELECT * FROM {{ ref('stg_orders') }}

This tells DBT to build the DAG (Directed Acyclic Graph) automatically.

Materializations

Type Description
view Default, lightweight
table Materialized as a full table
incremental Adds only new records
ephemeral Temporary CTE, never materialized

In dbt_project.yml:

yaml
models: my_project: staging: materialized: view

DBT Testing

You can test your models using:

Schema Tests (YAML)

yaml
version: 2 models: - name: stg_orders columns: - name: order_id tests: - not_null - unique

Custom Tests

Create a SQL file in /tests:

sql
-- tests/no_future_orders.sql SELECT * FROM {{ ref('stg_orders') }} WHERE order_date > current_date

Run tests with:

bash
dbt test

Jinja and Macros in DBT

DBT uses Jinja templating in SQL. Common uses:

  • {{ ref('model') }} – model dependency

  • {{ config(...) }} – model-level configs

  • {{ var('env') }} – use of variables

  • {{ macros.custom_macro(...) }} – custom logic

Macro Example

sql
-- macros/calculate_days.sql {% macro calculate_days(start_col, end_col) %} DATEDIFF(day, {{ start_col }}, {{ end_col }}) {% endmacro %}

Use in model:

sql
SELECT {{ calculate_days('created_at', 'updated_at') }} AS duration

DBT Documentation

Generate documentation:

bash
dbt docs generate dbt docs serve

This gives a UI with:

  • Model lineage (DAG)

  • Descriptions from YAML

  • Test coverage

  • Column-level metadata

DBT Snapshots

Used for slowly changing dimensions (SCDs).

Example:

sql
-- snapshots/snap_customers.sql {% snapshot snap_customers %} { "target_schema": "snapshots", "unique_key": "customer_id", "strategy": "timestamp", "updated_at": "updated_at" } SELECT * FROM {{ source('raw', 'customers') }} {% endsnapshot %}

Run:

bash
dbt snapshot

DBT Sources

Define raw data as sources:

yaml
version: 2 sources: - name: raw tables: - name: orders description: "Raw order data"

Use in model:

sql
SELECT * FROM {{ source('raw', 'orders') }}

DBT Seeds

Seeds are static CSVs loaded into your warehouse.

Example:

data/countries.csv:

csv
id,name 1,USA 2,India

Load with:

bash
dbt seed

Access in models:

sql
SELECT * FROM {{ ref('countries') }}

DBT Incremental Models

Use is_incremental() in SQL:

sql
-- models/fact_orders.sql {{ config(materialized='incremental') }} SELECT * FROM {{ ref('stg_orders') }} {% if is_incremental() %} WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }}) {% endif %}

Saves time on large datasets.

DBT Deployment & CI/CD

In DBT Cloud, set up:

  • Jobs: scheduled runs

  • Environments: dev, staging, prod

  • CI/CD: auto-test on pull request

Use GitHub/GitLab integration for CI:

  • Run dbt run on merge

  • Validate with dbt test

Popular DBT Packages

  • dbt_utils: helper macros (must-have)

  • dbt_expectations: test library

  • audit_helper: audit trail macros

  • codegen: auto-generate model skeletons

Install in packages.yml:

yaml
packages: - package: dbt-labs/dbt_utils version: 1.0.0

DBT Real-Time Project Flow

  1. Load data from Fivetran into Snowflake

  2. Use DBT to create:

    • Staging models (stg_)

    • Fact/dim models (dim_, fact_)

  3. Add tests, snapshots, docs

  4. Schedule in DBT Cloud

  5. Connect output to Looker or Tableau

DBT Best Practices

  • Use clear naming (stg_, fct_, dim_)

  • Always use ref()

  • Document everything in schema.yml

  • Break logic into layers: staging → intermediate → marts

  • Add tests early

  • Use is_incremental() for large datasets

  • Avoid SELECT * in production

DBT Job Roles and Use Cases

Role How They Use DBT
Data Engineer Build source and staging models
Analytics Engineer Build business logic and marts
Analyst Query clean models
Product Manager View data lineage and documentation

DBT Interview Questions Preview

  1. What is the role of DBT in the modern data stack?

  2. Difference between ref() and source()?

  3. Explain incremental model in DBT.

  4. How do you implement tests in DBT?

  5. What is the use of the is_incremental() macro?

For more, visit our DBT Interview Questions blog.

Final Thoughts

DBT has transformed how data teams build and manage transformations—bringing the best of software engineering into data. From modular SQL to automated testing and documentation, DBT is a must-learn tool for modern data professionals

— Latest Posts
— Upcoming

Join our community

ABOUT US

eLearnCourses is a trusted destination for high-quality, industry-relevant online IT training. We are committed to empowering learners and professionals with the practical skills and knowledge they need to succeed in the digital era.

Training Delivered In Cities/Countries: Hyderabad, Bangalore, Mumbai, Delhi, Chennai, Pune, Texas, California, New Jersey, Virginia, London, Dubai (UAE), Toronto, Melbourne, Sydney, Singapore, and many more.

Contact Us
Suite 204, 1200 South Avenue, Staten Island, New York, NY – 10314
 
 Unit 210, 789 Don Mills Road, North York, Toronto, ON, M3C 1T5
 
 #301, Sai Sree Residency, Ayyappa Society, Madhapur, Hyderabad, Telangana – 500081

Powered by eLearnCourses. All rights reserved.