DBT (Data Build Tool) is an open-source analytics engineering tool that enables data analysts and engineers to transform data in the warehouse more effectively. With DBT, you can write modular SQL, version control it with Git, test it, and deploy it like code.
It allows the T in ELT (Extract, Load, Transform). You extract and load data using tools like Fivetran or Airbyte, and transform it in your warehouse using DBT.
Feature | Benefit |
---|---|
SQL-Based | No need to learn Python or Scala |
Modular | Split queries into reusable models |
Git Integration | Version control your transformations |
Testing | Validate data quality with built-in assertions |
Documentation | Auto-generates model lineage & docs |
Deployment | CI/CD, scheduler, and production monitoring |
DBT supports major warehouses like Snowflake, BigQuery, Redshift, Databricks, Postgres.
DBT operates in the data transformation layer. Here’s how a typical ELT flow works with DBT:
1. Data Source → 2. ETL Tool (e.g., Fivetran) → 3. Warehouse → 4. DBT → 5. BI Tool (e.g., Looker, Tableau)
DBT sits between your warehouse and BI tools. It helps to:
Clean and transform raw data
Apply business logic
Create unified models for analytics
Feature | DBT Core | DBT Cloud |
---|---|---|
Free | ✅ | Limited free tier |
Local Execution | ✅ | ❌ |
Web UI | ❌ | ✅ |
Job Scheduling | ❌ | ✅ |
CI/CD | Manual | Built-in |
IDE | Terminal | In-browser SQL editor |
Use DBT Core for local/dev environments, and DBT Cloud for team collaboration and production deployment.
You can install DBT Core using pip:
pip install dbt-core dbt-postgres # or dbt-snowflake, dbt-bigquery
Then initialize a project:
dbt init my_project
cd my_project
This creates:
/models
: where your SQL models live
/dbt_project.yml
: your project settings
/profiles.yml
: connection settings (in ~/.dbt/
)
Also Read: DBT Interview Questions
my_project/
│
├── dbt_project.yml
├── models/
│ ├── staging/
│ ├── marts/
│ └── core/
├── macros/
├── snapshots/
├── tests/
├── seeds/
└── analysis/
models/: Core of your transformation logic
snapshots/: For slowly changing dimensions
seeds/: CSV files loaded into the warehouse
macros/: Jinja functions and reusable logic
A model is a SQL file (e.g., stg_orders.sql
) that transforms data. Example:
-- models/stg_orders.sql
SELECT
id AS order_id,
customer_id,
order_date
FROM raw.orders
Run it with:
dbt run --select stg_orders
The output table will be created in your warehouse.
Use ref()
to define dependencies between models:
SELECT * FROM {{ ref('stg_orders') }}
This tells DBT to build the DAG (Directed Acyclic Graph) automatically.
Type | Description |
---|---|
view | Default, lightweight |
table | Materialized as a full table |
incremental | Adds only new records |
ephemeral | Temporary CTE, never materialized |
In dbt_project.yml
:
models:
my_project:
staging:
materialized: view
You can test your models using:
version: 2
models:
- name: stg_orders
columns:
- name: order_id
tests:
- not_null
- unique
Create a SQL file in /tests
:
-- tests/no_future_orders.sql
SELECT * FROM {{ ref('stg_orders') }}
WHERE order_date > current_date
Run tests with:
dbt test
DBT uses Jinja templating in SQL. Common uses:
{{ ref('model') }}
– model dependency
{{ config(...) }}
– model-level configs
{{ var('env') }}
– use of variables
{{ macros.custom_macro(...) }}
– custom logic
-- macros/calculate_days.sql
{% macro calculate_days(start_col, end_col) %}
DATEDIFF(day, {{ start_col }}, {{ end_col }})
{% endmacro %}
Use in model:
SELECT {{ calculate_days('created_at', 'updated_at') }} AS duration
Generate documentation:
dbt docs generate
dbt docs serve
This gives a UI with:
Model lineage (DAG)
Descriptions from YAML
Test coverage
Column-level metadata
Used for slowly changing dimensions (SCDs).
-- snapshots/snap_customers.sql
{% snapshot snap_customers %}
{
"target_schema": "snapshots",
"unique_key": "customer_id",
"strategy": "timestamp",
"updated_at": "updated_at"
}
SELECT * FROM {{ source('raw', 'customers') }}
{% endsnapshot %}
Run:
dbt snapshot
Define raw data as sources:
version: 2
sources:
- name: raw
tables:
- name: orders
description: "Raw order data"
Use in model:
SELECT * FROM {{ source('raw', 'orders') }}
Seeds are static CSVs loaded into your warehouse.
data/countries.csv
:
id,name
1,USA
2,India
Load with:
dbt seed
Access in models:
SELECT * FROM {{ ref('countries') }}
Use is_incremental()
in SQL:
-- models/fact_orders.sql
{{ config(materialized='incremental') }}
SELECT * FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE updated_at > (SELECT MAX(updated_at) FROM {{ this }})
{% endif %}
Saves time on large datasets.
In DBT Cloud, set up:
Jobs: scheduled runs
Environments: dev, staging, prod
CI/CD: auto-test on pull request
Use GitHub/GitLab integration for CI:
Run dbt run
on merge
Validate with dbt test
dbt_utils
: helper macros (must-have)
dbt_expectations
: test library
audit_helper
: audit trail macros
codegen
: auto-generate model skeletons
Install in packages.yml
:
packages:
- package: dbt-labs/dbt_utils
version: 1.0.0
Load data from Fivetran into Snowflake
Use DBT to create:
Staging models (stg_)
Fact/dim models (dim_, fact_)
Add tests, snapshots, docs
Schedule in DBT Cloud
Connect output to Looker or Tableau
Use clear naming (stg_
, fct_
, dim_
)
Always use ref()
Document everything in schema.yml
Break logic into layers: staging → intermediate → marts
Add tests early
Use is_incremental()
for large datasets
Avoid SELECT * in production
Role | How They Use DBT |
---|---|
Data Engineer | Build source and staging models |
Analytics Engineer | Build business logic and marts |
Analyst | Query clean models |
Product Manager | View data lineage and documentation |
What is the role of DBT in the modern data stack?
Difference between ref() and source()?
Explain incremental model in DBT.
How do you implement tests in DBT?
What is the use of the is_incremental()
macro?
For more, visit our DBT Interview Questions blog.
DBT has transformed how data teams build and manage transformations—bringing the best of software engineering into data. From modular SQL to automated testing and documentation, DBT is a must-learn tool for modern data professionals
eLearnCourses is a trusted destination for high-quality, industry-relevant online IT training. We are committed to empowering learners and professionals with the practical skills and knowledge they need to succeed in the digital era.
Training Delivered In Cities/Countries: Hyderabad, Bangalore, Mumbai, Delhi, Chennai, Pune, Texas, California, New Jersey, Virginia, London, Dubai (UAE), Toronto, Melbourne, Sydney, Singapore, and many more.
Powered by eLearnCourses. All rights reserved.