
## 1. Model the project as a journey from source-conformed to business-conformed data

**Rule:** Your dbt DAG should intentionally move from raw, source-shaped tables to clean business concepts. The core structure should be `staging → intermediate → marts`, not a flat collection of SQL files.

**Why it matters:** If the project does not encode this progression, every downstream BI dashboard, metric, and LLM-generated query has to guess whether a model is raw-ish, cleaned, reusable, or consumer-ready. That creates inconsistent joins, duplicated logic, and ambiguous definitions. dbt structure should be deliberately framed as the movement from many narrow source-conformed models toward fewer, wider business-conformed models.

**Positive example:**

```
models/
  staging/
    stripe/
      stg_stripe__payments.sql
  intermediate/
    finance/
      int_payments_pivoted_to_orders.sql
  marts/
    finance/
      orders.sql
```

**Negative example:**

```
models/
  orders_cleaned.sql
  orders_final.sql
  orders_dashboard.sql
  stripe_logic.sql
```

**Warning:** The folder structure is not cosmetic. It becomes a selection interface, a governance boundary, and a mental model for contributors.

---

## 2. Group staging models by source system, not by business team

**Rule:** In `models/staging`, subdirectories should represent source systems such as `stripe`, `salesforce`, `jaffle_shop`, or `google_analytics`.

**Why it matters:** Staging is where source data is standardized. Grouping staging by business function creates duplicate “atoms” before the project has established a single source of truth. Avoid `marketing` or `finance` folders in staging because that invites competing definitions of basic objects like orders or customers.

**Positive example:**

```
models/staging/stripe/stg_stripe__payments.sql
models/staging/jaffle_shop/stg_jaffle_shop__orders.sql
```

**Negative example:**

```
models/staging/finance/orders.sql
models/staging/marketing/orders.sql
```

**Warning:** This is one of the fastest ways to break semantic consistency. Once two teams stage “orders” differently, every metric built on top becomes politically and technically fragile.

---

## 3. Keep staging models boring: rename, recast, clean, categorize — but do not join or aggregate

**Rule:** A staging model should normally select from one source table and perform only reusable transformations: renaming, type casting, simple calculations, and basic categorizations.

**Why it matters:** Staging models are the reusable atomic layer. If you join or aggregate here, you silently change grain too early and remove optionality from downstream models. The recommended pattern is to select from one source, rename fields, recast types, and then build all later models on top of those standardized models.

**Positive example:**

```
select
    idas payment_id,
    orderidas order_id,
    amount/100.0as amount,
    created::timestampas created_at
from {{ source('stripe','payment') }}
```

**Negative example:**

```
select
    orders.customer_id,
    sum(payments.amount)as lifetime_value
from {{ source('stripe','payment') }}as payments
join {{ source('app','orders') }}as orders
on payments.order_id= orders.id
groupby1
```

**Warning:** Aggregation in staging is a grain violation. Joins in staging are usually a fanout risk. Use a `base` sublayer only when staging genuinely requires a source-level cleanup pattern, such as delete tables or unioning identical source structures.

---

## 4. Use `source()` only at the raw boundary and `ref()` everywhere else

**Rule:** Raw data should be declared as dbt sources and referenced through `source()`. Models should reference other models only through `ref()`.

**Why it matters:** `ref()` lets dbt infer dependency order and keeps development and production environments isolated correctly. Direct database references bypass lineage, break environment portability, and make model dependencies invisible. Use `ref()` instead of direct relation references and limit raw references to one place.

**Positive example:**

```
from {{ref('stg_stripe__payments') }}
```

**Negative example:**

```
from analytics_prod.staging.stg_stripe__payments
```

**Warning:** Hard-coded relations are especially dangerous in CI, dev schemas, dbt Mesh, and downstream package consumption because the DAG no longer reflects the actual dependency.

---

## 5. Break complex transformations into intermediate models when they change grain, repeat logic, or hide fanout

**Rule:** Intermediate models should isolate meaningful transformation steps, especially joins, pivots, fanouts, aggregations, and reusable CTEs.

**Why it matters:** Complex models should be broken apart when CTEs are duplicated or when a CTE changes the grain of the data. Grain-changing logic deserves its own model because it should be testable and inspectable before it gets mixed into a mart.

**Positive example:**

```
int_payments_pivoted_to_orders.sql
```

This name tells you the input concept, the operation, and the output grain.

**Negative example:**

```
int_finance_logic.sql
```

This hides the transformation and gives no clue whether the model aggregates, joins, deduplicates, or fans out rows.

**Warning:** LLM-generated analytics queries are very sensitive to ambiguous grain. A model named `customer_metrics` could mean one row per customer, customer-day, customer-month, or customer-product unless the dbt layer makes grain explicit.

---

## 6. Name intermediate models with verbs, not vague nouns

**Rule:** Intermediate models should describe the transformation they perform: `pivoted`, `aggregated_to`, `joined`, `fanned_out_by`, `deduplicated`, `ranked`, `sessionized`.

**Why it matters:** Intermediate models should be named around verbs because this layer exists to perform purpose-built transformations on the way to marts. A good intermediate model name tells reviewers what changed without making them read the SQL first.

**Positive examples:**

```
int_payments_pivoted_to_orders.sql
int_orders_aggregated_to_customers.sql
int_events_sessionized.sql
```

**Negative examples:**

```
int_orders.sql
int_model_2.sql
finance_intermediate.sql
```

**Warning:** If an intermediate model cannot be named with a clear verb, it probably has too many responsibilities.

---

## 7. Build marts as business entities with one clear grain

**Rule:** A mart should represent a business entity or concept, such as `orders`, `customers`, `payments`, or `accounts`, with one row representing one instance of that entity.

**Why it matters:** Marts should be treated as business-defined entities; time-based rollups like `orders_per_day` are usually metrics, not pure marts. Avoid departmental duplicates like `finance_orders` and `marketing_orders` unless they are genuinely separate business concepts.

**Positive example:**

```
models/marts/finance/orders.sql
models/marts/marketing/customers.sql
```

**Negative example:**

```
finance_orders.sql
marketing_orders.sql
sales_orders.sql
```

**Warning:** Department-prefixed marts often encode political ownership rather than semantic difference. If finance revenue and product revenue differ, name the concepts precisely, such as `tax_revenue` versus `recognized_revenue`.

---

## 8. Denormalize marts only when you are not relying on the dbt Semantic Layer

**Rule:** Without the dbt Semantic Layer, wide denormalized marts are useful because BI users can query fewer joins. With the Semantic Layer, keep marts more normalized so MetricFlow has flexible dimensions and joins.

**Why it matters:** dbt projects often build highly denormalized datasets, but this limits the dimensionality available to MetricFlow. A more normalized approach is recommended when using the Semantic Layer.

**Positive without Semantic Layer:**

```
customers
  customer_id
  first_order_date
  lifetime_value
  number_of_orders
```

**Positive with Semantic Layer:**

```
customers
orders
payments
```

Then define semantic models and metrics on top.

**Negative with Semantic Layer:**

```
customer_metrics_obt
```

A single big rollup may be convenient for one dashboard but weak for reusable metrics.

**Warning:** This is a real architectural fork. Denormalized marts optimize human SQL and dashboard speed. Normalized semantic models optimize metric reuse and query generation flexibility.

---

## 9. Choose materializations by access pattern, not habit

**Rule:** Materialization should follow how the model is used:

```
staging      → view
intermediate → ephemeral or restricted view
marts        → table or incremental
```

**Why it matters:** Staging models should default to views because they are rarely queried directly and should stay fresh; intermediate models to ephemeral because they are implementation details; and marts to tables or incremental models because they are directly consumed and need performance.

**Positive example:**

```
models:
  jaffle_shop:
    staging:
      +materialized: view
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table
```

**Negative example:**

```
models:
  jaffle_shop:
    staging:
      +materialized: table
    marts:
      +materialized: view
```

**Warning:** Ephemeral models keep the warehouse clean but make debugging harder because they are compiled into downstream SQL. For complex intermediate logic, restricted views are often the better operational choice.

---

## 10. Configure defaults at the folder level and override only exceptions

**Rule:** Put default materializations, schemas, and other common configs in `dbt_project.yml` by directory. Use model-level config only for exceptions.

**Why it matters:** Project-level configuration is preferred because configs cascade, with more specific scopes overriding broader ones. This keeps the project DRY and avoids copy-pasted config blocks.

**Positive example:**

```
models:
  jaffle_shop:
    staging:
      +materialized: view
    intermediate:
      +materialized: ephemeral
    marts:
      +materialized: table
      finance:
        +schema: finance
```

**Negative example:**

```
{{ config(materialized='table') }}
```

copied into every mart model.

**Warning:** Overusing tags to compensate for weak folder structure creates maintenance debt. Folders should be the primary grouping and selection mechanism; tags should represent exceptions.

---

## 11. Treat names as semantic contracts

**Rule:** Naming conventions should encode source, layer, entity, grain, and field meaning.

**Recommended conventions:**

```
stg_[source]__[entity]s.sql
int_[entity]s_[verb]s.sql
marts: [business_entity].sql
primary key: [object]_id
boolean: is_ / has_
timestamp: [event]_at
date: [event]_date
```

Also recommended: plural model names, snake_case, no abbreviations, no reserved words, and consistent key names across models.

**Positive example:**

```
customer_id,
is_completed_payment,
created_at,
created_date
```

**Negative example:**

```
id,
completed,
created,
cust_id
```

**Warning:** Bad names are not just a readability issue. They degrade BI self-service and LLM-based analytics because query generators infer meaning from names, joins, and grain.

---

## 12. Use YAML per folder, not one giant project file

**Rule:** Keep model and source YAML close to the models they describe. Use `_[directory]__models.yml` and, for staging folders, `_[directory]__sources.yml`.

**Why it matters:** A single monolithic YAML file centralizes config but makes specific tests, descriptions, and source definitions hard to find. One YAML file per model is searchable but creates too much file churn. Folder-level YAML is the best default balance.

**Positive example:**

```
models/staging/stripe/_stripe__sources.yml
models/staging/stripe/_stripe__models.yml
models/marts/finance/_finance__models.yml
```

**Negative example:**

```
models/schema.yml
```

containing every model, source, test, and description.

**Warning:** YAML structure affects test discoverability. If contributors cannot quickly find where a model is documented and tested, they will skip both.

---

## 13. Put tests where the modeling risk is, especially around keys and grain changes

**Rule:** Every important model should have a primary key, and transformations that change grain should be tested aggressively.

**Why it matters:** Each model should have a primary key, and CTEs that change grain should be broken out because those transformations are useful to test. Singular tests are valuable when testing interactions across specific models.

**Positive examples:**

```
models:
  - name: orders
    columns:
      - name: order_id
        data_tests:
          - unique
          - not_null
```

A fanout model should also test expected row counts or uniqueness at the new grain.

**Negative example:**

```
select*
from orders
join order_itemsusing (order_id)
```

with no test proving whether the output is one row per order or one row per order item.

**Warning:** Most metric bugs are not caused by obviously wrong SQL. They come from unnoticed grain changes, duplicate joins, and null-handling assumptions.

---

## 14. Use seeds only for small static lookup data, not ingestion

**Rule:** Seeds are for small CSV-backed lookup tables that do not exist in source systems. They are not a replacement for an EL tool.

**Why it matters:** Seeds are intended for lookup tables, not for loading source data into the warehouse.

**Positive example:**

```
seeds/utm_campaign_mappings.csv
seeds/zip_code_to_state.csv
```

**Negative example:**

```
seeds/orders.csv
seeds/payments.csv
```

used as recurring source ingestion.

**Warning:** Using seeds as ingestion hides lineage and operational freshness. dbt should transform data already in the warehouse, not act as your data loader.

---

## 15. Use Mesh only when ownership boundaries are real

**Rule:** Split dbt projects by durable domain ownership, governance needs, or scale — not merely by use case like “BI versus ML.”

**Why it matters:** dbt Mesh is useful for managing complexity across multiple interconnected projects, but interfaces should be defined based on teams, jobs, lineage, and selectors before splitting. Use groups and access controls first, with the most private access that preserves functionality.

**Positive example:**

```
customer_domain_project
finance_domain_project
foundational_sources_project
```

with public/protected interfaces.

**Negative example:**

```
bi_project
ml_project
executive_dashboard_project
```

all rebuilding the same customer and revenue concepts differently.

**Warning:** Mesh introduces intentional friction. That is good for governed interfaces, but wasteful if your team is just trying to organize a small project.

---

## 16. Use model contracts, access, and versions for stable cross-team interfaces

**Rule:** Once other teams or projects depend on a model, treat that model as an interface: define ownership, restrict access, use contracts, and version breaking changes.

**Why it matters:** Model contracts enforce structure and types, model versions manage breaking changes and migration windows, and access modifiers control whether models are private, protected, or public.

**Positive example:**

```
models:
  - name: orders
    config:
      access: public
      contract:
        enforced: true
```

**Negative example:**

```
models:
  - name: int_orders_joined_to_payments
    config:
      access: public
```

exposing implementation logic as a cross-team dependency.

**Warning:** Public models should be boring, stable, and documented. Do not publish intermediate implementation details just because another team asked for them once.

---

## 17. Keep semantic models and metrics in an intentional file structure

**Rule:** For the dbt Semantic Layer, either co-locate semantic YAML with each mart or create a dedicated `models/semantic_models/` folder. For existing large projects, default to the dedicated folder.

**Why it matters:** Both options are valid; a dedicated subfolder is preferable when migrating existing projects because it makes it easier to see which marts have been codified into the Semantic Layer. Semantic models and metrics must live under `model-paths`, or the semantic manifest will be empty.

**Positive example:**

```
models/semantic_models/sem_orders.yml
models/semantic_models/sem_customers.yml
```

**Negative example:**

```
semantic_layer/orders.yml
```

outside `model-paths`.

**Warning:** This is not just organization. A misplaced semantic YAML file can mean your metrics are not recognized at all.

---

## 18. Use SQL style rules to protect review quality, not aesthetics

**Rule:** Adopt a style guide and automate it with SQLFluff or a formatter. The point is not prettiness; it is reducing review noise and making semantic mistakes visible.

**Why it matters:** Codify SQL style, use SQLFluff, avoid abbreviations, use explicit aliases, prefix columns when joining, and make join types explicit.

**Positive example:**

```
select
    orders.order_id,
    customers.customer_id

from orders

leftjoin customers
on orders.customer_id= customers.customer_id
```

**Negative example:**

```
select
    o.id,
    c.id
from orders o
join customers c
on o.customer= c.id
```

**Warning:** Ambiguous aliases and implicit joins make code review weaker. They also make it harder to spot fanout, wrong join direction, or accidental inner joins.

---

## 19. Separate development and production environments

**Rule:** Developers should run against a development target; production jobs should run against a production target.

**Why it matters:** Use separate development and production environments through dbt targets, plus version control and pull requests before merging to the production branch.

**Positive example:**

```
local development → dev target
scheduled job     → prod target
```

**Negative example:**

```
analyst laptop writes directly to production schemas
```

**Warning:** Environment discipline is part of semantic reliability. A trusted metric is not trusted if anyone can overwrite its production relation while experimenting.

---

## 20. Document deviations from the default conventions

**Rule:** Consistency matters more than the exact convention, but deviations must be explicit and documented.

**Why it matters:** The source repeatedly frames its conventions as strong defaults, not universal law. The important part is that teams understand why a structure exists and when it is safe to deviate.

**Positive example:**

```
We materialize selected intermediate models as views because they are used for
debugging complex grain-changing joins. They live in a restricted schema and
are not available to end users.
```

**Negative example:**

```
Some intermediate models are tables because the first developer preferred that.
```

**Warning:** Undocumented exceptions become fake patterns. New contributors will copy them, and the project will drift into inconsistency.

## 21. Design the mart first, then refactor upstream

**Rule:** Do not build dbt models strictly from left to right through the DAG. Start with the business output you need, prove the shape, then refactor logic back into staging and intermediate layers.

**Why this matters:** dbt project structure follows DAG order, but development usually should not. The recommended workflow is to mock the target output first, write the SQL needed to produce it, identify dependencies, then stage missing source tables and extract reusable logic upstream.

**Good pattern:**

```
1. Mock the target dashboard table or metric in a spreadsheet.
2. Write ugly-but-correct SQL to prove the output.
3. Identify source tables and grain.
4. Create missing staging models.
5. Move repeated or grain-changing logic into intermediate models.
6. Leave the mart clean and entity-grained.
```

**Bad pattern:**

```
1. Stage every source table.
2. Build many intermediate models speculatively.
3. Hope the final mart satisfies the stakeholder.
```

**Warning:** Premature DAG design creates beautiful unused models. The mart should justify the upstream abstractions.

---

## 22. Treat CTEs as miniature models

**Rule:** Inside a model, CTEs should tell the same story that the DAG tells across models: import, transform, join, aggregate, output.

**Why this matters:** Descriptive CTE names matter. A CTE should usually perform one logical unit of work, especially when the transformation affects grain or join behavior.

**Good example:**

```
with

ordersas (

select*from {{ref('stg_jaffle_shop__orders') }}

),

payments_pivoted_to_order_grainas (

select
        order_id,
        sum(amount)as total_amount

from {{ref('stg_stripe__payments') }}

groupby1

),

orders_joined_to_paymentsas (

select
        orders.order_id,
        orders.customer_id,
        payments_pivoted_to_order_grain.total_amount

from orders

leftjoin payments_pivoted_to_order_grain
on orders.order_id= payments_pivoted_to_order_grain.order_id

)

select*from orders_joined_to_payments
```

**Bad example:**

```
with xas (...),
yas (...),
finalas (...)
select*from final
```

**Warning:** If a CTE name cannot explain what changed, the model probably hides business logic.

---

## 23. Pull duplicated CTEs into intermediate models

**Rule:** If the same CTE appears in multiple models, it should usually become an intermediate model.

**Why this matters:** Repeated CTEs are semantic drift waiting to happen. One analyst changes the logic in one mart but not another, and suddenly “completed payments” or “active customers” means two different things.

**Good pattern:**

```
int_payments_filtered_to_completed.sql
```

used by:

```
orders.sql
customers.sql
revenue.sql
```

**Bad pattern:**

```
where statusin ('success','completed')
```

copy-pasted into five marts with small variations.

**Warning:** Copy-pasted CTEs are worse than copy-pasted SQL style violations. They duplicate definitions.

---

## 24. Aggregate before joining when possible

**Rule:** When joining fact-like tables, aggregate to the intended join grain before the join.

**Why this matters:** Aggregate on the smallest useful dataset before joining. This improves performance and reduces fanout risk.

**Good example:**

```
payments_aggregated_to_order_grainas (

select
        order_id,
        sum(amount)as total_amount

from payments

groupby1

)

select
    orders.order_id,
    payments_aggregated_to_order_grain.total_amount

from orders

leftjoin payments_aggregated_to_order_grain
on orders.order_id= payments_aggregated_to_order_grain.order_id
```

**Bad example:**

```
select
    orders.order_id,
    sum(payments.amount)as total_amount

from orders

leftjoin payments
on orders.order_id= payments.order_id

leftjoin order_items
on orders.order_id= order_items.order_id

groupby1
```

**Warning:** Joining two one-to-many relationships before aggregation is the classic fanout trap. It produces inflated metrics that look plausible.

---

## 25. Always make join direction obvious

**Rule:** Prefer left-to-right join logic. Start from the grain-defining table, then join supporting data onto it.

**Why this matters:** Avoid `right join` and use explicit join types. The table in the `from` clause should usually define the output grain.

**Good example:**

```
from orders

leftjoin customers
on orders.customer_id= customers.customer_id
```

This says: “one row per order, with customer attributes added.”

**Bad example:**

```
from customers

rightjoin orders
on customers.customer_id= orders.customer_id
```

The same result may be possible, but the model is harder to reason about.

**Warning:** LLM query generation and human reviewers both infer grain from the `from` table. Make that inference easy.

---

## 26. Prefix columns whenever a model joins tables

**Rule:** If a model joins two or more relations, every selected column should be prefixed with its source relation or CTE name.

**Why this matters:** Prefix columns in joined models. This prevents ambiguous references and makes lineage visible inside the SQL.

**Good example:**

```
select
    orders.order_id,
    orders.customer_id,
    customers.customer_name

from orders

leftjoin customers
on orders.customer_id= customers.customer_id
```

**Bad example:**

```
select
    order_id,
    customer_id,
    customer_name
```

after joining several CTEs.

**Warning:** Unprefixed columns make it harder to detect accidental column shadowing, especially when source systems reuse generic names like `id`, `status`, `created_at`, or `type`.

---

## 27. Avoid table aliases unless the alias is genuinely clearer

**Rule:** Do not use cryptic aliases like `o`, `c`, `p`, or `fct`.

**Why this matters:** Aliases reduce readability and should be avoided. In dbt, clarity usually beats short SQL.

**Good example:**

```
from orders

leftjoin customers
on orders.customer_id= customers.customer_id
```

**Bad example:**

```
from orders o

leftjoin customers c
on o.customer_id= c.customer_id
```

**Trade-off:** Short aliases may be acceptable in very long SQL expressions, but they should be meaningful. `current_orders` is better than `co`; `customers` is better than `c`.

---

## 28. Use `union all` by default

**Rule:** Use `union all` unless you explicitly need deduplication.

**Why this matters:** Prefer `union all` over `union`. `union` performs duplicate removal, which adds compute and can hide upstream data-quality issues.

**Good example:**

```
select*from shopify_us_orders

unionall

select*from shopify_eu_orders
```

**Bad example:**

```
select*from shopify_us_orders

union

select*from shopify_eu_orders
```

**Warning:** If duplicate rows appear, fix or test the duplicate condition explicitly. Do not let `union` silently erase evidence.

---

## 29. Use import CTEs to limit scan size

**Rule:** Top-level import CTEs should not blindly `select *` from large upstream models once the model is mature. Select only the columns and rows needed.

**Why this matters:** Place import CTEs at the top of the file and limit scanned data where possible.

**Good example:**

```
ordersas (

select
        order_id,
        customer_id,
        order_date,
        order_total

from {{ref('orders') }}

where order_date>='2024-01-01'

)
```

**Bad example:**

```
ordersas (

select*from {{ref('orders') }}

)
```

in a heavy mart that needs only four columns.

**Trade-off:** `select *` is acceptable during early development or in simple staging models. In expensive downstream transformations, it becomes wasteful.

---

## 30. Use the final `select * from final_cte` pattern while developing

**Rule:** End each model with a simple `select *` from the final output CTE.

**Why this matters:** This pattern makes debugging easier. You can temporarily change the last line to inspect an earlier CTE without rewriting the model.

**Good example:**

```
select*from orders_joined_to_payments
```

**Debugging move:**

```
select*from payments_aggregated_to_order_grain
```

**Bad example:**

```
select
    order_id,
    customer_id,
    total_amount
from (
    ...
)
```

**Warning:** Deeply nested SQL makes dbt models harder to audit. Prefer named CTEs that expose each transformation step.

---

## 31. Materialize intermediate models as views when debugging complexity

**Rule:** Ephemeral intermediate models are a good default, but complex or suspicious intermediate logic should be materialized as a restricted view during development or permanently if debugging value outweighs clutter.

**Why this matters:** Ephemerals keep the warehouse clean but make troubleshooting harder because they are interpolated into downstream SQL.

**Good use of ephemeral:**

```
Small reusable transformation
No need to inspect output
Low complexity
```

**Good use of restricted view:**

```
Changes grain
Contains window functions
Performs deduplication
Feeds several marts
Often breaks during development
```

**Warning:** A tidy warehouse is not worth opaque logic. If a model is critical to metric correctness, being able to inspect it may matter more than hiding it.

---

## 32. Temporarily build a failing model chain as tables to locate warehouse errors

**Rule:** When an error appears in a downstream mart but likely originates upstream, temporarily materialize the dependency chain as tables.

**Why this matters:** Troubleshoot by building specific model chains as tables so the warehouse throws the error where it actually occurs.

**Good debugging move:**

```
stg_orders       → table temporarily
int_order_items  → table temporarily
orders           → table temporarily
```

**Bad debugging move:**

```
Only inspect the final mart SQL and assume the error originates there.
```

**Warning:** View stacks and ephemerals can make database errors misleading. The visible failure point is not always the logical failure point.

---

## 33. Keep warehouse UX clean

**Rule:** The warehouse output is part of the user experience. Do not expose implementation models in schemas used by analysts, BI tools, or LLM agents.

**Why this matters:** The warehouse is one of the interfaces to the knowledge graph encoded in dbt. Intermediate models should not be exposed in the main production schema.

**Good pattern:**

```
analytics.finance.orders
analytics.marketing.customers
analytics_intermediate.finance.int_payments_pivoted_to_orders
```

**Bad pattern:**

```
analytics.orders
analytics.customers
analytics.int_payments_pivoted_to_orders
analytics.stg_stripe__payments
analytics.tmp_joined_orders_v2
```

**Warning:** BI users and LLM agents do not reliably know which tables are safe. If you expose staging and intermediate models in the same schema as marts, people will query the wrong thing.

---

## 34. Use `analyses` for audit and migration SQL, not production logic

**Rule:** Store version-controlled, Jinja-enabled audit queries in `analyses`, especially during migrations or reconciliations.

**Why this matters:** Use `analyses` for queries that should be version-controlled but not built into the warehouse. Audit-helper style workflows are a typical fit.

**Good use:**

```
analyses/compare_legacy_orders_to_dbt_orders.sql
analyses/audit_revenue_migration.sql
```

**Bad use:**

```
analyses/customer_lifetime_value.sql
```

where the query is actually needed by dashboards.

**Warning:** If an analysis becomes a recurring dependency, promote it to a model and test it.

---

## 35. Document macros once they become shared infrastructure

**Rule:** Any macro used by multiple models should have documented purpose and arguments.

**Why this matters:** Create `_macros.yml` and document macros when they are ready for use.

**Good example:**

```
macros:
  - name: cents_to_dollars
    description: Converts integer cents into decimal currency.
    arguments:
      - name: column_name
        type: column
        description: Column containing the amount in cents.
```

**Bad example:**

```
{{ cents_to_dollars('amount') }}
```

with no explanation of rounding, null behavior, currency assumptions, or type returned.

**Warning:** Macros hide logic. Hidden logic without documentation becomes a semantic black box.

---

## 36. Prefer packages for common tests and utilities before writing custom logic

**Rule:** Before writing a singular test or custom macro, check whether a mature dbt package already covers the use case.

**Why this matters:** Packages such as `dbt-utils`, `dbt-expectations`, `dbt-codegen`, and `dbt-audit-helper` cover common project needs.

**Good pattern:**

```
Use dbt_utils.unique_combination_of_columns
instead of writing a one-off uniqueness test for composite grain.
```

**Bad pattern:**

```
Every team writes its own slightly different accepted-values, recency, or relationship test.
```

**Warning:** Custom tests are useful, but they create maintenance obligations. Use them when the business rule is genuinely specific.

---

## 37. Use Codegen after you understand staging manually

**Rule:** Learn to write staging models by hand, then automate boilerplate staging and source YAML with Codegen.

**Why this matters:** Codegen is recommended for every project, but only after developers understand the staging pattern.

**Good pattern:**

```
First few source tables: write manually.
After pattern is clear: generate staging boilerplate.
Then customize exceptions.
```

**Bad pattern:**

```
Generate 200 staging models no one understands.
```

**Warning:** Codegen accelerates conventions. It does not design them for you.

---

## 38. Use snapshots only for true Type 2 history

**Rule:** Use snapshots when source data is destructively updated and you need historical versions.

**Why this matters:** Snapshots are the tool for creating Type 2 slowly changing dimension records from Type 1 source data.

**Good use:**

```
Track customer tier changes over time.
Track account owner changes over time.
Track subscription status history when the source overwrites status.
```

**Bad use:**

```
Snapshot every table just in case.
```

**Warning:** Snapshots add storage, complexity, and interpretation burden. If the source already provides historical events, model those events instead.

---

## 39. Use groups and access modifiers to encode ownership

**Rule:** As a project grows, define groups and set model access intentionally: private for implementation details, protected for cross-team reuse inside a project boundary, public for stable interfaces.

**Why this matters:** Groups, private models, and explicit cross-team collaboration matter most in dbt Mesh or multi-team projects.

**Good pattern:**

```
models:
  - name: int_payments_pivoted_to_orders
    config:
      group: finance
      access: private

  - name: orders
    config:
      group: finance
      access: public
```

**Bad pattern:**

```
Everything is public because someone might need it someday.
```

**Warning:** Public access is a promise. Do not make implementation details public unless you are ready to support them.

---

## 40. Version models only for breaking interface changes

**Rule:** Use model versions when a model’s structure, meaning, or contract changes in a way that would break consumers.

**Why this matters:** Versioned models and deprecation dates are part of managing model evolution.

**Good use of versioning:**

```
orders_v1: existing revenue logic
orders_v2: revised revenue recognition logic
```

**Bad use of versioning:**

```
orders_v2 because the SQL was refactored internally but output is unchanged
```

**Warning:** Versioning is not a substitute for clean migration planning. Add deprecation dates and communicate which version consumers should move to.

---

## 41. Do not split projects by consumption mode

**Rule:** Avoid splitting dbt projects into “BI project,” “ML project,” and “executive reporting project” if they reuse the same core concepts.

**Why this matters:** Avoid splitting by ML versus reporting use cases because it undermines the single source of truth.

**Good split:**

```
finance-owned project
customer-domain project
platform/foundational project
```

**Bad split:**

```
dashboard_project
ml_features_project
ad_hoc_analysis_project
```

**Warning:** ML features and BI metrics should often come from the same governed marts or semantic definitions. Otherwise the business will get two versions of “customer,” “revenue,” and “active user.”

---

## 42. Keep rollups out of marts when they are really metrics

**Rule:** Do not create marts like `orders_per_day`, `revenue_by_month`, or `customers_by_region` unless they are deliberate aggregate tables. Prefer entity-grained marts plus metrics.

**Why this matters:** Pure marts should not usually include time-based rollups; grouped outputs move past marts into metrics.

**Good pattern:**

```
orders            → one row per order
customers         → one row per customer
metric: revenue   → aggregated by requested dimensions
```

**Bad pattern:**

```
orders_per_day
orders_per_week
orders_by_region
orders_by_channel
```

**Warning:** Pre-aggregated marts multiply definitions. They also restrict future slicing because the grain has already been collapsed.

---

## 43. Make null handling explicit in marts

**Rule:** When joining optional downstream data into marts, use explicit `coalesce` only where the business meaning is clear.

**Why this matters:** Typical mart examples coalesce missing payment amounts and order counts to zero. That is correct when “no related rows” means zero, not unknown.

**Good example:**

```
coalesce(order_payments.total_amount,0)as total_amount
```

when no payment rows means no successful payments.

**Risky example:**

```
coalesce(customer_orders.lifetime_value,0)as lifetime_value
```

if missing orders could mean ingestion failure, privacy filtering, or an unresolved join key.

**Warning:** Nulls are semantic information. Replacing null with zero can make data quality problems invisible.

---

## 44. Use business terminology at the semantic boundary

**Rule:** Rename source terms into business terms as early as staging when the mapping is stable.

**Why this matters:** Use business terminology rather than source terminology, for example `customer_id` instead of `user_id` if the business calls them customers.

**Good example:**

```
idas customer_id
```

**Bad example:**

```
user_id
```

in some models and:

```
customer_id
```

in others for the same entity.

**Warning:** Inconsistent terminology breaks metric discoverability. It also confuses LLM-based querying because the same concept appears under multiple names.

---

## 45. Optimize for boring consistency, not cleverness

**Rule:** Prefer repetitive, predictable patterns over clever SQL structures.

**Why this matters:** Consistency is a functional requirement. dbt projects are collaborative semantic systems, not personal SQL notebooks.

**Good pattern:**

```
source CTE
renamed/transformed CTE
final select
```

used consistently across staging models.

**Bad pattern:**

```
Every model has a different style, CTE order, aliasing convention, and config location.
```

**Warning:** Cleverness compounds. A clever staging model becomes a confusing intermediate model, then an unreliable mart, then a disputed metric.
