Recipes vs. flows
A common new-user confusion. Both transform data; they're different abstractions and serve different purposes.
Recipe = one transformation
A recipe takes one or more inputs (datasets) and produces one output (a derived dataset). It's a single step.
Examples:
- A SQL recipe:
SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id - A visual recipe: filter rows where
status='active', then add a computed column - A Python recipe: load a model, score each row, return the predictions
Every recipe is independently runnable — you can right-click a recipe and "Run this step" without running anything else.
Flow = orchestration of recipes
A flow is a directed graph of recipes (and connectors and outputs) that defines how data moves end-to-end through your project.
A typical flow:
[Postgres connector] → [SQL recipe: clean_customers] → [SQL recipe: enrich_with_geo]
↓
[CSV upload] → [Visual recipe: parse_dates] ────→ [Python recipe: dedupe]
↓
[Dashboard tile dataset]
The flow knows the dependencies. When you run the flow, it:
- Topologically sorts the recipes
- Runs them in order (or in parallel where independent)
- Skips recipes whose inputs haven't changed (unless you force re-run)
- Stops at the first failure (configurable)
When to use which
| Situation | Use |
|---|---|
| One-off transformation for a single dashboard tile | A recipe (no flow needed) |
| Multiple tiles built on the same intermediate calculation | A flow with that calculation as a shared recipe |
| Scheduled overnight pipeline | A flow with a schedule |
| Ad-hoc exploration in a notebook | Neither — use a code notebook directly |
| Reusable transformation across projects | A recipe published to the project library |
A useful heuristic: if you find yourself running the same sequence of recipes by hand more than once, that's a flow trying to be born.
Recipes that aren't in a flow
A recipe can exist outside any flow — it just sits in the project, runnable on demand. These are useful for:
- Exploration before committing to a pipeline shape
- One-off data cleaning that doesn't need to repeat
- Materializing a view from one dataset into another for downstream use
Standalone recipes don't get scheduled or auto-run. You'd run them manually or wire them into a flow later.
Flows that aren't scheduled
A flow doesn't have to run on a schedule. Many orgs have flows that are only run on-demand — for example, "rebuild the customer 360" flow that gets triggered by a click in the dashboard, or via API.
Schedule a flow only when there's a real cadence ("every morning at 6 AM the dashboards need fresh data"). Otherwise leave it manual; you'll have less to debug.
Common mistakes
- One giant recipe doing five things — split it. Each recipe should do one transformation. Errors are easier to localize and intermediate outputs are inspectable.
- A flow with one recipe in it — flows are orchestration. A flow with one step is just a recipe with extra ceremony. Either add steps or use the recipe directly.
- Embedding business logic in the connector — connectors should bring data in raw. Transformations belong in recipes. Mixing them makes both harder to test.
See also
- Reduce flow runtime — performance tuning concentrated at the flow layer
- Incremental vs full refresh — orthogonal but interacts with how flows are structured