Lewati ke konten utama

Recipes vs. flows

A common new-user confusion. Both transform data; they're different abstractions and serve different purposes.

Recipe = one transformation

A recipe takes one or more inputs (datasets) and produces one output (a derived dataset). It's a single step.

Examples:

  • A SQL recipe: SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id
  • A visual recipe: filter rows where status='active', then add a computed column
  • A Python recipe: load a model, score each row, return the predictions

Every recipe is independently runnable — you can right-click a recipe and "Run this step" without running anything else.

Flow = orchestration of recipes

A flow is a directed graph of recipes (and connectors and outputs) that defines how data moves end-to-end through your project.

A typical flow:

[Postgres connector] → [SQL recipe: clean_customers] → [SQL recipe: enrich_with_geo]

[CSV upload] → [Visual recipe: parse_dates] ────→ [Python recipe: dedupe]

[Dashboard tile dataset]

The flow knows the dependencies. When you run the flow, it:

  1. Topologically sorts the recipes
  2. Runs them in order (or in parallel where independent)
  3. Skips recipes whose inputs haven't changed (unless you force re-run)
  4. Stops at the first failure (configurable)

When to use which

SituationUse
One-off transformation for a single dashboard tileA recipe (no flow needed)
Multiple tiles built on the same intermediate calculationA flow with that calculation as a shared recipe
Scheduled overnight pipelineA flow with a schedule
Ad-hoc exploration in a notebookNeither — use a code notebook directly
Reusable transformation across projectsA recipe published to the project library

A useful heuristic: if you find yourself running the same sequence of recipes by hand more than once, that's a flow trying to be born.

Recipes that aren't in a flow

A recipe can exist outside any flow — it just sits in the project, runnable on demand. These are useful for:

  • Exploration before committing to a pipeline shape
  • One-off data cleaning that doesn't need to repeat
  • Materializing a view from one dataset into another for downstream use

Standalone recipes don't get scheduled or auto-run. You'd run them manually or wire them into a flow later.

Flows that aren't scheduled

A flow doesn't have to run on a schedule. Many orgs have flows that are only run on-demand — for example, "rebuild the customer 360" flow that gets triggered by a click in the dashboard, or via API.

Schedule a flow only when there's a real cadence ("every morning at 6 AM the dashboards need fresh data"). Otherwise leave it manual; you'll have less to debug.

Common mistakes

  • One giant recipe doing five things — split it. Each recipe should do one transformation. Errors are easier to localize and intermediate outputs are inspectable.
  • A flow with one recipe in it — flows are orchestration. A flow with one step is just a recipe with extra ceremony. Either add steps or use the recipe directly.
  • Embedding business logic in the connector — connectors should bring data in raw. Transformations belong in recipes. Mixing them makes both harder to test.

See also