Reduce flow runtime
If a flow runs longer than you'd like, the cause is usually one of four things: too much data, redundant work, sequential steps that could parallelize, or a slow upstream connection. Walk through them in this order — fixes higher in the list have larger impact and are easier to apply.
1. Filter early
Push every WHERE clause as far upstream as possible. A flow that reads 10M rows from a connector and then filters down to 50K is doing 200× the work it needs to.
-- ✗ Bad: full scan, then filter
SELECT * FROM raw.events WHERE created_at > now() - interval '7 days';
-- ✓ Good: predicate pushdown into the connector
SELECT * FROM raw.events
WHERE created_at > now() - interval '7 days'
-- if the connector supports it, push this further into the source query
For partitioned datasets, name the partition column in the filter so the engine prunes partitions instead of scanning all of them.
2. Cache intermediate datasets
If two recipes both read from the same upstream and apply the same first transformation, that transformation runs twice per flow. Materialize it once.
In the flow editor: right-click the recipe output → Make persistent. Subsequent runs read from the cached table instead of recomputing.
3. Run independent steps in parallel
By default, the flow scheduler runs siblings concurrently up to a per-flow worker limit. Check the flow's Run settings → Max parallel workers. The default of 2 is conservative; raise it to 4 or 8 if you have CPU headroom and the steps don't fight over the same connection.
If two siblings read from the same database, parallelism may not help — the bottleneck is now the connection's query throughput, not your worker count.
4. Pick the right recipe type
Some recipes are 10× faster than others on the same data:
| Operation | Faster | Slower |
|---|---|---|
| Aggregation | SQL group-by recipe | Python df.groupby |
| Join | SQL join recipe | Python merge |
| Filter | Visual recipe (predicate pushdown) | Python row-level filter |
| Multi-step transformation | dbt model | Chained Python recipes |
Reach for Python only when the SQL/visual recipe genuinely cannot express the operation (regex munging, custom ML inference, calling an external API).
5. Reduce the working set
If the flow operates on a full historical dataset every run, consider:
- Incremental processing — only process rows added since the last run. See Incremental vs full refresh for when this is safe.
- Sampling — during development, run on a 1% sample. The flow's Run on sample toggle does this without code changes.
- Time-bounded windows — most analytics flows only need the last N days. Hardcode that window unless your dashboards explicitly need full history.
6. Profile, don't guess
Every flow run produces a per-step timing report under Activity → Run details. Open it. The slowest step is rarely the one you'd guess; spend optimization effort there, not on the steps that already run in seconds.
See also
- Flow stuck running — different problem, similar diagnostic surface
- Incremental vs full refresh — when to flip from "rebuild everything" to "process only new rows"
- Recipes vs Flows — clarifies which abstraction is the right place for a given operation