Skip to main content

Flow

The Flow is the canvas where Honeyframe shows the dependency graph of all datasets and recipes in a project. It's the same conceptual surface as Dataiku's Flow or Airflow's DAG view: nodes are datasets and recipes, edges are read/write relationships.

The Flow is the project's home page. Most operator workflows start there: pick the dataset you want to work on, see what feeds it, see what depends on it.

Node types

NodeVisualMeaning
DatasetSquareA queryable table or file. Sources, intermediate results, and final outputs all share the same node shape.
Visual recipeRound, blueprepare, join, group_by, stack — compiled to dbt.
SQL recipeRound, greenA free-form dbt SQL model.
Python recipeRound, yellowStandalone Python — opaque to dbt.
CDC / SyncRound, grayStandalone ingestion node.
Agent recipeRound, purpleembed, llm_enrich, agent, rag_search.
ZoneGroup containerLogical grouping of nodes — purely visual, no semantic effect.

Edges are drawn input → recipe → output. A recipe with multiple inputs (e.g. a join) has one edge per input.

Building a Flow

There are three ways nodes appear on the canvas:

  1. Connect a connector and import datasets. Each imported dataset shows up as a leaf node with no upstream.
  2. Add a recipe via the + New Recipe button on a dataset. The recipe and its output dataset both appear.
  3. Drag-and-drop a node from the Add Item panel. Useful for standalone Python / CDC / Sync nodes that aren't tied to a single upstream.

Drag a node onto a Zone to assign it. Zones don't constrain execution; they're purely organizational.

Running the Flow

Three execution surfaces:

  • Single-node run — Right-click a recipe → Run. Runs only that recipe.
  • Subgraph run — Right-click a dataset → Build → choose upstream, downstream, or full lineage. Runs all the recipes needed to bring that dataset up-to-date.
  • Scheduled run — On the Schedules page, attach a cron expression to a build target. The scheduler picks it up and runs it.

Run progress streams to the Flow canvas as colored borders on the running nodes (in-progress, success, failure). Click any node mid-run to see the live log.

Zones

Zones are purely visual. Use them to group:

  • All raw input datasets (e.g. "Sources").
  • A vertical pipeline (e.g. "Sales Reporting").
  • An experimental sandbox (e.g. "DRAFT — Q4 model").

A node can belong to one zone or none. Drag-and-drop assigns it; drag back to the canvas to unassign. Zones do not affect lineage, scheduling, or permissions.

The platform does not auto-create zones — they're added by users when the canvas grows.

Lineage and impact analysis

The Flow shows direct edges. For deeper analysis:

  • Lineage Explorer (Operations → Lineage) shows the full upstream/downstream subgraph for any node, with column-level granularity.
  • Asset References panel (sidebar on every dataset / dashboard / recipe edit page) lists all downstream consumers of the asset — answers "what breaks if I change this?" without leaving the editor.

Editing recipes from the Flow

Click any recipe node to open the recipe editor inline (the canvas slides aside). Save, run, and view results without leaving the Flow.

For SQL recipes, the editor opens with the compiled SQL preselected — Ctrl+S saves, Ctrl+Enter compiles. Visual recipes open the structured editor with all four Prepare tabs (or the join / group-by / stack equivalents).

API access

Programmatic Flow access is available via:

EndpointDescription
GET /api/projects/{id}/flowReturns the current node + edge state for a project.
POST /api/pipeline/runTriggers a build target — single recipe, subgraph, or full project.
GET /api/pipeline/runsLists past runs with status, duration, and log refs.
GET /api/lineage/{dataset_id}Returns the full lineage subgraph for a dataset, with column-level edges.

Authentication uses the same JWT token format as the rest of the API — see Authentication under the Developer property.

Performance characteristics

The Flow renders client-side using a fixed-coordinate layout (nodes have stable positions stored on each row in the DB, not auto-laid-out on each load). For projects with < 200 nodes, the canvas is interactive at 60 fps. Above ~500 nodes, layout becomes the bottleneck — collapse zones, or use the Search sidebar to jump directly to a node by name.

Common patterns

  • Bronze / Silver / Gold zones — three zones aligned with the Lakehouse data quality tiers. Raw ingest into Bronze, cleaned into Silver, modeled into Gold.
  • One vertical per Zone — for project-of-projects layouts, a Zone per business domain.
  • Sandbox zone — experimental nodes you don't want in the main lineage. Move them out of the sandbox when promoted.