Recipes

A recipe transforms one or more input datasets into an output dataset. Honeyframe's recipe model splits transforms into two layers:

Visual recipes — a guided UI that captures the operator's intent (filter rows, join two tables, group and aggregate). The visual recipe compiles to dbt SQL behind the scenes.
Code recipes — direct SQL or Python that the user writes. The platform doesn't try to interpret the code; it just runs it.

Both produce dbt models, so the lineage graph and dbt's incremental-build machinery work identically across recipe types.

Recipe types

The platform recognizes the following types. The first six are visual; the rest are code/standalone.

Type	Layer	What it does
`prepare`	Visual	Column-level operations: rename, drop, cast, derive (formula), filter, window. The most common recipe — typically the first step on a raw dataset.
`join`	Visual	Join two datasets on one or more key pairs. Supports `inner`, `left`, `right`, `outer`.
`group_by`	Visual	Group by columns and aggregate (`count`, `sum`, `avg`, `min`, `max`, `median`).
`stack`	Visual	UNION two or more datasets with the same shape.
`sql`	Code	Free-form dbt SQL. The user writes a model body; the platform wraps it with the appropriate dbt config.
`python`	Code	Python script with access to a `dataiku`-style API for reading inputs and writing outputs. Standalone — does not become a dbt model.
`cdc`	Standalone	Change-data-capture from a source connector. Watermarks tracked in `pipeline_runs`.
`sync`	Standalone	One-way sync from one connector to another. Used for ingestion to the Lakehouse.
`extract_document`	Standalone	OCR / extraction from PDF or image inputs into structured rows.
`embed`	Agent	Embed text from a dataset into a vector store.
`llm_enrich`	Agent	Run an LLM prompt over each row of a dataset and write the response back.
`agent`	Agent	Multi-step agent run with tool calls.
`rag_search`	Agent	Retrieve from a knowledge base and answer a query.

The agent recipe types (embed, llm_enrich, agent, rag_search) form the AI surface. They share the same plumbing as visual/code recipes — outputs become datasets — but their inputs include a knowledge base or model selector instead of just upstream datasets.

Anatomy of a visual recipe

The Prepare recipe is the canonical visual example. It exposes four tabs:

Columns — select columns to keep, rename, cast types, drop.
Filter — boolean expressions on the result rows. Compiled to a WHERE clause.
Formula — derive new columns from existing ones. Honeyframe supports a small expression language (numeric ops, coalesce, case, date functions).
Window — windowed aggregates (ROW_NUMBER, LAG, LEAD, RANK) partitioned by columns.

When the user saves a Prepare recipe, the platform compiles each tab's intent to a dbt SQL model. The compiled SQL is visible in the recipe editor's SQL tab — useful for debugging or for promoting a visual recipe to a code recipe.

Code recipes

A SQL recipe lets the user write a dbt model body directly:

SELECT
  customer_id,
  COUNT(*)                AS order_count,
  SUM(amount)             AS revenue,
  MAX(created_at)         AS last_order_at
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

The {{ ref('...') }} macro resolves to the upstream dataset's compiled name. The recipe editor's input panel shows which datasets are referenced, so the dependency graph stays accurate.

A Python recipe runs as a standalone script under the platform's tenant workspace. The platform exposes a small recipe_runner API for reading inputs and writing the output dataset:

from honeyframe.recipe import inputs, outputs

df = inputs.read("upstream_dataset")        # pandas DataFrame
df["enriched"] = df["raw"].apply(my_func)
outputs.write("downstream_dataset", df)

Python recipes cannot be combined with dbt's lineage graph automatically — they're opaque to dbt — so they appear as standalone nodes in the Flow.

Running a recipe

Recipes don't run on save. They run when:

The user clicks Run in the recipe editor, OR
A scheduled job triggers them as part of a Flow run, OR
A downstream dataset is queried and Honeyframe detects a stale upstream and runs the recipe to refresh.

Run progress appears live in the recipe editor (pipeline_runs table polled every second). Run history is preserved indefinitely; click any past run to see the compiled SQL, parameters, and stdout.

Lineage

Every recipe registers an edge in the platform's column-level lineage graph. The graph is queryable via the Lineage Explorer (Flow → Lineage) and via /api/lineage. A column-level edge tracks which upstream column each downstream column derives from — invaluable for impact analysis when changing a source.

Migrating from `require_role` for recipes

Recipe-level access control follows the dataset model: anyone who can read the input datasets can read the recipe; modifying the recipe requires dataset.readwrite (or, in the legacy layer, the editor role) on the recipe's output dataset.

When the dataset permission migration completes (see Permissions Reference), recipes will gain object-level permissions matching their output dataset.

Recipe types​

Anatomy of a visual recipe​

Code recipes​

Running a recipe​

Lineage​

Migrating from require_role for recipes​