Skip to main content

Recipes

A recipe transforms one or more input datasets into an output dataset. Honeyframe's recipe model splits transforms into two layers:

  • Visual recipes — a guided UI that captures the operator's intent (filter rows, join two tables, group and aggregate). The visual recipe compiles to dbt SQL behind the scenes.
  • Code recipes — direct SQL or Python that the user writes. The platform doesn't try to interpret the code; it just runs it.

Both produce dbt models, so the lineage graph and dbt's incremental-build machinery work identically across recipe types.

Recipe types

The platform recognizes the following types. The first six are visual; the rest are code/standalone.

TypeLayerWhat it does
prepareVisualColumn-level operations: rename, drop, cast, derive (formula), filter, window. The most common recipe — typically the first step on a raw dataset.
joinVisualJoin two datasets on one or more key pairs. Supports inner, left, right, outer.
group_byVisualGroup by columns and aggregate (count, sum, avg, min, max, median).
stackVisualUNION two or more datasets with the same shape.
sqlCodeFree-form dbt SQL. The user writes a model body; the platform wraps it with the appropriate dbt config.
pythonCodePython script with access to a dataiku-style API for reading inputs and writing outputs. Standalone — does not become a dbt model.
cdcStandaloneChange-data-capture from a source connector. Watermarks tracked in pipeline_runs.
syncStandaloneOne-way sync from one connector to another. Used for ingestion to the Lakehouse.
extract_documentStandaloneOCR / extraction from PDF or image inputs into structured rows.
embedAgentEmbed text from a dataset into a vector store.
llm_enrichAgentRun an LLM prompt over each row of a dataset and write the response back.
agentAgentMulti-step agent run with tool calls.
rag_searchAgentRetrieve from a knowledge base and answer a query.

The agent recipe types (embed, llm_enrich, agent, rag_search) form the AI surface. They share the same plumbing as visual/code recipes — outputs become datasets — but their inputs include a knowledge base or model selector instead of just upstream datasets.

Anatomy of a visual recipe

The Prepare recipe is the canonical visual example. It exposes four tabs:

  • Columns — select columns to keep, rename, cast types, drop.
  • Filter — boolean expressions on the result rows. Compiled to a WHERE clause.
  • Formula — derive new columns from existing ones. Honeyframe supports a small expression language (numeric ops, coalesce, case, date functions).
  • Window — windowed aggregates (ROW_NUMBER, LAG, LEAD, RANK) partitioned by columns.

When the user saves a Prepare recipe, the platform compiles each tab's intent to a dbt SQL model. The compiled SQL is visible in the recipe editor's SQL tab — useful for debugging or for promoting a visual recipe to a code recipe.

Code recipes

A SQL recipe lets the user write a dbt model body directly:

SELECT
customer_id,
COUNT(*) AS order_count,
SUM(amount) AS revenue,
MAX(created_at) AS last_order_at
FROM {{ ref('stg_orders') }}
GROUP BY customer_id

The {{ ref('...') }} macro resolves to the upstream dataset's compiled name. The recipe editor's input panel shows which datasets are referenced, so the dependency graph stays accurate.

A Python recipe runs as a standalone script under the platform's tenant workspace. The platform exposes a small recipe_runner API for reading inputs and writing the output dataset:

from honeyframe.recipe import inputs, outputs

df = inputs.read("upstream_dataset") # pandas DataFrame
df["enriched"] = df["raw"].apply(my_func)
outputs.write("downstream_dataset", df)

Python recipes cannot be combined with dbt's lineage graph automatically — they're opaque to dbt — so they appear as standalone nodes in the Flow.

Running a recipe

Recipes don't run on save. They run when:

  • The user clicks Run in the recipe editor, OR
  • A scheduled job triggers them as part of a Flow run, OR
  • A downstream dataset is queried and Honeyframe detects a stale upstream and runs the recipe to refresh.

Run progress appears live in the recipe editor (pipeline_runs table polled every second). Run history is preserved indefinitely; click any past run to see the compiled SQL, parameters, and stdout.

Lineage

Every recipe registers an edge in the platform's column-level lineage graph. The graph is queryable via the Lineage Explorer (Flow → Lineage) and via /api/lineage. A column-level edge tracks which upstream column each downstream column derives from — invaluable for impact analysis when changing a source.

Migrating from require_role for recipes

Recipe-level access control follows the dataset model: anyone who can read the input datasets can read the recipe; modifying the recipe requires dataset.readwrite (or, in the legacy layer, the editor role) on the recipe's output dataset.

When the dataset permission migration completes (see Permissions Reference), recipes will gain object-level permissions matching their output dataset.