Lewati ke konten utama

Connectors

A connector tells Honeyframe how to reach an external system — a database, an object store, a vector index, an LLM API, or a webhook target. Connectors are configured once at the organization level and reused across projects, datasets, and recipes.

The connector implementations live under paas/backend/connectors/. Each connector inherits from a base class that defines the interface the platform calls (test, read_schema, read_rows, write_rows, etc.). Adding a new connector is a matter of subclassing the right base and registering with the connector registry.

Connector families

Honeyframe ships with five connector families, each anchored on a base class.

Relational databases (sql_base.py)

Read and write to SQL databases. Used by datasets, recipes, and dbt integrations.

  • PostgreSQL (postgresql.py) — also the platform's own metadata store.
  • MySQL (mysql.py)
  • Microsoft SQL Server (mssql.py) — driver: pymssql.
  • Oracle (oracle.py) — driver: oracledb. Supports schema discovery and Explore against on-prem Oracle.
  • Snowflake (snowflake.py)
  • BigQuery (bigquery.py)
  • MongoDB (mongodb.py) — exposed through the SQL surface for read-only queries; full document operations require the dedicated MongoDB API.

Object storage (storage_base.py)

Read and write files (CSV, Parquet, JSON) on cloud or self-hosted object storage. Used by the Lakehouse layer and managed folders.

  • Amazon S3 (s3_storage.py)
  • Google Cloud Storage (gcs_storage.py)
  • Alibaba Cloud OSS (oss_storage.py) — must use virtual-hosted style; path-style is not honored. Access Points are VPC-only and do not work for cross-region clients.

Vector stores (vector_store.py)

Persistent vector indices for the Knowledge Base and RAG features.

  • ChromaDB (chroma_store.py) — embedded by default at $DATA_DIR/chroma/. Optional dependency installed via --install-plugins.
  • FAISS (faiss_store.py) — flat-file index. Faster for in-memory workloads, no server. Optional dependency.

LLM providers (llm_base.py)

Language model APIs. Used by Agent Builder, the SQL chat surface, and Knowledge Base retrieval.

  • OpenAI (openai_llm.py) — openai SDK; supports GPT-4, GPT-4o, GPT-4-turbo, embeddings.
  • Anthropic (anthropic_llm.py) — Claude models.
  • Ollama (ollama_llm.py) — self-hosted models. Configure with the URL of your Ollama server and a model name.

Other

  • REST API (rest_api.py) — generic HTTP connector with header / query / body templating. Useful when no first-class connector exists.
  • CSV upload (csv_upload.py) — accepts user-uploaded CSVs and writes them as managed datasets. Subject to client_max_body_size (default 200 MB).
  • Elasticsearch (elasticsearch.py) — full-text search and aggregations.
  • n8n webhook (n8n_webhook.py) — fires events to an n8n workflow.
  • Twilio messaging (twilio_messaging.py) — sends WhatsApp messages via the Twilio API. Used by the send_whatsapp agent tool.

Configuring a connector

In the platform UI:

  1. Open Connectors under the org switcher (org admins only).
  2. Click + New Connector and pick the family.
  3. Fill in the connection parameters. Sensitive fields (passwords, API keys) are encrypted at rest using the org's licensed sign key.
  4. Test verifies the connection without saving. Successful test results land in the test history pane.
  5. Save writes the connector to data_connectors. It is now selectable when creating datasets, recipes, or agent tools.

Programmatically:

curl -X POST https://platform.your-domain.com/api/connectors \
-H "Authorization: Bearer $TOKEN" \
-H "Content-Type: application/json" \
-d '{
"name": "Production Postgres",
"type": "postgresql",
"config": {
"host": "db.example.com",
"port": 5432,
"database": "analytics",
"username": "honeyframe_ro",
"password": "<redacted>",
"sslmode": "require"
}
}'

Permission model

Connectors are organization-level resources with no per-connector ACL. Anyone in the organization can see the list of connectors and reference them when building datasets. Sharing of the data flows through datasets — see Users & Groups — not connectors.

This intentionally mirrors the Dataiku model: the connector is a credential, the dataset is the unit of access control.

Adding a new connector type

  1. Create paas/backend/connectors/<name>.py subclassing the appropriate base (SQLBase, StorageBase, LLMBase, VectorStoreBase).
  2. Implement the required methods (test, read_schema, read_rows, etc. — see the existing connectors for examples).
  3. Register the class in paas/backend/connectors/registry.py.
  4. Add a frontend definition under paas/frontend/src/pages/ConnectorsPage.tsx — defines the form fields shown when creating a connector.
  5. Add a unit test covering test() and one read or write path.

The platform's connector framework does not require a server restart for new types — registry registration happens at import time.

Connection pooling

SQL connectors maintain per-process connection pools. The defaults are conservative (5 idle, 20 max) and tuned for the platform's mostly-read workload. Tune on a per-connector basis via the connector config:

{
"pool_size": 10,
"max_overflow": 30,
"pool_recycle": 1800
}

Object-storage and HTTP connectors use the underlying SDK's pooling — typically a per-thread session.