Motivation
MCP is an open JSON-RPC 2.0 spec that gives any LLM a universal “USB-C port” for tools, data, and actions — one schema, no custom glue. Anthropic open-sourced it in Nov 2024, and the rest of the stack stampeded in: OpenAI baked MCP into ChatGPT & the Agents SDK; Google DeepMind is wiring it into Gemini; Microsoft highlighted it during Build; dev–tool players like Replit, Zed, Sourcegraph, and Block already ship MCP servers.
Why REST doesn’t cut it for agents
- Auth done wrong – Users don’t want and shouldn’t share their API key with an LLM
- Too many calls – REST is designed for apps, so it’s extremely verbose and “atomic”
- IDs vs. context – agents don’t want
owner_id=42
, they want{id:42, username:"jdoe", email:"…"}
right away.
Superset + MCP → headless-BI 2.0
Superset is already headless; MCP lets any agent pick up the steering wheel.
The LLM can fetch context, build charts and dashboard, and generate a link to send the user straight into SQL Lab or Explore. That’s AI-augmented, headless BI.
How MCP Relates to the RAG-Focused SIPs
Different directions, different problems — both useful.
Dimension | RAG-centric SIPs | MCP SIP (this doc) |
---|---|---|
Call direction | Superset ➜ LLM Superset queries an external model for extra context or explanations. |
LLM ➜ Superset External agents call Superset to fetch assets or trigger actions. |
Primary benefit | Enriches the user’s experience inside Superset (semantic search, chart “explainers,” etc.). | Lets agents outside Superset automate everything users can do through the UI. |
Auth model | Superset authenticates to the model. | Agent authenticates to Superset, fully RBAC-aware. |
Granularity | Model returns unstructured answers. | Superset returns deterministic, typed objects and links. |
Dependency | Needs vector stores / LangChain wrappers inside Superset. | Needs an MCP blueprint exposed by Superset. |
These tracks don’t depend on each other, and neither blocks the other:
- Ship RAG features for smarter querying and insight within Superset.
- Ship MCP so any LLM agent can treat Superset as just another tool in a multi-app workflow.
Separate SIPs, separate code paths, complementary value. Feel free to pursue and ship either (or both) in any order.```
Proposed Change
Aspect | Detail |
---|---|
Blueprint | Opt-in Flask blueprint surfaced by helper pkg fastmcp (MIT, ≤300 LOC). |
Toggle | ENABLE_MCP_SERVICE = False (default). |
CLI | superset mcp run --port 5008 . |
Namespace | /api/mcp/v1/* (tag kept but less rigid than REST; see Versioning). |
Runtime | WSGI-Flask by default; ASGI wrapping possible via asgiref.wsgi.WsgiToAsgi . |
Hooks | auth_hook , impersonate , audit_log , rate_limit — no-ops in OSS, pluggable in Preset & enterprise. |
Code Reuse / DRY Strategy
- Single source of truth: Commands + DAOs encapsulate business rules. Similar REST / MCP endpoints compose the same set of commands + DAOs
- MCP and REST compose those objects; no logic duplication.
- Shared Marshmallow schemas reused directly or shallow-wrapped to add denormalised fields.
High-Level vs. Atomic
MCP tools are chunkier: one call, one meaningful action, denormalised payloads (e.g. owners:[{id, username, email}]
as opposed to owner_ids=[1,2,3]
) to spare agents extra look-ups.
As a general rule of thumb, we'll try design tools while aligning with "agent stories", the agent counterpart of "user stories". CRUD interface will be simplified with simpler, intuitive schemas, following some of the principles highlighted in https://www.jlowin.dev/blog/as-an-agent-the-new-user-story
Versioning Philosophy
LLMs parse tool schemas in-session. Non-destructive breaking tweaks (rename owner_ids
→owners
) don’t require heavy semver ceremony. We bump /v{n}
only for removals or semantic flips.
Initial Action Set (Phase 1)
Discovery → list_*
• Navigation → generate_explore_link
, open_sql_lab_with_context
• Mutations → generate_chart
, generate_dashboard
, add_chart_to_existing_dashboard
Deliverables
- Blueprint + flag + CLI
- 3-5 tools with unit, integration, and perf smoke tests
- Minimal OpenAPI spec + auto-generated TS/Python client
- Error envelope
{ "error": { "code": "...", "message": "..." } }
- Demo notebook/script
New / Changed Public Interfaces
Interface | Addition |
---|---|
MCP | /api/mcp/* |
Config | ENABLE_MCP_SERVICE |
CLI | superset mcp run |
Python | Optional import fastmcp |
Phasing / Roll-out Plan
Phase | Goal | Outcome |
---|---|---|
1 – Proof of Concept | Skeleton + 3-5 tools | Live agent demo: list → chart → SQL Lab |
2 – Coverage Expansion | Broader tool library | > 80 % of daily actions scriptable |
3 – Production Hardening | Extract superset-core ; add robust auth/impersonation/logging |
GA under OIDC / Okta / Preset Cloud |
Longer-Term Package Topology
flowchart LR
core[superset-core]
superset-app --> core
superset-rest --> core
superset-ext --> core
superset-mcp --> core
Industry Context: Auth & Impersonation
Token exchange vs. signed-JWT vs. OAuth device-flow is still shaking out. Phase 1 ships hooks + tests; adapters drop when a clear winner emerges.
New Dependencies
- fastmcp – internal helper, MIT, no external deps. FastMCP is brand new, but extremely well validated as it has become wildly adopted, backed by Anthropic, and the reference implementation for MCP servers in other languages.
- asgiref – optional (Apache-2) for ASGI wrapping
Migration Plan & Compatibility
- Disabled by default → zero impact
- No DB migrations
- Future breaking changes gated behind
/v{n}
and announced ondev@
Rejected Alternatives
Alternative | Why Not |
---|---|
External REST bridge (superset-mcp PoC) |
Extra hop, latency, duplicated RBAC/validation, schema drift |
Immediate full superset-core extraction |
Multi-month refactor; slows PoC. Scheduled for Phase 3 |
Embedded MCP provides speed now and maintainability later, complementing RAG efforts and keeping Superset at the center of AI-driven analytics.
Why Model Context Protocol (MCP) and Why Now?
The Model Context Protocol (MCP) is an open standard that lets large-language-model agents call tools—high-level, domain-specific actions—over a simple, schema-declared interface. Think of it as USB-C for AI: one plug that works across copilots (Claude, GitHub Copilot, Cursor, etc.) and related services (Postgres, GitHub, Slack, MotherDuck, Superset). The spec was open-sourced by Anthropic in late 2024 and has since been adopted or trialed by Microsoft, Hex, MotherDuck, Zed, Replit, Sourcegraph, Block, and others.:contentReference[oaicite:0]{index=0}
Why REST Isn’t Enough
REST was built for machine-to-machine plumbing, not for autonomous agents:
- API-keys & secrets Models shouldn’t see them. MCP sessions carry scoped credentials or use local sockets—no key leakage.
- Over-atomic verbs Listing a dashboard, grabbing its metadata, then building a chart is 3-5 calls in REST but one tool call in MCP.
- Hyper-verbose schemas REST spreads context across many endpoints; LLMs lose the thread. MCP bundles denormalised payloads that match the agent’s mental model.
Superset + MCP ⇒ Headless, AI-Ready BI
Superset already exposes a rich REST API, but agents must screenscrape or choreograph dozens of endpoints. Baking MCP into Superset means any copilot can treat Superset as a first-class tool in AI-driven workflows—perfect fit for our headless BI push.
What an LLM can do once MCP is live:
Action | Example prompt the agent can satisfy |
---|---|
Search assets | “List dashboards tagged revenue created in the last 30 days.” |
Spin up viz | “Create a bar chart showing ARR by region and add it to ‘Q3 Exec Dashboard’.” |
Jump to context | “Open SQL Lab on the raw_events dataset with a sample query.” |
Chat-in-context | “Why did MRR drop in EMEA last month? Show a quick breakdown.” |
Multi-app chains | Combine Superset → dbt → MotherDuck in one agent plan for root-cause analysis. |
In short, MCP lets any LLM do (almost) everything a human can in the UI, then hand the wheel back—unlocking AI-augmented analytics inside and around Superset without brittle glue code.