Motivation
MCP is an open JSON-RPC 2.0 spec that gives any LLM a universal “USB-C port” for tools, data, and actions — one schema, no custom glue. Anthropic open-sourced it in Nov 2024, and the rest of the stack stampeded in: OpenAI baked MCP into ChatGPT & the Agents SDK; Google DeepMind is wiring it into Gemini; Microsoft highlighted it during Build; dev–tool players like Replit, Zed, Sourcegraph, and Block already ship MCP servers.
Why REST doesn’t cut it for agents
- Auth done wrong – Users don’t want and shouldn’t share their API key with an LLM
- Too many calls – REST is designed for apps, so it’s extremely verbose and “atomic”
- IDs vs. context – agents don’t want
owner_id=42
, they want{id:42, username:"jdoe", email:"…"}
right away.
Superset + MCP → headless-BI 2.0
Superset is already headless; MCP lets any agent pick up the steering wheel.
The LLM can fetch context, build charts and dashboard, and generate a link to send the user straight into SQL Lab or Explore. That’s AI-augmented, headless BI.
How MCP Relates to the RAG-Focused SIPs
Different directions, different problems — both useful.
Dimension | RAG-centric SIPs | MCP SIP (this doc) |
---|---|---|
Call direction | Superset ➜ LLM Superset queries an external model for extra context or explanations. |
LLM ➜ Superset External agents call Superset to fetch assets or trigger actions. |
Primary benefit | Enriches the user’s experience inside Superset (semantic search, chart “explainers,” etc.). | Lets agents outside Superset automate everything users can do through the UI. |
Auth model | Superset authenticates to the model. | Agent authenticates to Superset, fully RBAC-aware. |
Granularity | Model returns unstructured answers. | Superset returns deterministic, typed objects and links. |
Dependency | Needs vector stores / LangChain wrappers inside Superset. | Needs an MCP, ASGI-compatible service exposed by Superset. |
These tracks don’t depend on each other, and neither blocks the other:
- Ship RAG features for smarter querying and insight within Superset.
- Ship MCP so any LLM agent can treat Superset as just another tool in a multi-app workflow.
Separate SIPs, separate code paths, complementary value. Feel free to pursue and ship either (or both) in any order.```
Proposed Change
Aspect | Detail |
---|---|
New ASGI service | ASGI compatible web server to serve fastmcp , likely uvicorn . |
Toggle | ENABLE_MCP_SERVICE = False (default). |
CLI | superset mcp run --port 5008 . |
Namespace | /api/mcp/v1/* (tag kept but less rigid than REST; see Versioning). |
Runtime | WSGI-Flask by default; ASGI wrapping possible via asgiref.wsgi.WsgiToAsgi . |
Hooks | auth_hook , impersonate , audit_log , rate_limit — no-ops in OSS, pluggable in Preset & enterprise. |
Code Reuse / DRY Strategy
- Single source of truth: Commands + DAOs encapsulate business rules. Similar REST / MCP endpoints compose the same set of commands + DAOs
- MCP and REST compose those objects; no logic duplication.
- Shared Marshmallow schemas reused directly or shallow-wrapped to add denormalised fields.
High-Level vs. Atomic
MCP tools are chunkier: one call, one meaningful action, denormalised payloads (e.g. owners:[{id, username, email}]
as opposed to owner_ids=[1,2,3]
) to spare agents extra look-ups.
As a general rule of thumb, we'll try design tools while aligning with "agent stories", the agent counterpart of "user stories". CRUD interface will be simplified with simpler, intuitive schemas, following some of the principles highlighted in https://www.jlowin.dev/blog/as-an-agent-the-new-user-story
Versioning Philosophy
LLMs parse tool schemas in-session. Non-destructive breaking tweaks (rename owner_ids
→owners
) don’t require heavy semver ceremony. We bump /v{n}
only for removals or semantic flips.
Initial Action Set (Phase 1)
Discovery → list_*
• Navigation → generate_explore_link
, open_sql_lab_with_context
• Mutations → generate_chart
, generate_dashboard
, add_chart_to_existing_dashboard
Deliverables
- CLI subcommand to serve FastMCP
- 3-5 tools with unit, integration, and perf smoke tests
- Minimal OpenAPI spec + auto-generated TS/Python client
- Error envelope
{ "error": { "code": "...", "message": "..." } }
- Demo notebook/script
New / Changed Public Interfaces
Interface | Addition |
---|---|
MCP | /api/mcp/* |
Config | ENABLE_MCP_SERVICE |
CLI | superset mcp run |
Python | Optional import fastmcp |
Phasing / Roll-out Plan
Phase | Goal | Outcome |
---|---|---|
1 – Proof of Concept | Skeleton + 3-5 tools | Live agent demo: list → chart → SQL Lab |
2 – Coverage Expansion | Broader tool library | > 80 % of daily actions scriptable |
3 – Production Hardening | Extract superset-core ; add robust auth/impersonation/logging |
GA under OIDC / Okta / Preset Cloud |
Longer-Term Package Topology
flowchart LR
core[superset-core]
superset-app --> core
superset-rest --> core
superset-ext --> core
superset-mcp --> core
Industry Context: Auth & Impersonation
Token exchange vs. signed-JWT vs. OAuth device-flow is still shaking out. Phase 1 ships hooks + tests; adapters drop when a clear winner emerges.
New Dependencies
- fastmcp – internal helper, MIT, no external deps. FastMCP is brand new, but extremely well validated as it has become wildly adopted, backed by Anthropic, and the reference implementation for MCP servers in other languages.
- Uvicorn – or similar server that can serve FastMCP/ASGI
Migration Plan & Compatibility
- Disabled by default → zero impact
- No DB migrations
- Future breaking changes gated behind
/v{n}
and announced ondev@
Rejected Alternatives
Alternative | Why Not |
---|---|
External REST bridge (superset-mcp PoC) |
Extra hop, latency, duplicated RBAC/validation, schema drift |
Immediate full superset-core extraction |
Multi-month refactor; slows PoC. Scheduled for Phase 3 |
Flask Blueprint | It would have been nice to serve /mcp out of the same Flask/Gunicorn server, but FastMCP is ASGI, not WSGI, and will require it's own process/service, likely served by uvicorn |
Embedded MCP provides speed now and maintainability later, complementing RAG efforts and keeping Superset at the center of AI-driven analytics.
Why Model Context Protocol (MCP) and Why Now?
The Model Context Protocol (MCP) is an open standard that lets large-language-model agents call tools—high-level, domain-specific actions—over a simple, schema-declared interface. Think of it as USB-C for AI: one plug that works across copilots (Claude, GitHub Copilot, Cursor, etc.) and related services (Postgres, GitHub, Slack, MotherDuck, Superset). The spec was open-sourced by Anthropic in late 2024 and has since been adopted or trialed by Microsoft, Hex, MotherDuck, Zed, Replit, Sourcegraph, Block, and others.:contentReference[oaicite:0]{index=0}
Why REST Isn’t Enough
REST was built for machine-to-machine plumbing, not for autonomous agents:
- API-keys & secrets Models shouldn’t see them. MCP sessions carry scoped credentials or use local sockets—no key leakage.
- Over-atomic verbs Listing a dashboard, grabbing its metadata, then building a chart is 3-5 calls in REST but one tool call in MCP.
- Hyper-verbose schemas REST spreads context across many endpoints; LLMs lose the thread. MCP bundles denormalised payloads that match the agent’s mental model.
Superset + MCP ⇒ Headless, AI-Ready BI
Superset already exposes a rich REST API, but agents must screenscrape or choreograph dozens of endpoints. Baking MCP into Superset means any copilot can treat Superset as a first-class tool in AI-driven workflows—perfect fit for our headless BI push.
What an LLM can do once MCP is live:
Action | Example prompt the agent can satisfy |
---|---|
Search assets | “List dashboards tagged revenue created in the last 30 days.” |
Spin up viz | “Create a bar chart showing ARR by region and add it to ‘Q3 Exec Dashboard’.” |
Jump to context | “Open SQL Lab on the raw_events dataset with a sample query.” |
Chat-in-context | “Why did MRR drop in EMEA last month? Show a quick breakdown.” |
Multi-app chains | Combine Superset → dbt → MotherDuck in one agent plan for root-cause analysis. |
In short, MCP lets any LLM do (almost) everything a human can in the UI, then hand the wheel back—unlocking AI-augmented analytics inside and around Superset without brittle glue code.
Comment From: betodealmeida
Love this!
One thing that I would like to see in Superset in general, not just for MCP, is the a ability to run blueprints either in-app or in a separate app. For example, if SQL Lab was written as a blueprint we could mount it in the main Superset app:
# superset/app.py
from superset.sqllab import blueprint as sqllab_bp
def create_app(
superset_config_module: Optional[str] = None,
superset_app_root: Optional[str] = None,
) -> Flask:
app = SupersetApp(__name__)
app.register_blueprint(sqllab_bp)
But if someone wants to scale horizontally they could run the SQL Lab blueprint app in a separate Flask app in a separate container. We could imagine something like this:
# superset_config.py
SQLLAB_ENDPOINT: str | None = "http://10.0.0.1:9000"
# superset/app.py
from superset.sqllab import blueprint as sqllab_bp
def create_app(
superset_config_module: Optional[str] = None,
superset_app_root: Optional[str] = None,
) -> Flask:
app = SupersetApp(__name__)
if not SQLLAB_ENDPOINT:
app.register_blueprint(sqllab_bp)
SQLLAB_ENDPOINT = url_for("sqllab_bp.index", _external=True)
Then we'd expose SQLLAB_ENDPOINT
to the frontend, for all SQL Lab API calls.
If we did this for MCP, users would have the option of running a single app, with the MCP blueprint mounted directly in the Superset app; or run a separate app for the MCP server via superset mcp run --port 5008
. The former is easier for beginners and for testing, while the latter is more robust and scalable.
Comment From: geido
Hey @mistercrunch this is great!
I think my only concern is about the Blueprint implementation. FastMCP is meant to be ASGI and using an adapter I don't think can be optional and if we do use an adapter that might introduce some complexity (that's what the official documentation suggests).
Comment From: mistercrunch
Ok, yeah I did some research just now and I think the right thing to do is to edit the SIP to remove the blueprint requirement. More isolation for distinct workloads isn't a bad thing, though would have been nice to be able to federate the service. I'm kind of jealous ASGI's awesomeness and lack of support for it in Flask ... Eventually could be nice to serve the API through FastAPI/asgi, but for now stripping the blueprint out of the SIP.
Comment From: mistercrunch
Updated, added Flask Blueprint in the "rejected alternatives" section with the related reason.
Comment From: vedantprajapati
Huge fan of this! As you said before, it would be amazing if this could work out of the box with the flag enabled.
Comment From: mistercrunch
Yes, unfortunately it appears we won't be able to run as a blueprint. We could wrap things in a subprocess for convenience, but this comes at a cost and is sub-optimal in many ways. Things like mixed logs output out of the process, can't stop/restart one without stopping the other, zombie processes, ...
Comment From: mistercrunch
Something we spoke about with committers is the idea of migrating from Flask to a flask-compatible/ASGI server (Quart
), but it's unclear how much lift that represents, especially given our foundation built on flaskappbuilder (FAB) and the fact that we pretty much hook up to the entire Flask ecosystem. I hear some flask extensions are not supported by Quart. This is a side project and outside the scope of this SIP, but it would be great if someone could do a "feasibility study" on a Quart migration. My guess is that it'd be a massive undertaking, and gains/value wouldn't match. Might become easier over time if the Flask ecosystem is moving in that direction.
Comment From: vedantprajapati
A seperate mcp server seems like the best option to work with the existing codebase. So long as authentication is taken care of correctly and agent is only able to retrieve data the user has permissions to. A whole migration to Quart
might come with a slurry of unintended issues that might pop up with time.
Something we spoke about with committers is the idea of migrating from Flask to a flask-compatible/ASGI server (
Quart
), but it's unclear how much lift that represents, especially given our foundation built on flaskappbuilder (FAB) and the fact that we pretty much hook up to the entire Flask ecosystem. I hear some flask extensions are not supported by Quart. This is a side project and outside the scope of this SIP, but it would be great if someone could do a "feasibility study" on a Quart migration. My guess is that it'd be a massive undertaking, and gains/value wouldn't match. Might become easier over time if the Flask ecosystem is moving in that direction.
Comment From: aminghadersohi
since we are running this as its own service, does having a feature flag make sense?
Comment From: mistercrunch
Feature flag not necessary as people will decide by firing up the CLI, let me scrub the SIP out of it.
Comment From: michael-s-molina
@mistercrunch Thanks for the SIP. It would be interesting to sync about this given that we have active efforts at Airbnb related to MCP in Superset. I'll tag @jamra and @zuzana-vej who are working on this.
This is also related to the Extensions work. MCP services will be used by Chat extensions in SQL Lab and other areas of the application.
VSCode has great guides on how they extend their product to tailor AI experiences that meet organization specific needs. Given that they already solved many of the problems on how to integrate these systems, and the fact that this becomes more complex when you consider multiple organizations and their unique needs, I highly recommend that we reuse their experience / work as inspiration to create our proposal.
@villebro and I discussed many times how beneficial it was to draw inspiration from VSCode for the Extensions work and how many hours of discussions and definitions were saved with the reuse of concepts and patterns.
Comment From: rusackas
@mistercrunch Time for a DISCUSS thread here?
Comment From: michael-s-molina
@mistercrunch Time for a DISCUSS thread here?
@rusackas @mistercrunch Let's schedule a meeting to sync about this before submitting to discussion.
Comment From: rusackas
We've had some syncs around this, so I'm curious what people's thoughts are about touching this up (if needed) and putting it up for a DISCUSS thread.