Apache Superset [SIP-171] Proposal for Model Context Protocol (MCP) Service

Motivation

MCP is an open JSON-RPC 2.0 spec that gives any LLM a universal “USB-C port” for tools, data, and actions — one schema, no custom glue. Anthropic open-sourced it in Nov 2024, and the rest of the stack stampeded in: OpenAI baked MCP into ChatGPT & the Agents SDK; Google DeepMind is wiring it into Gemini; Microsoft highlighted it during Build; dev–tool players like Replit, Zed, Sourcegraph, and Block already ship MCP servers.

Why REST doesn’t cut it for agents

Auth done wrong – Users don’t want and shouldn’t share their API key with an LLM
Too many calls – REST is designed for apps, so it’s extremely verbose and “atomic”
IDs vs. context – agents don’t want owner_id=42, they want {id:42, username:"jdoe", email:"…"} right away.

Superset + MCP → headless-BI 2.0

Superset is already headless; MCP lets any agent pick up the steering wheel.

The LLM can fetch context, build charts and dashboard, and generate a link to send the user straight into SQL Lab or Explore. That’s AI-augmented, headless BI.

How MCP Relates to the RAG-Focused SIPs

Different directions, different problems — both useful.

Dimension	RAG-centric SIPs	MCP SIP (this doc)
Call direction	Superset ➜ LLM Superset queries an external model for extra context or explanations.	LLM ➜ Superset External agents call Superset to fetch assets or trigger actions.
Primary benefit	Enriches the user’s experience inside Superset (semantic search, chart “explainers,” etc.).	Lets agents outside Superset automate everything users can do through the UI.
Auth model	Superset authenticates to the model.	Agent authenticates to Superset, fully RBAC-aware.
Granularity	Model returns unstructured answers.	Superset returns deterministic, typed objects and links.
Dependency	Needs vector stores / LangChain wrappers inside Superset.	Needs an MCP, ASGI-compatible service exposed by Superset.

These tracks don’t depend on each other, and neither blocks the other:

Ship RAG features for smarter querying and insight within Superset.
Ship MCP so any LLM agent can treat Superset as just another tool in a multi-app workflow.

Separate SIPs, separate code paths, complementary value. Feel free to pursue and ship either (or both) in any order.```

Proposed Change

Aspect	Detail
New ASGI service	ASGI compatible web server to serve `fastmcp`, likely `uvicorn` .
Toggle	`ENABLE_MCP_SERVICE = False` (default).
CLI	`superset mcp run --port 5008`.
Namespace	`/api/mcp/v1/` (tag kept but less rigid than REST; see Versioning*).
Runtime	WSGI-Flask by default; ASGI wrapping possible via `asgiref.wsgi.WsgiToAsgi`.
Hooks	`auth_hook`, `impersonate`, `audit_log`, `rate_limit` — no-ops in OSS, pluggable in Preset & enterprise.

Code Reuse / DRY Strategy

Single source of truth: Commands + DAOs encapsulate business rules. Similar REST / MCP endpoints compose the same set of commands + DAOs
MCP and REST compose those objects; no logic duplication.
Shared Marshmallow schemas reused directly or shallow-wrapped to add denormalised fields.

High-Level vs. Atomic

MCP tools are chunkier: one call, one meaningful action, denormalised payloads (e.g. owners:[{id, username, email}] as opposed to owner_ids=[1,2,3]) to spare agents extra look-ups.

As a general rule of thumb, we'll try design tools while aligning with "agent stories", the agent counterpart of "user stories". CRUD interface will be simplified with simpler, intuitive schemas, following some of the principles highlighted in https://www.jlowin.dev/blog/as-an-agent-the-new-user-story

Versioning Philosophy

LLMs parse tool schemas in-session. Non-destructive breaking tweaks (rename owner_ids→owners) don’t require heavy semver ceremony. We bump /v{n} only for removals or semantic flips.

Initial Action Set (Phase 1)

Discovery → list_* • Navigation → generate_explore_link, open_sql_lab_with_context • Mutations → generate_chart, generate_dashboard, add_chart_to_existing_dashboard

Deliverables

CLI subcommand to serve FastMCP
3-5 tools with unit, integration, and perf smoke tests
Minimal OpenAPI spec + auto-generated TS/Python client
Error envelope { "error": { "code": "...", "message": "..." } }
Demo notebook/script

New / Changed Public Interfaces

Interface	Addition
MCP	`/api/mcp/*`
Config	`ENABLE_MCP_SERVICE`
CLI	`superset mcp run`
Python	Optional `import fastmcp`

Phasing / Roll-out Plan

Phase	Goal	Outcome
1 – Proof of Concept	Skeleton + 3-5 tools	Live agent demo: list → chart → SQL Lab
2 – Coverage Expansion	Broader tool library	> 80 % of daily actions scriptable
3 – Production Hardening	Extract `superset-core`; add robust auth/impersonation/logging	GA under OIDC / Okta / Preset Cloud

Longer-Term Package Topology

flowchart LR
    core[superset-core]

    superset-app --> core
    superset-rest --> core
    superset-ext --> core
    superset-mcp --> core

Industry Context: Auth & Impersonation

Token exchange vs. signed-JWT vs. OAuth device-flow is still shaking out. Phase 1 ships hooks + tests; adapters drop when a clear winner emerges.

New Dependencies

fastmcp – internal helper, MIT, no external deps. FastMCP is brand new, but extremely well validated as it has become wildly adopted, backed by Anthropic, and the reference implementation for MCP servers in other languages.
Uvicorn – or similar server that can serve FastMCP/ASGI

Migration Plan & Compatibility

Disabled by default → zero impact
No DB migrations
Future breaking changes gated behind /v{n} and announced on dev@

Rejected Alternatives

Alternative	Why Not
External REST bridge (`superset-mcp` PoC)	Extra hop, latency, duplicated RBAC/validation, schema drift
Immediate full `superset-core` extraction	Multi-month refactor; slows PoC. Scheduled for Phase 3
Flask Blueprint	It would have been nice to serve `/mcp` out of the same Flask/Gunicorn server, but `FastMCP` is ASGI, not WSGI, and will require it's own process/service, likely served by `uvicorn`

Embedded MCP provides speed now and maintainability later, complementing RAG efforts and keeping Superset at the center of AI-driven analytics.

Why Model Context Protocol (MCP) and Why Now?

The Model Context Protocol (MCP) is an open standard that lets large-language-model agents call tools—high-level, domain-specific actions—over a simple, schema-declared interface. Think of it as USB-C for AI: one plug that works across copilots (Claude, GitHub Copilot, Cursor, etc.) and related services (Postgres, GitHub, Slack, MotherDuck, Superset). The spec was open-sourced by Anthropic in late 2024 and has since been adopted or trialed by Microsoft, Hex, MotherDuck, Zed, Replit, Sourcegraph, Block, and others.:contentReference[oaicite:0]{index=0}

Why REST Isn’t Enough

REST was built for machine-to-machine plumbing, not for autonomous agents:

API-keys & secrets Models shouldn’t see them. MCP sessions carry scoped credentials or use local sockets—no key leakage.
Over-atomic verbs Listing a dashboard, grabbing its metadata, then building a chart is 3-5 calls in REST but one tool call in MCP.
Hyper-verbose schemas REST spreads context across many endpoints; LLMs lose the thread. MCP bundles denormalised payloads that match the agent’s mental model.

Superset + MCP ⇒ Headless, AI-Ready BI

Superset already exposes a rich REST API, but agents must screenscrape or choreograph dozens of endpoints. Baking MCP into Superset means any copilot can treat Superset as a first-class tool in AI-driven workflows—perfect fit for our headless BI push.

What an LLM can do once MCP is live:

Action	Example prompt the agent can satisfy
Search assets	“List dashboards tagged revenue created in the last 30 days.”
Spin up viz	“Create a bar chart showing ARR by region and add it to ‘Q3 Exec Dashboard’.”
Jump to context	“Open SQL Lab on the `raw_events` dataset with a sample query.”
Chat-in-context	“Why did MRR drop in EMEA last month? Show a quick breakdown.”
Multi-app chains	Combine Superset → dbt → MotherDuck in one agent plan for root-cause analysis.

In short, MCP lets any LLM do (almost) everything a human can in the UI, then hand the wheel back—unlocking AI-augmented analytics inside and around Superset without brittle glue code.

Comment From: betodealmeida

Love this!

One thing that I would like to see in Superset in general, not just for MCP, is the a ability to run blueprints either in-app or in a separate app. For example, if SQL Lab was written as a blueprint we could mount it in the main Superset app:

# superset/app.py
from superset.sqllab import blueprint as sqllab_bp

def create_app(
    superset_config_module: Optional[str] = None,
    superset_app_root: Optional[str] = None,
) -> Flask:
    app = SupersetApp(__name__)

    app.register_blueprint(sqllab_bp)

But if someone wants to scale horizontally they could run the SQL Lab blueprint app in a separate Flask app in a separate container. We could imagine something like this:

# superset_config.py
SQLLAB_ENDPOINT: str | None = "http://10.0.0.1:9000"

# superset/app.py
from superset.sqllab import blueprint as sqllab_bp

def create_app(
    superset_config_module: Optional[str] = None,
    superset_app_root: Optional[str] = None,
) -> Flask:
    app = SupersetApp(__name__)

    if not SQLLAB_ENDPOINT:
        app.register_blueprint(sqllab_bp)
        SQLLAB_ENDPOINT = url_for("sqllab_bp.index", _external=True)

Then we'd expose SQLLAB_ENDPOINT to the frontend, for all SQL Lab API calls.

If we did this for MCP, users would have the option of running a single app, with the MCP blueprint mounted directly in the Superset app; or run a separate app for the MCP server via superset mcp run --port 5008. The former is easier for beginners and for testing, while the latter is more robust and scalable.

Comment From: geido

Hey @mistercrunch this is great!

I think my only concern is about the Blueprint implementation. FastMCP is meant to be ASGI and using an adapter I don't think can be optional and if we do use an adapter that might introduce some complexity (that's what the official documentation suggests).

Comment From: mistercrunch

Ok, yeah I did some research just now and I think the right thing to do is to edit the SIP to remove the blueprint requirement. More isolation for distinct workloads isn't a bad thing, though would have been nice to be able to federate the service. I'm kind of jealous ASGI's awesomeness and lack of support for it in Flask ... Eventually could be nice to serve the API through FastAPI/asgi, but for now stripping the blueprint out of the SIP.

Comment From: mistercrunch

Updated, added Flask Blueprint in the "rejected alternatives" section with the related reason.

Comment From: vedantprajapati

Huge fan of this! As you said before, it would be amazing if this could work out of the box with the flag enabled.

Comment From: mistercrunch

Yes, unfortunately it appears we won't be able to run as a blueprint. We could wrap things in a subprocess for convenience, but this comes at a cost and is sub-optimal in many ways. Things like mixed logs output out of the process, can't stop/restart one without stopping the other, zombie processes, ...

Comment From: mistercrunch

Something we spoke about with committers is the idea of migrating from Flask to a flask-compatible/ASGI server (Quart), but it's unclear how much lift that represents, especially given our foundation built on flaskappbuilder (FAB) and the fact that we pretty much hook up to the entire Flask ecosystem. I hear some flask extensions are not supported by Quart. This is a side project and outside the scope of this SIP, but it would be great if someone could do a "feasibility study" on a Quart migration. My guess is that it'd be a massive undertaking, and gains/value wouldn't match. Might become easier over time if the Flask ecosystem is moving in that direction.

Comment From: vedantprajapati

A seperate mcp server seems like the best option to work with the existing codebase. So long as authentication is taken care of correctly and agent is only able to retrieve data the user has permissions to. A whole migration to Quart might come with a slurry of unintended issues that might pop up with time.

Something we spoke about with committers is the idea of migrating from Flask to a flask-compatible/ASGI server (Quart), but it's unclear how much lift that represents, especially given our foundation built on flaskappbuilder (FAB) and the fact that we pretty much hook up to the entire Flask ecosystem. I hear some flask extensions are not supported by Quart. This is a side project and outside the scope of this SIP, but it would be great if someone could do a "feasibility study" on a Quart migration. My guess is that it'd be a massive undertaking, and gains/value wouldn't match. Might become easier over time if the Flask ecosystem is moving in that direction.

Comment From: aminghadersohi

since we are running this as its own service, does having a feature flag make sense?

Comment From: mistercrunch

Feature flag not necessary as people will decide by firing up the CLI, let me scrub the SIP out of it.

Comment From: michael-s-molina

@mistercrunch Thanks for the SIP. It would be interesting to sync about this given that we have active efforts at Airbnb related to MCP in Superset. I'll tag @jamra and @zuzana-vej who are working on this.

This is also related to the Extensions work. MCP services will be used by Chat extensions in SQL Lab and other areas of the application.

VSCode has great guides on how they extend their product to tailor AI experiences that meet organization specific needs. Given that they already solved many of the problems on how to integrate these systems, and the fact that this becomes more complex when you consider multiple organizations and their unique needs, I highly recommend that we reuse their experience / work as inspiration to create our proposal.

@villebro and I discussed many times how beneficial it was to draw inspiration from VSCode for the Extensions work and how many hours of discussions and definitions were saved with the reuse of concepts and patterns.

Comment From: rusackas

@mistercrunch Time for a DISCUSS thread here?

Comment From: michael-s-molina

@mistercrunch Time for a DISCUSS thread here?

@rusackas @mistercrunch Let's schedule a meeting to sync about this before submitting to discussion.

Comment From: rusackas

We've had some syncs around this, so I'm curious what people's thoughts are about touching this up (if needed) and putting it up for a DISCUSS thread.