Motivation

MCP is an open JSON-RPC 2.0 spec that gives any LLM a universal “USB-C port” for tools, data, and actions — one schema, no custom glue. Anthropic open-sourced it in Nov 2024, and the rest of the stack stampeded in: OpenAI baked MCP into ChatGPT & the Agents SDK; Google DeepMind is wiring it into Gemini; Microsoft highlighted it during Build; dev–tool players like Replit, Zed, Sourcegraph, and Block already ship MCP servers.

Why REST doesn’t cut it for agents

  • Auth done wrong – Users don’t want and shouldn’t share their API key with an LLM
  • Too many calls – REST is designed for apps, so it’s extremely verbose and “atomic”
  • IDs vs. context – agents don’t want owner_id=42, they want {id:42, username:"jdoe", email:"…"} right away.

Superset + MCP → headless-BI 2.0

Superset is already headless; MCP lets any agent pick up the steering wheel.

The LLM can fetch context, build charts and dashboard, and generate a link to send the user straight into SQL Lab or Explore. That’s AI-augmented, headless BI.

How MCP Relates to the RAG-Focused SIPs

Different directions, different problems — both useful.

Dimension RAG-centric SIPs MCP SIP (this doc)
Call direction Superset ➜ LLM
Superset queries an external model for extra context or explanations.
LLM ➜ Superset
External agents call Superset to fetch assets or trigger actions.
Primary benefit Enriches the user’s experience inside Superset (semantic search, chart “explainers,” etc.). Lets agents outside Superset automate everything users can do through the UI.
Auth model Superset authenticates to the model. Agent authenticates to Superset, fully RBAC-aware.
Granularity Model returns unstructured answers. Superset returns deterministic, typed objects and links.
Dependency Needs vector stores / LangChain wrappers inside Superset. Needs an MCP blueprint exposed by Superset.

These tracks don’t depend on each other, and neither blocks the other:

  • Ship RAG features for smarter querying and insight within Superset.
  • Ship MCP so any LLM agent can treat Superset as just another tool in a multi-app workflow.

Separate SIPs, separate code paths, complementary value. Feel free to pursue and ship either (or both) in any order.```

Proposed Change

Aspect Detail
Blueprint Opt-in Flask blueprint surfaced by helper pkg fastmcp (MIT, ≤300 LOC).
Toggle ENABLE_MCP_SERVICE = False (default).
CLI superset mcp run --port 5008.
Namespace /api/mcp/v1/* (tag kept but less rigid than REST; see Versioning).
Runtime WSGI-Flask by default; ASGI wrapping possible via asgiref.wsgi.WsgiToAsgi.
Hooks auth_hook, impersonate, audit_log, rate_limit — no-ops in OSS, pluggable in Preset & enterprise.

Code Reuse / DRY Strategy

  • Single source of truth: Commands + DAOs encapsulate business rules. Similar REST / MCP endpoints compose the same set of commands + DAOs
  • MCP and REST compose those objects; no logic duplication.
  • Shared Marshmallow schemas reused directly or shallow-wrapped to add denormalised fields.

High-Level vs. Atomic

MCP tools are chunkier: one call, one meaningful action, denormalised payloads (e.g. owners:[{id, username, email}] as opposed to owner_ids=[1,2,3]) to spare agents extra look-ups.

As a general rule of thumb, we'll try design tools while aligning with "agent stories", the agent counterpart of "user stories". CRUD interface will be simplified with simpler, intuitive schemas, following some of the principles highlighted in https://www.jlowin.dev/blog/as-an-agent-the-new-user-story

Versioning Philosophy

LLMs parse tool schemas in-session. Non-destructive breaking tweaks (rename owner_idsowners) don’t require heavy semver ceremony. We bump /v{n} only for removals or semantic flips.

Initial Action Set (Phase 1)

Discoverylist_*Navigationgenerate_explore_link, open_sql_lab_with_contextMutationsgenerate_chart, generate_dashboard, add_chart_to_existing_dashboard

Deliverables

  • Blueprint + flag + CLI
  • 3-5 tools with unit, integration, and perf smoke tests
  • Minimal OpenAPI spec + auto-generated TS/Python client
  • Error envelope { "error": { "code": "...", "message": "..." } }
  • Demo notebook/script

New / Changed Public Interfaces

Interface Addition
MCP /api/mcp/*
Config ENABLE_MCP_SERVICE
CLI superset mcp run
Python Optional import fastmcp

Phasing / Roll-out Plan

Phase Goal Outcome
1 – Proof of Concept Skeleton + 3-5 tools Live agent demo: list → chart → SQL Lab
2 – Coverage Expansion Broader tool library > 80 % of daily actions scriptable
3 – Production Hardening Extract superset-core; add robust auth/impersonation/logging GA under OIDC / Okta / Preset Cloud

Longer-Term Package Topology

flowchart LR
    core[superset-core]

    superset-app --> core
    superset-rest --> core
    superset-ext --> core
    superset-mcp --> core

Industry Context: Auth & Impersonation

Token exchange vs. signed-JWT vs. OAuth device-flow is still shaking out. Phase 1 ships hooks + tests; adapters drop when a clear winner emerges.

New Dependencies

  • fastmcp – internal helper, MIT, no external deps. FastMCP is brand new, but extremely well validated as it has become wildly adopted, backed by Anthropic, and the reference implementation for MCP servers in other languages.
  • asgiref – optional (Apache-2) for ASGI wrapping

Migration Plan & Compatibility

  • Disabled by default → zero impact
  • No DB migrations
  • Future breaking changes gated behind /v{n} and announced on dev@

Rejected Alternatives

Alternative Why Not
External REST bridge (superset-mcp PoC) Extra hop, latency, duplicated RBAC/validation, schema drift
Immediate full superset-core extraction Multi-month refactor; slows PoC. Scheduled for Phase 3

Embedded MCP provides speed now and maintainability later, complementing RAG efforts and keeping Superset at the center of AI-driven analytics.

Why Model Context Protocol (MCP) and Why Now?

The Model Context Protocol (MCP) is an open standard that lets large-language-model agents call tools—high-level, domain-specific actions—over a simple, schema-declared interface. Think of it as USB-C for AI: one plug that works across copilots (Claude, GitHub Copilot, Cursor, etc.) and related services (Postgres, GitHub, Slack, MotherDuck, Superset). The spec was open-sourced by Anthropic in late 2024 and has since been adopted or trialed by Microsoft, Hex, MotherDuck, Zed, Replit, Sourcegraph, Block, and others.:contentReference[oaicite:0]{index=0}

Why REST Isn’t Enough

REST was built for machine-to-machine plumbing, not for autonomous agents:

  • API-keys & secrets Models shouldn’t see them. MCP sessions carry scoped credentials or use local sockets—no key leakage.
  • Over-atomic verbs Listing a dashboard, grabbing its metadata, then building a chart is 3-5 calls in REST but one tool call in MCP.
  • Hyper-verbose schemas REST spreads context across many endpoints; LLMs lose the thread. MCP bundles denormalised payloads that match the agent’s mental model.

Superset + MCP ⇒ Headless, AI-Ready BI

Superset already exposes a rich REST API, but agents must screenscrape or choreograph dozens of endpoints. Baking MCP into Superset means any copilot can treat Superset as a first-class tool in AI-driven workflows—perfect fit for our headless BI push.

What an LLM can do once MCP is live:

Action Example prompt the agent can satisfy
Search assets “List dashboards tagged revenue created in the last 30 days.”
Spin up viz “Create a bar chart showing ARR by region and add it to ‘Q3 Exec Dashboard’.”
Jump to context “Open SQL Lab on the raw_events dataset with a sample query.”
Chat-in-context “Why did MRR drop in EMEA last month? Show a quick breakdown.”
Multi-app chains Combine Superset → dbt → MotherDuck in one agent plan for root-cause analysis.

In short, MCP lets any LLM do (almost) everything a human can in the UI, then hand the wheel back—unlocking AI-augmented analytics inside and around Superset without brittle glue code.