[SIP-166] Proposal for AI Assistant

Motivation

An accurate text-to-SQL translator (AI Assistant) can greatly enhance the SQLLab user experience by increasing productivity, supporting users with limited SQL knowledge, and making it easier to discover and access data in SQLLab.

Proposed Change

We propose implementing a text-to-SQL translator that is intentionally simple — avoiding the use of RAG, vector databases, or agentic LLM frameworks. This approach is designed to maximize compatibility across diverse database types and sizes, provide flexible configuration options, and leverage user-supplied context filtering when available. The system is built to handle scenarios with limited support gracefully, ensuring robust operation even when some functionality is unavailable.

We believe that by intentionally keeping this solution simple and avoiding complex dependencies, it will be easier for the community to reach consensus and approve its inclusion. This practical and accessible first implementation of the AI Assistant is designed to accelerate its adoption and help it materialize sooner as an official Superset OSS release.

The AI Assistant was developed in alignment with the guiding principles outlined above, within a dedicated fork of the Superset repository, based on the 5.0.0rc3 tag. For a comprehensive overview of its features and configuration, refer to the AI Assistant documentation.

New or Changed Public Interfaces

  • React Components:

    • AI Assistant Editor: Introduced in SQLLab as a text input bar for interacting with the AI Assistant.
    • Table Selector: Enhanced to allow multi-selection of schemas.
    • SQL Editor: Updated to support schema multi-selection.
    • AI Assistant Options: Added as a tab in the Database modal for configuring AI Assistant settings per database.
    • Table View: Added a SQL comment icon next to each column name.
  • REST Endpoints:

    • sqllab/generate_db_context: Initiates a rebuild of the database metadata LLM context.
    • sqllab/generate_sql: Sends user prompts to the LLM provider to generate SQL queries.
    • sqllab/db_context_status: Retrieves the status of the database metadata context and the context builder worker.
    • database/{db_id}/schema_tables: Returns all schemas and tables for a specified database.
  • Dashboards or Visualizations:
    No changes.

  • Superset CLI:
    No changes.

  • Deployment:
    No changes.

To simplify the setup of a custom Docker Compose deployment (e.g. deploying this fork), we have provided a shell script and configuration files. Detailed instructions and resources can be found here.

New dependencies

The new dependencies introduced are primarily related to integration with supported LLM API providers and data structure validation for building the database metadata context JSON file:

  • google-genai: Python SDK for Google Generative AI.
  • openai: Python SDK for OpenAI models.
  • anthropic: Python SDK for Anthropic models.
  • pydantic: Used for robust data validation and serialization.

These dependencies are required to enable AI Assistant functionality and ensure reliable handling of LLM-related data.

Migration Plan and Compatibility

Since these are additive changes, migration should be straightforward.

Changes to metadata database tables: - llm_connection: New table. - llm_context_options: New table. - context_builder_task: New table.

No breaking changes are expected, and existing deployments can be upgraded without data loss. Standard database migration procedures apply.

Comment From: rusackas

I think this is fantastic, but I think this has a direct corrolation with how we plan to build extensions as part of SIP-151. We plan to build something like VS Code does (see docs) so that any/all extensions can interact natively with the host app (Superset)'s configured LLM(s).

All of this is still being sorted out, so I wouldn't recommend voting on this, until it fits into that other plan.

Comment From: diegoscarabelli

I think this is fantastic, but I think this has a direct corrolation with how we plan to build extensions as part of SIP-151. We plan to build something like VS Code does (see docs) so that any/all extensions can interact natively with the host app (Superset)'s configured LLM(s).

All of this is still being sorted out, so I wouldn't recommend voting on this, until it fits into that other plan.

@rusackas Thanks for the update.