Apache Superset [SIP-151] The vision for a new Superset Plugins Architecture

[SIP-151] The vision for a new Superset Plugins Architecture

Motivation

Over the years, Superset has grown into a powerful and popular open-source data visualization platform, with a large and active community of users and contributors. As adoption grew, so did the requests for new features, and the desire to customize and extend the platform. As some examples, we have seen requests for new chart types, new dashboard components, new database connectors, new Explore controls, LLM integration, and more. However, adding new features to Superset has proven to be challenging, due to the monolithic nature of the codebase and the lack of a clear extensibility model. This has led to a situation where many users are unable to add the features they need, and the maintainers are struggling to keep up with the demand.

Superset currently has a chart plugin system that was introduced as part of SIP-6. While the framework introduced a generic chart data API endpoint and made it possible to create new chart plugins without the need for any backend changes, the system still required forking the project and modifying files in the core frontend codebase to be able to add plugins. In addition, the functionality was largely undocumented, making it very opaque and difficult to understand by non-core contributors. This led to limited adoption of the plugin framework, and only a handful individuals/orgs were able to benefit from it.

Due to the monolithic design, a lot of implementation-specific code has been added to the core parts of the codebase, making it less DRY and reusable. This has made it increasingly difficult to evolve the core functionality required for improving existing features or adding new ones. We also perceive a need for experimentation in Superset, where developers can try out new types of components without affecting the core codebase, allowing new behaviors to be tested and validated with the user base.

This SIP is intended not to sell a refactor of existing architectures merely for a cleaner Superset codebase. This project is intended to: - Enable the expansion of Superset with a simpler developer experience allowing us to expand capabilities and competitive stance in the BI market - Reduce the responsibilities and maintenance burden of the core application to make it simpler for core maintainers.

Proposed Change

To unlock the next phase of Superset's evolution, we are proposing moving out feature-specific logic from the core codebase and introducing a new plugin architecture. This architecture will allow users to extend and customize Superset's functionality in a modular and maintainable way, without having to modify the core codebase. By adopting this new architecture, we aim to address many of the technical and operational challenges that have been holding Superset back, and pave the way for a plugin ecosystem where users can easily install and share plugins.

The inspiration comes from platforms like VS Code—simple at its core, yet endlessly extensible. This approach has led to a thriving ecosystem of plugins that extend far beyond its original scope, empowering users to shape their tools to fit their unique needs.

Note: For the purpose of this SIP, the naming conventions around “extensions/plugins/modules/etc.” has yet to be formalized, and will likely solidify as we build out POCs.

This new architecture will bring numerous benefits: - Allow organizations to develop features specific to their needs, such as advanced SQL Lab capabilities, SQL explainers, query optimizers, automatic translations between database engines, improved metadata, natural language support, and more. - Create many opportunities to develop new plugins, including but not limited to new chart visualization types, dashboard components, database connectors, Explore controls, etc. - Support feature isolation with clearly defined boundaries and dependencies. - Improve technical quality and provide clear development patterns. - Enable easier contributions from the open-source community without compromising technical and release quality.

A key aspect of this project is the introduction of a developer portal with clear documentation about how to develop new plugins, what interfaces are available, how inter-plugin communication works, what restrictions are in place, how to deploy the plugins, etc. We think the lack of clear documentation is preventing many developers from contributing to Superset, and we believe that by providing clear and comprehensive documentation, we can significantly increase the number of contributions to the project. By making high-quality documentation a first-class citizen of the project, we intend to make it possible for developers to find the information they need without having to resort to opening issues/discussions on GitHub or Slack.

Another really important aspect is the offering of an SDK for plugin development to provide developers with clear and well-documented interfaces, robust tooling, and reusable components to streamline the creation of plugins. Whether building a custom chart type, integrating an AI-powered query optimizer, or introducing advanced dashboard components, the SDK will lower barriers to entry, enabling developers to focus on innovation rather than infrastructure. The SDK will also define consistent patterns for development, ensuring compatibility and maintainability across the ecosystem. Features such as standardized inter-plugin communication, built-in security protocols, and versioning will empower developers to build confidently while maintaining the technical quality and integrity of Superset.

When we think about what parts of Superset can be pluggable, it's clear that there are many possibilities with varying degrees of complexity. That's why it's essential to have a clear vision and roadmap for this project. We propose to start with a limited scope, focusing on the most requested features, and then gradually expand the scope as we gain experience and confidence with the new architecture. We also propose to have a clear governance model in place to review and approve new plugin types, to ensure that they meet the technical and quality standards of the project.

We defined the following priorities that will define the architecture and pave the way for new plugin types:

Priority	Description
Developer portal	One or multiple SIPs that define the structure and content of our developer portal. It will contain clear documentation about how to develop new plugins, what interfaces are available, how inter-plugin communication works, what restrictions are in place, how to deploy the plugins, etc.
SQL Lab tools	One or multiple SIPs that define the interfaces for SQL Lab plugins. It will be a revised version of SIP-132: Proposal for SQL Lab add-on plugins
Chart visualizations	One or multiple SIPs that define the interfaces for new chart types, multiple visualization libraries, etc.
Dashboard components	One or multiple SIPs that define the interfaces for dashboard components. This includes supporting new types of components, multiple export options, multiple layout types, etc.
Explore controls	One or multiple SIPs that define the interfaces for Explore controls which includes custom tooltips, formatters, validations, etc.
Database connectors	One or multiple SIPs that define the interfaces for database connectors for our 50+ supported databases.
Security model	One or multiple SIPs that define the interfaces for extending our security model.

As we begin to design and work on the aforementioned variations of plugins, and drive toward standardizing their architecture, we will need to address any number of other issues, which will be proposed as additional SIPs, or as part of planned SIPs, including (but not limited to):

Security - all plugins should have the lowest possible security risk, protecting against untrusted code.
Versioning - documented/versioned interfaces should allow us to follow SemVer practices across versions of Superset and plugins in the ecosystem.
Building, Deployment, and Installation - all plugins should follow standardized workflows for plugin publication and installation.
Inter-plugin dependencies and communication.
and more!!!

Our plan is to work with subject matter experts from the community to define the contents of each SIP and once a SIP is approved and implemented, make sure the developer portal is updated with the new information.

To track the project, we'll use the existing SIPs project board to place all SIPs related to the plugins work. They will be labeled with the plugin-architecture label so that users can easily find the full scope of the project and check the status of each SIP.

The objective of this SIP is to get an early buy-in from the community on the general feasibility of the project, and to get feedback on the proposed scope and timeline. We assume this SIP in itself should not be objectionable, as it will improve the Superset product, its maintainability and developer experience, improve its security, boost compatibility, and spread wider adoption by end users. By embracing a modular plugin architecture, we are setting the stage for a future where innovation and collaboration can thrive, ultimately delivering greater value to all users. We look forward to working with the community to make this vision a reality.

✍ Evan, Michael and Ville

Comment From: amitmiran137

Guys , I adore your endless passion towards superset Keep rocking 💪💪

Comment From: suddjian

hell yes

Comment From: TechAuditBI

This is an awesome improvement!

Comment From: betodealmeida

Regarding "Database connectors", we already have a plugin architecture based on Python entry points, and today it's possible to install new DB engine specs from 3rd-party codebases. The only thing missing is a proper definition of the interface, which is documented but sometimes falls out of sync.

Comment From: villebro

Regarding "Database connectors", we already have a plugin architecture based on Python entry points, and today it's possible to install new DB engine specs from 3rd-party codebases. The only thing missing is a proper definition of the interface, which is documented but sometimes falls out of sync.

The idea here is to not only support various SQLAlchemy based connectors, but any queriable datasource. I know Superset used to have an abstraction for this, but it was really an intersection of Native Druid and SQLAlchemy, and didn't abstract well to other datasource types. By having well defined interfaces on the datasource level, the community could then author their own connectors, and query those via SQL Lab (probably Query Lab going forward) if they so wish. Examples could include PromQL or arbitrary NoSQL dbs.

Comment From: michael-s-molina

Yep! We're aware of the current solution and intend to enhance it the same way we'll do for the visualization plugins. We want to have a generic way of installing/configuring backend plugins (database connectors, security models, etc). By the way, we are relying on your expertise @betodealmeida for co-authoring related SIPs 🤝

Comment From: jansule

Great proposal! I just want to share my two cents from the perspective of an external visualization plugin developer:

Dependency management was quite cumbersome (as it happens on bigger NodeJs projects), especially when sharing common dependencies between the core and (multiple) plugins. Providing a solution for that would be fantastic!
Provision of common UI elements that are also extensible. While the superset-ui-core already provides a great set of UI elements, the ones used in the explore view were still heavily adjusted in the core. So as a plugin developer, I had to either make changes to the core (adding custom controls), or redo the UI components in the plugin, in order to get the same look and feel for my custom controls. Devs would benefit from a solution here, allowing us to faster create custom components that better integrate in the UI in the short run, while improving the components' maintainability in the long run.
Changes that also affect the backend: Some changes require adjustments to the backend. A very simple case might be adding configuration parameters that need to be passed to the plugins, e.g. API tokens, adjustable default values, etc. Currently, this would require devs to adjust the core repository. Being able to also adjust backend functionality from a third party plugin, would allow me to provide more useful features to a plugin, without having to touch the main repository.
Connecting to datasources via HTTP can probably already cover quite a big area besides the SQL based datasources. Especially, when integrating third party sources that do not want to expose their databases to the public, but rather provide access via HTTP APIs. States like Germany follow an open data approach, where governmental bodies have to provide access to public information. This often happens via HTTP APIs (especially in the geographic domain, where these APIs are standardized) which provide sensor data as well as other geospatial information. Being able to integrate these datasources as third party plugins would be extremely beneficial for us.

Knowing that these things are not easily solvable problems that also come with considerations regarding applicability as well as security, I hope that the future architecture might address some of these problems.

Comment From: michael-s-molina

Thanks for sharing all these great points @jansule! We are totally aligned here and plan to tackle these problems during this work. As you mentioned, these issues are not easily solvable and it will be really valuable to partner with the community for writing the SIPs and validating the architecture. We're already working on some of the topics you mentioned, inspired by VSCode Extensions and other projects that have successfully navigated similar challenges. By drawing on their experiences and best practices, we aim to implement solutions that not only address our current issues but also enhance the overall functionality and user experience of Superset.

Comment From: jansule

As you mentioned, these issues are not easily solvable and it will be really valuable to partner with the community for writing the SIPs and validating the architecture

Sure, I'm happy to contribute, where possible.

Comment From: michael-s-molina

The vote has PASSED with 5 binding +1, 1 non-binding + 1 and 0 -1 votes.

Result thread: https://lists.apache.org/thread/vh1qpd9jdf2lk1d82wk9m01x20b6d2j2

Comment From: jpchev

great, I can't wait to see the new plugin architecture in place, I am available to contribute

Comment From: mistercrunch

Currently working on two features that may land as Preset-only extensions, and had a bit of a thought to share about our code structure, wanted to surface here.

Mostly thinking about the backend currently, but this relates to the frontend as well. Say if I'm building a "global search" extensions, i'll need a set of building blocks, like models/views/daos/commands/ ...

Now in our codebase, sometimes things are implemented by type and scattered through things like: - superset/models/.. - superset/view/... - superset/command/... - superset/daoes/...

Now for my extension, it'd be kind of nice to structure it more like: - superset/{feature}/models.py - superset/{feature}/models.py - superset/{feature}/commands.py - superset/{feature}/daos.py

Not sure exactly what that means and whether: - fundamentally the Superset codebase should be more structured in that way (?) OR - we need some sort of registry for each object type, and extensions can inject into them, as in extension.register_views() extension.register_commands() extension.register_blueprints()

In any case, raises the questions of what I have access to from the extension SDK, probably access to BaseAPI, BaseCommand, ..., and some ways to register my stuff somehow ... Not sure if the SIP or related document formalizes that stuff already or if that's still in up for definition. But really curious around the mechanics of things. My recent projects are interesting use cases to figure out: - what objects I need a handle on for extension (base objects to extend) - what objects I need to inject (commands, api endpoint/views, models, ...)

Comment From: michael-s-molina

@mistercrunch I completely agree with organizing the code according to features instead of layers. That's what I proposed in SIP-61 and it's the current way we organize frontend files. @villebro and I discussed this many times and we share the same perspective.

We had SIP-92 about organizing backend files. I added a comment about this here but the SIP was approved anyway.

To me, a feature-based approach makes even more sense when you think about a modular system where specific parts might be present or not during installation.

It's also really helpful when setting code ownership.