[SIP] Proposal for Migrating from npm + Lerna to pnpm

Motivation

Apache Superset’s frontend currently relies on npm for dependency management and Lerna for monorepo publishing and workspace organization. While Lerna has served us well, it has experienced periods of limited maintenance and slower updates. Additionally, we still face challenges with large node_modules directories, long install times, and complex multi-package workflows.

pnpm offers a compelling alternative that consolidates both package management and monorepo workspace features in a single tool. Some key benefits include:

  • Smaller node_modules footprint: pnpm’s content-addressable store greatly reduces disk usage.
  • Faster installations: pnpm’s symlink-based architecture speeds up the dependency installation process.
  • Actively maintained: pnpm is supported by the npm team, ensuring ongoing development and timely fixes.
  • Built-in workspace management: pnpm can replace Lerna’s monorepo features without additional overhead.

Description of the problem to be solved

  1. Large node_modules directories: Both local development environments and CI pipelines suffer from bloated install footprints.
  2. Long install times: As dependencies grow, npm install becomes slower and more resource-intensive.
  3. Maintenance overhead: Using two tools—npm and Lerna—for monorepo management can lead to redundant configuration and potential version mismatches.
  4. Future-proofing: Lerna’s maintenance status has fluctuated, creating uncertainty for the project’s long-term needs.

Proposed Change

  1. Adopt pnpm Workspaces
  2. Migrate from Lerna’s monorepo setup (lerna.json and associated scripts) to a pnpm-workspace.yaml configuration.
  3. Remove or deprecate Lerna-specific commands in favor of pnpm’s built-in workspace features.
  4. Replace npm with pnpm
  5. Update all npm install and npm run scripts to use pnpm.
  6. Validate and adjust any scripts or hooks to ensure they function under pnpm.
  7. Integrate Changesets for Versioning & Publishing (optional but recommended)
  8. If we still want automated changelog generation and version bumping, incorporate Changesets.
  9. Configure pnpm to run Changesets during CI to publish packages.
  10. Update CI/CD
  11. Switch CI steps from npm install to pnpm install.
  12. Evaluate caching strategies (e.g., caching the pnpm store) to maximize build performance.
  13. Documentation
  14. Provide clear migration steps for developers (e.g., uninstall Lerna globally if used, install pnpm, and switch to pnpm commands).
  15. Update any references to Lerna or npm in project documentation, READMEs, and onboarding guides.

New or Changed Public Interfaces

  • No direct impact on Superset’s REST endpoints, dashboards, or CLI is anticipated.
  • Developer-facing scripts (e.g., lerna publish or npm run build) will be replaced by pnpm run publish or pnpm run build, necessitating documentation updates.

New Dependencies

  • pnpm
  • License: MIT
  • Actively maintained by the npm team and open-source community.
  • (Optional) Changesets
  • License: MIT
  • Actively maintained and widely adopted for multi-package versioning and changelog generation.

Migration Plan and Compatibility

  1. Local Environment
  2. Developers will install pnpm globally (corepack enable pnpm or others).
  3. Replace Lerna commands (e.g., lerna bootstrap, lerna publish) with pnpm equivalents (pnpm install, pnpm publish).
  4. CI Environments
  5. Update pipelines to install pnpm and run pnpm install instead of npm install or lerna bootstrap.
  6. Validate build scripts, tests, and publishing flows under pnpm.
  7. Backward Compatibility
  8. Removing Lerna does not affect final build artifacts or runtime usage.
  9. The main changes are internal to developer workflows and CI processes.

Rejected Alternatives

  1. Continue using npm + Lerna
  2. Maintains the status quo but does not address large node_modules, slower install times, or the uncertain maintenance status of Lerna.
  3. Nx or Turborepo
  4. Both offer monorepo solutions, but either introduce additional layers of complexity or lack out-of-the-box publishing features (like Nx or Turborepo).

By migrating from npm + Lerna to pnpm, Apache Superset can simplify its monorepo workflows, reduce disk usage, speed up installation times, and rely on a single, well-maintained solution for both dependency management and workspace organization. This proposal aims to streamline developer workflows and position Superset for future growth and maintenance.

32692

Comment From: rusackas

Thank you for kicking this off! I definitely support simplification and ridding ourselves of lerna!

One thing I've been wondering about (which we should consider as an alternative) would be usingyarn (on v4 at the moment). It's also pretty fast/clean/easy, and works with changesets as well, and supports workspaces and publishing. We might want to add it to the "alternatives" section for good measure.

One thing I've been enjoying lately about yarn is the overrides capabilities it offers over npm, to help keep subdependencies free from CVE exposure. It sounds like pnpm has nearly identical capabilities, so that's cool. It even cleans up after itself when the top-level dependency no longer needs the override.

I was GPT-ing again, and weighing pros and cons:

Image

I did a little digging on the "Cross-Project Linking" aspects, and it does sound like pnpm will be an excellent fit for working with our monorepo packages, as well as developing (and migrating) monorepo pieces around as we work toward our new extensions architecture.

I believe I'll be a +1 on this... let me know if you want help moving it forward with discussion/voting.

Comment From: rusackas

I think this is a bit stuck in the SIP process here, and needs to be brought up for a [DISCUSS] thread on the dev@superset.apache.org mailing list. Let me know here (or on Slack) if you want any help with that. Thanks!

Comment From: rusackas

Opening the discussion... i think this fits well with the Extensions Architecture and Documentation Portal SIPs, as we can (seemingly/hopefully) use this new pnpm setup to automate the versioning of various sections of documentation along with the packages themselves.

Comment From: rusackas

This only has one vote so far. Anyone for or against this? Someone at Town Hall brought up the idea of bun as an alternative, which might be faster, but it's also the newer/shiny thing. I'm open to it, but like tools that have a large community of support, when we're picking something that needs to last for years.

Comment From: rusackas

Then again... just looked at bun's repo, and it is quite popular/active/supported. I'll update my comparison grid and take that into consideration. The vote remains open in the meantime. I'll roll it back to [DISCUSS] if it changes my mind ;)

Comment From: rusackas

Been meaning to circle back to this for AGES now. Threw my friend Claude at the problem, and here's what it has to say about bun vs pnpm:

pnpm vs Bun Comparison for Apache Superset

Aspect pnpm Bun
Monorepo Support ✅ Mature workspace protocol, workspace:* for internal deps ✅ Built-in workspaces (since v1.0), less battle-tested
Installation Speed 2-3x faster than npm 10-100x faster than npm
Disk Usage 50-70% space savings via hard links Standard node_modules, but faster writes
Publishing Workflow ✅ Excellent - changesets integration, recursive publish ❌ Limited - no changeset equivalent, basic npm publish
Dependency Strictness ✅ Prevents phantom deps, strict by default ⚠️ More permissive, follows npm model
Package Overrides ✅ .pnpmfile.cjs for patches/overrides ⚠️ Basic overrides in package.json
Filtering/Scoping ✅ Powerful --filter flag for selective ops ✅ Workspace filtering available
Node.js Compatibility 100% (is a package manager for Node) ~90% Node API coverage
Production Maturity ✅ Used by Vue, Microsoft, proven at scale ⚠️ Newer, less proven in large monorepos
CI/CD Integration ✅ Established patterns, wide support ⚠️ Emerging patterns, less documentation
Peer Deps Handling ✅ Auto-install with configurable strictness ✅ Auto-installs peer dependencies
Windows Support ✅ Full support (with some symlink caveats) ⚠️ Experimental Windows support
Native Addons ✅ Full support for Node native addons ⚠️ Incomplete N-API coverage
Runtime Included ❌ Package manager only, needs Node.js ✅ All-in-one runtime + package manager
TypeScript Via Node + ts-node/tsx ✅ Native TypeScript execution
Debugging Tools ✅ Mature ecosystem ⚠️ Limited debugging tools
Migration Effort Medium - import command, adjust for symlinks Medium-High - runtime changes, compat fixes
Risk Level Low - community governed, stable Medium - single company, rapid changes
Best For Production monorepos needing publishing Greenfield projects, speed-critical builds

Bottom Line for Apache Superset

Go with pnpm because: - Publishing workflows are critical for Superset's multi-package releases, and pnpm has mature tooling here - Proven stability with large monorepos (Vue.js uses it successfully) - Dependency strictness helps manage Superset's complex dependency tree - .pnpmfile.cjs enables patching problematic dependencies

Consider Bun later (12-18 months) when: - Publishing workflow tools mature - More monorepos prove it in production - Node.js compatibility reaches 95%+ - You need dramatic CI speedups (could be 10x faster builds)

The main tradeoff: pnpm gives you production-ready monorepo management today, while Bun offers revolutionary speed but with ecosystem gaps that could block Superset's publishing needs.