Bug description

Looking at 5.0.0 images these have increased in size compare to 4.1.x 892MB vs 292MB This seems to stem from the fact that this change #31385 moved playwright, gecko, chromium from the dev image into the base image which is then used for everything downstream.

Screenshots/recordings

No response

Superset version

5.0.0

Python version

3.10

Node version

Not applicable

Browser

Not applicable

Additional context

No response

Checklist

  • [ ] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: agologan

@mistercrunch in the subsequent change #31386 INCLUDE_CHROMIUM was disabled for most dev workflows. Should've this been disabled for tag-release#L95 workflow as well? Or am I misunderstanding the purpose of playwright?

Comment From: dosubot[bot]

Hi @agologan! I'm Dosu and I’m helping the superset team.

You’re not misunderstanding—Playwright is mainly for browser-based testing and automation, which does require Chromium or similar browsers. In the release workflow (tag-release.yml), neither Playwright nor Chromium is enabled or required by default. The relevant build args and environment variables (like INCLUDE_CHROMIUM) are set to false unless explicitly overridden, and the workflow itself doesn’t run browser-based tests or tasks that would need these dependencies. So, there’s no need to explicitly disable INCLUDE_CHROMIUM in the release workflow—it’s already off by default, and the release images shouldn’t include Chromium or Playwright unless you specifically opt in. The image size increase isn’t due to these components in the standard release workflow. If you’re seeing them included despite this, it might be worth double-checking for any custom overrides in your build environment.
Sources:
- tag-release.yml workflow does not enable Chromium/Playwright
- Default Docker env disables Playwright/Chromium
- docker-compose.yml defaults INCLUDE_CHROMIUM to false

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

Comment From: rusackas

One counterargument might be that we're steering toward using Playwright for thumbnail generation. Would it make sense to leave it there for when that becomes the default configuration?

Comment From: agologan

Afaik thumbnails are opt-in and it would represent a significant overhead in terms of container image size and dependency management just to generate those. Can understand if the maintainers want to go in that direction but the change was not called out in the release notes and the few details I could dig up makes it seem unintentional.

Comment From: sfirke

Good noticing and I appreciate you raising this for discussion. I wouldn't be opposed to bundling Playwright in the base image but I expect you're right that introducing it was an accident. Good call to ping @mistercrunch both as someone who touches these files and who has been committed to keeping the lean image lean. @mistercrunch is this something you think should be reverted?

Comment From: rusackas

Note also that this MIGHT (but probably doesn't) intersect with @sadpandajoe's new SIP to use Playwright for E2E testing. Just noting it since Playwright is becoming a more established part of our little universe, and is growing roots!

Comment From: mistercrunch

Seems playwright should not be in the lean image. Let's fix this.

It appears that INCLUDE_CHROMIUM was set to "true" before my PR https://github.com/apache/superset/pull/31385, so I'm unclear what exactly changed here, maybe something downstream (?) On the PR I was assuming that downstream processes like the official version builds were using those ARGs properly, but maybe that wasn't the case (?)

In any case, it really feels like an image named lean shouldn't be packaging a headless browser, and that if/when people want a headless browser, they'd bake their own image, whether from our Dockerfile or in subsequent layers.

One related question is whether docker compose up should use INCLUDE_CHROMIUM=true or not? It can be useful to have this in the dev environment if/when working with alerts and reports or thumbnails. Though the price on builds is relatively steep and we could say "turn it on if/when you work on certain features that require this". Question here is whether the philosophy around dev environment is around "fully loaded by default", or "optimized and lean for core workflows" ...