Bug description

Hi I want to upgrade Superset 4.0.2 to 4.1.1 version using Helm. helm upgrade --install superset superset/superset -f values.yaml I got issue with psycopg2. Help, please.

Screenshots/recordings

Defaulted container "superset-init-db" out of: superset-init-db, wait-for-postgres (init) Upgrading DB schema... Loaded your LOCAL configuration at [/app/pythonpath/superset_config.py] 2024-11-22 03:34:11,278:ERROR:superset.app:Failed to create app Traceback (most recent call last): File "/app/superset/app.py", line 40, in create_app app_initializer.init_app() File "/app/superset/initialization/init.py", line 476, in init_app self.setup_db() File "/app/superset/initialization/init.py", line 667, in setup_db pessimistic_connection_handling(db.engine) File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 998, in engine return self.get_engine() File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1017, in get_engine return connector.get_engine() File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 594, in get_engine self._engine = rv = self._sa.create_engine(sa_url, options) File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1027, in create_engine return sqlalchemy.create_engine(sa_url, engine_opts) File "", line 2, in create_engine File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 375, in warned return fn(*args, kwargs) File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 544, in create_engine dbapi = dialect_cls.dbapi(dbapi_args) File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 811, in dbapi import psycopg2 ModuleNotFoundError: No module named 'psycopg2' Traceback (most recent call last): File "/usr/local/bin/superset", line 8, in sys.exit(superset()) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(*args, kwargs) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1685, in invoke super().invoke(ctx) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/usr/local/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, kwargs) File "/usr/local/lib/python3.10/site-packages/click/decorators.py", line 33, in new_func return f(get_current_context(), args, kwargs) File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 355, in decorator app = __ctx.ensure_object(ScriptInfo).load_app() File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 309, in load_app app = locate_app(import_name, name) File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 238, in locate_app return find_app_by_string(module, app_name) File "/usr/local/lib/python3.10/site-packages/flask/cli.py", line 166, in find_app_by_string app = attr(args, kwargs) File "/app/superset/app.py", line 40, in create_app app_initializer.init_app() File "/app/superset/initialization/init.py", line 476, in init_app self.setup_db() File "/app/superset/initialization/init.py", line 667, in setup_db pessimistic_connection_handling(db.engine) File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 998, in engine return self.get_engine() File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1017, in get_engine return connector.get_engine() File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 594, in get_engine self._engine = rv = self._sa.create_engine(sa_url, options) File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1027, in create_engine return sqlalchemy.create_engine(sa_url, engine_opts) File "", line 2, in create_engine File "/usr/local/lib/python3.10/site-packages/sqlalchemy/util/deprecations.py", line 375, in warned return fn(args, kwargs) File "/usr/local/lib/python3.10/site-packages/sqlalchemy/engine/create.py", line 544, in create_engine dbapi = dialect_cls.dbapi(*dbapi_args) File "/usr/local/lib/python3.10/site-packages/sqlalchemy/dialects/postgresql/psycopg2.py", line 811, in dbapi import psycopg2 ModuleNotFoundError: No module named 'psycopg2'

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Hello @kirmorozov1992! I'm here to assist you with any bugs, questions, or contributions you have. Let's work together to solve the issue you're facing.

To resolve the "No module named 'psycopg2'" error when upgrading Apache Superset using Helm, you need to ensure that the psycopg2 package is installed as part of the bootstrap script in your Helm chart configuration. You can do this by adding the installation command for psycopg2 in the bootstrapScript section of your values.yaml file. Here is an example configuration:

bootstrapScript: |
  #!/bin/bash
  pip install psycopg2==2.9.6 \
    sqlalchemy-bigquery==1.6.1 \
    elasticsearch-dbapi==0.2.5 &&\
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

This script ensures that psycopg2 is installed when the Superset application starts, which should resolve the module not found error [1].

To continue talking to Dosu, mention @dosu.

Comment From: kirmorozov1992

@dosu Unfortunately it does not help me. Somebody has other advice?

Comment From: dosubot[bot]

This is a tough nut to crack, could really use your help on this, @rusackas.

To continue talking to Dosu, mention @dosu.

Comment From: DavidProdinger

There is the same issue for the docker compose version (docker-compose-image-tag.yml), not limited to Helm.

Comment From: paulchoi

This worked for me. You need psycopg2-binary. You can ignore clickhouse-connect.

bootstrapScript: |
  #!/bin/bash
  pip install \
    "clickhouse-connect>=0.6.8" \
    "psycopg2-binary>=2.9.10" \
    &&\
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

Comment From: kirmorozov1992

@paulchoi Hi! Do you check this with Docker Compose? I`ve tried many times to change config and reloading, but it is unhelpful for me. I use Minikube.

Comment From: DavidProdinger

I use it with Docker Compose

just add in your docker/requirements-local.txt file this content:

# database drivers
pymysql
psycopg2-binary

Sadly the non-binary version of psycopg2 cant be installed, as well as mysqlclient. Therefore I use pymysql with the URL mariadb+pymysql://...

Comment From: richard-fairthorne

Put this in your bootstrapScript, before pip install:

apt-get update && apt-get install -y build-essential

I am surprised this is not included in the documentation.

Comment From: Rusp0

bootstrapScript: | #!/bin/bash apt-get update && apt-get install -y build-essential pip install psycopg2==2.9.6 \ sqlalchemy-bigquery==1.6.1 \ elasticsearch-dbapi==0.2.5 &&\ if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi

Sadly, Didn't help to me, same error

Comment From: sfirke

Starting with 4.1.0 the lean docker image no longer contains the drivers for MySQL or Postgres, as described in the release notes: https://github.com/apache/superset/blob/master/RELEASING/release-notes-4-1/README.md#change-to-docker-image-builds

I know that affects people deploying with docker compose and may be the issue here with the Helm chart too. The Helm chart is in a gray area where it doesn't have a dedicated manager and gets bumped/fixed by the community as needed -- so community fixes are especially welcome here.

Comment From: merlos

In my case, I was tryping to install sqlalchemy-drill as part of the helm chart deployment and also failed because psycopg2 was not available.

This change in the values.yaml fixed the issue

bootstrapScript: |
  #!/bin/bash
  pip install sqlalchemy-drill psycopg2-binary

Comment From: martimors

Why can't psycopg2 just be bundled in the image? The docs suggest to create a derived image, but using a postgresql backend is not only an extremely common use-case for Superset There isn't really any harm in bundling a few database libs with the image for convenience, especially ones needed for the typical backends for superset itself.

Comment From: villebro

This is not directly related to the Helm chart, but rather how the Docker image is built. Sadly, pulling in extra db drivers on the fly is not totally straight forward for the following reasons:

  • User account: Only the root account is allowed to install new packages on the running pod/container. If you're ok with this, remember to keep the default runAsUser: 0 in your values.yaml
  • Internet access: Many environments may have blocked external access from the Superset pods to the external internet. However, if you happen to have an internal PyPI registry, you can use that in your bootstrap script: pip install psycopg2 --index-url https://pypi.mycorp.com/simple

If neither of these is possible in your environment, you will need to build a custom image, where you preinstall all necessary drivers. This both eliminates the need to run as root, and doesn't require having access to a running PyPI registry. This is also the recommended approach, as it keeps startup times to a minimum (no need to install the drivers every time the pod starts up), doesn't require access to a PyPI registry, and doesn't require running as root.

Comment From: villebro

Why can't psycopg2 just be bundled in the image? The docs suggest to create a derived image, but using a postgresql backend is not only an extremely common use-case for Superset There isn't really any harm in bundling a few database libs with the image for convenience, especially ones needed for the typical backends for superset itself.

@martimors sadly this is a bit of a slippery slope, as Superset supports some 40+ databases currently. As you will anyway need to figure out a way to add drivers for your other database drivers, prebaking psycopg2 into the image is not necessarily a good solution for the following reasons: - It adds to the image size. - It introduces an unnecessary attack vector for envs that don't need psycopg2 if an exploit exists in it. - Users may run into dependency issues if the pre-baked version of psycopg2 has conflicting requirements with whichever db driver someone wants to install.

Comment From: villebro

Put this in your bootstrapScript, before pip install:

apt-get update && apt-get install -y build-essential

I am surprised this is not included in the documentation.

@richard-fairthorne Pull Requests improving the docs are always welcomed!

Comment From: sfirke

Hm, I see both points here. - I agree with @villebro that if we included support for the backend DB, people will still likely need to build their own image to include drivers for their data warehouse as well as a browser to take screenshots for Alerts & Reports, pyxl for Excel import/export, etc. (EDIT: I was wrong, no need to install pyxl on top of lean in 5.0.0) - But for people who just want to spin up Superset the first time with example data to see what it feels like, including psycopg2 and mysqlclient might be beneficial to new users and ultimately the Superset project. We are seeing users getting stuck on docker build and installing pre-reqs for mysqlclient when they are just trying to test out Superset, that doesn't seem right.

At the very least we could document how to build an extended image with these drivers, I hope to do that in the coming months. I personally think it would be nice to offer a new docker image basics that has these drivers, pyxl, Pillow, and a headless browser installed. Then people who are extending can still extend from lean to avoid the bloat and security issues that Ville points out, but new users have a plug-and-play demo option.

Comment From: nfalco79

This hit also our helm chart deployment. Suggestion in THIS thread resolved the issue. I'm not happy to override the bootstrapScript because in future version could not work. So I have to remember to remove in the next update

Comment From: JohnDietrich-Pepper

This is a really obnoxious change. The most common use case includes Superset running off of Postgres and it is a PITA to make the changes required to get an upgraded instance running again.

Comment From: sfirke

I have completed the docs addition on how to build a custom Superset image that extends lean with additional drivers: https://superset.apache.org/docs/installation/docker-builds/#building-your-own-production-docker-image

If anyone wants to open a PR that puts psycopg2 back into the lean Docker image, I will review. I don't think the slippery slope argument applies to that one single package and the benefit (many more people will be able to try out Superset without building a Docker image) outweighs the harm (psycopg2 adding size and a possible attack surface to deployments that don't use Postgres as their metadata DB).

It seems like there's also support for a basics image that I suggest above, I would also review a PR for that.

Comment From: 1yuv

It's sad it was not documented clearly. The comments above are helpful and as @villebro mentioned, adding build-essential above pip install is also required. I was able to upgrade from 4.0.1 to 4.1.2 with following changes on my bootstrap script.

bootstrapScript: |
  #!/bin/bash
  apt-get update && apt-get install -y build-essential
  pip install \
    "authlib" \
    "psycopg2-binary>=2.9.10" \
    &&\
  if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi