Bug description
Hi I want to upgrade Superset 4.0.2 to 4.1.1 version using Helm. helm upgrade --install superset superset/superset -f values.yaml I got issue with psycopg2. Help, please.
Screenshots/recordings
Defaulted container "superset-init-db" out of: superset-init-db, wait-for-postgres (init)
Upgrading DB schema...
Loaded your LOCAL configuration at [/app/pythonpath/superset_config.py]
2024-11-22 03:34:11,278:ERROR:superset.app:Failed to create app
Traceback (most recent call last):
File "/app/superset/app.py", line 40, in create_app
app_initializer.init_app()
File "/app/superset/initialization/init.py", line 476, in init_app
self.setup_db()
File "/app/superset/initialization/init.py", line 667, in setup_db
pessimistic_connection_handling(db.engine)
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 998, in engine
return self.get_engine()
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1017, in get_engine
return connector.get_engine()
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 594, in get_engine
self._engine = rv = self._sa.create_engine(sa_url, options)
File "/usr/local/lib/python3.10/site-packages/flask_sqlalchemy/init.py", line 1027, in create_engine
return sqlalchemy.create_engine(sa_url, engine_opts)
File "
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hello @kirmorozov1992! I'm here to assist you with any bugs, questions, or contributions you have. Let's work together to solve the issue you're facing.
To resolve the "No module named 'psycopg2'" error when upgrading Apache Superset using Helm, you need to ensure that the psycopg2
package is installed as part of the bootstrap script in your Helm chart configuration. You can do this by adding the installation command for psycopg2
in the bootstrapScript
section of your values.yaml
file. Here is an example configuration:
bootstrapScript: |
#!/bin/bash
pip install psycopg2==2.9.6 \
sqlalchemy-bigquery==1.6.1 \
elasticsearch-dbapi==0.2.5 &&\
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
This script ensures that psycopg2
is installed when the Superset application starts, which should resolve the module not found error [1].
To continue talking to Dosu, mention @dosu.
Comment From: kirmorozov1992
@dosu Unfortunately it does not help me. Somebody has other advice?
Comment From: dosubot[bot]
This is a tough nut to crack, could really use your help on this, @rusackas.
To continue talking to Dosu, mention @dosu.
Comment From: DavidProdinger
There is the same issue for the docker compose version (docker-compose-image-tag.yml
), not limited to Helm.
Comment From: paulchoi
This worked for me. You need psycopg2-binary
.
You can ignore clickhouse-connect
.
bootstrapScript: |
#!/bin/bash
pip install \
"clickhouse-connect>=0.6.8" \
"psycopg2-binary>=2.9.10" \
&&\
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
Comment From: kirmorozov1992
@paulchoi Hi! Do you check this with Docker Compose? I`ve tried many times to change config and reloading, but it is unhelpful for me. I use Minikube.
Comment From: DavidProdinger
I use it with Docker Compose
just add in your docker/requirements-local.txt
file this content:
# database drivers
pymysql
psycopg2-binary
Sadly the non-binary version of psycopg2
cant be installed, as well as mysqlclient
.
Therefore I use pymysql with the URL mariadb+pymysql://...
Comment From: richard-fairthorne
Put this in your bootstrapScript, before pip install:
apt-get update && apt-get install -y build-essential
I am surprised this is not included in the documentation.
Comment From: Rusp0
bootstrapScript: | #!/bin/bash apt-get update && apt-get install -y build-essential pip install psycopg2==2.9.6 \ sqlalchemy-bigquery==1.6.1 \ elasticsearch-dbapi==0.2.5 &&\ if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
Sadly, Didn't help to me, same error
Comment From: sfirke
Starting with 4.1.0 the lean
docker image no longer contains the drivers for MySQL or Postgres, as described in the release notes: https://github.com/apache/superset/blob/master/RELEASING/release-notes-4-1/README.md#change-to-docker-image-builds
I know that affects people deploying with docker compose and may be the issue here with the Helm chart too. The Helm chart is in a gray area where it doesn't have a dedicated manager and gets bumped/fixed by the community as needed -- so community fixes are especially welcome here.
Comment From: merlos
In my case, I was tryping to install sqlalchemy-drill
as part of the helm chart deployment and also failed because psycopg2
was not available.
This change in the values.yaml
fixed the issue
bootstrapScript: |
#!/bin/bash
pip install sqlalchemy-drill psycopg2-binary
Comment From: martimors
Why can't psycopg2
just be bundled in the image? The docs suggest to create a derived image, but using a postgresql backend is not only an extremely common use-case for Superset There isn't really any harm in bundling a few database libs with the image for convenience, especially ones needed for the typical backends for superset itself.
Comment From: villebro
This is not directly related to the Helm chart, but rather how the Docker image is built. Sadly, pulling in extra db drivers on the fly is not totally straight forward for the following reasons:
- User account: Only the
root
account is allowed to install new packages on the running pod/container. If you're ok with this, remember to keep the defaultrunAsUser: 0
in yourvalues.yaml
- Internet access: Many environments may have blocked external access from the Superset pods to the external internet. However, if you happen to have an internal PyPI registry, you can use that in your bootstrap script:
pip install psycopg2 --index-url https://pypi.mycorp.com/simple
If neither of these is possible in your environment, you will need to build a custom image, where you preinstall all necessary drivers. This both eliminates the need to run as root
, and doesn't require having access to a running PyPI registry. This is also the recommended approach, as it keeps startup times to a minimum (no need to install the drivers every time the pod starts up), doesn't require access to a PyPI registry, and doesn't require running as root
.
Comment From: villebro
Why can't
psycopg2
just be bundled in the image? The docs suggest to create a derived image, but using a postgresql backend is not only an extremely common use-case for Superset There isn't really any harm in bundling a few database libs with the image for convenience, especially ones needed for the typical backends for superset itself.
@martimors sadly this is a bit of a slippery slope, as Superset supports some 40+ databases currently. As you will anyway need to figure out a way to add drivers for your other database drivers, prebaking psycopg2
into the image is not necessarily a good solution for the following reasons:
- It adds to the image size.
- It introduces an unnecessary attack vector for envs that don't need psycopg2
if an exploit exists in it.
- Users may run into dependency issues if the pre-baked version of psycopg2
has conflicting requirements with whichever db driver someone wants to install.
Comment From: villebro
Put this in your bootstrapScript, before pip install:
apt-get update && apt-get install -y build-essential
I am surprised this is not included in the documentation.
@richard-fairthorne Pull Requests improving the docs are always welcomed!
Comment From: sfirke
Hm, I see both points here.
- I agree with @villebro that if we included support for the backend DB, people will still likely need to build their own image to include drivers for their data warehouse as well as a browser to take screenshots for Alerts & Reports, pyxl
for Excel import/export, etc. (EDIT: I was wrong, no need to install pyxl on top of lean
in 5.0.0)
- But for people who just want to spin up Superset the first time with example data to see what it feels like, including psycopg2
and mysqlclient
might be beneficial to new users and ultimately the Superset project. We are seeing users getting stuck on docker build
and installing pre-reqs for mysqlclient
when they are just trying to test out Superset, that doesn't seem right.
At the very least we could document how to build an extended image with these drivers, I hope to do that in the coming months. I personally think it would be nice to offer a new docker image basics
that has these drivers, pyxl, Pillow, and a headless browser installed. Then people who are extending can still extend from lean
to avoid the bloat and security issues that Ville points out, but new users have a plug-and-play demo option.
Comment From: nfalco79
This hit also our helm chart deployment. Suggestion in THIS thread resolved the issue. I'm not happy to override the bootstrapScript because in future version could not work. So I have to remember to remove in the next update
Comment From: JohnDietrich-Pepper
This is a really obnoxious change. The most common use case includes Superset running off of Postgres and it is a PITA to make the changes required to get an upgraded instance running again.
Comment From: sfirke
I have completed the docs addition on how to build a custom Superset image that extends lean
with additional drivers: https://superset.apache.org/docs/installation/docker-builds/#building-your-own-production-docker-image
If anyone wants to open a PR that puts psycopg2
back into the lean
Docker image, I will review. I don't think the slippery slope argument applies to that one single package and the benefit (many more people will be able to try out Superset without building a Docker image) outweighs the harm (psycopg2 adding size and a possible attack surface to deployments that don't use Postgres as their metadata DB).
It seems like there's also support for a basics
image that I suggest above, I would also review a PR for that.
Comment From: 1yuv
It's sad it was not documented clearly. The comments above are helpful and as @villebro mentioned, adding build-essential
above pip install is also required. I was able to upgrade from 4.0.1 to 4.1.2 with following changes on my bootstrap script.
bootstrapScript: |
#!/bin/bash
apt-get update && apt-get install -y build-essential
pip install \
"authlib" \
"psycopg2-binary>=2.9.10" \
&&\
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi