Bug description
When using superset import_datasources
the sqlalchemy_uri
is imported as-is, without encrypting the password, keeping it as clear text.
Steps to Reproduce
- Create
/app/configs/import_datasources.yaml
(e.g. usingextraConfigs
(just for reproduction, only use secrets for credentials!)). Include the fullsqlalchemy_uri
with clear text password. ```yaml databases:- database_name: Example sqlalchemy_uri: 'postgresql://user:secret-password@db.example.org:5432/superset_data' cache_timeout: null expose_in_sqllab: true allow_run_async: false allow_ctas: true allow_cvas: true allow_dml: true allow_csv_upload: false extra: | { "metadata_params": {}, "engine_params": {}, "metadata_cache_timeout": {}, "schemas_allowed_for_csv_upload": [] } ```
- Import via
init.initScript
or manually:sh superset import_datasources -p /app/configs/import_datasources.yaml
Expectation
- The password in the
sqlalchemy_uri
field in thedbs
table is redacted, as if editing via the web UI. Example:postgresql://user:XXXXXXXXXX@db.example.org:5432/superset_data
- The
password
field in thedbs
table is set to the respective encrypted password. - The DB connection works.
Actual
- The password in the
sqlalchemy_uri
field in thedbs
table is kept as-is, readable in clear text, even in the web UI. Example:postgresql://user:secret-password@db.example.org:5432/superset_data
- The
password
field in thedbs
table is kept asNULL
. - The DB connection works nevertheless. ✅
Screenshots/recordings
Superset version
4.1.1
Python version
Not applicable
Node version
Not applicable
Browser
Not applicable
Additional context
Use Case / Justification
Automatically setting up data sources as part of a Helm chart deployment.
Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hey, @buschtoens! I'm here to help you with this issue. I can assist you with bugs, answer questions, and guide you to become a contributor. Let's tackle this together!
I found a similar unsolved discussion regarding the exposure of username and password when importing data sources using the import_datasource
command. You can view the discussion here [1].
Regarding your issue, the superset import_datasources
command currently does not encrypt the database password in the sqlalchemy_uri
. The password remains in clear text, which is not the expected behavior if you want it to be redacted as it is when editing via the web UI. The current implementation does not include logic for handling or encrypting passwords in the sqlalchemy_uri
during the import process [2].
The Superset web UI handles password encryption by masking the password in the API response, and similar logic could potentially be applied to the import process. However, this would require changes to the current implementation to ensure that passwords are encrypted or masked during import [2].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
Comment From: jeanpommier
Nice catch !
Actually, it should also not be stored as a configMap but as a secret, since the file contains passwords.
Comment From: buschtoens
Thanks!
Yes, hard agree on not storing secrets in a configMap. I was just trying to show a quick and easy minimal reproduction. :)
Comment From: fpfuetsch
Can confirm this with Superset version 3.1.3.
Comment From: ciaranj
This is also a problem in 4.1.1.
In case it helps anyone out in the meantime, superset set-database-uri does appear to ensure that the password appears masked to the end user in the UI.
Because I want to both maintain the UUID of the database and be able to control the username/password through environment secrets in my container, I'm currently doing an import directory (to enforce the UUID) and a call to set-database-uri (to enforce the password masking.) I'm doing this by re-writing the initscript in the helm chart, to allow for the container to change the username/password it's using.
YMMV, but here's mine:
init:
initscript: |-
#!/bin/sh
set -eu
echo "Upgrading DB schema..."
superset db upgrade
echo "Initializing roles..."
superset init
urlencode() {
string="$1"
encoded=""
pos=0
while [ "$pos" -lt "${#string}" ]; do
c=$(printf "%s" "$string" | cut -c $((pos + 1)))
case "$c" in
[a-zA-Z0-9.~_-]) o="$c" ;;
*) o=$(printf '%%%02X' "'$c") ;;
esac
encoded="$encoded$o"
pos=$((pos + 1))
done
echo "$encoded"
}
echo "Configuring default database"
mkdir /tmp/datasources
date_now=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
user=$(urlencode "$USER" )
password=$(urlencode "$PASSWORD" )
server=$(urlencode "$DB_SERVER" )
dbName=$(urlencode "$DB_NAME" )
conn_str=$( echo "mssql+pymssql://$user:$password@$server:1433/$dbName" )
mkdir /tmp/datasources/databases
cat > /tmp/datasources/metadata.yaml <<EOT
version: 1.0.0
type: Database
timestamp: '$date_now'
EOT
cat > /tmp/datasources/databases/DB.yaml <<EOT
database_name: DB
sqlalchemy_uri: $conn_str
cache_timeout: null
expose_in_sqllab: true
allow_run_async: false
allow_ctas: false
allow_cvas: false
allow_dml: false
allow_file_upload: false
extra:
allows_virtual_table_explore: true
uuid: 738f72b0-651b-4d60-9968-dc3d3064c149
version: 1.0.0
EOT
echo "Importing database connections from constructed local files"
superset import-directory /tmp/datasources -o
echo "Re-Importing database connections to work around https://github.com/apache/superset/issues/31983"
superset set-database-uri -d Lokulus -u $conn_str -s
(Please be aware that because of #21256 still being present in 4.1.1, this also brings in an extraneous examples database connection)
Comment From: watercraft
My workaround also uses the init script with these commands to call the API after the import:
echo "Encrypt database connection URI.... "
CSRF=`curl --silent -c cookies -H"<<some authentication>>" -X GET http://dataviz-superset:8088/api/v1/security/csrf_token/ | python3 -c 'import json, sys; print(json.dumps(json.loads("".join([l for l in sys.stdin]))["result"]))' | sed 's/"//g' `
curl --silent -b cookies -H"<<some authentication>>" -X GET http://dataviz-superset:8088/api/v1/database/1/connection | python3 -c 'import json, sys; print(json.dumps(json.loads("".join([l for l in sys.stdin]))["result"]))' > result
curl --silent -b cookies -H"Content-Type: application/json" -H"X-Csrftoken: $CSRF" -H"<<some authentication>>" -d@result -X PUT http://dataviz-superset:8088/api/v1/database/1
Comment From: rusackas
CC @betodealmeida @dpgaspar @msyavuz - seems like we should look more closely at this if it's an issue.