Bug description

When using superset import_datasources the sqlalchemy_uri is imported as-is, without encrypting the password, keeping it as clear text.

Steps to Reproduce

  1. Create /app/configs/import_datasources.yaml (e.g. using extraConfigs (just for reproduction, only use secrets for credentials!)). Include the full sqlalchemy_uri with clear text password. ```yaml databases:
    • database_name: Example sqlalchemy_uri: 'postgresql://user:secret-password@db.example.org:5432/superset_data' cache_timeout: null expose_in_sqllab: true allow_run_async: false allow_ctas: true allow_cvas: true allow_dml: true allow_csv_upload: false extra: | { "metadata_params": {}, "engine_params": {}, "metadata_cache_timeout": {}, "schemas_allowed_for_csv_upload": [] } ```
  2. Import via init.initScript or manually: sh superset import_datasources -p /app/configs/import_datasources.yaml

Expectation

  • The password in the sqlalchemy_uri field in the dbs table is redacted, as if editing via the web UI. Example: postgresql://user:XXXXXXXXXX@db.example.org:5432/superset_data
  • The password field in the dbs table is set to the respective encrypted password.
  • The DB connection works.

Actual

  • The password in the sqlalchemy_uri field in the dbs table is kept as-is, readable in clear text, even in the web UI. Example: postgresql://user:secret-password@db.example.org:5432/superset_data
  • The password field in the dbs table is kept as NULL.
  • The DB connection works nevertheless. ✅

Screenshots/recordings

Apache Superset superset import_datasources does not encrypt DB password

Superset version

4.1.1

Python version

Not applicable

Node version

Not applicable

Browser

Not applicable

Additional context

Use Case / Justification

Automatically setting up data sources as part of a Helm chart deployment.

Checklist

  • [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Hey, @buschtoens! I'm here to help you with this issue. I can assist you with bugs, answer questions, and guide you to become a contributor. Let's tackle this together!

I found a similar unsolved discussion regarding the exposure of username and password when importing data sources using the import_datasource command. You can view the discussion here [1].

Regarding your issue, the superset import_datasources command currently does not encrypt the database password in the sqlalchemy_uri. The password remains in clear text, which is not the expected behavior if you want it to be redacted as it is when editing via the web UI. The current implementation does not include logic for handling or encrypting passwords in the sqlalchemy_uri during the import process [2].

The Superset web UI handles password encryption by masking the password in the API response, and similar logic could potentially be applied to the import process. However, this would require changes to the current implementation to ensure that passwords are encrypted or masked during import [2].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Comment From: jeanpommier

Nice catch !

Actually, it should also not be stored as a configMap but as a secret, since the file contains passwords.

Comment From: buschtoens

Thanks!

Yes, hard agree on not storing secrets in a configMap. I was just trying to show a quick and easy minimal reproduction. :)

Comment From: fpfuetsch

Can confirm this with Superset version 3.1.3.

Comment From: ciaranj

This is also a problem in 4.1.1.

In case it helps anyone out in the meantime, superset set-database-uri does appear to ensure that the password appears masked to the end user in the UI.

Because I want to both maintain the UUID of the database and be able to control the username/password through environment secrets in my container, I'm currently doing an import directory (to enforce the UUID) and a call to set-database-uri (to enforce the password masking.) I'm doing this by re-writing the initscript in the helm chart, to allow for the container to change the username/password it's using.

YMMV, but here's mine:

init:
  initscript: |-
    #!/bin/sh
    set -eu
    echo "Upgrading DB schema..."
    superset db upgrade
    echo "Initializing roles..."
    superset init

    urlencode() {
      string="$1"
      encoded=""
      pos=0
      while [ "$pos" -lt "${#string}" ]; do
        c=$(printf "%s" "$string" | cut -c $((pos + 1)))
        case "$c" in
          [a-zA-Z0-9.~_-]) o="$c" ;;
          *)               o=$(printf '%%%02X' "'$c") ;;
        esac
        encoded="$encoded$o"
        pos=$((pos + 1))
      done
      echo "$encoded"
    }

    echo "Configuring default database"
    mkdir /tmp/datasources
    date_now=$(date -u +"%Y-%m-%dT%H:%M:%SZ")
    user=$(urlencode "$USER" )
    password=$(urlencode "$PASSWORD" )
    server=$(urlencode "$DB_SERVER" )
    dbName=$(urlencode "$DB_NAME" )
    conn_str=$( echo "mssql+pymssql://$user:$password@$server:1433/$dbName" )

    mkdir /tmp/datasources/databases

    cat > /tmp/datasources/metadata.yaml <<EOT
    version: 1.0.0
    type: Database
    timestamp: '$date_now'
    EOT

    cat > /tmp/datasources/databases/DB.yaml <<EOT
    database_name: DB
    sqlalchemy_uri: $conn_str
    cache_timeout: null
    expose_in_sqllab: true
    allow_run_async: false
    allow_ctas: false
    allow_cvas: false
    allow_dml: false
    allow_file_upload: false
    extra:
        allows_virtual_table_explore: true
    uuid: 738f72b0-651b-4d60-9968-dc3d3064c149
    version: 1.0.0
    EOT

    echo "Importing database connections from constructed local files"
    superset import-directory /tmp/datasources -o
    echo "Re-Importing database connections to work around https://github.com/apache/superset/issues/31983"
    superset set-database-uri -d Lokulus -u $conn_str -s

(Please be aware that because of #21256 still being present in 4.1.1, this also brings in an extraneous examples database connection)

Comment From: watercraft

My workaround also uses the init script with these commands to call the API after the import: echo "Encrypt database connection URI.... " CSRF=`curl --silent -c cookies -H"<<some authentication>>" -X GET http://dataviz-superset:8088/api/v1/security/csrf_token/ | python3 -c 'import json, sys; print(json.dumps(json.loads("".join([l for l in sys.stdin]))["result"]))' | sed 's/"//g' ` curl --silent -b cookies -H"<<some authentication>>" -X GET http://dataviz-superset:8088/api/v1/database/1/connection | python3 -c 'import json, sys; print(json.dumps(json.loads("".join([l for l in sys.stdin]))["result"]))' > result curl --silent -b cookies -H"Content-Type: application/json" -H"X-Csrftoken: $CSRF" -H"<<some authentication>>" -d@result -X PUT http://dataviz-superset:8088/api/v1/database/1

Comment From: rusackas

CC @betodealmeida @dpgaspar @msyavuz - seems like we should look more closely at this if it's an issue.