Dear lovely people of Apache Superset,
first things first: Thanks a stack for conceiving and maintaining Apache Superset. It is truly a gem.
Foreword
~This is not meant to be an actual bug report. Maybe you can slap an info
label on it, or just tuck it away into the "Discussions" section?~ After so many people confirming the problem is also hitting them, I think it actually qualifies as a bug.
Introduction
I am trying to create a data source using the HTTP API of Apache Superset without adjusting WTF_CSRF_ENABLED = False
and I think I took all input from #2488, #4018, #8382, #10354, #16003, #17206, #19343, #19356, and further information referenced below into consideration.
16003 was the most helpful of all resources, outlining how to send both Authorization
and X-CSRFToken
headers appropriately. However, people are still struggling to replicate this workflow from the command line, for example using curl
.
In this post, I would like to demonstrate, that beyond properly sending the corresponding tokens, you will also need to maintain a session between requests. I will use HTTPie for that purpose.
Walkthrough
This is meant to be exercised on a standard vanilla installation of Apache Superset, where the authentication credentials are still admin/admin
and no other pieces have been modified. If you adjusted your installation, you will need to modify some bits accordingly.
You will need to install both HTTPie and jq, e.g. by typing {apt,brew,yum} install httpie jq
.
# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)
# Acquire a CSRF token.
CSRF_TOKEN=$(http --session=superset http://localhost:8088/api/v1/security/csrf_token/ Authorization:"Bearer ${AUTH_TOKEN}" | jq -r .result)
# Create a data source item / database connection.
http --session=superset http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://postgres@host.docker.internal:5432 Authorization:"Bearer ${AUTH_TOKEN}" X-CSRFToken:"${CSRF_TOKEN}"
Enquiry
Somehow, I would have expected that this procedure would also work without needing to maintain a session. However, when running the commands from the example above, and omitting the --session=
option, the last command croaks with the venerous
400 Bad Request: The CSRF session token is missing.
Conclusion
So, this post is meant to be both an informational reference for the community how to actually create datasource items using the HTTP API from the commandline, and at the same time an enquiry to the developers, if my expectations, to be able to run a conversation with the API without maintaining a session, are actually inappropriate.
Thank you in advance for taking the time to look into this topic.
With kind regards, Andreas.
Further references
https://stackoverflow.com/questions/66015739/use-apache-superset-api-to-feed-a-dataset https://stackoverflow.com/questions/68614350/cannot-post-a-new-db-to-apache-superset-400-error-with-csrf https://solveforum.com/forums/threads/solved-cannot-post-a-new-db-to-apache-superset-400-error-with-csrf.49375/ https://groups.google.com/g/airbnb_superset/c/3H7SZma4ZEE
Comment From: stupid-yu
hello, I have the same problem when using curl
to create database.
[root@superset]# token=$(curl -X 'POST' \
'http://'${HOSTNAME}':'${PORT}'/api/v1/security/login' \
-H 'accept: */*' \
-H 'Content-Type: application/json' \
-d '{
"username": "admin",
"password": "admin",
"refresh": true,
"provider": "db"
}')
[root@superset]# function parse_json { echo "${1//\"/}" | sed "s/.*$2:\([^,}]*\).*/\1/" ; }
[root@superset]# csrf=$(curl -X 'GET' 'http://'${HOSTNAME}':'${PORT}'/api/v1/security/csrf_token/' -H 'Authorization: Bearer '$(parse_json $token "access_token")'')
[root@superset]# curl -vvvv -X 'POST' 'http://'${HOSTNAME}':'${PORT}'/api/v1/database/' -H 'Authorization: Bearer '$(parse_json $token "access_en: '$(parse_json $csrf "result")'' -H 'accept: */*' -H 'Content-Type: application/json' -d '{
"database_name": "kyuubi-jdbc",
"sqlalchemy_uri": "hive://bcdp@dwh-htwsxrv9-kyuubi-kyuubi",
"expose_in_sqllab": true,
"allow_ctas": true,
"allow_cvas": true,
"allow_dml": true,
"allow_multi_schema_metadata_fetch": true
}'
* About to connect() to dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf port 58093 (#0)
* Trying 192.168.11.173...
* Connected to dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf (192.168.11.173) port 58093 (#0)
> POST /api/v1/database/ HTTP/1.1
> User-Agent: curl/7.29.0
> Host: dwh-htwsxrv9-kyuubi-superset-dc959bbbd-lhkcf:58093
> Authorization: Bearer eyJ0eXAiOiJKV1QiLCJhbGciOiJIUzI1NiJ9.eyJpYXQiOjE2NTcxODI2MTcsIm5iZiI6MTY1NzE4MjYxNywianRpIjoiMzYxMjA4YmEtMThjZC00MDY0LTgxOTQtNjdiZjI3ZmY1ZjI2IiwiZXhwIjoxNjU3MTgzNTEJlc2giOnRydWUsInR5cGUiOiJhY2Nlc3MifQ.a7sFispKsyUD3FDo47HuuCtq9jP7xpWy3ZaeI1bVpuc
> X-CSRFToken: ImY2ZmUxNDIzNGQ2YTUwYjI2NDg3ZDc0YjRjOGUxZGMwMDAzODA3Zjgi.YsaZsQ.SrP1_NXVfnSZ6uW16V25vPE7yqo
> accept: */*
> Content-Type: application/json
> Content-Length: 222
>
* upload completely sent off: 222 out of 222 bytes
* HTTP 1.0, assume close after body
< HTTP/1.0 400 BAD REQUEST
< Content-Type: text/html; charset=utf-8
< Content-Length: 150
< Server: Werkzeug/1.0.1 Python/3.7.10
< Date: Thu, 07 Jul 2022 08:33:01 GMT
<
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>400 Bad Request</title>
<h1>Bad Request</h1>
<p>The CSRF session token is missing.</p>
* Closing connection 0
Comment From: vishaltps
@amotl Its clearly a bug, I have tried to create a guest user token from my rails app and i am keep getting error for CSRF token session is misisng
. However, if i am trying from postman it is working fine.
Comment From: amotl
Hi again,
using Superset 2.1.3, on a vanilla installation, I verified that maintaining a session, and supplying a CSRF token, is no longer needed to work with the HTTP API.
# Authenticate and acquire a JWT token.
AUTH_TOKEN=$(http http://localhost:8088/api/v1/security/login username=admin password=admin provider=db | jq -r .access_token)
# Create a data source item / database connection.
http http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://postgres@host.docker.internal:5432 Authorization:"Bearer ${AUTH_TOKEN}"
Thanks a stack for improving the situation in this regard.
With kind regards, Andreas.
Comment From: amotl
Hi again. After upgrading to the most recent Superset 3, the problem is back! Cheers, Andreas.
Request
http http://localhost:8088/api/v1/database/ database_name="PostgreSQL Example" engine=postgres sqlalchemy_uri=postgres://postgres@host.docker.internal:5432 Authorization:"Bearer ${AUTH_TOKEN}" --print hHbB
Response
{
"errors": [
{
"error_type": "GENERIC_BACKEND_ERROR",
"extra": {
"issue_codes": [
{
"code": 1011,
"message": "Issue 1011 - Superset encountered an unexpected error."
}
]
},
"level": "error",
"message": "400 Bad Request: The CSRF token is missing."
}
]
}
Comment From: amotl
I see. With Superset 3, you need to configure WTF_CSRF_ENABLED = False
in superset_config.py
. Then, communicating with the HTTP API works without needing to use a corresponding CSRF token. That's fine for my specific purpose, but I am wondering if CSRF protection would be turned off completely then, also on requests from browsers?
Comment From: ghost
I have this with latest Superset Docker image from the Docker hub.
Please, sort this out, this is ridiculous!
Comment From: rusackas
This has gone slilent for upward of a year, and is a bit confusing at this point, since it was originally reported in an older (unsupported) version. Maybe @dosu-bot can give us some advice and help summarize the current state of affairs.
Comment From: amotl
Hi Evan. Unless anything has been fixed, I guess nothing has changed/improved in this regard.
After upgrading to the most recent Superset 3, the problem is back!
We had to use WTF_CSRF_ENABLED = False
, in order to make pure HTTP API conversations possible, see https://github.com/crate/cratedb-examples/commit/e49671eb6dc62ff0adb009160d5d5d5ecc57b532. We think it should not be required to turn that off, because this would on the other hand make the web-based conversations more vulnerable?
Comment From: babaMar
I got it working by persisting a session and updating its headers (also the 'Referer' one):
import requests
BASE_URL = '...' # your base Superset URL
LOGIN = '/api/v1/security/login'
CSRF_TOKEN = '/api/v1/security/csrf_token/'
DATASET = '/api/v1/dataset/'
session = requests.Session()
session.headers.update({'Referer': BASE_URL})
res = requests.post(BASE_URL + LOGIN, json=auth_payload)
AUTH_HEADER = {
'Authorization': f'Bearer {res.json()["access_token"]}'
}
session.headers.update(AUTH_HEADER)
res = session.get(BASE_URL + CSRF_TOKEN)
CSRF_TOKEN_HEADER = {"X-CSRFToken": f"{res.json()['result']}"}
session.headers.update(CSRF_TOKEN_HEADER)
After this creating a dataset via a POST request to /api/v1/dataset/'
Comment From: amotl
Yeah, this works. However, it's difficult to maintain a session on the command line. HTTP sessions are mostly not in the same box like API-style access, but for users instead.
Comment From: silwyne
I have the same problem ! can we discuss more about this to find a solution? I think I can fix that. can anyone assign this to me ?
Comment From: amotl
Hi again. As we observed this had been working well at least once in the past, it would be sweet if someone could tackle this again for the more recent versions of Superset. Thank you for looking into this!
Comment From: silwyne
Hi everyone ! I made it! for creating a database or a datasource in superset using HTTP api you must use this python code:
import sys
import requests
import json
import logging as Logger
TIMEOUT = 3
class Utils():
@staticmethod
def abort(message: str):
Logger.error(message)
Logger.error("Aborting the operation.")
sys.exit(1)
class SupersetHTTPService:
@staticmethod
def get_csrf_authenticated_session(
superset_host: str,
user: str,
password: str,
access_token: str = None,
) -> requests.Session:
if not access_token:
access_token = SupersetHTTPService._get_access_token(
superset_host=superset_host,
user=user,
password=password,
)
session = requests.Session()
session.headers.update({'Referer': superset_host})
session.headers.update({'Authorization': f'Bearer {access_token}'})
csrf_token = SupersetHTTPService._get_csrf_token(
session=session,
superset_host=superset_host,
)
session.headers.update({
'X-CSRFToken': csrf_token
})
return session
@staticmethod
def _get_csrf_token(
session: requests.Session,
superset_host: str,
) ->str:
try:
res = session.get(superset_host + "/api/v1/security/csrf_token/")
res.raise_for_status()
csrf_token = res.json()['result']
return csrf_token
except Exception as e:
Utils.abort(f"Error while getting csrf token: {e}")
@staticmethod
def create_database(
superset_host: str,
user: str,
password: str,
database_name: str,
engine: str,
sqlalchemy_uri: str,
other_parameters: str,
access_token: str = None,
):
if not access_token:
access_token = SupersetHTTPService._get_access_token(
superset_host=superset_host,
user=user,
password=password,
)
session = SupersetHTTPService.get_csrf_authenticated_session(
superset_host=superset_host,
user=user,
password=password,
access_token=access_token,
)
json_data = {
"database_name": database_name,
"engine": engine,
"sqlalchemy_uri": sqlalchemy_uri,
}
other_parameters = json.loads(other_parameters)
for key, value in other_parameters.items():
json_data[key] = value
print(f"json_data: \n{json_data}")
response = session.post(url=f"{superset_host}/api/v1/database", json=json_data)
try:
response.raise_for_status()
return response
except Exception as e:
Utils.abort(f"Error while creating database {e}")
def create_database(
superset_host: str = None,
user: str = None,
password: str = None,
database_name: str = None,
engine: str = None,
sqlalchemy_uri: str = None,
other_parameters: str = None,
):
response = SupersetHTTPService.create_database(
superset_host=superset_host,
user=user,
password=password,
database_name=database_name,
engine=engine,
sqlalchemy_uri=sqlalchemy_uri,
other_parameters=other_parameters,
)
return response
response = create_database(
superset_host="http://localhost:8080",
user="admin",
password="admin_pass_1234",
database_name="Some_DB",
engine="Clickhouse",
sqlalchemy_uri="clickhousedb+connect://admin:XXXXXXXXXX@192.168.8.140:8124/Some_DB",
other_parameters="",
)
Comment From: amotl
Hi @silwyne. Thank you for sharing your solution. However, I think you are maintaining a session, right? On the other hand, our request / admonition was about the need to have a conversation without needing to maintain a session, when using API authentication (Authorization Bearer). We think CSRF tokens are also not applicable in this situation.