Bug description
Description
I find my superset instance not working properly the Async queries and Cache config seems messed up. When opening Superset for the first time, all dashboards are stuck on loading for whatever time you stay there, then when I change to Datasets or apply a different value in filters and then comeback to the original dashboard, it will be loaded making seem like cache is working intermittently, as sometimes cached data seems to be wrong.
If I take one of the queries from the Dashboard and run it in SQL.Lab, it will work correctly and take 3 seconds approximately, but in the dashboard it will stay loading forever.
Evidence
Environment
- superset-helm-chart-0.12.11
Values file
(used by helm chart)
superset:
imagePullSecrets:
- name: artifactory
resources:
# We usually recommend not to specify default resources and to leave this as a conscious
# choice for the user. This also increases chances charts run on environments with little
# resources, such as Minikube. If you do want to specify resources, uncomment the following
# lines, adjust them as necessary, and remove the curly braces after 'resources:'.
# The limits below will apply to all Superset components. To set individual resource limitations refer to the pod specific values below.
# The pod specific values will overwrite anything that is set here.
limits:
cpu: 1000m
memory: 2000Mi
requests:
cpu: 500m
memory: 1000Mi
image:
repository: (edited)
tag: 4.0.2
pullPolicy: "IfNotPresent"
pullSecrets:
- artifactory
initImage:
repository: (edited)/jwilder/dockerize
tag: "latest"
pullPolicy: "IfNotPresent"
pullSecrets:
- artifactory
postgresql:
# Skip to use own managed instance instead
enabled: false
image:
registry: (edited)
repository: bitnami/postgresql
tag: "14.6.0-debian-11-r13"
pullPolicy: "IfNotPresent"
pullSecrets:
- artifactory
redis:
image:
registry: (edited)
repository: bitnami/redis
tag: 7.2.5-debian-12-r2
pullPolicy: "IfNotPresent"
pullSecrets:
- artifactory
extraEnv:
AZURE_APPLICATION_ID: (edited)
AZURE_TENANT_ID: (edited)
SMTP_HOST: (edited)
SMTP_MAIL_FROM: (edited)
SMTP_PASSWORD: null
SMTP_PORT: 25
SMTP_USER: null
DB_SCHEMA: superset
EMAIL_REPORTS_SUBJECT_PREFIX:
- Metrics Report
TALISMAN_ENABLED: false
configOverrides:
enable_oauth: |
# This will make sure the redirect_uri is properly computed, even with SSL offloading
ENABLE_PROXY_FIX = True
from flask_appbuilder.security.manager import AUTH_OAUTH
AUTH_TYPE = AUTH_OAUTH
OAUTH_PROVIDERS = [
{
"name": "azure",
"icon": "fa-windows",
"token_key": "access_token",
"remote_app": {
"client_id": os.getenv("AZURE_APPLICATION_ID"),
"client_secret": os.getenv("AZURE_SECRET"),
"api_base_url": f"https://login.microsoftonline.com/{env('AZURE_TENANT_ID')}/oauth2",
"client_kwargs": {
"scope": "User.read name preferred_username email profile upn",
"resource": os.getenv("AZURE_APPLICATION_ID"),
},
"request_token_url": None,
"access_token_url": f"https://login.microsoftonline.com/{env('AZURE_TENANT_ID')}/oauth2/token",
"authorize_url": f"https://login.microsoftonline.com/{env('AZURE_TENANT_ID')}/oauth2/authorize",
},
}
]
HTML_SANITIZATION_SCHEMA_EXTENSIONS = {
"attributes": {
"*": ["style","className", "class"],
},
"tagNames": ["style"]
}
# Map Authlib roles to superset roles
AUTH_ROLE_ADMIN = 'Admin'
AUTH_ROLE_PUBLIC = 'Public'
# Will allow user self registration, allowing to create Flask users from Authorized User
AUTH_USER_REGISTRATION = True
# The default user self registration role
AUTH_USER_REGISTRATION_ROLE = "Gamma"
extra_features: |
import ast
CORS_OPTIONS = {
"supports_credentials": True,
"allow_headers": ['*'],
"resources": ['*'],
"origins": [
"http://localhost:8088",
"http://localhost:8888",
"https:\/\/.*\.(edited)\.com"
]
}
FEATURE_FLAGS = {
"ALERT_REPORTS": True,
"DASHBOARD_RBAC": True,
"EMBEDDABLE_CHARTS": True,
"EMBEDDED_SUPERSET": True,
"HORIZONTAL_FILTER_BAR": True,
"DRILL_BY": True,
"DRILL_TO_DETAIL": True,
"ENABLE_TEMPLATE_PROCESSING": True,
"DASHBOARD_VIRTUALIZATION": True,
"GLOBAL_ASYNC_QUERIES": True
}
celery_conf: |
from celery.schedules import crontab
class CeleryConfig(object):
broker_url = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
imports = ('superset.sql_lab', "superset.tasks.scheduler", )
result_backend = f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
}
CELERY_CONFIG = CeleryConfig
GLOBAL_ASYNC_QUERIES_JWT_SECRET = os.getenv("JWT_SECRET")
GLOBAL_ASYNC_QUERIES_TRANSPORT: 'polling'
GLOBAL_ASYNC_QUERIES_POLLING_DELAY: 50
from flask_caching.backends.rediscache import RedisCache
RESULTS_BACKEND = RedisCache(
host=f"{env('REDIS_HOST')}", port=f"{env('REDIS_PORT')}", key_prefix='superset_results')
RESULTS_BACKEND_USE_MSGPACK = True
CACHE_NO_NULL_WARNING = True
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 86400,
'CACHE_KEY_PREFIX': 'superset_cache',
'CACHE_REDIS_URL': f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
}
DATA_CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 86400,
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': f"redis://{env('REDIS_HOST')}:{env('REDIS_PORT')}/0"
}
# WebDriver configuration
EMAIL_PAGE_RENDER_WAIT = 60
WEBDRIVER_BASEURL = "http://{{ template "superset.fullname" . }}:{{ .Values.service.port }}/"
WEBDRIVER_BASEURL_USER_FRIENDLY = "${hostname}"
smtp: |
import ast
SMTP_HOST = os.getenv("SMTP_HOST")
SMTP_PORT = os.getenv("SMTP_PORT")
SMTP_STARTTLS = ast.literal_eval(os.getenv("SMTP_STARTTLS", "True"))
SMTP_SSL = ast.literal_eval(os.getenv("SMTP_SSL", "False"))
SMTP_USER = os.getenv("SMTP_USER")
SMTP_PASSWORD = os.getenv("SMTP_PASSWORD")
SMTP_MAIL_FROM = os.getenv("SMTP_MAIL_FROM")
EMAIL_REPORTS_SUBJECT_PREFIX = os.getenv("EMAIL_REPORTS_SUBJECT_PREFIX")
general: |
SQLALCHEMY_DATABASE_URI = f"postgresql+psycopg2://{env('DB_USER')}:{env('DB_PASS')}@{env('DB_HOST')}:{env('DB_PORT')}/{env('DB_NAME')}?options=-csearch_path={env('DB_SCHEMA')}"
SQLALCHEMY_POOL_SIZE = 20
SQLALCHEMY_MAX_OVERFLOW = 40
SQLALCHEMY_POOL_TIMEOUT = 180
secret: |
SECRET_KEY = os.getenv("SECRET_KEY")
PREVIOUS_SECRET_KEY = os.getenv("PREVIOUS_SECRET_KEY")
bootstrapScript: |
#!/bin/bash
rm -rf /var/lib/apt/lists/* && \
pip install \
psycopg2-binary==2.9.1 \
Authlib && \
if [ ! -f ~/bootstrap ]; then echo "Running Superset with uid {{ .Values.runAsUser }}" > ~/bootstrap; fi
supersetWorker:
replicaCount: 2
command:
- /bin/sh
- -c
- |
# Install firefox webdriver
# See https://github.com/apache/superset/blob/4fa3b6c7185629b87c27fc2c0e5435d458f7b73d/docs/src/pages/docs/installation/email_reports.mdx
apt update
apt-get install --no-install-recommends -y firefox-esr wget
wget -q https://github.com/mozilla/geckodriver/releases/download/v${geckodriver_version}/geckodriver-v${geckodriver_version}-linux64.tar.gz
tar -x geckodriver -zf geckodriver-v${geckodriver_version}-linux64.tar.gz -O > /usr/bin/geckodriver
chmod 755 /usr/bin/geckodriver
rm geckodriver-v${geckodriver_version}-linux64.tar.gz
# Run
. {{ .Values.configMountPath }}/superset_bootstrap.sh; celery --app=superset.tasks.celery_app:app worker --pool=prefork -O fair -c 4
forceReload: true
ingress:
enabled: true
ingressClassName: nginx
annotations:
"cert-manager.io/cluster-issuer": "(edited)"
path: /
pathType: ImplementationSpecific
extraHostsRaw: []
# Superset beat configuration (to trigger scheduled jobs like reports)
supersetCeleryBeat:
# -- This is only required if you intend to use alerts and reports
enabled: true
Screenshots/recordings
Dashboard stuck:
Correct connectivity to database:
Superset version
4.0.2
Python version
3.10
Node version
I don't know
Browser
Chrome
Additional context
No response
Checklist
- [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [ ] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hey there, @jorgehn52! 👋 I'm here to help you with any bugs, questions, or even becoming a contributor. Let's squash those bugs together!
I found a similar unsolved discussion regarding async queries in Superset, which might be related to the issue you're experiencing. You can check out the discussion titled Async Queries in Dashboards & Charts [1].
Additionally, there was a closed issue titled Async queries sometimes fail when retrieving cached query results, which was addressed by a pull request that aimed to fix unintended cache misses with async queries [2]. This might provide some insights or solutions relevant to your problem.
For your specific issue, ensure that your Redis configuration is correctly set up in the superset_config.py
file. Here are some key configurations to check:
-
Cache Configuration: Ensure that the
CACHE_CONFIG
andDATA_CACHE_CONFIG
are correctly pointing to your Redis instance and that theCACHE_TYPE
is set to'RedisCache'
. -
Async Queries Configuration: Verify that the
GLOBAL_ASYNC_QUERIES
feature flag is enabled and that theGLOBAL_ASYNC_QUERIES_TRANSPORT
is set to'polling'
or'ws'
as per your setup. -
Celery Configuration: Make sure that the Celery broker and result backend are correctly configured to use Redis, and that the Celery workers are running properly.
-
JWT Secret: Ensure that the
GLOBAL_ASYNC_QUERIES_JWT_SECRET
is set correctly, as this is crucial for async query authentication.
If the issue persists, you might want to look into the specific error messages or logs related to async queries and caching in Superset 4.0.2, which can provide more detailed insights into what might be going wrong [3][4].
To continue talking to Dosu, mention @dosu.
Comment From: jorgehn52
Hello @dosu Yes those configurations are in place. And still, not working. As an example: I have a dashboard, that is taking ~20 seconds to load, within that dashboard there are graphs in where the SQL query takes only 3 seconds. That is why I was configuring Async, hoping that gets resolved.
I also noticed that turning "GLOBAL_ASYNC_QUERIES" to True, makes the graphs in the dashboard to load forever. EDIT: I also noticed that even after making sure that redis-headless is up and configured, some calles are trying to reach redis in "127.0.0.1:6379"
Comment From: padbk
Can't see GLOBAL_ASYNC_QUERIES_REDIS_CONFIG in your config. Try adding it and pointing it at your Redis instance.
https://github.com/apache/superset/blob/f11fa091e261a35f4d39d8567a859fad07547d84/superset/config.py#L1536
Comment From: dosubot[bot]
It seems like there might be a misconfiguration in your Redis setup. Superset determines the Redis host and port using the CACHE_REDIS_HOST
and CACHE_REDIS_PORT
configuration keys. If these are not explicitly set, it defaults to "redis" for the host and 6379 for the port [1][2].
To resolve the issue, ensure that your Redis configuration in superset_config.py
or your Helm chart values file is correctly pointing to the Redis service you intend to use. Specifically, check that the CACHE_REDIS_HOST
and CACHE_REDIS_PORT
are set to the correct values for your environment, rather than defaulting to 127.0.0.1:6379
.
Additionally, verify that the GLOBAL_ASYNC_QUERIES_REDIS_CONFIG
is also correctly configured to point to your Redis instance. This should help ensure that Superset is connecting to the correct Redis server for caching and async queries.
To continue talking to Dosu, mention @dosu.
Comment From: padbk
@dosu CACHE_REDIS_HOST and CACHE_REDIS_PORT only applies on latest master where GLOBAL_ASYNC_QUERIES_CACHE_BACKEND is the new flag to use.
For 4.1.0 and below GLOBAL_ASYNC_QUERIES_REDIS_CONFIG is needed.
Comment From: jorgehn52
Hello @padbk. Thank you for your message and suggestion.
So I added the portion I was missing. And also enabled the debug mode to the worker. In the UI it seems to still taking ~20 seconds to load, but I see in the logs is that it is also taking long in the "load_chart_data_into_cache". Taking ~20 seconds for some of them.
Also to mention that I tested it with 4.0.1 and 4.0.2 versio with same 0.12.11 helm chart version This is how my values file look line now.
`
FEATURE_FLAGS = {
"ALERT_REPORTS": True,
"DASHBOARD_RBAC": True,
"EMBEDDABLE_CHARTS": True,
"EMBEDDED_SUPERSET": True,
"HORIZONTAL_FILTER_BAR": True,
"DRILL_BY": True,
"DRILL_TO_DETAIL": True,
"ENABLE_TEMPLATE_PROCESSING": True,
"DASHBOARD_VIRTUALIZATION": False,
"GLOBAL_ASYNC_QUERIES": True
}
celery_conf: |
from celery.schedules import crontab
import redis
REDIS_HOST = os.getenv("REDIS_HOST")
REDIS_PORT = os.getenv("REDIS_PORT")
REDIS_CELERY_DB = os.getenv("REDIS_CELERY_DB")
class CeleryConfig(object):
broker_url = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
imports = ('superset.sql_lab', "superset.tasks.thumbnails", "superset.tasks.scheduler",)
result_backend = f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}"
worker_prefetch_multiplier = 10
task_acks_late = True
task_annotations = {
'sql_lab.get_sql_results': {
'rate_limit': '100/s',
},
}
CELERY_CONFIG = CeleryConfig
CACHE_CONFIG = {
'CACHE_TYPE': 'RedisCache',
'CACHE_DEFAULT_TIMEOUT': 900,
'CACHE_KEY_PREFIX': 'superset_results',
'CACHE_REDIS_URL': f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}",
}
GLOBAL_ASYNC_QUERIES_JWT_SECRET = os.getenv("JWT_SECRET")
GLOBAL_ASYNC_QUERIES_TRANSPORT: 'polling'
GLOBAL_ASYNC_QUERIES_REDIS_CONFIG = {
"port": f"{REDIS_PORT}",
"host": f"{REDIS_HOST}",
"password": "",
"db": f"{REDIS_CELERY_DB}",
"ssl": False,
}
from flask_caching.backends.rediscache import RedisCache
RESULTS_BACKEND = RedisCache(host=f"{REDIS_HOST}", port=f"{REDIS_PORT}", key_prefix='superset_results')
RESULTS_BACKEND_USE_MSGPACK = True
CACHE_NO_NULL_WARNING = True
`
Is it ok to use REDIS_CELERY_DD which is 0 by default for every config? during installation, redis is only configured with: REDIS_DB=1 and REDIS_CELERY_DB=0 env values.
Do you have ay other recommendation? or seeing something odd in my config.. It has changed because of all post I have seen. Thanks
Comment From: dosubot[bot]
To address the issue with async queries and caching, ensure that you have the GLOBAL_ASYNC_QUERIES_REDIS_CONFIG
properly set in your configuration for Superset 4.0.2. This configuration should point to your Redis instance, ensuring that Superset is not defaulting to 127.0.0.1:6379
.
Here's an example of how you might configure it:
GLOBAL_ASYNC_QUERIES_REDIS_CONFIG = {
'host': 'your-redis-host',
'port': 6379,
'db': 0,
'password': 'your-redis-password', # if applicable
}
Make sure that this configuration is included in your superset_config.py
or your Helm chart values file. This should help Superset connect to the correct Redis server for handling async queries and caching effectively. If you continue to experience issues, double-check the network connectivity and firewall settings to ensure that Superset can reach the Redis server at the specified host and port.
To continue talking to Dosu, mention @dosu.
Comment From: padbk
We use same redis db for both
Comment From: rusackas
@jorgehn52 sorry most of us weren't able to help here (and thank you @padbk for your help!). Are you still facing this issue in 4.1.2? I'm tempted to close this out as inactive since it's been silent for a long time, or move it to a Discussion, since it's a config issue rather than a bug (as far as I can tell) but wanted to get your input first.
Comment From: rusackas
Haven't heard back, so closing this as inactive, but happy to reopen if needed!