Bug description

When exporting a dashboard with a high number of charts (e.g., 33 charts) as a PDF or PNG, the exported file does not include all charts. In one example, only the first 24 charts are visible in the downloaded file, and the last visible chart is partially cut off. In another attempt, not all the charts appear in the export but no partial charts are showing.

Repro steps: 1. Make sure the following feature flags are on: DASHBOARD_VIRTUALIZATION, ENABLE_DASHBOARD_SCREENSHOT_ENDPOINTS, ENABLE_DASHBOARD_DOWNLOAD_WEBDRIVER_SCREENSHOT 2. Open a dashboard with a large number of charts (e.g., 33 charts, predominantly table and pivot table charts). 3. Make most of the tables full width and ensure the dashboard layout requires scrolling to see the charts 4. Export the dashboard as a PDF or PNG

Expected: The exported file should include all charts present in the dashboard, fully visible and not cut off. Current: Export doesn't contain all charts or has cut off charts.

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [ ] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: DachiCharkviani

Hello, I would like to work on this issue.

Comment From: geido


🎉 Preset Bounty Available: $250 USD 🎉

To claim this bounty, please carefully follow the steps below.


📋 Steps to Participate

  1. Review Guidelines:
    Read through the Preset Bounty Program Contribution Guide for complete details on bounty requirements.

  2. Show Your Interest:
    Complete the Preset Bounty Program Survey and comment this issue to express your interest.

  3. Join the Slack Channel:
    After completing the survey, you’ll receive an invitation to the dedicated Apache Superset Slack channel.

  4. Get Assigned:
    To officially start, ensure a Bounty Program Manager has assigned you to this issue.

  5. Submit Your Solution:
    When ready, submit your solution with the Fixes #{issue_number} notation in your Pull Request description.

  6. Claim Your Bounty:
    Sign up at GitPay.me and submit your solution via: https://gitpay.me/#/task/987


💡 Additional Notes

  • Only developers assigned by a Bounty Program Manager should start working on this issue to win the bounty.
  • Be sure to follow the guide closely to avoid any delays in payment. Please, allow a few days after your PR has been merged for the bounty to be released.

Good luck, and happy coding! 🎉

Comment From: alexandrusoare

Hello, I am interested on working on this

Comment From: geido

Hello, I would like to work on this issue.

Hi @DachiCharkviani have you joined our bounty program? This issue is going out as a bounty and I don't see you in the program yet. Feel free to join as we have a bunch more issues up for grab!

Comment From: geido

Currently holding on this as we could not reproduce the issue

Comment From: OOub

I'm running into the same issue for large dashboards.

Comment From: geido

I'm running into the same issue for large dashboards.

Hello @OOub it would be great if you could provide some repro steps as we are having hard times reproducing this issue. Thanks!

Comment From: OOub

Hello, sure, let me know if you need additional details.

It works fine on small dashboards, and the generated pdf is cutoff on dashboards that require scrolling.

The pdf with the screenshot endpoint disabled is in vertical format and all pages are present.

The pdf with the screenshot endpoint feature flags enabled seems to be in horizontal format, only the first page is downloaded, and the last visible chart is cutoff

  • superset version: 4.1.1

Enabled feature flags:

  • ALERT_REPORT_SLACK_V2
  • ALERT_REPORTS
  • ALERTS_ATTACH_REPORTS
  • ALLOW_ADHOC_SUBQUERY
  • DASHBOARD_NATIVE_FILTERS
  • DASHBOARD_VIRTUALIZATION (tried both ON and OFF)
  • DASHBOARD_RBAC
  • DRILL_BY
  • DRILL_TO_DETAIL
  • EMBEDDABLE_CHARTS
  • EMBEDDED_SUPERSET
  • ENABLE_DASHBOARD_SCREENSHOT_ENDPOINTS
  • ENABLE_DASHBOARD_DOWNLOAD_WEBDRIVER_SCREENSHOT
  • ENABLE_JAVASCRIPT_CONTROLS
  • ENABLE_TEMPLATE_PROCESSING
  • HORIZONTAL_FILTER_BAR
  • SLACK_ENABLE_AVATARS
  • TAGGING_SYSTEM
  • THUMBNAILS
  • THUMBNAILS_SQLA_LISTENERS

Deployed on Kubernetes via the helm chart (version 0.13.4)

Comment From: Maissacrement

I'm interresed too

Comment From: OOub

Managed to solve my issue by using firefox instead of chrome as the webdriver in my custom docker image. Hope this helps

Comment From: geido

Thanks. This is still on hold while we investigate the problem. Thank you

Comment From: rusackas

@geido do you know if this one is still an issue that needs tackling?

Comment From: geido

@geido do you know if this one is still an issue that needs tackling?

We are still trying to repro consistently

Comment From: rusackas

These all seem like they might effectively be the same issue:

https://github.com/apache/superset/issues/29394 https://github.com/apache/superset/issues/31158 https://github.com/apache/superset/issues/29719 https://github.com/apache/superset/issues/28713 https://github.com/apache/superset/issues/27532

Can anyone here test this with DASHBOARD_VIRTUALIZATION set to false? I have a suspicion that may fix it. If so, we can probably just disable viewport virtualization when the headless browser is building the screencap.

Comment From: LisaHusband

@msyavuz can i get assigned it ?

Comment From: f-teyssier

Hi, nothing new on this one ? I have the same issue on Superset 4.1.3 with playwright firefox: Le PDF Misc Charts-1.pdf

I have tried a lot of combinations with this but it's only effective on png export, not pdf. Here is my conf file:


import logging
import os
import sys

from celery.schedules import crontab
from flask_caching.backends.filesystemcache import FileSystemCache
from flask_caching.backends.rediscache import RedisCache

###Configurations####
logger = logging.getLogger()

#Version
TAG=os.getenv("TAG")


SQLALCHEMY_ECHO = True
# Variables d'environnement pour la DB
DATABASE_DIALECT = os.getenv("DATABASE_DIALECT")
DATABASE_USER = os.getenv("POSTGRES_USER")
DATABASE_PASSWORD = os.getenv("POSTGRES_PASSWORD")
DATABASE_HOST = os.getenv("DATABASE_HOST")
DATABASE_PORT = os.getenv("DATABASE_PORT")
DATABASE_DB = os.getenv("POSTGRES_DB")

# Connexion SQLAlchemy
SQLALCHEMY_DATABASE_URI = (
    f"{DATABASE_DIALECT}://"
    f"{DATABASE_USER}:{DATABASE_PASSWORD}@"
    f"{DATABASE_HOST}:{DATABASE_PORT}/{DATABASE_DB}"
)

EXAMPLES_USER = os.getenv("EXAMPLES_USER")
EXAMPLES_PASSWORD = os.getenv("EXAMPLES_PASSWORD")
EXAMPLES_HOST = os.getenv("EXAMPLES_HOST")
EXAMPLES_PORT = os.getenv("EXAMPLES_PORT")
EXAMPLES_DB = os.getenv("EXAMPLES_DB")

SQLALCHEMY_EXAMPLES_URI = (
    f"{DATABASE_DIALECT}://"
    f"{EXAMPLES_USER}:{EXAMPLES_PASSWORD}@"
    f"{EXAMPLES_HOST}:{EXAMPLES_PORT}/{EXAMPLES_DB}"
)

# Variables Redis
REDIS_HOST = os.getenv("REDIS_HOST", "redis")
REDIS_PORT = os.getenv("REDIS_PORT", "6379")
REDIS_CELERY_DB = os.getenv("REDIS_CELERY_DB", "0")
REDIS_RESULTS_DB = os.getenv("REDIS_RESULTS_DB", "1")

# Configuration Celery
CELERY_CONFIG = {
    "broker_url": f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_CELERY_DB}",
    "result_backend": f"redis://{REDIS_HOST}:{REDIS_PORT}/{REDIS_RESULTS_DB}",
    "imports": ("superset.sql_lab", "superset.tasks.scheduler"),
    "worker_log_level": "INFO",
    "task_log_prefix": "superset.tasks",
    "beat_schedule": {
        "reports.scheduler": {
            "task": "reports.scheduler",
            "schedule": crontab(minute="*", hour="*"),
        },
        "reports.prune_log": {
            "task": "reports.prune_log",
            "schedule": crontab(minute=10, hour=0),
        },
    },
}

# Cache général
CACHE_CONFIG = {
    "CACHE_TYPE": "RedisCache",
    "CACHE_DEFAULT_TIMEOUT": 300,
    "CACHE_KEY_PREFIX": "superset_",
    "CACHE_REDIS_HOST": REDIS_HOST,
    "CACHE_REDIS_PORT": REDIS_PORT,
    "CACHE_REDIS_DB": REDIS_RESULTS_DB,
}
DATA_CACHE_CONFIG = CACHE_CONFIG

# Backend pour les résultats SQL Lab
RESULTS_BACKEND = RedisCache(
    host=REDIS_HOST,
    port=int(REDIS_PORT),
    db=int(REDIS_RESULTS_DB),
    key_prefix="superset_results"
)

# Définir la clé secrète via une variable d'environnement
import os
SECRET_KEY = os.getenv("SUPERSET_SECRET_KEY")

####Tags Superset####

# SQL Lab
SQLLAB_CTAS_NO_LIMIT = True

# Localisation et sécurité
BABEL_DEFAULT_LOCALE = 'fr'
ENABLE_PROXY_FIX = True
PREVENT_UNSAFE_DB_CONNECTIONS = True
SESSION_COOKIE_SECURE = True 
WTF_CSRF_ENABLED = True

# Logs
log_level_text = os.getenv("SUPERSET_LOG_LEVEL", "INFO")
LOG_LEVEL = getattr(logging, log_level_text.upper(), logging.INFO)
logging.basicConfig(level=LOG_LEVEL)

# SMTP
hidden

# Feature Flags
FEATURE_FLAGS = {
    "ALERT_REPORTS": True, #Active les alertes et rapports auto
    "ALERTS_ATTACH_REPORTS": True,
    "ALERT_REPORT_TABS": True,
    "EMAIL_NOTIFICATIONS": True, #Active les emails de notification
    "EMBEDDED_SUPERSET": True, #Active l'embedding de superset dans d'autres pages web
    "EMBEDDABLE_CHARTS": True, #Active l'embedding des graphiques dans d'autres pages web
    "ENABLE_TEMPLATE_PROCESSING": True, #Active l'utilisation de Jinja (python like dans les requ�tes)
    "DASHBOARD_RBAC": True, #Droits d'acc�s pour les dashboard
   # "DASHBOARD_NATIVE_FILTERS": True, #Active les filtres natifs des dashboard
   # "VERSIONED_EXPORT": True, #Active la versioning des exports
    "ALLOW_ADHOC_SUBQUERY":True,  #Activer les sous requ�tes dans les filtres
    "ALLOW_FULL_CSV_EXPORT":True,
    "DATE_FORMAT_IN_EMAIL_SUBJECT" : True, # Activer le formatage de date dans les objets des emails, https://strftime.org/
    "PLAYWRIGHT_REPORTS_AND_THUMBNAILS" : True,
    "ENABLE_SUPERSET_META_DB" :True, #requ�tes cross DB, n�cessite la config :https://superset.apache.org/docs/configuration/databases#querying-across-databases
    "SCHEDULED_QUERIES": True,
    "DASHBOARD_VIRTUALIZATION": False, #Désactive la virtualisation des dashboards
    "ENABLE_DASHBOARD_SCREENSHOT_ENDPOINTS": True,
    "ENABLE_DASHBOARD_DOWNLOAD_WEBDRIVER_SCREENSHOT": True
}

# Désactiver le mode dry-run pour envoyer réellement les notifications
ALERT_REPORTS_NOTIFICATION_DRY_RUN = False
SCREENSHOT_LOCATE_WAIT = 100
SCREENSHOT_LOAD_WAIT = 600
SCREENSHOT_PLAYWRIGHT_DEFAULT_TIMEOUT = 60_000
# URL de base pour les captures d'écran
WEBDRIVER_BASEURL = "http://superset_app:8088"
# URL conviviale pour les liens dans les emails
WEBDRIVER_BASEURL_USER_FRIENDLY = hidden
# Type de navigateur à utiliser pour les captures d'écran (firefox par défaut)
WEBDRIVER_TYPE = "firefox"
WEBDRIVER_WINDOW = {
    "dashboard": (1600, 2000),  # Largeur augmentée x Hauteur
    "slice": (1600, 2000),
    "pixel_density": 3  # Densité de pixels accrue
}

Comment From: lilotter45

I've also run into this issue on Superset 4.1.2 and 5.0.0. It affects both manual dashboard downloads and scheduled/emailed reports and both pdf and png file types. When emailing reports I have tried it with Ignore cache when generating report enabled and disabled. This is more noticeable with the emailed reports because it seems that rather than loading only a portion of the report and screenshotting the loading icon for the remainder, only a blank image/pdf is sent.

  • For the 4.1.2 instance: Playwright is not enabled and the browser is chromium (for emailed reports).
  • For the 5.0.0. instance: Playwright is enabled, chromium is installed as the headless browser, and DASHBOARD_VIRTUALIZATION has been tried both enabled and disabled without a difference.
  • When manually downloading the reports, only those charts currently on screen, or those slightly beyond, include their data, the remainder appear with the loading icon.
  • When emailing reports, a blank pdf or image is sent; there are not even placeholders for the charts.
  • It takes about 20 seconds for the dashboard I tested this with to load when manually opened; when the scheduled report runs, it also takes about 20 seconds to execute (21 seconds, to be exact; based on the logs), so it seems to be waiting long enough to load the dashboard.
  • I have attempted to discern if opening the dashboard impacts the emailed reports and this doesn't appear to be the case, unless the cache is not ignored.
  • On the same instances displaying this behavior, if you email a smaller report, e.g. a single chart, the pdf/image are generated as expected.

PS - I looked at all of the issues listed in https://github.com/apache/superset/issues/31158#issuecomment-2957568579 and it seems all are reporting a similar issue, but this seems to be the most recently active.

Comment From: tahvane1

I did some more troubleshooting and If I take 5.0 branch export works okay. If I take 6.0 export is cut.

Comment From: aukfood

Hello, @tahvane1 I have same problem with 6.0.0rc2

Image

Comment From: tahvane1

I think I know now how to fix dashboard export. Could you assign this to me?