Apache Superset Failure on creation of Snowflake dataset due to staging S3 download errors (works in SQL Lab, regression with new connector)

Bug description

Description

We have a Snowflake connection that works fine in SQL Lab (queries succeed). When creating a new dataset or running “Test Connection,” Superset fails with:

An Error Occurred
Unable to load columns for the selected table. Please select a different table.

Logs show underlying errors from the Snowflake connector when trying to fetch staged results:

HTTPSConnectionPool(host='<Snowflake staging S3 bucket in us-west-2>', port=443):
Max retries exceeded … Remote end closed connection without response

Screenshots

Repro steps

Go to Datasets
Click + Dataset
Pick Snowflake DB connection
Choose schema + table
Observe error

Expected

Columns load, Test Connection succeeds.

Actual

Dataset creation fails.
Test Connection fails.
SQL Lab queries continue to work (likely because small results don’t hit S3 staging).

Environment

Superset 4.1.1 (dockerized, AWS ECS Fargate)
Python 3.9 (default in base image)
Snowflake connector: 3.16.0
Snowflake SQLAlchemy: 1.7.6
Snowflake region: AWS us-west-2

Troubleshooting performed

Confirmed Snowflake network policy allows our NAT egress IPs.
Verified no failed logins in Snowflake login history (issue occurs before auth).
Increased Superset/Gunicorn/ALB timeouts.
Disabled proxy variables, set NO_PROXY for Snowflake/AWS domains.
Tried connector options: ocsp_fail_open, insecure_mode, session params (CLIENT_PREFETCH_THREADS, CLIENT_RESULT_CHUNK_SIZE, USE_S3_REGIONAL_URL, etc.).
Added curl sidecar: Snowflake account host + generic S3 reachable, but staging S3 bucket sometimes fails.
Tested with minimal 40-row tables - still fails
Confirmed issue is specific to metadata/reflection queries, not query size

Why this might be Superset-related

SQL Lab queries work, but inspector/metadata queries consistently fail.
Looks like a difference in how Superset uses the connector for reflection/metadata (larger results, staging) vs. SQL Lab.
Want to confirm if this is a known issue with staging downloads in Superset’s Snowflake integration, and if there are recommended config flags / retries / version pins.

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

Browser

Chrome

Additional context

No response

Checklist

[x] I have searched Superset docs and Slack and didn't find a solution to my problem.
[x] I have searched the GitHub issue tracker and didn't find a similar bug report.
[x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Hi @oliverbell-klv! I'm Dosu and I’m helping the superset team.

This is a known challenge with Superset's Snowflake integration: metadata reflection and dataset creation use SQLAlchemy's inspector methods (like get_columns), which can trigger result sets that Snowflake stages in S3. If your infrastructure intermittently fails to access the staging S3 bucket, these operations will fail, even if SQL Lab queries work (since they often return small results directly and avoid S3 staging) source.

Superset does not have built-in retry logic or self-healing for Snowflake-specific network failures. Mitigation relies on tuning pool and timeout settings in the connector's configuration or your infrastructure setup. You can adjust these via the database's "Extra" field (engine_params and connect_args). Also, enabling the CLIENT_SESSION_KEEP_ALIVE parameter in Snowflake may improve connection stability source.

If the issue persists, review your AWS/container networking and resource limits, and enable detailed logging (LOG_LEVEL = logging.DEBUG) to capture more diagnostic info. There are no Superset-specific config flags or version pins that directly address S3 staging download errors in metadata reflection. This is primarily an infrastructure/networking issue surfaced by how Superset interacts with the Snowflake connector for metadata queries.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}

Comment From: rusackas

Tempted to move this to a Help discussion. @dosu do you think there's a Superset bug here, or is this an infra/config issue?

Comment From: dosubot[bot]

Based on a deep review of Superset's code and recent changes, this isn't a Superset bug—it's an infrastructure or connector config issue. Superset uses SQLAlchemy inspector methods for metadata reflection, which can trigger Snowflake to stage results in S3. If your infra can't reliably access the staging S3 bucket, metadata queries (like dataset creation and column introspection) will fail, while SQL Lab queries may work because they often avoid S3 staging. Superset doesn't implement custom error handling or retries for these S3 staging errors; failures are surfaced directly from the Snowflake connector or network stack, not introduced by Superset itself. No recent PRs or docs indicate a regression or bug in Superset related to this behavior. Troubleshooting should focus on network reliability, connector options, and Snowflake session parameters. If new evidence emerges, it's worth revisiting, but right now this is infra/config, not a Superset bug.

To reply, just mention @dosu.

^{How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other}