Since Superset enables users to run SQL queries directly against large-scale data warehouses like Google BigQuery, Snowflake, and Amazon Redshift. A significant risk in this environment is that a user, particularly one less familiar with SQL optimization, could inadvertently write a query that scans terabytes of data, incurring substantial and unexpected financial costs. Superset currently lacks a built-in "guardrail" to prevent this.

Proposed Solution: This feature would integrate Superset with the "dry run" or query cost estimation APIs that are provided by most major cloud data warehouses. Before executing a query from the SQL Lab or a dashboard, Superset would first send the query to the warehouse's estimation endpoint. If the estimated data to be scanned or the computed cost exceeds a user-configurable threshold, Superset would present a clear warning to the user, such as: "This query will process an estimated 5TB of data. Are you sure you wish to continue?".

Impact: This contribution would add a critical layer of financial governance and safety to Superset, making it a more robust and trustworthy tool for enterprise environments. It would prevent costly mistakes, encourage more efficient query writing, and provide administrators with greater control over resource usage.

Comment From: betodealmeida

I like the idea!

We do have a query cost estimation feature, supported only in BigQuery these days, it requires the ESTIMATE_QUERY_COST feature flag to be enabled. We could extend it to have another flag with a threshold, adding the hebavior you suggested.

Comment From: mistercrunch

I ran an analysis in Claude Code, here's what it had to say after parsing through the code:


Analysis

Current Implementation

The ESTIMATE_QUERY_COST feature already exists in Superset but requires two conditions: 1. Feature flag: ESTIMATE_QUERY_COST must be enabled in superset/config.py 2. Per-database setting: cost_estimate_enabled: true must be added to the database's extra attributes

Currently Supported Databases

  • BigQuery - Full support with dry run API (returns data processed in B/KB/MB/GB)
  • PostgreSQL - Uses EXPLAIN command (returns startup and total cost)
  • Presto/Trino - Uses EXPLAIN (TYPE IO, FORMAT JSON) (returns detailed metrics)

How It Works

  1. User clicks "Estimate cost" button in SQL Lab
  2. Frontend calls /api/v1/sqllab/estimate/ endpoint
  3. Database engine spec's estimate_query_cost() method executes the appropriate estimation command
  4. Results are formatted and displayed in a modal

Key Limitation

The feature currently only displays costs - it doesn't implement the threshold warning system suggested in this issue. This would be a valuable enhancement.

Implementation Path for Threshold Warnings

To implement the proposed warning system: 1. Add a new config parameter like QUERY_COST_WARNING_THRESHOLD with sub-settings per metric type 2. Modify QueryEstimationCommand to check thresholds after estimation 3. Update the frontend to display warnings before query execution 4. Consider making thresholds configurable per database or per user role

Extending to Other Databases

The architecture is well-designed for extension. To add support for databases like Snowflake or Redshift: 1. Override get_allow_cost_estimate() to return True 2. Implement estimate_statement_cost() to execute the database's cost estimation command 3. Parse and format the results appropriately

Many major databases could be supported since they have EXPLAIN capabilities: Snowflake, Redshift, MySQL, Oracle, SQL Server, Databricks, and ClickHouse.