Bug description

Sql query for line chart with time grain is wrong in superset from master version. 1. Install superset 2. set OPTIMIZE_SQL = True in feature flags 3. Create line chart with time column on X-axis 4. Choose hour time grain Data not groupped properly by hours, caused by SQL query

Screenshots/recordings

Dataset has +3 hour offset Samples: Image Chart and result data:

Image SQL:

Image

Expected result:

Dataset has +3 hour offset Samples for chart: Image Chart and result data: Image SQL:

Image

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Привет, @vestrik! Я — бот, который может помочь вам решить баги, ответить на вопросы и стать участником проекта. Я здесь, чтобы помочь, пока вы ждете человека-администратора.

The issue you're experiencing with the SQL query for the hourly time grain in the master branch of Superset might be related to the way the time grain is being handled in the SQL query generation. In the master branch, the SQL expression for hourly time grains is defined as:

TimeGrain.HOUR: "date_trunc('hour', CAST({col} AS TIMESTAMP))"

This expression truncates the timestamp to the hour level by casting the column as a timestamp and applying the date_trunc function with 'hour' as the specified time grain [1].

If the data is not grouped properly by hours, it could be due to differences in how the SQL expression is applied or interpreted by the database engine you are using. You might want to verify that the database engine supports the date_trunc function as expected and that the column data types are compatible with this operation.

Additionally, ensure that the configuration and any custom time grain expressions are correctly set up in your Superset instance. If the issue persists, comparing the SQL query generated in the master branch with the one from version 4.1.1 might help identify specific differences that could be causing the problem.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Comment From: vestrik

@dosu Both datasets are based on one table in one database (it is MySql), data type is DATETIME in superset dataset for both. For MySql there is superset/db_engine_specs/mysql.py: TimeGrain.HOUR: "DATE_ADD(DATE({col}), INTERVAL HOUR({col}) HOUR)", but in docker version it is not applied properly.

Comment From: dosubot[bot]

In the MySQL engine specification for Superset, the time grain expression for the hour is defined as:

TimeGrain.HOUR: "DATE_ADD(DATE({col}), INTERVAL HOUR({col}) HOUR)"

This expression is intended to adjust the datetime column to the start of the hour by adding the hour component to the date part of the column [1].

If this expression is not being applied properly in your Docker setup, it could be due to a configuration issue or a bug in the master branch. You might want to check if there are any recent changes or known issues related to time grain expressions for MySQL in the master branch. Additionally, ensure that your Superset instance is correctly configured to use this expression for MySQL.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Comment From: vestrik

@dosu this happens only if i set OPTIMIZE_SQL = True in feature flags

Comment From: dosubot[bot]

When OPTIMIZE_SQL is set to True in Superset's feature flags, it changes the SQL query generation by moving filtering conditions from the outer query to the inner query, which might affect how time grain expressions are applied. This optimization is intended to improve query performance by applying predicate pushdown, specifically for virtual datasources [1][2].

If you're experiencing issues with time grain expressions in MySQL when OPTIMIZE_SQL is enabled, it could be due to how these optimizations interact with the MySQL-specific time grain expressions. You might want to verify if the optimization is affecting the application of the DATE_ADD function in your queries.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Comment From: rusackas

Is this still an issue on master today, with all the SQLParse -> SQLGlot migrations in place? CC @betodealmeida in case of any additional input here.