Bug description
Sql query for line chart with time grain is wrong in superset from master version. 1. Install superset 2. set OPTIMIZE_SQL = True in feature flags 3. Create line chart with time column on X-axis 4. Choose hour time grain Data not groupped properly by hours, caused by SQL query
Screenshots/recordings
Dataset has +3 hour offset
Samples:
Chart and result data:
SQL:
Expected result:
Dataset has +3 hour offset
Samples for chart:
Chart and result data:
SQL:
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Привет, @vestrik! Я — бот, который может помочь вам решить баги, ответить на вопросы и стать участником проекта. Я здесь, чтобы помочь, пока вы ждете человека-администратора.
The issue you're experiencing with the SQL query for the hourly time grain in the master branch of Superset might be related to the way the time grain is being handled in the SQL query generation. In the master branch, the SQL expression for hourly time grains is defined as:
TimeGrain.HOUR: "date_trunc('hour', CAST({col} AS TIMESTAMP))"
This expression truncates the timestamp to the hour level by casting the column as a timestamp and applying the date_trunc
function with 'hour' as the specified time grain [1].
If the data is not grouped properly by hours, it could be due to differences in how the SQL expression is applied or interpreted by the database engine you are using. You might want to verify that the database engine supports the date_trunc
function as expected and that the column data types are compatible with this operation.
Additionally, ensure that the configuration and any custom time grain expressions are correctly set up in your Superset instance. If the issue persists, comparing the SQL query generated in the master branch with the one from version 4.1.1 might help identify specific differences that could be causing the problem.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: vestrik
@dosu
Both datasets are based on one table in one database (it is MySql), data type is DATETIME in superset dataset for both.
For MySql there is superset/db_engine_specs/mysql.py:
TimeGrain.HOUR: "DATE_ADD(DATE({col}), INTERVAL HOUR({col}) HOUR)",
but in docker version it is not applied properly.
Comment From: dosubot[bot]
In the MySQL engine specification for Superset, the time grain expression for the hour is defined as:
TimeGrain.HOUR: "DATE_ADD(DATE({col}), INTERVAL HOUR({col}) HOUR)"
This expression is intended to adjust the datetime column to the start of the hour by adding the hour component to the date part of the column [1].
If this expression is not being applied properly in your Docker setup, it could be due to a configuration issue or a bug in the master branch. You might want to check if there are any recent changes or known issues related to time grain expressions for MySQL in the master branch. Additionally, ensure that your Superset instance is correctly configured to use this expression for MySQL.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: vestrik
@dosu this happens only if i set OPTIMIZE_SQL = True in feature flags
Comment From: dosubot[bot]
When OPTIMIZE_SQL
is set to True
in Superset's feature flags, it changes the SQL query generation by moving filtering conditions from the outer query to the inner query, which might affect how time grain expressions are applied. This optimization is intended to improve query performance by applying predicate pushdown, specifically for virtual datasources [1][2].
If you're experiencing issues with time grain expressions in MySQL when OPTIMIZE_SQL
is enabled, it could be due to how these optimizations interact with the MySQL-specific time grain expressions. You might want to verify if the optimization is affecting the application of the DATE_ADD
function in your queries.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: rusackas
Is this still an issue on master
today, with all the SQLParse -> SQLGlot migrations in place? CC @betodealmeida in case of any additional input here.