Bug description
Selecting a subquery calculated column as a dimension breaks charts
How to reproduce the bug
Add a subquery calculated column on a dataset Try to use it as a dimension
Screenshots/recordings
No response
Superset version
4.0.2
Python version
Not applicable
Node version
Not applicable
Browser
Not applicable
Additional context
Simple fix would be to only use the column alias instead of the entire subquery in the generated SQL query
Checklist
- [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hey @ThChatz! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let's tackle this issue together!
To address the issue where selecting a subquery calculated column as a dimension breaks charts in Superset version 4.0.2, you need to handle the sql_mode=only_full_group_by
setting. This setting enforces strict SQL standards, which can cause issues when using subqueries in SELECT statements without proper GROUP BY clauses.
Here are the steps to fix this:
-
Disable
ONLY_FULL_GROUP_BY
Mode: Modify your MySQL configuration file (my.cnf
ormy.ini
) to removeONLY_FULL_GROUP_BY
from thesql_mode
setting, then restart your MySQL server.ini [mysqld] sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"
-
Modify the Query in Superset: Ensure that your subquery includes all necessary columns in the
GROUP BY
clause. This can be done by modifying the query generation logic in Superset to include all selected columns in theGROUP BY
clause.Example modification in the query generation logic:
py if need_groupby: # dedup columns while preserving order columns = groupby or columns for selected in columns: if isinstance(selected, str): # if groupby field/expr equals granularity field/expr if selected == granularity: table_col = columns_by_name[selected] outer = table_col.get_timestamp_expression( time_grain=time_grain, label=selected, template_processor=template_processor, ) # if groupby field equals a selected column elif selected in columns_by_name: outer = self.convert_tbl_column_to_sqla_col( columns_by_name[selected], template_processor=template_processor, ) else: selected = validate_adhoc_subquery( selected, self.database_id, self.schema, ) outer = literal_column(f"({selected})") outer = self.make_sqla_column_compatible(outer, selected) else: outer = self.adhoc_column_to_sqla( col=selected, template_processor=template_processor ) groupby_all_columns[outer.name] = outer if ( is_timeseries and not series_column_labels ) or outer.name in series_column_labels: groupby_series_columns[outer.name] = outer select_exprs.append(outer)
This code ensures that all selected columns are included in the GROUP BY
clause, which should prevent issues related to the ONLY_FULL_GROUP_BY
setting [1].
To continue talking to Dosu, mention @dosu.
Comment From: ThChatz
Bad bot
Comment From: rusackas
I do this all the time and haven't had problem. Can you give more details about this, or provide a reproducible test case using some of the example datasets?
Comment From: ThChatz
Probably has to do with only_full_group_by but still the issue is visible with default datasets:
create a calculated column on the main.channel
dataset with the query
(SELECT name FROM channels WHERE channels.id=channel_id)
Create a table with that column as the dimension and the default count metric
the query you get is:
SELECT
(SELECT name
FROM channels
WHERE channels.id=channel_id) AS test,
COUNT(*) AS count
FROM main.channel_members
GROUP BY
(SELECT name
FROM channels
WHERE channels.id=channel_id)
ORDER BY count DESC
LIMIT 1000
OFFSET 0;
while a more compatible query would be
SELECT
(SELECT name
FROM channels
WHERE channels.id=channel_id) AS test,
COUNT(*) AS count
FROM main.channel_members
GROUP BY test
ORDER BY count DESC
LIMIT 1000
OFFSET 0;
Comment From: rusackas
Curious if this is still an issue in 5.0.0rc2, as there has been a lot of SQL parsing revamping. We'll need to validate this is still an issue in the current codebase, since this has been silent for quite a while, and may otherwise be closed as stale.
Comment From: ThChatz
Well I changed jobs so I haven't worked with superset for a while now. Good luck!
Comment From: rusackas
OK, thanks! Closing this as not-planned, but happy to revisit/reopen if others encounter it in 5.0.