Bug description

Selecting a subquery calculated column as a dimension breaks charts

How to reproduce the bug

Add a subquery calculated column on a dataset Try to use it as a dimension

Screenshots/recordings

No response

Superset version

4.0.2

Python version

Not applicable

Node version

Not applicable

Browser

Not applicable

Additional context

Simple fix would be to only use the column alias instead of the entire subquery in the generated SQL query

Checklist

  • [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Hey @ThChatz! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let's tackle this issue together!

To address the issue where selecting a subquery calculated column as a dimension breaks charts in Superset version 4.0.2, you need to handle the sql_mode=only_full_group_by setting. This setting enforces strict SQL standards, which can cause issues when using subqueries in SELECT statements without proper GROUP BY clauses.

Here are the steps to fix this:

  1. Disable ONLY_FULL_GROUP_BY Mode: Modify your MySQL configuration file (my.cnf or my.ini) to remove ONLY_FULL_GROUP_BY from the sql_mode setting, then restart your MySQL server.

    ini [mysqld] sql_mode = "STRICT_TRANS_TABLES,NO_ZERO_IN_DATE,NO_ZERO_DATE,ERROR_FOR_DIVISION_BY_ZERO,NO_AUTO_CREATE_USER,NO_ENGINE_SUBSTITUTION"

  2. Modify the Query in Superset: Ensure that your subquery includes all necessary columns in the GROUP BY clause. This can be done by modifying the query generation logic in Superset to include all selected columns in the GROUP BY clause.

    Example modification in the query generation logic:

    py if need_groupby: # dedup columns while preserving order columns = groupby or columns for selected in columns: if isinstance(selected, str): # if groupby field/expr equals granularity field/expr if selected == granularity: table_col = columns_by_name[selected] outer = table_col.get_timestamp_expression( time_grain=time_grain, label=selected, template_processor=template_processor, ) # if groupby field equals a selected column elif selected in columns_by_name: outer = self.convert_tbl_column_to_sqla_col( columns_by_name[selected], template_processor=template_processor, ) else: selected = validate_adhoc_subquery( selected, self.database_id, self.schema, ) outer = literal_column(f"({selected})") outer = self.make_sqla_column_compatible(outer, selected) else: outer = self.adhoc_column_to_sqla( col=selected, template_processor=template_processor ) groupby_all_columns[outer.name] = outer if ( is_timeseries and not series_column_labels ) or outer.name in series_column_labels: groupby_series_columns[outer.name] = outer select_exprs.append(outer)

This code ensures that all selected columns are included in the GROUP BY clause, which should prevent issues related to the ONLY_FULL_GROUP_BY setting [1].

To continue talking to Dosu, mention @dosu.

Comment From: ThChatz

Bad bot

Comment From: rusackas

I do this all the time and haven't had problem. Can you give more details about this, or provide a reproducible test case using some of the example datasets?

Comment From: ThChatz

Probably has to do with only_full_group_by but still the issue is visible with default datasets: create a calculated column on the main.channel dataset with the query (SELECT name FROM channels WHERE channels.id=channel_id) Create a table with that column as the dimension and the default count metric

the query you get is: SELECT (SELECT name FROM channels WHERE channels.id=channel_id) AS test, COUNT(*) AS count FROM main.channel_members GROUP BY (SELECT name FROM channels WHERE channels.id=channel_id) ORDER BY count DESC LIMIT 1000 OFFSET 0;

while a more compatible query would be SELECT (SELECT name FROM channels WHERE channels.id=channel_id) AS test, COUNT(*) AS count FROM main.channel_members GROUP BY test ORDER BY count DESC LIMIT 1000 OFFSET 0;

Comment From: rusackas

Curious if this is still an issue in 5.0.0rc2, as there has been a lot of SQL parsing revamping. We'll need to validate this is still an issue in the current codebase, since this has been silent for quite a while, and may otherwise be closed as stale.

Comment From: ThChatz

Well I changed jobs so I haven't worked with superset for a while now. Good luck!

Comment From: rusackas

OK, thanks! Closing this as not-planned, but happy to revisit/reopen if others encounter it in 5.0.