Bug description
Starting from Superset 4.1 (and confirmed in 4.1.1), probably related to the heatmaps updates, it no longer works to define Y axis as a function on a field.
Having Y axis set to foobar
(with foobar
a field in your dataset) works, but setting it to REPLACE('foobar', '', '')
(a SQL function of your field) results in duplicating all the fields (X axis and Y axis) in the produced SQL query and then crashing.
I'm using Spark as a datasource, with Apache Hive connector.
Best
Screenshots/recordings
No response
Superset version
4.1.1
Python version
3.11
Node version
18 or greater
Browser
Firefox
Additional context
No response
Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hey @Phyks! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!
The issue you're experiencing with duplicate columns when using a SQL function in the Y-axis configuration for heatmaps in Superset 4.1.1 seems to be related to changes in the heatmap SQL query generation logic. While I couldn't find specific details about changes in version 4.1 that might cause this, the SQL query generation for heatmaps is handled by the buildQuery
function in the file superset-frontend/plugins/plugin-chart-echarts/src/Heatmap/buildQuery.ts
[1].
The expected behavior when using SQL functions in axis configurations is that the SQL expression should be supported and incorporated into the query without causing duplication [2]. However, it seems there might be a bug affecting this functionality in your case.
Since you're using Spark with the Apache Hive connector, it's also possible that there might be specific issues or limitations with this setup, although I couldn't find any documented ones related to heatmaps.
To address this, you might want to check the buildQuery
logic for any recent changes or consider creating a minimal reproducible example to further investigate the issue.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other
Comment From: rusackas
Seems to be working on master
when I try it. Can you give us an example/screenshot with an example dataset? Here I'm using Vehicle Sales.
Before REPLACE:
After:
Comment From: Phyks
Non-working example for me in 4.1.1:
- In SQLLab, run:
SELECT 'Jan' AS Qtr1, 'Apr' AS Qtr2
UNION ALL SELECT 'Feb' AS Qtr1, 'May' AS Qtr2
UNION ALL SELECT 'Mar' AS Qtr1, 'Jun' AS Qtr2
and save as a new dataset.
- In Charts:
It works with the "legacy" heatmap though. When looking at the generated SQL query, it looks like
SELECT `Qtr1` AS `Qtr1`, REPLACE(Qtr2, '', '') AS `My column`, COUNT(*) AS `count`, `Qtr1` AS `Qtr1`, REPLACE(Qtr2, '', '') AS `My column`
FROM (SELECT 'Jan' AS Qtr1, 'Apr' AS Qtr2
UNION ALL SELECT 'Feb' AS Qtr1, 'May' AS Qtr2
UNION ALL SELECT 'Mar' AS Qtr1, 'Jun' AS Qtr2
) AS `virtual_table` GROUP BY `Qtr1`, REPLACE(Qtr2, '', '') ORDER BY `Qtr1` ASC, `My column` ASC
LIMIT 1000;
which has duplicates in the outer SELECT.
Thanks
Comment From: Phyks
It also seems impossible to import a legacy heatmap in latest superset (4.1.1):
- If I export a new heatmap chart, the YAML file has a viz type "heatmap v2".
- If I export a legacy heatmap chart, the YAML file has a viz type "heatmap".
But when importing a "heatmap" type into Superset 4.1.1, it seems to always be promoted to a "heatmap v2" (migration logic?). Therefore, it seems impossible to export and reimport a legacy heatmap chart as is.
Comment From: Phyks
Hi @rusackas! Were you able to reproduce with the provided extra details? Thanks!
Comment From: Phyks
Hi,
I managed to narrow down the issue. It is reproducible only with a "Apache Hive" backend, following the instructions on https://github.com/apache/superset/issues/31974#issuecomment-2611891720. For some reason, in this context, the function expression is generated twice in the resulting SQL query.
On the other hand, if using an SQL backend (e.g. SQLite "examples" database), then the issue cannot be reproduced as pointed out by @rusackas in https://github.com/apache/superset/issues/31974#issuecomment-2610733404.
This issue is still present in the latest v4.1.2 release.
Thanks
Comment From: rusackas
Can anyone still reproduce this in 5.0, or on master
now that SQLGlot is in the picture there?
I don't know anyone with Hive access... perhaps @bkyryliuk?