Bug description

Starting from Superset 4.1 (and confirmed in 4.1.1), probably related to the heatmaps updates, it no longer works to define Y axis as a function on a field.

Having Y axis set to foobar (with foobar a field in your dataset) works, but setting it to REPLACE('foobar', '', '') (a SQL function of your field) results in duplicating all the fields (X axis and Y axis) in the produced SQL query and then crashing.

I'm using Spark as a datasource, with Apache Hive connector.

Best

Screenshots/recordings

No response

Superset version

4.1.1

Python version

3.11

Node version

18 or greater

Browser

Firefox

Additional context

No response

Checklist

  • [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Hey @Phyks! I'm here to assist you with any bugs, questions, or contributions. Let's tackle this issue together!

The issue you're experiencing with duplicate columns when using a SQL function in the Y-axis configuration for heatmaps in Superset 4.1.1 seems to be related to changes in the heatmap SQL query generation logic. While I couldn't find specific details about changes in version 4.1 that might cause this, the SQL query generation for heatmaps is handled by the buildQuery function in the file superset-frontend/plugins/plugin-chart-echarts/src/Heatmap/buildQuery.ts [1].

The expected behavior when using SQL functions in axis configurations is that the SQL expression should be supported and incorporated into the query without causing duplication [2]. However, it seems there might be a bug affecting this functionality in your case.

Since you're using Spark with the Apache Hive connector, it's also possible that there might be specific issues or limitations with this setup, although I couldn't find any documented ones related to heatmaps.

To address this, you might want to check the buildQuery logic for any recent changes or consider creating a minimal reproducible example to further investigate the issue.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

Comment From: rusackas

Seems to be working on master when I try it. Can you give us an example/screenshot with an example dataset? Here I'm using Vehicle Sales.

Image

Before REPLACE: Image

After: Image

Comment From: Phyks

Non-working example for me in 4.1.1:

  • In SQLLab, run:
SELECT 'Jan' AS Qtr1, 'Apr' AS Qtr2
UNION ALL SELECT 'Feb' AS Qtr1, 'May' AS Qtr2
UNION ALL SELECT 'Mar' AS Qtr1, 'Jun' AS Qtr2

and save as a new dataset.

  • In Charts:

Image

It works with the "legacy" heatmap though. When looking at the generated SQL query, it looks like

SELECT `Qtr1` AS `Qtr1`, REPLACE(Qtr2, '', '') AS `My column`, COUNT(*) AS `count`, `Qtr1` AS `Qtr1`, REPLACE(Qtr2, '', '') AS `My column` 
FROM (SELECT 'Jan' AS Qtr1, 'Apr' AS Qtr2
UNION ALL SELECT 'Feb' AS Qtr1, 'May' AS Qtr2
UNION ALL SELECT 'Mar' AS Qtr1, 'Jun' AS Qtr2
) AS `virtual_table` GROUP BY `Qtr1`, REPLACE(Qtr2, '', '') ORDER BY `Qtr1` ASC, `My column` ASC
 LIMIT 1000;

which has duplicates in the outer SELECT.

Thanks

Comment From: Phyks

It also seems impossible to import a legacy heatmap in latest superset (4.1.1):

  • If I export a new heatmap chart, the YAML file has a viz type "heatmap v2".
  • If I export a legacy heatmap chart, the YAML file has a viz type "heatmap".

But when importing a "heatmap" type into Superset 4.1.1, it seems to always be promoted to a "heatmap v2" (migration logic?). Therefore, it seems impossible to export and reimport a legacy heatmap chart as is.

Comment From: Phyks

Hi @rusackas! Were you able to reproduce with the provided extra details? Thanks!

Comment From: Phyks

Hi,

I managed to narrow down the issue. It is reproducible only with a "Apache Hive" backend, following the instructions on https://github.com/apache/superset/issues/31974#issuecomment-2611891720. For some reason, in this context, the function expression is generated twice in the resulting SQL query.

On the other hand, if using an SQL backend (e.g. SQLite "examples" database), then the issue cannot be reproduced as pointed out by @rusackas in https://github.com/apache/superset/issues/31974#issuecomment-2610733404.

This issue is still present in the latest v4.1.2 release.

Thanks

Comment From: rusackas

Can anyone still reproduce this in 5.0, or on master now that SQLGlot is in the picture there?

I don't know anyone with Hive access... perhaps @bkyryliuk?