Bug description
In chart creator, I can't apply a filter on a Boolean column (e.g. to just return rows where the column is True
) when using Snowflake.
The generated Snowflake SQL isn't valid, because IS True
isn't valid in Snowflake. This is the generated SQL:
SELECT COUNT(*) AS "count"
FROM flights.fct_flight_stats
WHERE "IS_CANCELLED" IS true
LIMIT 5000
IS
can only be used to evaluate NULL-ness in Snowflake, not for filtering values (read more here and here). I suspect this is also true for some other databases, but I haven't researched thoroughly.
The alternative would be to have menu options for = True
and = False
, for Boolean columns.
Version
Preset Cloud // Apache Superset 4.2.0.5
Workarounds
- The
IN
option can be used in combination with and limiting the values toTrue
. -
However, if the generated SQL errored previously, the IN option mysteriously disappears in the UI for this Boolean column. I suspect this is because Superset isn't aware of valid values to fetch that would work for
IN
. I'm happy to open a separate issue for this if the community would like! -
Use the Custom SQL tab and manually specify
= True
.
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hello @srinify! I'm Dosu, here to help you solve bugs, answer questions, and guide you in becoming a contributor. I'm on standby while you wait for a human maintainer. How can I assist you today?
The issue you're encountering is due to the way Superset generates SQL queries for Boolean columns, which isn't compatible with Snowflake's SQL syntax. Superset uses IS True
for filtering Boolean columns, but Snowflake requires = True
instead [1].
To address this, you can use the SQL_QUERY_MUTATOR
configuration in Superset to customize the SQL query before execution. This function allows you to modify the SQL query, potentially replacing IS True
with = True
for Snowflake. Here's a basic example of how you might implement this:
def SQL_QUERY_MUTATOR(sql, security_manager=security_manager, database=database):
if 'IS True' in sql:
sql = sql.replace('IS True', '= True')
return sql
This function checks for the presence of IS True
in the SQL query and replaces it with = True
, making it compatible with Snowflake [2].
Alternatively, you can manually use the Custom SQL tab to specify the correct syntax (= True
) for your queries.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: mistercrunch
DOSU solution probably works but we need better/cleaner support. I know many dialects expect WHERE {{ bolelan_field }}
and WHERE NOT {{ bolelan_field }}
. Not sure what ANSI SQL says, WHERE {{ boolean_field }} IS TRUE
smells ANSI to me but idk. @dosubot what does ANSI say about boolean fields?
Comment From: dosubot[bot]
I wasn't able to find specific information within the Superset repository regarding ANSI SQL's specifications for filtering Boolean fields. However, ANSI SQL typically allows filtering Boolean fields directly using expressions like WHERE {{ boolean_field }}
for true values and WHERE NOT {{ boolean_field }}
for false values. The use of IS TRUE
or IS FALSE
is not standard for Boolean fields in ANSI SQL, as these constructs are generally used for checking NULL values.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: mistercrunch
Is this an already-merged solution? -> https://github.com/apache/superset/pull/32701. Seems it may not be, or perhaps a regression (?) Seems easy-ish to fix though.
Comment From: OOub
I'm having a similar issue with superset 5.0 and a clickhouse database:
the generated clickhouse SQL query returns:
WHERE is_cancelled IS false
instead of WHERE is_cancelled = 0
OR WHERE NOT is_cancelled
Comment From: mistercrunch
Unleashed Claude Code on this issue, burning through tokens ... Will report back.
Comment From: mistercrunch
Here -> https://github.com/apache/superset/pull/34199 . Claude did well except its first suggestion was to post-process and regex-mutate the generated SQL. Horrible idea. After a bit of nudging it did all the work in that PR.
Comment From: mistercrunch
@betodealmeida we might want to try to get our new best buddy to refactor / reorganize the large methods in helpers.py
, looks like it would do amazing at that.