Bug description
We are getting an error in Superset when using ClickHouse:
"SupersetError(message='clickhouse error:
CREATE TABLE mysy.dim_anchor_id --
ON CLUSTER cluster_1S_2R --
(
`anchor_id` String,
`anchor_name` String,
`institution` String
) ENGINE = S3(
'https://seayoo-analytics.s3.cn-northwest-1.amazonaws.com.cn/dim_anchor_id.csv',
'CSV'
) SETTINGS input_format_csv_skip_first_lines = 1;
Screenshots/recordings
Superset version
5.0.0rc3
Python version
3.10
Node version
18
Browser
Chrome
Additional context
No response
Checklist
- [ ] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [ ] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [ ] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
This error is coming from the SQL parsing layer (specifically, sqlglot) used by the ClickHouse driver, not from Superset itself. Superset passes your SQL directly to the database driver, which then tries to parse it. The ON CLUSTER clause in your SQL is not currently supported by sqlglot, so it raises an error when it encounters it. Any fix for this would need to happen in sqlglot or the ClickHouse driver, not in Superset itself. There is no built-in workaround in Superset to rewrite or pre-process ClickHouse SQL to handle ON CLUSTER statementsĀ reference.
If you can, try running your CREATE TABLE statement without the ON CLUSTER clause, or execute this kind of DDL directly in ClickHouse rather than through Superset. If you need ON CLUSTER support via Superset, you would need to wait for upstream support in sqlglot or the ClickHouse driver, or implement a custom query mutator/validator hook in your deployment to handle this clause before parsing.
Let me know if you want more details on possible workarounds or tracking upstream support.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report š | Other
Comment From: zhangdong1015
However, the execution can succeed when using the clickhouse-client.
Comment From: zhangdong1015
@dosu
Comment From: dosubot[bot]
The reason your SQL works in clickhouse-client but fails in Superset is that Superset uses the sqlglot library to parse SQL, and sqlglot currently doesn't support the ON CLUSTER clause. Superset passes your SQL to the driver, and if sqlglot can't parse a clause, you'll get errors like the one you saw. There isn't any special handling for ON CLUSTER in Superset's ClickHouse integrationāunsupported syntax just results in errors surfaced to the userĀ cite.
Workarounds discussed in the community include: running DDL like CREATE TABLE ... ON CLUSTER directly in ClickHouse (not through Superset), creating a view in ClickHouse that encapsulates your logic and querying that from Superset, or extending sqlglot to support the clause (which requires Python development)Ā cite.
If you need to run this kind of DDL from Superset, there isn't a built-in workaroundāit's a limitation of the current SQL parsing layer.
To reply, just mention @dosu.
How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report š | Other
Comment From: zhangdong1015
It was executable in previous versions.@dosu
Comment From: rusackas
Hmmm... 5.0.0 is official now (as of rc4) so I'm curious if it happens there, or on master
where the SQLGlot migration has been completed. CC @betodealmeida in case they have any insights.
Comment From: betodealmeida
Taking a look.