How to reproduce the bug
- Store Russian text in a Postgres database.
- Make sure you can fetch that textual data into Superset and properly display it via SQL lab or chart explorer.
- Modify the request: add a
group by
clause and use thearray_agg
function to create an array of strings of Russian text. - Problem: the strings become "unicode gibberish". Example:
["\u041b\u043e\u043d\u0433\u0441\u043b\u0438\u0432\u044b", "\u0421\u0432\u0438\u0442\u0448\u043e\u0442\u044b"]
. They get displayed like this in the SQL Lab, in the Chart Explorer and on the dashboard.
Environment
- browser type and version: updated Brave (Chome-base) and updated Libre Office (Firefox-based)
- helm info:
- chart:
superset-0.7.7
- app version:
1.0
Checklist
Make sure to follow these steps before submitting your issue - thank you!
- [ ] I have checked the superset logs for python stacktraces and included it here as text if there are any.
- [x] I have reproduced the issue with at least the latest released version of superset.
- [x] I have checked the issue tracker for the same issue and I haven't found one similar.
Comment From: rusackas
This sounds different, yet potentially related to https://github.com/apache/superset/issues/19982
Comment From: rusackas
@jinghua-qa / @ sadpandajoe do we have any tests using russian, chinese, etc? Not sure who's best suited to look into this with postres and other DBs to sort out the risk factors.
Comment From: m-ocean-it
This sounds different, yet potentially related to #19982
It actually seems to be identical
Comment From: m-ocean-it
19982:
when you click on the data it shows a popup window that displays the characters correctly, but when you download it to csv it shows up incorrectly.
Exactly same for me
Comment From: G0Dzilla1984
Hi, folks. We've got the same problem with array data from Clickhouse. In CH data stored as UTF-8 encoded values. "clickhouse_connect" driver returns correct data. All issues only in Superset interface. Can someone give me direction, where I can find bug?
Thanks.
Comment From: sonfire186
UP!
Comment From: Irrichie
I've solved the same problem by converting data type into text (PostgreSQL):
select array_agg(smth)::text
Comment From: pixelky
+1
Comment From: rusackas
While this is getting a lot of support (noting also that you can 👍 the original description rather than adding messages), it doesn't seem to have anyone investigating it. It's been open for over a year and a half, and at some point we'll have to close it in the name of steering toward an actionable backlog. Any takers? Meanwhile hopefully @dosu-bot can shed some light on the subject.
Comment From: rusackas
Let's try @dosubot - maybe I got the command wrong :/ If anyone is willing to tackle this, great, but consider this the last warning before closing as not-planned.
Comment From: dosubot[bot]
Hey @rusackas! I'm Dosu, your friendly bot here to help you tackle bugs, answer questions, and navigate your way to becoming a contributor. I'm here while you wait for a human maintainer. Let's get started!
The issue you're experiencing with the array_agg
function in Postgres and the display of Russian (Cyrillic) text as "unicode gibberish" in Superset seems to be related to how Superset handles character encoding when fetching and displaying data. This is a known issue that has been discussed in the Superset community, as seen in the related GitHub issue #19982.
A potential workaround, as suggested by a user, is to convert the array data into text using a SQL query like SELECT array_agg(smth)::text
. This conversion might help in displaying the characters correctly in Superset's interface.
Additionally, it might be helpful to check if there are any configuration settings in Superset or the database connection that could affect Unicode handling. Unfortunately, I couldn't find specific configuration settings related to Unicode handling in the Superset codebase.
If this workaround doesn't resolve the issue, further investigation into the Superset codebase, particularly in areas handling query results and their conversion for display, might be necessary. The querySuccess
and fetchQueryResults
functions in superset-frontend/src/SqlLab/actions/sqlLab.js
and the ResultSet
component in superset-frontend/src/SqlLab/components/ResultSet/index.tsx
are key parts of this process [1][2].
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: rusackas
I think we're close to merging a fix for this (https://github.com/apache/superset/pull/33720/)... stay tuned!