Apache Superset Superset 4.0.1 - Exported CSV Contains Garbled Chinese Characters and Numbers

Bug description

I have installed the latest version of Superset (4.0.1). However, when exporting query results to a CSV file, the Chinese characters and numbers in the file are always garbled. I have tried changing the encoding to utf-8, utf-8-sig, and gbk, but the issue persists.

How to reproduce the bug

Install Superset 4.0.1. Run a query that includes Chinese characters and numbers. Export the query results to a CSV file. Open the CSV file and observe that the Chinese characters and numbers are garbled.

Screenshots/recordings

No response

Superset version

4.0.1

Python version

3.11

Node version

20.12.2

Browser

Chrome

Additional context

Expected Behavior: The Chinese characters and numbers should be displayed correctly in the exported CSV file.

Actual Behavior: The Chinese characters and numbers are garbled in the exported CSV file.

Environment:

Superset version: 4.0.1 Encoding attempted: utf-8, utf-8-sig, gbk Additional Context: Please provide any additional information that might help in diagnosing the

Checklist

[X] I have searched Superset docs and Slack and didn't find a solution to my problem.
[X] I have searched the GitHub issue tracker and didn't find a similar bug report.
[X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

Hey @TommyLeung-gj! I'm here to help you with the issue you're experiencing with Superset 4.0.1. I'm a bot that can assist you with bug fixes, answer your questions, and guide you on becoming a contributor. Let's dive into resolving the problem together!

To resolve the issue of Chinese characters and numbers being garbled when exporting query results to a CSV file in Superset 4.0.1, you need to ensure that the CSV file is encoded in UTF-8. The df_to_escaped_csv function in the superset/utils/csv.py file should be called with the appropriate encoding settings.

Here is the relevant part of the code with the necessary modification:

csv_data = df_to_escaped_csv(df, index=False, encoding='utf-8', **config["CSV_EXPORT"])

Ensure that the encoding='utf-8' parameter is included in the call to df_to_escaped_csv. This will help in correctly encoding Chinese characters and numbers in the exported CSV file ^[1]^[2].

To continue talking to Dosu, mention @dosu.

Comment From: rusackas

Related issue linked above ^^^

Might even be considered a duplicate, really, but they seem kind of unique.

Comment From: glenn-gj-liang

No ,It still doesn't work after I changed encoding = "utf-8"

Comment From: glenn-gj-liang

maybe there is something wrong in the process of async query , the celery or redis resultbackend cache

Comment From: bionexit

Comment From: Habeeb556

Having the same issue with VERSION 4.0.1 even VERSION 2.1.3 WORKING FINE WITH THESE PARAMETER CONFIG.

Comment From: Habeeb556

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

Comment From: bionexit

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option?

My is following

CSV_EXPORT = {"encoding": "utf-8-sig"}

Comment From: Habeeb556

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option?

My is following

CSV_EXPORT = {"encoding": "utf-8-sig"}

Yes, this encoding. But did you face the same problem with VERSION 2.1.3 or 3.1.3? Also I got a Chinese characters not with the English.

Comment From: bionexit

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option? My is following CSV_EXPORT = {"encoding": "utf-8-sig"}

Yes, this encoding. But did you face the same problem with VERSION 2.1.3 or 3.1.3? Also I got a Chinese characters not with the English.

It's worked after i reload the cerely service. Thanks a lot bro.

Comment From: Habeeb556

++ @TommyLeung-gj, could you confirm if this downgrade solves your issue or not? Also, what language are you using?

++ @bionexit, we appreciate your feedback on the language characters you encountered, to report to the Werkzeug team.

Comment From: foretony5211

The issue has been resolved by downgrading the package with the following command: pip install Werkzeug==2.3.8.

I down grade the Werkzeug as 2.3.8 but no luck. What's your encoding option? My is following CSV_EXPORT = {"encoding": "utf-8-sig"}

Yes, this encoding. But did you face the same problem with VERSION 2.1.3 or 3.1.3? Also I got a Chinese characters not with the English.

It's worked after i reload the cerely service. Thanks a lot bro.

I use docker. How to reload the cerely service. thanks.

Comment From: wuqicyber

++ @TommyLeung-gj, could you confirm if this downgrade solves your issue or not? Also, what language are you using?

++ @bionexit, we appreciate your feedback on the language characters you encountered, to report to the Werkzeug team.

i've try downgraded Werkzeug==2.3.8, and it works, thanks

Comment From: ruifpedro

@glenn-gj-liang are you by chance downloading a CSV from a Table type chart which has server side paging enabled? I found that the server side pagination caused chinese characters / garbled characters to appear in the CSV file (will report this as a bug later).

Comment From: rusackas

Related, I think: https://github.com/apache/superset/pull/33720

Anyone able to reproduce thison 5.0.0 release candidates or on master branch?