Bug description

Hi All

We are creating a superset dataset and creating a chart from the dataset, we are creating a dataset from the parquet file, using duckdb engine.

When our parquet file is empty (But columns exists) and we are creating the dataset using the empty parquet. We use this dataset to create a chart Once chart is created, chart will show the error saying that columns are missing. Because my parquet file is empty, superset is not able to read the column, if parquet/dataset records are zero.

Please help us how we can fix this issue in superset.

Please find the screenshot for referance

Image

Screenshots/recordings

No response

Superset version

4.1.2

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • [ ] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [ ] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [ ] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

This happens because Superset relies on the database engine (in this case, DuckDB via SQLAlchemy) to provide column metadata. When your Parquet file is empty, DuckDB/SQLAlchemy may not return any columns, so Superset treats the dataset as having no columns, which leads to the error when building charts. This is a known limitation and has been discussed in the community—other users have run into similar issues when syncing columns for datasets that return no data [discussion].

A common workaround is to temporarily add a dummy row to your Parquet file or use a fake query that always returns at least one row, sync the columns in Superset, and then revert to your original (empty) dataset [issue]. Superset currently does not have built-in logic to infer or synthesize columns from an empty table; it simply exposes what the database engine reports [code reference].

If you need to automate this, you could script the addition and removal of a dummy row as part of your data pipeline. Otherwise, keep an eye on updates to DuckDB and Superset, as improvements in schema introspection for empty files may be added in the future.

To reply, just mention @dosu.


How did I do? Good | Irrelevant | Incorrect | Verbose | Hallucination | Report 🐛 | Other  Join Discord Share on X

Comment From: rusackas

I think the bot is right, but paging Dr. @betodealmeida in case they have a better idea of how we can support this.

Comment From: nagarajmmu

Hi @rusackas

We are using DUCK DB to process the data, as we verified DUCKDB is always return the column names, even though data is Zero. But superset is not able to recognize the columns, if data is zero in table.

If superset is able to get the columns form the underlying dataset, when data is zero, we will not face this issue.

Adding dummy records will mislead the user how this data came in to picture, this where we are bit hesitate to use the dummy row when there is zero data.

As of now we are planning to use below query, please let me know, is this recommended to use or also let us know, if this is not the right way to use query (Considering performance).

Query planning to use: Below query will add the null row, if data is zero, else it will show only data

SELECT * FROM your_table
WHERE EXISTS (SELECT 1 FROM your_table limit 1)
UNION BY NAME
SELECT NULL as <Existing_Column>
WHERE NOT EXISTS (SELECT 1 FROM your_table limit 1);

Thanks Nagaraj M M