Bug description
Describe the bug When configuring a BigQuery database connection in Superset using a service account, the generated SQL queries fail if the service account's project ID is different from the project ID where the target BigQuery dataset and table reside.
The BigQuery connector appears to generate table references in the format `dataset`.`table_name`
. While this works when querying tables within the same project as the service account, cross-project queries require the format `projectId`.`dataset`.`table_name`
. The missing project ID in the generated query prevents successful execution.
To Reproduce
Steps to reproduce the behavior:
1. Configure a new BigQuery database connection in Superset.
2. Select "Service Account" as the authentication method.
3. Modify SQLAlchemy to seperate billing_project_id and project_id. It still waiting approve
4. Use a service account whose project ID is project-A
.
5. Add a dataset and table from a different project, say project-B
, to Superset.
6. Attempt to run a query or create a chart using the table from project-B
.
Actual behavior The query fails with an error (likely related to permissions or table not found), because the generated SQL omits the required project ID for cross-project access, like:
SELECT `column` FROM `dataset`.`table_name`
Proposed Solution A query that includes the project ID format works correctly:
SELECT `example` AS `example` FROM `projectId`.`dataset`.`table_name`
Environment:
Superset version:4.1.2
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
To address the issue of missing project IDs in SQL queries for cross-project access in BigQuery when using a service account, you need to ensure that the SQLAlchemy URI is correctly configured to include the project ID. This can be done by setting the sqlalchemy_uri_placeholder
to include the project ID format:
sqlalchemy_uri_placeholder = "bigquery://{project_id}"
Additionally, when building the SQLAlchemy URI, ensure that the build_sqlalchemy_uri
method uses the project_id
from the service account credentials. The encrypted_extra
should contain valid service account credentials with the project_id
field. This setup allows Superset to construct the correct SQLAlchemy URI for cross-project access in BigQuery [1].
If the SQLAlchemy modification to separate billing_project_id
and project_id
is still pending approval, you might need to manually adjust the query format or wait for the update to be merged and released.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: rusackas
@mistercrunch @Vitor-Avila got any pro tips on this one?
Comment From: mistercrunch
Maybe not ideal, but you should be able to create a virtual dataset that's a simple
SELECT `example` AS `example` FROM `projectId`.`dataset`.`table_name`
Comment From: esgungor
@mistercrunch Yes, that works! However, if there are any limitations on virtual dataset, the query generation should be modified. Will it broke smth?
Comment From: Vitor-Avila
I believe this is handled if you have Allow changing catalogs enabled and then create a physical dataset selecting the appropriate catalog?
Comment From: mistercrunch
All features should work, but Superset will run predicates against the SELECT * FROM
as a subquery, which can have perf implications.
SELECT dim1, count(1)
FROM (SELECT * FROM tbl)
WHERE {{ data_filters }}
GROUP BY 1
Some engines will generate the right plans (pushing the predicate on the inner table operator), some won't, unclear what snowflake will do.
Comment From: betodealmeida
I have a fix for this here: https://github.com/apache/superset/pull/34360