Bug description

Describe the bug When configuring a BigQuery database connection in Superset using a service account, the generated SQL queries fail if the service account's project ID is different from the project ID where the target BigQuery dataset and table reside.

The BigQuery connector appears to generate table references in the format `dataset`.`table_name`. While this works when querying tables within the same project as the service account, cross-project queries require the format `projectId`.`dataset`.`table_name`. The missing project ID in the generated query prevents successful execution.

To Reproduce Steps to reproduce the behavior: 1. Configure a new BigQuery database connection in Superset. 2. Select "Service Account" as the authentication method. 3. Modify SQLAlchemy to seperate billing_project_id and project_id. It still waiting approve 4. Use a service account whose project ID is project-A. 5. Add a dataset and table from a different project, say project-B, to Superset. 6. Attempt to run a query or create a chart using the table from project-B.

Actual behavior The query fails with an error (likely related to permissions or table not found), because the generated SQL omits the required project ID for cross-project access, like:

SELECT `column` FROM `dataset`.`table_name`

Proposed Solution A query that includes the project ID format works correctly:

 SELECT `example` AS `example` FROM `projectId`.`dataset`.`table_name`

Environment:

Superset version:4.1.2

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [x] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [x] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: dosubot[bot]

To address the issue of missing project IDs in SQL queries for cross-project access in BigQuery when using a service account, you need to ensure that the SQLAlchemy URI is correctly configured to include the project ID. This can be done by setting the sqlalchemy_uri_placeholder to include the project ID format:

sqlalchemy_uri_placeholder = "bigquery://{project_id}"

Additionally, when building the SQLAlchemy URI, ensure that the build_sqlalchemy_uri method uses the project_id from the service account credentials. The encrypted_extra should contain valid service account credentials with the project_id field. This setup allows Superset to construct the correct SQLAlchemy URI for cross-project access in BigQuery [1].

If the SQLAlchemy modification to separate billing_project_id and project_id is still pending approval, you might need to manually adjust the query format or wait for the update to be merged and released.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Comment From: rusackas

@mistercrunch @Vitor-Avila got any pro tips on this one?

Comment From: mistercrunch

Maybe not ideal, but you should be able to create a virtual dataset that's a simple

 SELECT `example` AS `example` FROM `projectId`.`dataset`.`table_name`

Comment From: esgungor

@mistercrunch Yes, that works! However, if there are any limitations on virtual dataset, the query generation should be modified. Will it broke smth?

Comment From: Vitor-Avila

I believe this is handled if you have Allow changing catalogs enabled and then create a physical dataset selecting the appropriate catalog?

Comment From: mistercrunch

All features should work, but Superset will run predicates against the SELECT * FROM as a subquery, which can have perf implications.

SELECT dim1, count(1)
FROM (SELECT * FROM tbl)
WHERE {{ data_filters }}
GROUP BY 1

Some engines will generate the right plans (pushing the predicate on the inner table operator), some won't, unclear what snowflake will do.

Comment From: betodealmeida

I have a fix for this here: https://github.com/apache/superset/pull/34360