Bug description
This bug describes the cause for a bug previously described in #22467 and later closed without explicit resolution. It offers a possible patch, but not a full fix.
This is the issue I encountered in superset v. 4.1.1 using Duckdb.
In the superset database connection (edit database --> Advanced --> Other additional settings), I wanted to pass some engine parameters, however this was failing.
What I found is that the parameters weren't being decoded correctly as part of the get_extra_params()
method of the base database engine class. This meant that when duckdb.py calls that method to get the extra parameters, you don't get a properly formed nested dictionary, instead you get a dictionary with the first key and then all other info as a string for the value.
So, if my input was:
{'connect_args': {'read_only': True}}
superset is passing back:
{"engine_params":"{'connect_args': {'read_only': True}}"}
(Note the extra quotes around the value)
It should be passing:
{"engine_params":{'connect_args': {'read_only': True}}}
This means that duckdb.py is failing on its get_extra_params()
method because it's trying to operate on a string instead of a dictionary.
One solution that worked for me was to fix the immediate problem - this worked for my use case:
# /superset/db_engine_specs/duckdb.py
from ast import literal_eval
@staticmethod
def get_extra_params(
database: Database, source: QuerySource | None = None
) -> dict[str, Any]:
"""
Add a user agent to be used in the requests.
"""
extra: dict[str, Any] = BaseEngineSpec.get_extra_params(database)
# >>> Start new code
for key in extra.key():
try:
extra[key] = literal_eval(extra[key])
except ValueError:
pass
# >>> End new code
engine_params: dict[str, Any] = extra.setdefault("engine_params", {})
connect_args: dict[str, Any] = engine_params.setdefault("connect_args", {})
config: dict[str, Any] = connect_args.setdefault("config", {})
custom_user_agent = config.pop("custom_user_agent", "")
delim = " " if custom_user_agent else ""
user_agent = get_user_agent(database, source)
user_agent = user_agent.replace(" ", "-").lower()
user_agent = f"{user_agent}/{VERSION_STRING}{delim}{custom_user_agent}"
config.setdefault("custom_user_agent", user_agent)
return extra
However, as literal_eval
is just turning a presumed string into its representation (or passing if not a string) the best solution is probably to fix the root cause problem of the malformed json being passed in the first place. Unfortunately, I don't know the code base well enough to know where to make that change.
Also, I assume that this could be a problem for databases other than duckdb. Given that it appears to be a problem from the Database object and base database engine class.
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [ ] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [ ] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: danlewis85
@sapcode - might be of interest to you if still relevant.
Comment From: dosubot[bot]
Hey @danlewis85! I'm here to assist you with any bugs, questions, or contributions while you wait for a human maintainer. Let me know how I can help! Dosu
To address the issue of malformed JSON/dictionary in the get_extra_params()
method affecting the DuckDB connection, you should ensure that the extra
field of the database
object is correctly parsed as a JSON object before accessing it. The test case for the DuckDBEngineSpec
class demonstrates how the get_extra_params
method should handle the extra
field:
from superset.db_engine_specs.duckdb import DuckDBEngineSpec
database = mocker.MagicMock()
# Case when extra is an empty dictionary
database.extra = {}
assert DuckDBEngineSpec.get_extra_params(database) == {
"engine_params": {
"connect_args": {
"config": {"custom_user_agent": f"apache-superset/{VERSION_STRING}"}
}
}
}
# Case when extra is a JSON string
database.extra = json.dumps(
{"engine_params": {"connect_args": {"config": {"custom_user_agent": "my-app"}}}}
)
assert DuckDBEngineSpec.get_extra_params(database) == {
"engine_params": {
"connect_args": {
"config": {
"custom_user_agent": f"apache-superset/{VERSION_STRING} my-app"
}
}
}
}
Ensure that the extra
field is correctly parsed using json.loads()
if it is a JSON string. This will prevent issues with malformed JSON/dictionary structures [1].
Unfortunately, I didn't find any similar threads that might provide additional context or solutions.
To continue talking to Dosu, mention @dosu.
Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other
Comment From: rusackas
@betodealmeida @guenp might have interest/advice here too.
Comment From: rusackas
Is this still an issue as of 5.0.0 or on current master
? We're certainly open to a PR and/or further investigation.
Comment From: danlewis85
I've checked this on 5.0.0 and it is still an issue.
Comment From: danlewis85
Ok. I've figured outwhat's going on here by looking at the actual api call that gets made, there are 2 things happening, here is an example of what the api returns:
'{"allows_virtual_table_explore":true,"engine_params":"{\\"connect_args\\":{\\"read_only\\": false}}","metadata_cache_timeout":{"schema_cache_timeout":"10","table_cache_timeout":"10"}}'
- There is an implicit expectation of json formatting for the parameters, not python. This means: you must use double quotes, you can't leave trailing commas, and booleans must be lower case.
- This is probably user error, even if it feels implicitly like you should be using python here. Logically I assumed python knowing it's sqlalchemy underneath. The biggest gotcha is lower casing booleans.
- The engine parameters themselves get added as a string, which doesn't get decoded as it's effectively a nested json.
There are 2 logical paths to fix this, you could change the api response, but I think the easiest is probably to update: BaseEngineSpec.get_extra_params()
Currently this effectively just calls json.loads()
, we could update this to recursively load jsons, e.g.:
def _hook(obj):
for key, value in obj.items():
if isinstance(value, str):
try:
obj[key] = json.loads(value, object_hook=_hook)
except ValueError:
pass
return obj
json.loads(val, object_hook=_hook)
If this makes sense to you I can submit a PR. The thing I don't know is whether this recursion would break anything for other databases where you might have a json-like object that you don't want decoding.
Comment From: danlewis85
The design to key engine parameters specifically as a string appears purposeful. This means it could probably be handled specifically rather than recursing over all key-value pairs.
Comment From: danlewis85
Closing as couldn't reproduce issue in master branch. Appears to work fine. Will reopen if it reappears.