Bug description

This bug describes the cause for a bug previously described in #22467 and later closed without explicit resolution. It offers a possible patch, but not a full fix.

This is the issue I encountered in superset v. 4.1.1 using Duckdb.

In the superset database connection (edit database --> Advanced --> Other additional settings), I wanted to pass some engine parameters, however this was failing.

What I found is that the parameters weren't being decoded correctly as part of the get_extra_params() method of the base database engine class. This meant that when duckdb.py calls that method to get the extra parameters, you don't get a properly formed nested dictionary, instead you get a dictionary with the first key and then all other info as a string for the value.

So, if my input was: {'connect_args': {'read_only': True}}

superset is passing back: {"engine_params":"{'connect_args': {'read_only': True}}"} (Note the extra quotes around the value) It should be passing: {"engine_params":{'connect_args': {'read_only': True}}}

This means that duckdb.py is failing on its get_extra_params() method because it's trying to operate on a string instead of a dictionary.

One solution that worked for me was to fix the immediate problem - this worked for my use case:

# /superset/db_engine_specs/duckdb.py

from ast import literal_eval

@staticmethod
    def get_extra_params(
        database: Database, source: QuerySource | None = None
    ) -> dict[str, Any]:
        """
        Add a user agent to be used in the requests.
        """
        extra: dict[str, Any] = BaseEngineSpec.get_extra_params(database)
        # >>> Start new code
        for key in extra.key():
            try:
                extra[key] = literal_eval(extra[key])
            except ValueError:
                pass
        # >>> End new code
        engine_params: dict[str, Any] = extra.setdefault("engine_params", {})
        connect_args: dict[str, Any] = engine_params.setdefault("connect_args", {})
        config: dict[str, Any] = connect_args.setdefault("config", {})
        custom_user_agent = config.pop("custom_user_agent", "")
        delim = " " if custom_user_agent else ""
        user_agent = get_user_agent(database, source)
        user_agent = user_agent.replace(" ", "-").lower()
        user_agent = f"{user_agent}/{VERSION_STRING}{delim}{custom_user_agent}"
        config.setdefault("custom_user_agent", user_agent)

        return extra

However, as literal_eval is just turning a presumed string into its representation (or passing if not a string) the best solution is probably to fix the root cause problem of the malformed json being passed in the first place. Unfortunately, I don't know the code base well enough to know where to make that change.

Also, I assume that this could be a problem for databases other than duckdb. Given that it appears to be a problem from the Database object and base database engine class.

Screenshots/recordings

No response

Superset version

master / latest-dev

Python version

3.9

Node version

16

Browser

Chrome

Additional context

No response

Checklist

  • [x] I have searched Superset docs and Slack and didn't find a solution to my problem.
  • [ ] I have searched the GitHub issue tracker and didn't find a similar bug report.
  • [ ] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.

Comment From: danlewis85

@sapcode - might be of interest to you if still relevant.

Comment From: dosubot[bot]

Hey @danlewis85! I'm here to assist you with any bugs, questions, or contributions while you wait for a human maintainer. Let me know how I can help! Dosu

To address the issue of malformed JSON/dictionary in the get_extra_params() method affecting the DuckDB connection, you should ensure that the extra field of the database object is correctly parsed as a JSON object before accessing it. The test case for the DuckDBEngineSpec class demonstrates how the get_extra_params method should handle the extra field:

from superset.db_engine_specs.duckdb import DuckDBEngineSpec

database = mocker.MagicMock()

# Case when extra is an empty dictionary
database.extra = {}
assert DuckDBEngineSpec.get_extra_params(database) == {
    "engine_params": {
        "connect_args": {
            "config": {"custom_user_agent": f"apache-superset/{VERSION_STRING}"}
        }
    }
}

# Case when extra is a JSON string
database.extra = json.dumps(
    {"engine_params": {"connect_args": {"config": {"custom_user_agent": "my-app"}}}}
)
assert DuckDBEngineSpec.get_extra_params(database) == {
    "engine_params": {
        "connect_args": {
            "config": {
                "custom_user_agent": f"apache-superset/{VERSION_STRING} my-app"
            }
        }
    }
}

Ensure that the extra field is correctly parsed using json.loads() if it is a JSON string. This will prevent issues with malformed JSON/dictionary structures [1].

Unfortunately, I didn't find any similar threads that might provide additional context or solutions.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Bug Report | Other

Comment From: rusackas

@betodealmeida @guenp might have interest/advice here too.

Comment From: rusackas

Is this still an issue as of 5.0.0 or on current master? We're certainly open to a PR and/or further investigation.