Bug description
I have added 2 data source. One ElaticSearch (named as es) and one MySQL (named as mysql). I have enabled the "querying across database" feature using the feature FLAG as described in "https://superset.apache.org/docs/configuration/databases/#querying-across-databases" and named the meta db as "superset_meta"
Now when I go to sqllab and select "superset_meta" and then a simple query like below
SELECT
customer_id as CustomerID,
customer_name as CustomerName
FROM
"es.customer_availability"
I get below error
Superset meta database Error
superset error: 'str' object has no attribute 'tzinfo'
This may be triggered by:
Issue 1002 - The database returned an unexpected error.
And in the docker-compose logs, I see below
superset_app | 2024-07-12 09:52:20,424:DEBUG:superset.stats_logger:[stats_logger] (timing) sqllab.query.time_executing_query | 1077.662841796875
superset_app | 2024-07-12 09:52:20,427:DEBUG:superset.sql_lab:Query 92: 'str' object has no attribute 'tzinfo'
superset_app | 2024-07-12 09:52:20,450:DEBUG:superset.stats_logger:[stats_logger] (incr) SqlLabRestApi.execute_sql_query.error
superset_app | 2024-07-12 09:52:20,451:WARNING:superset.views.base:SupersetErrorsException
superset_app | Traceback (most recent call last):
superset_app | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1823, in full_dispatch_request
superset_app | rv = self.dispatch_request()
superset_app | File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1799, in dispatch_request
superset_app | return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
superset_app | File "/usr/local/lib/python3.10/site-packages/flask_appbuilder/security/decorators.py", line 95, in wraps
superset_app | return f(self, *args, **kwargs)
superset_app | File "/app/superset/views/base_api.py", line 127, in wraps
superset_app | raise ex
superset_app | File "/app/superset/views/base_api.py", line 121, in wraps
superset_app | duration, response = time_function(f, self, *args, **kwargs)
superset_app | File "/app/superset/utils/core.py", line 1463, in time_function
superset_app | response = func(*args, **kwargs)
superset_app | File "/app/superset/views/base_api.py", line 93, in wraps
superset_app | return f(self, *args, **kwargs)
superset_app | File "/app/superset/utils/log.py", line 255, in wrapper
superset_app | value = f(*args, **kwargs)
superset_app | File "/app/superset/sqllab/api.py", line 409, in execute_sql_query
superset_app | command_result: CommandResult = command.run()
superset_app | File "/app/superset/commands/sql_lab/execute.py", line 121, in run
superset_app | raise ex
superset_app | File "/app/superset/commands/sql_lab/execute.py", line 103, in run
superset_app | status = self._run_sql_json_exec_from_scratch()
superset_app | File "/app/superset/commands/sql_lab/execute.py", line 163, in _run_sql_json_exec_from_scratch
superset_app | raise ex
superset_app | File "/app/superset/commands/sql_lab/execute.py", line 158, in _run_sql_json_exec_from_scratch
superset_app | return self._sql_json_executor.execute(
superset_app | File "/app/superset/sqllab/sql_json_executer.py", line 111, in execute
superset_app | if data["errors"]: # type: ignore
superset_app | superset.exceptions.SupersetErrorsException: [SupersetError(message="superset error: 'str' object has no attribute 'tzinfo'", error_type=<SupersetErrorType.GENERIC_DB_ENGINE_ERROR: 'GENERIC_DB_ENGINE_ERROR'>, level=<ErrorLevel.ERROR: 'error'>, extra={'engine_name': 'Superset meta database', 'issue_codes': [{'code': 1002, 'message': 'Issue 1002 - The database returned an unexpected error.'}]})]
superset_app | 172.18.0.1 - - [12/Jul/2024:09:52:20 +0000] "POST /api/v1/sqllab/execute/ HTTP/1.1" 500 297 "http://127.0.0.1:8088/sqllab/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
superset_app | 2024-07-12 09:52:21,530:DEBUG:superset.stats_logger:[stats_logger] (incr) log
superset_app | 172.18.0.1 - - [12/Jul/2024:09:52:21 +0000] "POST /superset/log/?explode=events HTTP/1.1" 200 1 "http://127.0.0.1:8088/sqllab/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/126.0.0.0 Safari/537.36"
In the logs, I can clearly see that it fetched the data but for some reason it is trying to find tzinfo
in the fetched json
from the elasticsearch data source.
Also in the logs, when it stared to process this query, it made a call like below
superset_app | 2024-07-12 09:52:20,052:DEBUG:elasticsearch:> {"query":"SHOW COLUMNS FROM \"customer_availability\"","fetch_size":10000}
And after this I see the log showing the fetched columns.
I am not sure why is it still trying to look for tzinfo
.
Please let me know if any more information is required from my side related to this issue.
How to reproduce the bug
- Enable
ENABLE_SUPERSET_META_DB
insuperset_config.py
- Add
elasticsearch-dbapi
indocker/requirements-local.txt
- Start docker
docker compose -f docker-compose-image-tag.yml up
- Add ElasticSearch data source and MySQL data source and
superset
meta db - Run query as shown in the main description
Screenshots/recordings
No response
Superset version
master / latest-dev
Python version
3.9
Node version
18 or greater
Browser
Chrome
Additional context
No response
Checklist
- [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
@mudasirmirza Hello! I'm here to assist you with any issues you're facing. Let's work together to resolve the problem.
The error 'str' object has no attribute 'tzinfo'
occurs because the code expects a datetime object with timezone information but encounters a string instead. To resolve this, you need to ensure that datetime objects in your data have the tzinfo
attribute set correctly.
Here are the steps to fix this issue:
- Preprocess Data to Ensure Timezone Information: Convert string representations of datetime to actual datetime objects with timezone information before they are processed. You can modify the data preprocessing step as follows:
```python from datetime import datetime import pytz
def preprocess_data(data): for row in data: for i, value in enumerate(row): if isinstance(value, str): try: # Attempt to parse the string as a datetime object dt = datetime.fromisoformat(value) # Add timezone information if missing if dt.tzinfo is None: dt = dt.replace(tzinfo=pytz.UTC) row[i] = dt except ValueError: # If parsing fails, keep the original string pass return data
# Example usage data = [ ("2023-10-01T12:00:00", "some other value"), ("2023-10-02T13:00:00+00:00", "another value"), ]
preprocessed_data = preprocess_data(data) ```
- Ensure
convert_dttm
Method Handles Datetime Objects Correctly: Theconvert_dttm
method in theElasticSearchEngineSpec
class should convert datetime objects to the appropriate string format for Elasticsearch queries. Here is the relevant implementation:
```python class ElasticSearchEngineSpec(BaseEngineSpec): # pylint: disable=abstract-method engine = "elasticsearch" engine_name = "ElasticSearch (SQL API)" time_groupby_inline = True allows_joins = False allows_subqueries = True allows_sql_comments = False
@classmethod
def convert_dttm(
cls, target_type: str, dttm: datetime, db_extra: Optional[dict[str, Any]] = None
) -> Optional[str]:
db_extra = db_extra or {}
sqla_type = cls.get_sqla_column_type(target_type)
if isinstance(sqla_type, types.DateTime):
es_version = db_extra.get("version")
supports_dttm_parse = False
try:
if es_version:
supports_dttm_parse = Version(es_version) >= Version("7.8")
except Exception as ex: # pylint: disable=broad-except
logger.error("Unexpected error while convert es_version", exc_info=True)
logger.exception(ex)
if supports_dttm_parse:
datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds")
return (
f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')"""
)
return f"""CAST('{dttm.isoformat(timespec="seconds")}' AS DATETIME)"""
return None
```
By ensuring that your datetime objects have the tzinfo
attribute set and are correctly formatted for Elasticsearch queries, you can avoid the 'str' object has no attribute 'tzinfo'
error when querying an Elasticsearch data source in Apache Superset [1][2].
To continue talking to Dosu, mention @dosu.
Comment From: mudasirmirza
Just to add to it. If I run the same query directly selecting "ElasticSearch" as the data source and selecting the "Index"
SELECT
customer_id as CustomerID,
customer_name as CustomerName
FROM
"customer_availability"
This query returns proper results.
Comment From: hardikjain178
Hello, were you able to solve the above issue? I am also running into the same problem when trying to get elasticsearch table data using superset meta db. The query works fine when using "ElasticSearch" as a datasource. Let me know if you were able to solve it and how?
Comment From: mudasirmirza
Hi @hardikjain178 ,
I was unable to resolve this and because of this I just dropped working on this.
Comment From: rusackas
Is anyone still facing this, and able to replicate it in 4.1.2/5.0.0? I don't have Elasticsearch access, and @dpgaspar is the only one mentioned in the rolodex who does. Feel free to add yourselves to the list!
Comment From: rusackas
Well, unfortunately, I think we're going to have to call this one "inactive" since it hasn't seen any real action or corroboration in nearly a year. If anyone is still facing this in 5.0 or on master
we can certainly revisit/reopen at any time. Thanks!