Bug description
- I tried connecting elastic search to store metadata, but it showed a Parsing error.
- in the second attempt, I started with the Postgres database but tried to add elastic search from the UI, and I succeeded with that, but when I tried to add a dataset from elastic search, the columns that are to typ array were not listed and not able to get that data.
- When I try to select data from SQL Lab in superset UI times, I am getting errors like
error: Error (invalid_argument_exception): {'error': {'root_cause': [{'type': 'invalid_argument_exception', 'reason': 'Arrays (returned by [actors]) are not supported'}], 'type': 'invalid_argument_exception', 'reason': 'Arrays (returned by [actors]) are not supported'}, 'status': 400}
elastic search doc sample
{
"_index": "movies",
"_id": "1",
"_score": 1,
"_source": {
"name": "Inception",
"actors": [
"Leonardo DiCaprio",
"Joseph Gordon-Levitt",
"Ellen Page"
],
"director": "Christopher Nolan",
"rating": 8.8,
"genre": "Sci-Fi",
"budget": 160000000,
"box_office": 828300000,
"length": 148,
"release_year": 2010
}
}
How to reproduce the bug
Connect with elastic search and create nested document and try to create report from that All the nested field is not going to present int that dataset
Screenshots/recordings
Superset version
master / latest-dev
Python version
3.10
Node version
16
Browser
Chrome
Additional context
No response
Checklist
- [X] I have searched Superset docs and Slack and didn't find a solution to my problem.
- [X] I have searched the GitHub issue tracker and didn't find a similar bug report.
- [X] I have checked Superset's logs for errors and if I found a relevant Python stacktrace, I included it here as text in the "additional context" section.
Comment From: dosubot[bot]
Hey @santoshmallah! I'm here to help you with any bugs, questions, or contributions you have for the repository. Let me know how I can assist you!
To resolve the issue of unsupported array types in Elasticsearch when trying to create a report from a dataset in Apache Superset, you can modify the SupersetResultSet
class to handle nested array types by converting them to strings. This is already partially implemented in the provided code. Specifically, you need to ensure that nested types are stringified before being added to the pa_data
list.
Here is the relevant part of the code that handles this:
if pa_data: # pylint: disable=too-many-nested-blocks
for i, column in enumerate(column_names):
if pa.types.is_nested(pa_data[i].type):
# TODO: revisit nested column serialization once nested types
# are added as a natively supported column type in Superset
# (superset.utils.core.GenericDataType).
stringified_arr = stringify_values(array[column])
pa_data[i] = pa.array(stringified_arr.tolist())
This code snippet checks if the column type is nested and then converts the values to strings using the stringify_values
function. This should help in resolving the issue with unsupported array types in Elasticsearch [1].
Additionally, the ElasticSearchEngineSpec
class in superset/db_engine_specs/elasticsearch.py
provides specific configurations and methods for handling Elasticsearch data types and exceptions. For instance, it includes methods to convert datetime types and handle specific Elasticsearch exceptions:
class ElasticSearchEngineSpec(BaseEngineSpec): # pylint: disable=abstract-method
engine = "elasticsearch"
engine_name = "ElasticSearch (SQL API)"
time_groupby_inline = True
allows_joins = False
allows_subqueries = True
allows_sql_comments = False
@classmethod
def get_dbapi_exception_mapping(cls) -> dict[type[Exception], type[Exception]]:
import es.exceptions as es_exceptions
return {
es_exceptions.DatabaseError: SupersetDBAPIDatabaseError,
es_exceptions.OperationalError: SupersetDBAPIOperationalError,
es_exceptions.ProgrammingError: SupersetDBAPIProgrammingError,
}
@classmethod
def convert_dttm(
cls, target_type: str, dttm: datetime, db_extra: Optional[dict[str, Any]] = None
) -> Optional[str]:
db_extra = db_extra or {}
sqla_type = cls.get_sqla_column_type(target_type)
if isinstance(sqla_type, types.DateTime):
es_version = db_extra.get("version")
supports_dttm_parse = False
try:
if es_version:
supports_dttm_parse = Version(es_version) >= Version("7.8")
except Exception as ex: # pylint: disable=broad-except
logger.error("Unexpected error while convert es_version", exc_info=True)
logger.exception(ex)
if supports_dttm_parse:
datetime_formatted = dttm.isoformat(sep=" ", timespec="seconds")
return (
f"""DATETIME_PARSE('{datetime_formatted}', 'yyyy-MM-dd HH:mm:ss')"""
)
return f"""CAST('{dttm.isoformat(timespec="seconds")}' AS DATETIME)"""
return None
By ensuring that nested types are converted to strings and leveraging the specific configurations in ElasticSearchEngineSpec
, you can avoid issues with unsupported array types in Elasticsearch [2].
To continue talking to Dosu, mention @dosu.
Comment From: rusackas
Is this still an issue in 4.1.2/5.0.0rc2? This has gone silent for 250+ days, so I'm tempted to close it as stale, but I think it's mostly due to a shortage of (known/volunteering) ElasticSearch users that can repro/diagnose these things.
Comment From: rusackas
Closing as inactive/not-planned, but MORE than happy to revisit/reopen if anyone finds this to be the case on current master
and/or 5.0.0