Feature Type
-
[ ] Adding new functionality to pandas
-
[X] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
Observation
After upgrading to Pandas 2.2.0 calling read_sql_query
with a SQLAlchemy query on an SQLAlchemy connection yielded "Query must be a string unless using sqlalchemy."
without any indication why or what may have broken with the upgrade.
Analysis
Tracking the problem down it was found that pandasSQL_builder
(code) calls sqlalchemy = import_optional_dependency("sqlalchemy", errors="ignore")
to resolve the optional SQLAlchemy package (which was installed in my case)
Due to errors="ignore"
the violation of the minimal required version for SQLAlchemy does not lead to a warn or raise (code) but just silently returns with None
.
This in turn lets pandasSQL_builder
silently default to SQLiteDatabase(con)
.
Feature Description
Proposed improvement
Do not ignore the minimal version violation in import_optional_dependency
in this context but make it obvious. For example by introducing an additional "errors"-mode like import_optional_dependency("sqlalchemy", errors="raise-on-version-violation")
.
Alternative Solutions
./.
Additional Context
No response
Comment From: rhshadrach
Thanks for the report! It looks like this was changed in #45679 to allow for other dependencies. Perhaps the logic could be changed there so that when SQLAlchemy is used, we do check that the version is satisfied.
Further investigations and PRs to fix are welcome!
Comment From: luke396
I have attempted to recreate the issue in a fresh environment with pandas==2.2.0; sqlalchemy==2.0.25
, and the output seems to be correct. I may require assistance in reproducing the issue.
from sqlalchemy import create_engine
import pandas as pd
engine = create_engine("sqlite:///:memory:")
df = pd.DataFrame({
"id": [1, 2, 3],
"name": ["Alice", "Bob", "Charlie"]
})
df.to_sql("Person", engine, index=False)
sql_query = "SELECT * FROM Person"
result = pd.read_sql_query(sql_query, engine)
print(result)
id name
0 1 Alice
1 2 Bob
2 3 Charlie
Comment From: bebach
The issue is not that it works with SQLAlchemy 2 but that it gives no clue why it does not work with SQLAlchemy < 2.x
To replicate install SQLAlchemy==1.4 and rerun the code-snippet from above. This results in..
$ python pandas_test.py
D:\temp\pandas-issue-57178\pandas_test.py:10: UserWarning: pandas only supports SQLAlchemy connectable (engine/connection) or database string URI or sqlite3 DBAPI2 connection. Other DBAPI2 objects are not tested. Please consider using SQLAlchemy.
df.to_sql("Person", engine, index=False)
Traceback (most recent call last):
File "D:\temp\pandas-issue-57178\pandas_test.py", line 10, in <module>
df.to_sql("Person", engine, index=False)
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\util\_decorators.py", line 333, in wrapper
return func(*args, **kwargs)
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\core\generic.py", line 3081, in to_sql
return sql.to_sql(
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\io\sql.py", line 842, in to_sql
return pandas_sql.to_sql(
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\io\sql.py", line 2851, in to_sql
table.create()
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\io\sql.py", line 984, in create
if self.exists():
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\io\sql.py", line 970, in exists
return self.pd_sql.has_table(self.name, self.schema)
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\io\sql.py", line 2866, in has_table
return len(self.execute(query, [name]).fetchall()) > 0
File "D:\temp\pandas-issue-57178\venv\lib\site-packages\pandas\io\sql.py", line 2673, in execute
cur = self.con.cursor()
AttributeError: 'Engine' object has no attribute 'cursor'
which is surprising given that we just created the engine with SQLAlchemy. And it gives the user no clue that this is due to violation of version requirement.
It would be obvious what is the issue if the version check in import_optional_dependency
raises the
msg = (
f"Pandas requires version '{minimum_version}' or newer of '{parent}' "
f"(version '{version}' currently installed)."
)
https://github.com/pandas-dev/pandas/blob/f538741432edf55c6b9fb5d0d496d2dd1d7c2457/pandas/compat/_optional.py#L152
... instead of just silently returning None (which only later leads to the miss-leading error from above.)
Comment From: luke396
Seems related #57049, whether to support Sqlalchemy 1.4 is under discussion.
Comment From: turbotimon
I think this is related. It should state that sqlalchemy <2.0 is not supported instead of "not installed".
import sqlalchemy
print("sqlalchemy Version:", sqlalchemy.__version__)
pd.read_sql("select * from osm_point limit 1", connection_string)
sqlalchemy Version: 1.4.52
...
ImportError: Using URI string without sqlalchemy installed.
Comment From: SoulofAkuma
take
Comment From: SoulofAkuma
I have so far implemented something that will raise for import_optional_dependency
if the min_version
of what is defined for sqlalchemy
in requirements-dev.txt
is not met and `isinstance(con, str).
The part about what should be done for sqlalchemy.engine.Connectable
objects is a little more tricky. In an ideal case:
- There should be an import error for both str
and sqlalchemy.engine.Connectable
if sqlalchemy
is not found or does not meet the version requirements
- There should be no warning if sqlalchemy
is not required, no matter if it is installed
The problem is that I would either have to check whether the passed connection is of instance sqlalchemy.engine.Connectable
before actually calling import_optional_dependency
for sqlalchemy
or would have to do the version check manually after calling/call import_optional_dependency twice to do the version check for me and raise an error.
The latter options kind of go against what import_optional_dependency
is for.
But this seems to be an issue in multiple places that there is no paved path for how an optional import should be handled that should throw in case a condition is met that can only be available after the import. Any suggestions from contributors welcome, if I am missing something or someone has an idea.