Pandas BUG: SQL connection - Aurora Blog|java/go/python

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

from sqlalchemy import create_engine
import pandas as pd

engine = create_engine('mysql+pymysql://user:pwd@host/Data', echo=True)
with db.engine.begin() as conn:
    df = pd.read_sql_table('LocationIDs', conn.conection)

Traceback (most recent call last):
  File "example.py", line 148, in <module>
    df = pd.read_sql_table('LocationIDs', conn.connection)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniforge3/envs/web/lib/python3.11/site-packages/pandas/io/sql.py", line 385, in read_sql_table
    if not pandas_sql.has_table(table_name):
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniforge3/envs/web/lib/python3.11/site-packages/pandas/io/sql.py", line 2865, in has_table
    return len(self.execute(query, [name]).fetchall()) > 0
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/miniforge3/envs/web/lib/python3.11/site-packages/pandas/io/sql.py", line 2686, in execute
    raise ex from exc
pandas.errors.DatabaseError: Execution failed on sql '
        SELECT
            name
        FROM
            sqlite_master
        WHERE
            type IN ('table', 'view')
            AND name=?;
        ': not all arguments converted during string formatting
python-BaseException

Issue Description

The connection to the database does not work. Moreover, this error is similar for all SQL methods that use the internal method pandasSQL_builder(). https://github.com/pandas-dev/pandas/blob/d96646219618e007f64ee49e0a6e20f4aea761b5/pandas/io/sql.py#L870
Although I explicitly specify a working MySQL connection, the internal pandasSQL_builder() method changes it to an SQLite connection without any notification. Only after several hours of research and googling I noticed on stackoverflow a post about this problem and guessed that the problem refers with SQLite.
pandasSQL_builder() requires con to be a sqlalchemy.engine.Connectable instance . Such instances in sqlalchemy are the engine and conn objects (with db.engine.begin() as conn:), but they do not have a cursor object, which is required. Whereas conn.connection has a cursor object, but it is rejected because it is not a Connectable instance. As a result, connecting via sqlalchemy is impossible. https://github.com/pandas-dev/pandas/blob/d96646219618e007f64ee49e0a6e20f4aea761b5/pandas/io/sql.py#L890
Also pandasSQL_builder() does not detect the sqlalchemy module I have installed. This detection is necessary for the above check and for the ability to specify the database URL in the form of a literal (like pd.read_sql_table(tablename, con='mysql+pymysql://user:pwd@host/Data' )). https://github.com/pandas-dev/pandas/blob/d96646219618e007f64ee49e0a6e20f4aea761b5/pandas/io/sql.py#L885

As a result, no connection via sqlalchemy is possible.

I have a suspicion that the problem may be partitial linked that I'm using SQLAlchemy v1.4.49, as required by some of my code's dependencies, since many Python modules stlll not support SQLAlchemy v2. But then I would like Pandas support for the previous version of Sqlalchemy, or locking it as a dependency on a specific version up to which Pandas has support.

Also I found issue in the method that looks for installed SQLAlchemy, but it doesn't find it. This detects the SQLAlchemy installed. But while the minimum version is v2 set in it (VERSIONS = {"sqlalchemy": "2.0.0",}) and a notification is assigned there that Pandas requires a version no lower than this (msg at line 175). But since the detection is made with the ignore argument (sqlalchemy = import_optional_dependency("sqlalchemy", errors="ignore") and mod = import_optional_dependency(modname, errors="ignore") in pd.show_versions()), the message is not shown and is returned the result is None, as if no installed SQLAlchemy was found. https://github.com/pandas-dev/pandas/blob/d96646219618e007f64ee49e0a6e20f4aea761b5/pandas/compat/_optional.py#L171

Expected Behavior

Should work: * Connect via connection SQLAlchemy. * Connect via literal database URL. * There should be no implicit silent connection to non-existent SQLite, when a MySQL (or other) connection is explicitly specified. * Should show SQLAlchemy installed, whereas currently it shows no installation when SQLAlchemy is installed. * If Pandas requires SQLAlchemy to be a "minimum version" of 2.0.0, this must be reported.

Installed Versions

pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : 16.1.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Comment From: vladiscripts

Curiously, another method works: df = pd.read_sql('SELECT * FROM LocationIDs', conn.connection). Although it also does not define the connection correctly, PandasSQL_builder() gives the MySQL connection the wrong sqlite class SQLiteDatabase. https://github.com/pandas-dev/pandas/blob/d96646219618e007f64ee49e0a6e20f4aea761b5/pandas/io/sql.py#L690

Comment From: Animesh-Shukla

take