Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import gcsfs
fs = gcsfs.GCSFileSystem(project="***")
path = 'gs://***/2025-09-01T00_15_30_events.parquet'
df = pd.read_parquet(path, filesystem=fs)
Issue Description
When parquet files contain logical type like map, read_parquet
behaviors differently on local parquet file and gcs file.
The same parquet file can be loaded as pandas df from local, but got error when load from gcs:
ArrowTypeError: Unable to merge: Field type has incompatible types: binary vs dictionary<values=string, indices=int32, ordered=0>
load from local file:
import pandas as pd
path = '2025-09-01T00_15_30_events.parquet'
df = pd.read_parquet(path)
print("Row count:", len(df))
---------------------------------------------------------------------------
Row count: 327827
load from gcs:
import pandas as pd
import gcsfs
fs = gcsfs.GCSFileSystem(project="***")
path = 'gs://***/2025-09-01T00_15_30_events.parquet'
df = pd.read_parquet(path, filesystem=fs)
---------------------------------------------------------------------------
ArrowTypeError Traceback (most recent call last)
Cell In[6], line 5
3 fs = gcsfs.GCSFileSystem(project="***")
4 path = 'gs://****/2025-09-01T00_15_30_events.parquet'
----> 5 df = pd.read_parquet(path, filesystem=fs)
File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pandas/io/parquet.py:669, in read_parquet(path, engine, columns, storage_options, use_nullable_dtypes, dtype_backend, filesystem, filters, **kwargs)
666 use_nullable_dtypes = False
667 check_dtype_backend(dtype_backend)
--> 669 return impl.read(
670 path,
671 columns=columns,
672 filters=filters,
673 storage_options=storage_options,
674 use_nullable_dtypes=use_nullable_dtypes,
675 dtype_backend=dtype_backend,
676 filesystem=filesystem,
677 **kwargs,
678 )
File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pandas/io/parquet.py:265, in PyArrowImpl.read(self, path, columns, filters, use_nullable_dtypes, dtype_backend, storage_options, filesystem, **kwargs)
258 path_or_handle, handles, filesystem = _get_path_or_handle(
259 path,
260 filesystem,
...
File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pyarrow/error.pxi:155, in pyarrow.lib.pyarrow_internal_check_status()
File ~/Apps/python_virtual_envs/cursor/lib/python3.13/site-packages/pyarrow/error.pxi:92, in pyarrow.lib.check_status()
ArrowTypeError: Unable to merge: Field type has incompatible types: binary vs dictionary<values=string, indices=int32, ordered=0>
Expected Behavior
The read_parquet
should be able to load the same GCS parquet file same as loading from local file.
Installed Versions
INSTALLED VERSIONS
------------------
commit : c888af6d0bb674932007623c0867e1fbd4bdc2c6
python : 3.13.3
python-bits : 64
OS : Darwin
OS-release : 24.6.0
Version : Darwin Kernel Version 24.6.0: Mon Jul 14 11:30:40 PDT 2025; root:xnu-11417.140.69~1/RELEASE_ARM64_T6041
machine : arm64
processor : arm
byteorder : little
LC_ALL : None
LANG : None
LOCALE : en_US.UTF-8
pandas : 2.3.1
numpy : 2.3.2
pytz : 2025.2
dateutil : 2.9.0.post0
pip : 25.2
Cython : None
sphinx : None
IPython : 9.5.0
adbc-driver-postgresql: None
...
zstandard : 0.23.0
tzdata : 2025.2
qtpy : None
pyqt5 : None