Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
# BUG Behavior 1:
df = pd.DataFrame(np.zeros((4, 1)))
# error, expected
# ValueError: Expected a 1D array, got an array with shape (4, 2)
df['A'] = np.zeros((4, 2))
# no error, not expected
df["A"] = np.zeros((4, 2, 3))
print(df) # exception here
'''
>>> print(df)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/azuk/pandasdev/pandas/pandas/core/frame.py", line 1096, in __repr__
return self.to_string(**repr_params)
File "/home/azuk/pandasdev/pandas/pandas/core/frame.py", line 1273, in to_string
return fmt.DataFrameRenderer(formatter).to_string(
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1099, in to_string
string = string_formatter.to_string()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/string.py", line 30, in to_string
text = self._get_string_representation()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/string.py", line 45, in _get_string_representation
strcols = self._get_strcols()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/string.py", line 36, in _get_strcols
strcols = self.fmt.get_strcols()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 614, in get_strcols
strcols = self._get_strcols_without_index()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 878, in _get_strcols_without_index
fmt_values = self.format_col(i)
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 892, in format_col
return format_array(
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1295, in format_array
return fmt_obj.get_result()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1328, in get_result
fmt_values = self._format_strings()
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1576, in _format_strings
return list(self.get_result_as_array())
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1543, in get_result_as_array
formatted_values = format_values_with(float_format)
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1523, in format_values_with
result = _trim_zeros_float(values, self.decimal)
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1972, in _trim_zeros_float
while should_trim(trimmed):
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1969, in should_trim
numbers = [x for x in values if is_number_with_decimal(x)]
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1969, in <listcomp>
numbers = [x for x in values if is_number_with_decimal(x)]
File "/home/azuk/pandasdev/pandas/pandas/io/formats/format.py", line 1959, in is_number_with_decimal
return re.match(number_regex, x) is not None
File "/home/azuk/.conda/envs/pandas-dev/lib/python3.10/re.py", line 190, in match
return _compile(pattern, flags).match(string)
TypeError: cannot use a string pattern on a bytes-like object
'''
# BUG Behavior 2:
df = pd.DataFrame(np.zeros((4, 1)))
# ok
df['A'] = np.zeros((4, 1))
# no error, not expected here
# expcted ValueError: Expected a 1D array, got an array with shape (4, 2)
df['A'] = np.zeros((4, 2))
Issue Description
Input array demension is only checked in BlockManager.insert
.
https://github.com/pandas-dev/pandas/blob/f5a5c8d7f0d1501e5d8ff31b3b5f24c916137d9c/pandas/core/internals/managers.py#L1404-L1409
- The code only checks for 2d ndarray, so a >=3d ndarray can be set to crash DataFrame.
- The code only checks for inserting, so the value replacing for Series will not raise an Exception.
The issue shares a same reason for #51925 .
Expected Behavior
ValueException
should be raised for both situations.
Installed Versions
Comment From: adrien-berchet
I had the same issue, anything new on this?
Also, note that it only happens for numpy.array
objects. Casting it to list works properly:
df = pd.DataFrame({"a": np.zeros(4)})
df["b"] = np.zeros((4, 2, 3))
crashes as reported while
df = pd.DataFrame({"a": np.zeros(4)})
df["b"] = np.zeros((4, 2, 3)).tolist()
works and the DF is:
In [2]: df
Out[2]:
a b
0 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
1 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
2 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
3 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
which is what I wanted to achieve before I got this issue.
Finally, trying to use the constructor detects the issue and crashes with a more understandable error:
pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[3], line 1
----> 1 pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/frame.py:664, in DataFrame.__init__(self, data, index, columns, dtype, copy)
658 mgr = self._init_mgr(
659 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy
660 )
662 elif isinstance(data, dict):
663 # GH#38939 de facto copy defaults to False only in non-dict cases
--> 664 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager)
665 elif isinstance(data, ma.MaskedArray):
666 import numpy.ma.mrecords as mrecords
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:493, in dict_to_mgr(data, index, columns, dtype, typ, copy)
489 else:
490 # dtype check to exclude e.g. range objects, scalars
491 arrays = [x.copy() if hasattr(x, "dtype") else x for x in arrays]
--> 493 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:118, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate)
115 if verify_integrity:
116 # figure out the index, if necessary
117 if index is None:
--> 118 index = _extract_index(arrays)
119 else:
120 index = ensure_index(index)
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:653, in _extract_index(data)
651 raw_lengths.append(len(val))
652 elif isinstance(val, np.ndarray) and val.ndim > 1:
--> 653 raise ValueError("Per-column arrays must each be 1-dimensional")
655 if not indexes and not raw_lengths:
656 raise ValueError("If using all scalar values, you must pass an index")
ValueError: Per-column arrays must each be 1-dimensional
(and again, casting to a list also works properly in this case)
EDIT: I can reproduce this issue with pandas==1.5.3
and pandas==2.2.1
.
Comment From: determ1ne
I had the same issue, anything new on this?
Also, note that it only happens for
numpy.array
objects. Casting it to list works properly:
python df = pd.DataFrame({"a": np.zeros(4)}) df["b"] = np.zeros((4, 2, 3))
crashes as reported while
python df = pd.DataFrame({"a": np.zeros(4)}) df["b"] = np.zeros((4, 2, 3)).tolist()
works and the DF is:
python In [2]: df Out[2]: a b 0 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]] 1 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]] 2 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]] 3 0.0 [[0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]
which is what I wanted to achieve before I got this issue.
Finally, trying to use the constructor detects the issue and crashes with a more understandable error:
```python pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
ValueError Traceback (most recent call last) Cell In[3], line 1 ----> 1 pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/frame.py:664, in DataFrame.init(self, data, index, columns, dtype, copy) 658 mgr = self._init_mgr( 659 data, axes={"index": index, "columns": columns}, dtype=dtype, copy=copy 660 ) 662 elif isinstance(data, dict): 663 # GH#38939 de facto copy defaults to False only in non-dict cases --> 664 mgr = dict_to_mgr(data, index, columns, dtype=dtype, copy=copy, typ=manager) 665 elif isinstance(data, ma.MaskedArray): 666 import numpy.ma.mrecords as mrecords
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:493, in dict_to_mgr(data, index, columns, dtype, typ, copy) 489 else: 490 # dtype check to exclude e.g. range objects, scalars 491 arrays = [x.copy() if hasattr(x, "dtype") else x for x in arrays] --> 493 return arrays_to_mgr(arrays, columns, index, dtype=dtype, typ=typ, consolidate=copy)
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:118, in arrays_to_mgr(arrays, columns, index, dtype, verify_integrity, typ, consolidate) 115 if verify_integrity: 116 # figure out the index, if necessary 117 if index is None: --> 118 index = _extract_index(arrays) 119 else: 120 index = ensure_index(index)
File ~/.virtualenvs/AxonSynthesis/lib/python3.10/site-packages/pandas/core/internals/construction.py:653, in _extract_index(data) 651 raw_lengths.append(len(val)) 652 elif isinstance(val, np.ndarray) and val.ndim > 1: --> 653 raise ValueError("Per-column arrays must each be 1-dimensional") 655 if not indexes and not raw_lengths: 656 raise ValueError("If using all scalar values, you must pass an index")
ValueError: Per-column arrays must each be 1-dimensional ```
(and again, casting to a list also works properly in this case)
EDIT: I can reproduce this issue with
pandas==1.5.3
andpandas==2.2.1
.
The last code snippet pd.DataFrame({"a": np.zeros(4), "b": np.zeros((4, 2, 3))})
worked properly and raised the corresponding error. Casting to a list works because lists are always 1-d.
I didn't dig into how pandas deal with memory when PR #53367 is opened, but managed to block invalid DataFrame.__setitem__
as consistence to the ValueError you mentioned.
Comment From: ebo
I have started working with CryoSat-2 data, and the RADAR waveform is 3D, and while I can convert it with np.tolist, it would be nice if there was some way to use non 1-dimentional data within a DataFrame.
For reference, here is a trivial script to replicate. The data is available from ESA https://earth.esa.int/eogateway/missions/cryosat/data.
import pandas as pd from netCDF4 import Dataset
fname = "CS_OFFL_SIR_SAR_1B_20220302T004121_20220302T004737_E001.nc" with Dataset(fname, mode='r') as CS: test_df = pd.DataFrame({ 'Waveform' : CS.variables['pwr_waveform_20_ku'][:] })
This gives the error: "ValueError: Per-column arrays must each be 1-dimensional"
If anyone knows of any tricks to get Pandas to work with multidimentional data without casting to a list, please let me know.