Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas 
import numpy as np 
p = np.array([
              [1,2,3], 
              [1,2,3]
             ])

df = pandas.DataFrame(columns=['n', 'p'])
df['n'] = np.array([0,0])
df['p'] = p 

print(df)
   n  p
0  0  1
1  0  1

Issue Description

Passing the columns as separate arrays, but one of the arrays has the wrong dimensions Nx2 instead of Nx1. In that case, the dataframe column 'p' can be assigned that Nx2 array, but only the first column of the array is actually assigned to 'p'.

While this is hard to happen by accident, it's not impossible.

Expected Behavior

Either store each row of the array to the corresponding row of the dataframe or raise a warning/error for trying to store a NxM array as a column.

See for example:

import pandas 
import numpy as np 

p =          [
              [1,2], 
              [1,2]
             ]

df = pandas.DataFrame(columns=['n', 'p'])
df['n'] = [0,0]
df['p'] = p 

print(df)
   n       p
0  0  [1, 2]
1  0  [1, 2]

Installed Versions

INSTALLED VERSIONS ------------------ commit : bdc79c146c2e32f2cab629be240f01658cfb6cc2 python : 3.10.12.final.0 python-bits : 64 OS : Linux OS-release : 6.1.58+ Version : #1 SMP PREEMPT_DYNAMIC Sat Nov 18 15:31:17 UTC 2023 machine : x86_64 processor : x86_64 byteorder : little LC_ALL : en_US.UTF-8 LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.1 numpy : 1.25.2 pytz : 2023.4 dateutil : 2.8.2 setuptools : 67.7.2 pip : 23.1.2 Cython : 3.0.8 pytest : 7.4.4 hypothesis : None sphinx : 5.0.2 blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.4 html5lib : 1.1 pymysql : None psycopg2 : 2.9.9 jinja2 : 3.1.3 IPython : 7.34.0 pandas_datareader : 0.10.0 adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.3 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2023.6.0 gcsfs : 2023.6.0 matplotlib : 3.7.1 numba : 0.58.1 numexpr : 2.9.0 odfpy : None openpyxl : 3.1.2 pandas_gbq : 0.19.2 pyarrow : 14.0.2 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.11.4 sqlalchemy : 2.0.28 tables : 3.8.0 tabulate : 0.9.0 xarray : 2023.7.0 xlrd : 2.0.1 zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None

Comment From: rhshadrach

Thanks for the report. The current behavior does look problematic to me, and the expected behavior seems reasonable, but I think this needs some more discussion.

Comment From: jbrockmendel

Should raise

Comment From: TinusChen

It is indeed an interesting issue, worth exploring the implicit handling within pandas, as well as adjusting strategies. I will continue to keep an eye on it. The following is just a method of avoidance.

The array p is a two-dimensional array, which will cause the issue. If you want to add each sub-array in the p array as independent rows to the 'p' column, you should initialize it directly when creating the DataFrame, instead of adding the column afterwards.

import pandas as pd
import numpy as np

p = np.array([
    [1, 2, 3],
    [1, 2, 3]
])

# Create a DataFrame containing the 'p' array and initialize the 'n' column
df = pd.DataFrame({
    'n': [0, 0],
    'p': list(map(list, p))  # Convert each sub-array to a list and assign as independent rows
})

print(df)
   n          p
0  0  [1, 2, 3]
1  0  [1, 2, 3]

In this corrected code, list(map(list, p)) is converting each sub-array in the two-dimensional NumPy array to a list. Then, we can assign these lists as independent rows to the 'p' column of the DataFrame.

Comment From: kgourgou

Thank you, @TinusChen, that is reasonable.

The context for this issue is that I was inspecting a codebase that was accidentally passing an Nx2 array instead of an Nx1 to the frame. Now because no warning was raised and because only the first column of the array was assigned, nobody was the wiser to what was happening.

Comment From: phofl

I think I agree with @jbrockmendel