Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas
import numpy as np
p = np.array([
[1,2,3],
[1,2,3]
])
df = pandas.DataFrame(columns=['n', 'p'])
df['n'] = np.array([0,0])
df['p'] = p
print(df)
n p
0 0 1
1 0 1
Issue Description
Passing the columns as separate arrays, but one of the arrays has the wrong dimensions Nx2 instead of Nx1. In that case, the dataframe column 'p' can be assigned that Nx2 array, but only the first column of the array is actually assigned to 'p'.
While this is hard to happen by accident, it's not impossible.
Expected Behavior
Either store each row of the array to the corresponding row of the dataframe or raise a warning/error for trying to store a NxM array as a column.
See for example:
import pandas
import numpy as np
p = [
[1,2],
[1,2]
]
df = pandas.DataFrame(columns=['n', 'p'])
df['n'] = [0,0]
df['p'] = p
print(df)
n p
0 0 [1, 2]
1 0 [1, 2]
Installed Versions
Comment From: rhshadrach
Thanks for the report. The current behavior does look problematic to me, and the expected behavior seems reasonable, but I think this needs some more discussion.
Comment From: jbrockmendel
Should raise
Comment From: TinusChen
It is indeed an interesting issue, worth exploring the implicit handling within pandas
, as well as adjusting strategies. I will continue to keep an eye on it. The following is just a method of avoidance.
The array p
is a two-dimensional array, which will cause the issue. If you want to add each sub-array in the p
array as independent rows to the 'p' column, you should initialize it directly when creating the DataFrame
, instead of adding the column afterwards.
import pandas as pd
import numpy as np
p = np.array([
[1, 2, 3],
[1, 2, 3]
])
# Create a DataFrame containing the 'p' array and initialize the 'n' column
df = pd.DataFrame({
'n': [0, 0],
'p': list(map(list, p)) # Convert each sub-array to a list and assign as independent rows
})
print(df)
n p
0 0 [1, 2, 3]
1 0 [1, 2, 3]
In this corrected code, list(map(list, p))
is converting each sub-array in the two-dimensional NumPy
array to a list. Then, we can assign these lists as independent rows to the 'p' column of the DataFrame.
Comment From: kgourgou
Thank you, @TinusChen, that is reasonable.
The context for this issue is that I was inspecting a codebase that was accidentally passing an Nx2 array instead of an Nx1 to the frame. Now because no warning was raised and because only the first column of the array was assigned, nobody was the wiser to what was happening.
Comment From: phofl
I think I agree with @jbrockmendel