Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
To reproduce it:
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3]})
lst = np.split(df, 3)
Issue Description
The above code raises a FutureWarning:
FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.
As far as I understand, np.split
uses np.swapaxes
which is raising this warning.
Expected Behavior
Not show a warning.
Installed Versions
Comment From: VISWESWARAN1998
swapaxes
is called from shape_base.py in numpy package
Comment From: rhshadrach
Thanks for the report - is there a reason you prefer to use np.swapaxes(df)
over df.T
?
Comment From: amanlai
@rhshadrach the use case here is not np.swapaxes(df)
itself, it's np.split(df)
, which apparently uses np.swapaxes
under the hood.
Comment From: rhshadrach
Ah, thanks @amanlai. On main, DataFrame.swapaxes
has been removed and the OP gives the output:
[array([[1]]), array([[2]]), array([[3]])]
On 2.2.x, I am seeing
[ a
0 1, a
1 2, a
2 3]
cc @jorisvandenbossche @phofl @mroeschke
Comment From: rhshadrach
Marking this as a blocker for 3.0 so a decision is made.
Comment From: Aloqeely
I don't think many people use np.split
on a DataFrame to consider reverting this deprecation. (Stackoverflow question did not have many interactions)
And as stated in numpy/numpy#24889 (comment) it's possible to calculate the start/stop slices and then manually slicing using df.iloc[start:stop]
Comment From: WillAyd
I agree with @Aloqeely. Is there an upstream issue for NumPy on this topic? I think should be resolved there and use df.T
like @rhshadrach suggests
Comment From: WillAyd
Ah ignore my previous comment - I thought they were calling our swapaxes implementation but misread the OP. Assuming they call swapaxes generically for their internal use, so not as easy as changing the call
Even still I don't think we should revert this deprecation
Comment From: Aloqeely
Is it sensible to implement a DataFrame.split
function for convenience? Since np.split
doesn't work appropriately on DataFrames anymore.
I can work on it next week if you are all ok with it.
Comment From: jorisvandenbossche
Is there an upstream issue for NumPy on this topic?
https://github.com/numpy/numpy/issues/24889#issuecomment-1895503977
Comment From: WillAyd
Is it sensible to implement a
DataFrame.split
function for convenience? Sincenp.split
doesn't work appropriately on DataFrames anymore.
I don't think so - generally our goal is to reduce the footprint of our API, and I don't see this as a huge value add over the other method you have suggested:
And as stated in numpy/numpy#24889 (comment) it's possible to calculate the start/stop slices and then manually slicing using
df.iloc[start:stop]
Comment From: ddelange
Is it sensible to implement a
DataFrame.split
function for convenience? Sincenp.split
doesn't work appropriately on DataFrames anymore.I can work on it next week if you are all ok with it.
+1 :+1: splitting a dataframe into equally sized chunks (except for the trailing chunk) is a routine task in ML and other batching applications.
requiring end-users to replace np.split/np.array_split with some bespoke iloc
helper function sounds to me like a lot of duplicated boilerplate, wasted brain cycles over time, and overall increased bug surface for the ecosystem
Comment From: ddelange
another big reason to use np.array_split is the optional axis=1
argument, which is probably why it is calling np.swapaxes
in the first place.