Pandas BUG: FutureWarning when splitting a dataframe using np.split

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

To reproduce it:

import numpy as np
import pandas as pd

df = pd.DataFrame({'a': [1, 2, 3]})
lst = np.split(df, 3)

Issue Description

The above code raises a FutureWarning:

FutureWarning: 'DataFrame.swapaxes' is deprecated and will be removed in a future version. Please use 'DataFrame.transpose' instead.

As far as I understand, np.split uses np.swapaxes which is raising this warning.

Expected Behavior

Not show a warning.

Installed Versions

python : 3.11.5 pandas : 2.2.0 numpy : 1.26.3

Comment From: VISWESWARAN1998

swapaxes is called from shape_base.py in numpy package

Comment From: rhshadrach

Thanks for the report - is there a reason you prefer to use np.swapaxes(df) over df.T?

Comment From: amanlai

@rhshadrach the use case here is not np.swapaxes(df) itself, it's np.split(df), which apparently uses np.swapaxes under the hood.

Comment From: rhshadrach

Ah, thanks @amanlai. On main, DataFrame.swapaxes has been removed and the OP gives the output:

[array([[1]]), array([[2]]), array([[3]])]

On 2.2.x, I am seeing

[   a
0  1,    a
1  2,    a
2  3]

cc @jorisvandenbossche @phofl @mroeschke

Comment From: rhshadrach

Marking this as a blocker for 3.0 so a decision is made.

Comment From: Aloqeely

I don't think many people use np.split on a DataFrame to consider reverting this deprecation. (Stackoverflow question did not have many interactions)

And as stated in numpy/numpy#24889 (comment) it's possible to calculate the start/stop slices and then manually slicing using df.iloc[start:stop]

Comment From: WillAyd

I agree with @Aloqeely. Is there an upstream issue for NumPy on this topic? I think should be resolved there and use df.T like @rhshadrach suggests

Comment From: WillAyd

Ah ignore my previous comment - I thought they were calling our swapaxes implementation but misread the OP. Assuming they call swapaxes generically for their internal use, so not as easy as changing the call

Even still I don't think we should revert this deprecation

Comment From: Aloqeely

Is it sensible to implement a DataFrame.split function for convenience? Since np.split doesn't work appropriately on DataFrames anymore.

I can work on it next week if you are all ok with it.

Comment From: jorisvandenbossche

Is there an upstream issue for NumPy on this topic?

https://github.com/numpy/numpy/issues/24889#issuecomment-1895503977

Comment From: WillAyd

Is it sensible to implement a DataFrame.split function for convenience? Since np.split doesn't work appropriately on DataFrames anymore.

I don't think so - generally our goal is to reduce the footprint of our API, and I don't see this as a huge value add over the other method you have suggested:

And as stated in numpy/numpy#24889 (comment) it's possible to calculate the start/stop slices and then manually slicing using df.iloc[start:stop]

Comment From: ddelange

Is it sensible to implement a DataFrame.split function for convenience? Since np.split doesn't work appropriately on DataFrames anymore.

I can work on it next week if you are all ok with it.

+1 :+1: splitting a dataframe into equally sized chunks (except for the trailing chunk) is a routine task in ML and other batching applications.

requiring end-users to replace np.split/np.array_split with some bespoke iloc helper function sounds to me like a lot of duplicated boilerplate, wasted brain cycles over time, and overall increased bug surface for the ecosystem

Comment From: ddelange

another big reason to use np.array_split is the optional axis=1 argument, which is probably why it is calling np.swapaxes in the first place.