df = pd.DataFrame(
    {
        "B": [1, None, 3],
        "C": pd.array([1, None, 3], dtype="Int64"),
    }
)
result = df.skew()

>>> result
B    <NA>
C    <NA>
dtype: Float64

>>> df[["B"]].skew()
B   NaN
dtype: float64

Based on test_mixed_reductions. The presence of column "C" shouldn't affect the result we get for column "B".

Comment From: rhshadrach

Expected Behavior

NA

I would expect Float64 dtype with NaN and NA. One might also argue object, but so far it appears we coerce to the nullable dtypes.

df = pd.DataFrame(
    {
        "B": [1, 2, 3],
        "C": pd.array([1, 2, 3], dtype="Float64"),
    }
)
print(df.sum())
# B    6.0
# C    6.0
# dtype: Float64

Comment From: jbrockmendel

Hah, in the expected behavior section i put "NA" to mean "I'm not writing anything here", not pd.NA.

Comment From: jorisvandenbossche

I would say that the current behaviour is fine. We indeed currently coerce to the nullable dtype for the result if there are mixed nullable and non-nullable columns, and at that point converting NaN to NA seems the correct thing to do (if you would first cast the original non-nullable column to its nullable dtype, you would get the same result)

Comment From: jbrockmendel

I would say that the current behaviour is fine

It's fine in a never-distinguish world. We're currently in a sometimes-distinguish world in which I had previously thought the rule was "distinguish when NaNs are introduced through operations". We can make that rule more complicated "distinguish when NaNs are introduced through operations but not reductions", but I do think that makes it a worse rule.

if you would first cast the original non-nullable column to its nullable dtype

In this case the user specifically didn't do that.

Comment From: rhshadrach

We indeed currently coerce to the nullable dtype for the result if there are mixed nullable and non-nullable columns

Should coercion to Float64 change NaN to NA?

ser = pd.Series([np.nan, pd.NA], dtype="object")
print(ser.astype("Float64"))
# 0    <NA>
# 1    <NA>
# dtype: Float64

Comment From: jbrockmendel

Should coercion to Float64 change NaN to NA?

That is expected behavior ATM, yes. In an always-distinguish world it would not (xref #62040)

Comment From: sharkipelago

take

Comment From: jorisvandenbossche

@sharkipelago this is not the best issue to try to tackle at this point, as there is not yet a clear action to be taken

Comment From: jorisvandenbossche

if you would first cast the original non-nullable column to its nullable dtype

In this case the user specifically didn't do that.

The user indeed did not specifically cast themselves, but if you do a reduction operation on a DataFrame with multiple columns (and multiple dtypes), you are always doing an implicit cast. Also if you have an integer and float column, the integer result gets cast to float.

If you consider a reduction on a DataFrame as reducing each column separately, ad then a concat of the results, then again the current behaviour is kind of the expected behaviour, because concatting float64 and Float64 gives Float64, converting NaNs to NA:

>>> pd.concat([pd.Series([np.nan], dtype="float64"), pd.Series([pd.NA], dtype="Float64")])
0    <NA>
0    <NA>
dtype: Float64

Here the float64 series gets cast to the common dtype, i.e. Float64, and casting float64 to nullable float64 converts NaNs to NA. You can of course question this casting behaviour, but as you mention in the comment above, this is the expected behaviour at the moment.

Comment From: sharkipelago

@sharkipelago this is not the best issue to try to tackle at this point, as there is not yet a clear action to be taken

Understood. Thanks for letting me know. I had mistakenly thought the behaviour should be updated so that (for the skew example above) column "B" always becomes NaN regardless of if column "C" is present in the DataFrame. Is there a way to un-assign myself?

Comment From: rhshadrach

Agree with OP that users will find it surprising that the presence of another column will change the result. But unless we're going to go to object dtype this coercion needs to be a part of general reductions since we go from horizontal to vertical. So I think the "right" user expectation is for reducers to compute and then coerce as necessary. Assuming this, current behavior is correct.

I do find coercion converting NaN to NA surprising but that's for a separate issue.

Comment From: jbrockmendel

Since the current behavior is "correct", i'm updating the title from BUG to API. I maintain that if I was surprised, users will be surprised.

Comment From: jorisvandenbossche

The solution to all this surprising behaviour is of course to eventually only have dtypes that use NA for missing data, so we don't run in those "mixed" dataframes with some NA-based columns and some NaN-based columns (or I think as someone suggested in an earlier related discussion, prohibit such mixed dataframes for now until we get there, but personally I don't think that is going to be practical)

Comment From: jbrockmendel

The solution to all this surprising behaviour is of course to eventually only have dtypes that use NA

That'll be great eventually. I would also consider the issue solved if we got to either "always distinguish" or "never distinguish" so there wouldn't be any confusion as to when we distinguish.