Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

df = pd.DataFrame({"x": [1], "y": [4]})
df.sum(numeric_only=None)

Issue Description

In 2.0 we changed the default to False, so no need to allow anything else except True/False.

Expected Behavior

We should raise an error

cc @rhshadrach thoughts?

Installed Versions

Replace this line with the output of pd.show_versions()

Comment From: topper-123

DataFrame.sumjust treats the None as a False-like value AFAICS, which is ok IMO.

Also, most other parameters in DataFrame.sum (name, skipna) are not type checked either AFAICS (outside of mypy), and we don't type check generally outside of mypy.

I'd prefer to not type check this parameter.

Comment From: phofl

It's not only about None, its more like

df = DataFrame({"a": [1, 2, 3], "B": 1}) df.sum(numeric_only=["a"])

Which should raise

Comment From: topper-123

Boolean parameters are used in a lot of methods, then we should check for this everywhere in the public interfaces? Or do you see something special with this case?

Comment From: rhshadrach

I don't think it's a good idea to check that everything a user passes agrees with our API documentation. This would be a lot of checks to maintain. It would add what I think as unnecessary overhead - in my opinion it's, in general, the user's responsibility to ensure they are calling the functions in a supported fashion.

In the particular case of Boolean arguments, Python has the concept of truthy and falsey and some users, reasonably in my opinion, may expect pandas to adhere to this. While I personally do not like this style, users may have code:

# Only use numeric_only when string columns are present
string_columns: list[str]
df.sum(numeric_only=string_columns)

Comment From: jbrockmendel

In the numeric_only case i think there is a real danger that users pass the no-longer-supported None thinking it behaves different from False.

Comment From: rhshadrach

I think that makes sense in this case, but I am wondering if we should have a standard way of handling this across methods. Do we want to leave the check of not None (or is_bool?) for only a certain time? For example, we could leave the check in for all of 2.x and remove in 3.x. Or is there a thought that we should be checking all Boolean (and other?) arguments?

Comment From: topper-123

In the numeric_only case i think there is a real danger that users pass the no-longer-supported None thinking it behaves different from False.

But then users should have gotten deprecation warnings in 1.x and if they fixed that they should be clear. Nothin has changed since v2.0.0, correct? We have a lot of boolean parameters and unless something has changed in v2.x, I don't see this as a bigger problem than other locations.

A compromise suggestion could be to add a warning for the duration of the 2.x cycle.