Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
df = pd.DataFrame({"x": [1], "y": [4]})
df.sum(numeric_only=None)
Issue Description
In 2.0 we changed the default to False, so no need to allow anything else except True/False.
Expected Behavior
We should raise an error
cc @rhshadrach thoughts?
Installed Versions
Comment From: topper-123
DataFrame.sum
just treats the None
as a False-like value AFAICS, which is ok IMO.
Also, most other parameters in DataFrame.sum
(name
, skipna
) are not type checked either AFAICS (outside of mypy), and we don't type check generally outside of mypy.
I'd prefer to not type check this parameter.
Comment From: phofl
It's not only about None, its more like
df = DataFrame({"a": [1, 2, 3], "B": 1}) df.sum(numeric_only=["a"])
Which should raise
Comment From: topper-123
Boolean parameters are used in a lot of methods, then we should check for this everywhere in the public interfaces? Or do you see something special with this case?
Comment From: rhshadrach
I don't think it's a good idea to check that everything a user passes agrees with our API documentation. This would be a lot of checks to maintain. It would add what I think as unnecessary overhead - in my opinion it's, in general, the user's responsibility to ensure they are calling the functions in a supported fashion.
In the particular case of Boolean arguments, Python has the concept of truthy and falsey and some users, reasonably in my opinion, may expect pandas to adhere to this. While I personally do not like this style, users may have code:
# Only use numeric_only when string columns are present
string_columns: list[str]
df.sum(numeric_only=string_columns)
Comment From: jbrockmendel
In the numeric_only case i think there is a real danger that users pass the no-longer-supported None thinking it behaves different from False.
Comment From: rhshadrach
I think that makes sense in this case, but I am wondering if we should have a standard way of handling this across methods. Do we want to leave the check of not None
(or is_bool
?) for only a certain time? For example, we could leave the check in for all of 2.x and remove in 3.x. Or is there a thought that we should be checking all Boolean (and other?) arguments?
Comment From: topper-123
In the numeric_only case i think there is a real danger that users pass the no-longer-supported None thinking it behaves different from False.
But then users should have gotten deprecation warnings in 1.x and if they fixed that they should be clear. Nothin has changed since v2.0.0, correct? We have a lot of boolean parameters and unless something has changed in v2.x, I don't see this as a bigger problem than other locations.
A compromise suggestion could be to add a warning for the duration of the 2.x cycle.