Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"x": pd.Categorical(["a", "b"]), "y": [1, 2]})
df.groupby("x")
Issue Description
As of #51811 this fires a FutureWarning
because the default value for observed
is going to change.
It seems like this is one of several pending deprecations in 2.1.0 to simplify / sand off the API.
Expected Behavior
(Sorry this is not exactly a bug but the other issue categories didn't work either)
Please consider introducing such deprecations with a DeprecationWarning
rather than a FutureWarning
so that downstream packages have a chance to adjust their internal calls to the Pandas API. Otherwise, users who update their pandas will immediate see confusing and non-actionable FutureWarning
messages coming out of other libraries.
Pandas is often used interactively and so I agree that it's important to promote this to a more-visible FutureWarning
at some point, but this could happen after a few release cycles.
Installed Versions
2.1.0
Comment From: mwaskom
Bumping this as I am getting nearly daily false issue reports about these warnings (and doubt that my package is the only one affected).
Comment From: lithomas1
Sure, delaying the FutureWarning by 1 release seems reasonable to me.
Assuming this is for seaborn, I noticed you aren't testing against pandas nightly. (Ideally, it's best to have these kinds of things tracked before the release.) Do you want me to open a PR for this?
Comment From: mwaskom
Having nightly tests could be useful but not sufficient for this issue — I don't have resources to drop what I'm doing and cut a new release just because pandas has some breaking changes coming down the pike. DeprecationWarnings would be surfaced in tests but not to end users.
Comment From: mwaskom
(To be clear, pandas devs were very helpful about proactively sending PRs which was much appreciated; but seaborn wasn't able to get a release out before pandas did, and so now users are affected).
Comment From: lithomas1
Shoot, this somehow slipped off of my radar for 2.1.1. Sorry about that!
I will tag this as a blocker and put up a PR today so this doesn't get lost again.
Comment From: rhshadrach
Mea cupla.
While I don't agree that all deprecations need to go through the DeprecationWarning -> FutureWarning cycle, this is one that should have. It is significant enough, and we could have done a DeprecationWarning in 2.1.0 followed by a FutureWarning in 2.2.0. I'll keep this in mind for the future.
Does changing this to a DeprecationWarning in 2.1.2 actually help things? I'm afraid it might just make things more confusing, and since users might be using 2.1.0 or 2.1.1 it doesn't actually help downstream packages.
Comment From: mwaskom
FWIW this one is actually causing the most problems in seaborn.
I appreciate that there's some maintenance overhead to tracking the warning escalation but the alternative is to leave downstream libraries somewhat in the lurch in terms of issuing un-actionable warnings to their users.
I don't have a strong opinion about whether you need to roll back these specific changes, this issue is more to advocate for more gradual deprecation going forward.
Comment From: rhshadrach
Thanks for the feedback here @mwaskom. If you'd like consideration about doing something about #35793, I think it'd be good to open a new issue.
Comment From: mwaskom
👍 I’ve done a seaborn release with the adaptations so I don’t think pandas needs to walk back any of the deprecations
Comment From: jorisvandenbossche
As Richard also mentioned above, we actually do have some policy (or at least habit, but just don't apply it consistently) to start with DeprecationWarnings, but specifically for those deprecations that we assume mostly affect libraries and less end-users, exactly to avoid end users seeing them through usage of a library they can't do anything about.
(I actually changed some of those to DeprcationWarning before the release, like https://github.com/pandas-dev/pandas/pull/53994 and https://github.com/pandas-dev/pandas/pull/54800, but missed the is_.._dtype
ones)
FWIW this one is actually causing the most problems in seaborn.
I think the case of is_categorical_dtype
(and the other is_..
functions) is actually a good example of something that will typically be used more by libraries and less by end users in their scripts/notebooks. So that should have been a candidate of using DeprecationWarning.
While this is now fixed in seaborn, it might still be relevant for other libraries that haven't yet been updated. So personally I think it might still be worth switching to DeprecationWarning for those. Opened a PR for that -> https://github.com/pandas-dev/pandas/pull/55703
Comment From: mwaskom
IMO even if a change (like the groupby observed default) is likely to affect end users, starting with a deprecation warning so that downstream libraries have time to get out updated code is still important. Users (of those downstream libraries, but hence also pandas) are still negatively affected when they start seeing unactionable warnings. The fact that python warnings come with relatively little context mean it’s often not obvious to the end user that that have done nothing wrong / don’t have anything they need to change, aside from the warnings being annoying in a notebook or other interactive context.
Comment From: mwaskom
Ping on this one again. It looks like (based on the logs for running seaborn tests against 2.2.0rc0), 2.2 deprecates several behaviors while jumping straight to a FutureWarning
.
There are many libraries downstream of pandas and I really think that this degrades the overall experience for everyone in the ecosystem!
Comment From: lithomas1
Ping on this one again. It looks like (based on the logs for running seaborn tests against 2.2.0rc0), 2.2 deprecates several behaviors while jumping straight to a
FutureWarning
.There are many libraries downstream of pandas and I really think that this degrades the overall experience for everyone in the ecosystem!
The next release in pandas will be pandas 3.0. I don't think there's a window to have a DeprecationWarning -> FutureWarning change (as there'll be no more minor release before 3.0).
Just to clarify, are these the warning you're talking about:
When using seaborn with the new 2.2.0rc0 release candidate, it throws two FutureWarnings:
C:\Users\Ewout.virtualenvs\Py312\Lib\site-packages\seaborn_base.py:949: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass
(name,)
instead ofname
to silence this warning. data_subset = grouped_data.get_group(pd_key)C:\Users\Ewout.virtualenvs\Py312\Lib\site-packages\seaborn\relational.py:293: FutureWarning: DataFrameGroupBy.apply operated on the grouping columns. This behavior is deprecated, and in a future version of pandas the grouping columns will be excluded from the operation. Either pass
include_groups=False
to exclude the groupings or explicitly select the grouping columns after groupby to silence this warning. sub_data = grouped.apply(agg, other).reset_index()
Looking at these, the first one seems OK to have gone straight to a FutureWarning.
I think we could've gone with a DeprecationWarning for the second though (include_groups=False
doesn't seem to exist in version before 2.2.0, making the warning unavoidable).
cc @rhshadrach
Comment From: rhshadrach
Ping on this one again. It looks like (based on the logs for running seaborn tests against 2.2.0rc0), 2.2 deprecates several behaviors while jumping straight to a FutureWarning.
As stated in my comment above, I do not agree that every deprecation needs to start with a DeprecationWarning before becoming a FutureWarning. And certainly not "after a few release cycles" - we release every 6 months, and so you're talking multiple years at that point to deprecate something.
It sounds like you wish to make it the case in pandas that every deprecation must go through DeprecationWarning -> FutureWarning. I'll reopen this issue for further discussion on this point.
cc @pandas-dev/pandas-core
Comment From: mwaskom
we release every 6 months, and so you're talking multiple years at that point to deprecate something
Pandas is a mature, extremely widely used, base library. "A couple of years to deprecate something" doesn't actually strike me as unreasonable?
I haven't even had the chance to look closely at the nature of the new changes, but the seaborn tests run against the 2.2.0rc0 are showing 3108 warnings, so I gather the impact is not minimal. As another datapoint, the single most visited issue on the seaborn GitHub remains one of the changes mentioned in this tread, despite a compatible version of seaborn having been released nearly four months ago. That's hundreds of people every week having their work disrupted by a warning they don't understand.
You're free of course to take whatever tack you want with pandas development, and I appreciate the desire to streamline the API or remove common sources of confusion. But please remember that each of these changes also has costs for downstream libraries and users.
Comment From: bashtage
It sounds like you wish to make it the case in pandas that every deprecation must go through DeprecationWarning -> FutureWarning. I'll reopen this issue for further discussion on this point.
pandas has probably matured to the point where it needs a formal deprecation policy. IIRC NumPy, which is very old and mature, typically has a 4 release cycle, which is thought to be around 2 years. Ideally, this would be formalised in a PDEP, although I'm not volunteering.
Comment From: bashtage
I haven't even had the chance to look closely at the nature of the new changes, but the seaborn tests run against the 2.2.0rc0 are showing 3108 warnings, so I gather the impact is not minimal. As another datapoint, the single most visited issue on the seaborn GitHub remains one of the changes mentioned in this tread, despite a compatible version of seaborn having been released nearly four months ago. That's hundreds of people every week having their work disrupted by a warning they don't understand.
I have been future proofing statsmodels against warnings in pre-release pandas. It has been taken substantial effort, and many of the changes are pretty nontrivial, e.g., how assignment and dtype alternation have changed.
Comment From: MarcoGorelli
Thanks for bringing this up - I'd never understood the need for FutureWarning
everywhere, so am glad to discuss
Say we have the following in pandas:
def deprecated_function():
warnings.warn('deprecated_function is deprecated', DeprecationWarning, stacklevel=find_stacklevel())
Then
- end users using pd.deprecated_function
directly will see the warning
- if seaborn uses pd.deprecated_function
, then seaborn will see the warning in their test suite
- if a user uses seaborn (which uses pd.deprecated_function
), then the user won't see the warning
This looks good enough to me? I don't see the downside to just switching to using DeprecationWarning
everywhere, nor the need for FutureWarning
As an aside, if you search "python futurewarning" in a new incognito browser, the first result is How to suppress Pandas Future warning?, which might suggest that futurewarnings everywhere may not be the best course of action
Comment From: mwaskom
@MarcoGorelli are you saying that because of the stacklevel
? Do you have a citation for that? I wasn't aware that stacklevel
could control whether a warning is visible to the top of the stack, I've only ever seen it used to control what line the warning gets attributed to (i.e. to say "this is a problem with how you're calling pandas, not a problem within pandas itself").
Comment From: MarcoGorelli
an incorrect stacklevel can result in a warning not being shown (e.g. https://github.com/pola-rs/polars/pull/7796 )
What I'm saying is - so long as the stacklevel is correct (which it typically is in pandas becaus of find_stacklevel
), I don't see why pandas can't just use DeprecationWarning everywhere instead of FutureWarning
Comment From: bashtage
I think at some point you need to make users aware that their code might break (soon) even if the use is through a library. This provides a mechanism to help reporting to the repo of the intermediate package, and can also motivate some community members to work on a patch (e.g., recent commits to pydata/patsy, which is in permanent maintenance mode).
Comment From: mwaskom
Got it @MarcoGorelli; you're saying the stacklevel
ensures that end-users of pandas will see a DeprecationWarning
, not that seaborn or other intermediaries will (i.e in their tests — that should happen regardless of the stacklevel). I agree! Getting that right sounds like the correct path to make it such that libraries and users warnings if and only if they are actionable.
Comment From: Dr-Irv
pandas has probably matured to the point where it needs a formal deprecation policy.
We do have one: https://pandas.pydata.org/docs/development/policies.html
Maybe this discussion would lead to someone suggesting improvements.
Comment From: jorisvandenbossche
I don't see the downside to just switching to using
DeprecationWarning
everywhere, nor the need forFutureWarning
Thanks Marco, I think you bring up some good arguments for (first) using DeprecationWarning for all our deprecations.
I am not 100% sure whether the stacklevel we set is always fool-proof: is it guaranteed to always be correct? (I know the find_stacklevel should help ensuring that)
Additionally, users can also have their own scripts with reusable functions, and also in those cases, they won't see the deprecation warning, because it is not direct usage. So I think that is still a reason to switch to a more visible FutureWarning in a second stage.
I think at some point you need to make users aware that their code might break (soon) even if the use is through a library. This provides a mechanism to help reporting to the repo of the intermediate package
I know this doesn't cover all usage, but just noting that any other project running tests will still see those warnings and can notify upstream (for example, for the case of patsy, any package (or project that has tests) that uses both pandas and patsy will see the warnings and can warn patsy about it).
Personally, I think a process of first using a DeprecationWarning and then later switching to a more visible FutureWarning might be the best compromise. Of course, that does require at least two releases, so deprecating something in the last release before a major release that enforces the deprecations (like 2.2 before 3.0) then doesn't work (now, I think the time between 2.2 and 3.0 is very short anyway, and we might want to reconsider if we can actually already enforce those deprecations after a period of 3-4 months).
Comment From: mwaskom
Yes, a very fast turnaround on a major breaking change seems pretty unfortunate, independent of the main point here about who sees what warnings. FWIW (and you're not obligated to convince me) the argument that the deprecations couldn't have been made more smoothly doesn't strike me as very compelling. Is some external factor forcing the next release to be a "major version" with breaking changes?
Comment From: rhshadrach
I am not 100% sure whether the stacklevel we set is always fool-proof: is it guaranteed to always be correct? (I know the find_stacklevel should help ensuring that)
No - there are cases that the stack goes:
- user code
- pandas
- NumPy
- pandas
I can dig up an example if it might be helpful. This might happen with other third party libraries as well, I'm not sure. In such a case, a warning generated at the bottom of the stack will have the stack level set to the NumPy portion.
Personally, I think a process of first using a DeprecationWarning and then later switching to a more visible FutureWarning might be the best compromise. Of course, that does require at least two releases...
In the future, let's say 3.2 is the last "regular" minor release before 4.0. Could we consider having all deprecations introduced as DeprecationWarnings as part of 3.x, and then half way between 3.2 and 4.0 we release 3.3 (not from main) where all we do is convert all DeprecationWarnings to FutureWarnings?
Comment From: jbrockmendel
If "deprecations being enforced too quickly" is a major pain point (particularly for deprecations introduced in the last minor version before a major version), we could consider going back to the pre-semver days of rolling deprecations. Not advocating.
Comment From: mwaskom
https://stackoverflow.com/questions/77867004/problem-when-running-the-hue-in-python-seaborn
This is a good example IMO of a common user reaction to these warnings. There's really nothing about them that indicates to an end user that they're not doing anything wrong and don't need to spend time thinking about it or asking on Stackoverflow. That's why I think that they degrade user experience across the ecosystem.
Comment From: MarcoGorelli
If FutureWarnings
are deemed necessary at some point, then I'd be fine with saying that the policy should be:
- 3.0, 3.1: OK to introduce DeprecationWarning
- 3.2: DeprecationWarning becomes FutureWarning. No new deprecations allowed
and similarly for future releases
If something doesn't make it in in time, then no worries, you wait 4 months and get it in to the next major release series
Comment From: mwaskom
That seems reasonable but the way you write it suggests that pandas currently has a policy of doing yearly major (i.e. semver-breaking) releases. Is that the case?
Comment From: MarcoGorelli
That's right - I don't know when it was decided though, I just heard it from others 😄
Comment From: mwaskom
That is the sort of thing that might be helpful to advertise to downstream projects, e.g. in the version policy document...
Comment From: jorisvandenbossche
AFAIK we don't have a policy of yearly major releases. The first one after 1.0 (1->2) took about 3 years. It's just that for 2->3.0, we decided we wanted to do it more quickly after 2.0 because we have a set of big breaking changes we want to get out, i.e. everything related to CoW (and also the pyarrow strings). I don't think we explicitly discussed the timeline for 3->4 (or which features would warrant it), although I think that some of the maintainers are assuming (or hoping ;)) we keep a similar faster pace.
Comment From: lithomas1
If
FutureWarnings
are deemed necessary at some point, then I'd be fine with saying that the policy should be:
- 3.0, 3.1: OK to introduce DeprecationWarning
- 3.2: DeprecationWarning becomes FutureWarning. No new deprecations allowed
and similarly for future releases
If something doesn't make it in in time, then no worries, you wait 4 months and get it in to the next major release series
I would be -1 on this, as this would create a pretty large backlog of deprecation PRs and churn to change DeprecationWarnings to FutureWarnings, and doesn't reduce total burden(of maintainers/contributors + downstream), but just pushes it onto contributors as opposed to downstream.
Personally, I think numpy's deprecation policy seems pretty reasonable.
As quote from NEP 23(https://numpy.org/neps/nep-0023-backwards-compatibility.html), they
shall use DeprecationWarning by default, and VisibleDeprecation for changes that need attention again after already having been deprecated or needing extra attention for some reason.
Also,
For backwards incompatible changes that aren’t “deprecate and remove” but for which code will start behaving differently, a FutureWarning should be used.
Based off this a reasonable deprecation policy for us could be: - For things like remove keyword/method -> use DeprecationWarning - We do run into the issue that users wouldn't know that the current version of packages that they use would be incompatible with a future version of pandas, as @bashtage notes above - I think this is avoidable for numpy since they kinda do rolling deprecations, so you can pin your package to < curr version + 3 or something, and not have to worry about a future version of numpy breaking your package. (IIRC, scipy does or used to do this) - For behavior changes -> use FutureWarning
EDIT: As discussed at the dev call, I'm fine with this if Marco is able to abstract away the complexities of changing DeprecationWarning -> FutureWarning.
Comment From: MarcoGorelli
From today's community call, I think there weren't objections to starting with DeprecationWarning and moving to FutureWarning, the main concern was that updating them can be a lot of work
I think it should be feasible to write some helper functions to do it though, which track which version a warning was introduced in, and then move to FutureWarning at the right time. This would need doing for tests too, but should be feasible
Comment From: lithomas1
What if we just had a PandasDeprecationWarning that aliased to FutureWarning/DeprecationWarning and then made all deprecation warnings raise PandasDeprecationWarnings?
(Then, we wouldn't really need helpers, the only work would be to change the alias in the release before a major release. If we wanted to do Irv's idea we could split into e.g. Pandas30DeprecationWarning, and Pandas40DeprecationWarning, and only alias the 3.0 ones.)
Comment From: mwaskom
That's a nice idea (probably better to do it by making the PandasDeprecationWarning
a derived class though; e.g. people might add a warning filter to ignore pandas future warnings they can't control and find timeselves having accidentally ignored all future warnings.
Comment From: rhshadrach
From today's community call, I think there weren't objections to starting with DeprecationWarning and moving to FutureWarning, the main concern was that updating them can be a lot of work
I did not realize how much work this would be in https://github.com/pandas-dev/pandas/pull/56952. There, after changing from FutureWarning to DeprecationWarning, I had to deal with multiple warnings being raised in the test suite that were not an issue when it was a FutureWarning. Perhaps improvements could be made to tm.assert_produces_warning
to avoid this in the future.
Comment From: lithomas1
@MarcoGorelli
Are you working on this? I can try to put up a POC PR with my idea from above (hopefully by end of this week) if you're not working on it.
Comment From: MarcoGorelli
please do go ahead, thanks! your idea sounds a lot simpler than what I had in mind anyway
Comment From: jbrockmendel
@rhshadrach do your recent PRs close this?