Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
On pandas < 2.1 (e.g. 1.5.3, 2.0.3):
import pandas as pd
pd.Series(['a'], name='hi').to_pickle('G:/temp/test.pkl')
On pandas 2.3.0 and main:
import pandas as pd
ser = pd.read_pickle('G:/temp/test.pkl') # appears to work
ser2 = pd.Series(['a'], name='hi') # works
pd.testing.assert_series_equal(ser, ser2) # works
pd.testing.assert_series_equal(ser, ser.copy()) # Attribute "name" are different
Issue Description
In doing a migration from 1.5.3 to the 2.x series we hit an issue where copying an unpickled Series drops its name (the actual operation was a .reindex_like
, which called .copy
under the hood). The bug begins with the pandas 2.1 series; I believe this may have been introduced in #51784 when the Series metadata was changed from name to _name.
Expected Behavior
It seems like an unpickled Series and its copy should be equal in all attributes, since that's what .copy does. However anything which does a copy (including implicit copies, such as calling .reindex()
) currently causes the name to be dropped inadvertently.
Now I'm not sure to what extent read_pickle
guarantees that all actions on an unpickled legacy object work the same way on a newly-created object. That said, one reason this may be worth fixing is that the problem seems to persist in new versions, i.e. rewriting the pickle with the new version directly doesn't mitigate the problem:
# using version 2.3.0
# read legacy pickle
ser = pd.read_pickle('G:/temp/test.pkl')
# write out new pickle of the object
ser.to_pickle('G:/temp/ser_copy.pkl')
# read in new pickle
ser_copy = pd.read_pickle('G:/temp/ser_copy.pkl')
pd.testing.assert_series_equal(ser, ser_copy) # works
pd.testing.assert_series_equal(ser_copy, ser_copy.copy()) # fails, even though ser_copy is read in from a pickle created in 2.3.0)
And of course obviously calling ser.copy() to get a new pandas 2.3 object also does not work.
Thus it seems the only workaround to: 1) Read in the legacy pickle 2) Serialize the legacy pickle to some other format 3) Deserialize the other format 4) Serialize the newly-created object as a replacement pickle