In[2]: import pandas as pd
...: import numpy as np
...: pd.__version__
Out[2]: u'0.23.4'
In[3]: ts = pd.Series([np.nan, 1., 2., 3., np.nan, 4., np.nan])
In[4]: ts.pct_change(fill_method = None)
Out[4]:
0 NaN
1 NaN
2 1.0
3 0.5
4 NaN
5 NaN
6 NaN
dtype: float64
In[5]: ts.pct_change(fill_method = 'pad')
Out[5]:
0 NaN
1 NaN
2 1.000000
3 0.500000
4 0.000000
5 0.333333
6 0.000000
dtype: float64
In[6]: ts.pct_change(fill_method = 'pad').mask(ts.isnull())
Out[6]:
0 NaN
1 NaN
2 1.000000
3 0.500000
4 NaN
5 0.333333
6 NaN
dtype: float64
Hello,
After recently updating my version, I noticed a change in behavior of pct_change with missing data. This is related to https://github.com/pandas-dev/pandas/issues/19873 .
First example without fill_method is as expected. The second example is the result now and the third is what it used to be. I think the user should be able to choose if she prefers the second or third behavior. I agree that the second example is correct, as it forward fills as expected, but if the time series is a stock price for example, returns on missing days (holidays) were not 0, which can bias some statistics.
I would suggest adding a new parameter, like skipna. I could not find any solution with existing parameters, if I missed something please let me know.
Thanks
Comment From: WillAyd
Thanks for the issue and clear example. I think skipna
could make sense as an argument here. PRs are always welcome!
Comment From: albertvillanova
@WillAyd and what should be the expected result for a DataFrame with non-aligned NaNs?
df = pd.DataFrame({'a': [np.nan, 1., 2., 3., np.nan, 4., np.nan],
'b': [np.nan, np.nan, 1., 2., 3., np.nan, 4.]})
Comment From: WillAyd
@albertvillanova not sure I understand the distinction you are trying to make; this should work against each series in a DataFrame individually
Comment From: jbrockmendel
fill_method was deprecated in 2.1, closing.