Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [ ] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# the data file used in the example https://1drv.ms/u/s!AnPL7Q5hAP8rk12MkTUQZs2RnVwv?e=gmZMFI (CSV file stored in MS OneDrive)

import pandas as pd
pd.set_option('display.precision',12)
series = pd.read_csv(r"E:\pd_rollingstd_issue.csv", index_col=[0], parse_dates=["date"]).loc[:, '0']
series.loc["2017-05-20":"2017-06-10"]
date
2017-05-22    10.0242441
2017-05-23    10.0242441
2017-05-24    10.0242441
2017-05-25    10.0242441
2017-05-26    10.0242441
2017-05-31    10.0242441
2017-06-01    10.0242436
2017-06-02    10.0242436
2017-06-05    10.0242436
2017-06-06    10.0242436
2017-06-07    10.0242436
2017-06-08    10.0242436
2017-06-09    10.0242436
Name: 0, dtype: float64
series.loc["2017-05-20":"2017-06-10"].rolling(5).std()
date
2017-05-22               NaN
2017-05-23               NaN
2017-05-24               NaN
2017-05-25               NaN
2017-05-26    0.000000000000
2017-05-31    0.000000000000
2017-06-01    0.000000223607
2017-06-02    0.000000273861
2017-06-05    0.000000273861
2017-06-06    0.000000223607
2017-06-07    0.000000000000
2017-06-08    0.000000000000
2017-06-09    0.000000000000
Name: 0, dtype: float64
series.rolling(5).std().loc["2017-05-20":"2017-06-10"]
date
2017-05-22    0.0
2017-05-23    0.0
2017-05-24    0.0
2017-05-25    0.0
2017-05-26    0.0
2017-05-31    0.0
2017-06-01    0.0
2017-06-02    0.0
2017-06-05    0.0
2017-06-06    0.0
2017-06-07    0.0
2017-06-08    0.0
2017-06-09    0.0
Name: 0, dtype: float64

Issue Description

As one can see in the example, rolling std with window size 5 generates non-zero number between 2017-06-01 and 2017-06-06 when we use part of the series, which is correct as we see in source series value slightly changes on 2017-06-01. However, when using the whole series, it generates all 0 number, which is not correct.

Expected Behavior

With same number in the 5-length window, the whole series should generate same non-zero result for dates between 2017-06-01 and 2017-06-06.

Installed Versions

INSTALLED VERSIONS ------------------ commit : e86ed377639948c64c429059127bcf5b359ab6be python : 3.9.16.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.19045 machine : AMD64 processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.936 pandas : 2.1.1 numpy : 1.24.2 pytz : 2022.7.1 dateutil : 2.8.2 setuptools : 65.6.3 pip : 22.3.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.2 html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.10.0 pandas_datareader : None bs4 : 4.11.2 bottleneck : None dataframe-api-compat: None fastparquet : None fsspec : None gcsfs : None matplotlib : 3.7.0 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

Comment From: PiotrekB416

take

Comment From: PiotrekB416

Could you recheck if the problem exists. I just checked and got this output

import pandas as pd
pd.set_option('display.precision',12)
series = pd.read_csv(r"./test.csv", index_col=[0], parse_dates=["date"]).loc[:,'0']
series.loc["2017-05-20":"2017-06-10"]
date
2017-05-22    10.0242441
2017-05-23    10.0242441
2017-05-24    10.0242441
2017-05-25    10.0242441
2017-05-26    10.0242441
2017-05-31    10.0242441
2017-06-01    10.0242436
2017-06-02    10.0242436
2017-06-05    10.0242436
2017-06-06    10.0242436
2017-06-07    10.0242436
2017-06-08    10.0242436
2017-06-09    10.0242436
Name: 0, dtype: float64
series.loc["2017-05-20":"2017-06-10"].rolling(5).std()
date
2017-05-22               NaN
2017-05-23               NaN
2017-05-24               NaN
2017-05-25               NaN
2017-05-26    0.000000000000
2017-05-31    0.000000000000
2017-06-01    0.000000223607
2017-06-02    0.000000273861
2017-06-05    0.000000273861
2017-06-06    0.000000223607
2017-06-07    0.000000000000
2017-06-08    0.000000000000
2017-06-09    0.000000000000
Name: 0, dtype: float64
series.rolling(5).std().loc["2017-05-20":"2017-06-10"]
date
2017-05-22               NaN
2017-05-23               NaN
2017-05-24               NaN
2017-05-25               NaN
2017-05-26    0.000000000000
2017-05-31    0.000000000000
2017-06-01    0.000000223607
2017-06-02    0.000000273861
2017-06-05    0.000000273861
2017-06-06    0.000000223607
2017-06-07    0.000000000000
2017-06-08    0.000000000000
2017-06-09    0.000000000000
Name: 0, dtype: float64

Comment From: qianyun210603

@PiotrekB416

Thanks for prompt reply!

I retried on both Linux and Windows, and confirmed the issue exists with latest release. Sorry I had some issue building main branch which I cannot solve quickly.

I suspect you assigned series.loc["2017-05-20":"2017-06-10"] to series somewhere before you run series.rolling(5).std().loc["2017-05-20":"2017-06-10"], otherwise there shouldn't be NaNs at the beginning of the results, as with the whole series participating the rolling, there are enough elements to calculate MA5 for dates from 2017-05-22 to 05-25.

Also maybe it's caused by similar reason of #54380?

Comment From: qianyun210603

Another weird phenomenon I observed is, when you run rolling with different start date, the numbers from 2017-06-01 to 2017-06-06 changes, e.g.

>>> series.loc["2015-05-20":"2017-06-10"].rolling(5).std().loc["2017-05-20":"2017-06-10"]
date
2017-05-22    0.000000000000
2017-05-23    0.000000000000
2017-05-24    0.000000000000
2017-05-25    0.000000000000
2017-05-26    0.000000000000
2017-05-31    0.000000000000
2017-06-01    0.000000504119
2017-06-02    0.000000528333
2017-06-05    0.000000528333
2017-06-06    0.000000504119
2017-06-07    0.000000000000
2017-06-08    0.000000000000
2017-06-09    0.000000000000
Name: 0, dtype: float64

Comment From: PiotrekB416

Could you send the CVS data in the reply. The link didn't work so I just used the data from the issue and I'd like to verify that is not the data that's causing the weird behavior.

Comment From: qianyun210603

Attached please find the whole csv file.

From: Piotr Bartoszewicz @.> Sent: Sunday, October 1, 2023 23:13 To: pandas-dev/pandas @.> Cc: YQ Tsui @.>; Author @.> Subject: Re: [pandas-dev/pandas] BUG: roll_std compute different result when input same data with diffrent length (Issue #55343)

Could you send the CVS data in the reply. The link didn't work so I just used the data from the issue and I'd like to verify that is not the data that's causing the weird behavior.

— Reply to this email directly, view it on GitHubhttps://github.com/pandas-dev/pandas/issues/55343#issuecomment-1742111471, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AEMWMPGUQNASKRBHUQHQR5DX5GB7ZANCNFSM6AAAAAA5OANKRQ. You are receiving this because you authored the thread.Message ID: @.**@.**>>

Comment From: qianyun210603

@PiotrekB416

Replied email with csv attached, but it's a github address, not sure if you can receive it.

Re-uploaded the csv to onedrive, zipped to avoid unwanted conversion by MS. https://1drv.ms/u/s!AnPL7Q5hAP8rk12MkTUQZs2RnVwv?e=gmZMFI

Comment From: qianyun210603

@PiotrekB416 Any finding on this please?