Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
idx1 = pd.date_range("2024-01-01", periods=24*12, freq="5min", unit="us")
idx2 = pd.date_range("2024-01-02", periods=24*12, freq="5min", unit="us")
idx3 = pd.date_range("2024-01-03", periods=24*12, freq="5min", unit="us")
ts1 = pd.Series(range(len(idx1)), index=idx1)
ts2 = pd.Series(range(len(idx2)), index=idx2)
ts3 = pd.Series(range(len(idx3)), index=idx3)
df = pd.concat([ts1, ts2, ts3], axis=1)
print(ts1.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00',
# '2024-01-01 00:10:00', '2024-01-01 00:15:00',
# '2024-01-01 00:20:00', '2024-01-01 00:25:00',
# '2024-01-01 00:30:00', '2024-01-01 00:35:00',
# '2024-01-01 00:40:00', '2024-01-01 00:45:00',
# ...
# '2024-01-01 23:10:00', '2024-01-01 23:15:00',
# '2024-01-01 23:20:00', '2024-01-01 23:25:00',
# '2024-01-01 23:30:00', '2024-01-01 23:35:00',
# '2024-01-01 23:40:00', '2024-01-01 23:45:00',
# '2024-01-01 23:50:00', '2024-01-01 23:55:00'],
# dtype='datetime64[us]', length=288, freq='5min')
print(ts2.index)
# DatetimeIndex(['2024-01-02 00:00:00', '2024-01-02 00:05:00',
# '2024-01-02 00:10:00', '2024-01-02 00:15:00',
# '2024-01-02 00:20:00', '2024-01-02 00:25:00',
# '2024-01-02 00:30:00', '2024-01-02 00:35:00',
# '2024-01-02 00:40:00', '2024-01-02 00:45:00',
# ...
# '2024-01-02 23:10:00', '2024-01-02 23:15:00',
# '2024-01-02 23:20:00', '2024-01-02 23:25:00',
# '2024-01-02 23:30:00', '2024-01-02 23:35:00',
# '2024-01-02 23:40:00', '2024-01-02 23:45:00',
# '2024-01-02 23:50:00', '2024-01-02 23:55:00'],
# dtype='datetime64[us]', length=288, freq='5min')
print(ts3.index)
# DatetimeIndex(['2024-01-03 00:00:00', '2024-01-03 00:05:00',
# '2024-01-03 00:10:00', '2024-01-03 00:15:00',
# '2024-01-03 00:20:00', '2024-01-03 00:25:00',
# '2024-01-03 00:30:00', '2024-01-03 00:35:00',
# '2024-01-03 00:40:00', '2024-01-03 00:45:00',
# ...
# '2024-01-03 23:10:00', '2024-01-03 23:15:00',
# '2024-01-03 23:20:00', '2024-01-03 23:25:00',
# '2024-01-03 23:30:00', '2024-01-03 23:35:00',
# '2024-01-03 23:40:00', '2024-01-03 23:45:00',
# '2024-01-03 23:50:00', '2024-01-03 23:55:00'],
# dtype='datetime64[us]', length=288, freq='5min')
print(df.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-02 00:00:00',
# '2024-01-03 00:00:00', '2024-01-03 00:05:00',
# '2024-01-03 00:10:00', '2024-01-03 00:15:00',
# '2024-01-03 00:20:00', '2024-01-03 00:25:00',
# '2024-01-03 00:30:00', '2024-01-03 00:35:00',
# ...
# '2024-01-03 23:20:00', '2024-01-03 23:25:00',
# '2024-01-03 23:30:00', '2024-01-03 23:35:00',
# '2024-01-03 23:40:00', '2024-01-03 23:45:00',
# '2024-01-03 23:50:00', '2024-01-03 23:55:00',
# '2024-01-04 11:20:00', '2024-01-05 11:20:00'],
# dtype='datetime64[us]', length=292, freq=None)
print(df.to_string())
# 0 1 2
# 2024-01-01 00:00:00 0.0 NaN NaN
# 2024-01-02 00:00:00 NaN 0.0 NaN
# 2024-01-03 00:00:00 NaN NaN 0.0
# 2024-01-03 00:05:00 NaN NaN 1.0
# 2024-01-03 00:10:00 NaN NaN 2.0
# 2024-01-03 00:15:00 NaN NaN 3.0
# 2024-01-03 00:20:00 NaN NaN 4.0
# 2024-01-03 00:25:00 NaN NaN 5.0
# 2024-01-03 00:30:00 NaN NaN 6.0
# 2024-01-03 00:35:00 NaN NaN 7.0
# 2024-01-03 00:40:00 NaN NaN 8.0
# 2024-01-03 00:45:00 NaN NaN 9.0
# 2024-01-03 00:50:00 NaN NaN 10.0
# ...
# 2024-01-03 22:50:00 NaN NaN 274.0
# 2024-01-03 22:55:00 NaN NaN 275.0
# 2024-01-03 23:00:00 NaN NaN 276.0
# 2024-01-03 23:05:00 NaN NaN 277.0
# 2024-01-03 23:10:00 NaN NaN 278.0
# 2024-01-03 23:15:00 NaN NaN 279.0
# 2024-01-03 23:20:00 NaN NaN 280.0
# 2024-01-03 23:25:00 NaN NaN 281.0
# 2024-01-03 23:30:00 NaN NaN 282.0
# 2024-01-03 23:35:00 NaN NaN 283.0
# 2024-01-03 23:40:00 NaN NaN 284.0
# 2024-01-03 23:45:00 NaN NaN 285.0
# 2024-01-03 23:50:00 NaN NaN 286.0
# 2024-01-03 23:55:00 NaN NaN 287.0
# 2024-01-04 11:20:00 NaN NaN NaN
# 2024-01-05 11:20:00 NaN NaN NaN
df2 = pd.concat([ts1, ts2], axis=1)
print(df2.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-02 00:00:00',
# '2024-01-04 11:20:00', '2024-01-05 11:20:00'],
# dtype='datetime64[us]', freq=None)
print(df2.to_string())
# 0 1
# 2024-01-01 00:00:00 0.0 NaN
# 2024-01-02 00:00:00 NaN 0.0
# 2024-01-04 11:20:00 NaN NaN
# 2024-01-05 11:20:00 NaN NaN
Issue Description
When trying to concatenate by column few non overlapping timeseries dataframes, if the units of the original dataframes are not 'ns' then the resulting dataframe will have missing data (and lose it's frequency value, in case it's relevant).
The example given has 3 dataframes and for some reason the result has missed most of the data from the first and the second dataframe. In case of concatenating 2 dataframes we end up with almost no data at all.
If we set the units to 'ns' everything works as expected, the resulting df has all the data and kept its frequency='5min'. Every other unit I tried failed with similar results than the example.
Expected Behavior
import pandas as pd
idx1 = pd.date_range("2024-01-01", periods=24*12, freq="5min", unit="ns")
idx2 = pd.date_range("2024-01-02", periods=24*12, freq="5min", unit="ns")
idx3 = pd.date_range("2024-01-03", periods=24*12, freq="5min", unit="ns")
ts1 = pd.Series(range(len(idx1)), index=idx1)
ts2 = pd.Series(range(len(idx2)), index=idx2)
ts3 = pd.Series(range(len(idx3)), index=idx3)
df = pd.concat([ts1, ts2, ts3], axis=1)
print(ts1.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00',
# '2024-01-01 00:10:00', '2024-01-01 00:15:00',
# '2024-01-01 00:20:00', '2024-01-01 00:25:00',
# '2024-01-01 00:30:00', '2024-01-01 00:35:00',
# '2024-01-01 00:40:00', '2024-01-01 00:45:00',
# ...
# '2024-01-01 23:10:00', '2024-01-01 23:15:00',
# '2024-01-01 23:20:00', '2024-01-01 23:25:00',
# '2024-01-01 23:30:00', '2024-01-01 23:35:00',
# '2024-01-01 23:40:00', '2024-01-01 23:45:00',
# '2024-01-01 23:50:00', '2024-01-01 23:55:00'],
# dtype='datetime64[ns]', length=288, freq='5min')
print(ts2.index)
# DatetimeIndex(['2024-01-02 00:00:00', '2024-01-02 00:05:00',
# '2024-01-02 00:10:00', '2024-01-02 00:15:00',
# '2024-01-02 00:20:00', '2024-01-02 00:25:00',
# '2024-01-02 00:30:00', '2024-01-02 00:35:00',
# '2024-01-02 00:40:00', '2024-01-02 00:45:00',
# ...
# '2024-01-02 23:10:00', '2024-01-02 23:15:00',
# '2024-01-02 23:20:00', '2024-01-02 23:25:00',
# '2024-01-02 23:30:00', '2024-01-02 23:35:00',
# '2024-01-02 23:40:00', '2024-01-02 23:45:00',
# '2024-01-02 23:50:00', '2024-01-02 23:55:00'],
# dtype='datetime64[ns]', length=288, freq='5min')
print(ts3.index)
# DatetimeIndex(['2024-01-03 00:00:00', '2024-01-03 00:05:00',
# '2024-01-03 00:10:00', '2024-01-03 00:15:00',
# '2024-01-03 00:20:00', '2024-01-03 00:25:00',
# '2024-01-03 00:30:00', '2024-01-03 00:35:00',
# '2024-01-03 00:40:00', '2024-01-03 00:45:00',
# ...
# '2024-01-03 23:10:00', '2024-01-03 23:15:00',
# '2024-01-03 23:20:00', '2024-01-03 23:25:00',
# '2024-01-03 23:30:00', '2024-01-03 23:35:00',
# '2024-01-03 23:40:00', '2024-01-03 23:45:00',
# '2024-01-03 23:50:00', '2024-01-03 23:55:00'],
# dtype='datetime64[ns]', length=288, freq='5min')
print(df.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00',
# '2024-01-01 00:10:00', '2024-01-01 00:15:00',
# '2024-01-01 00:20:00', '2024-01-01 00:25:00',
# '2024-01-01 00:30:00', '2024-01-01 00:35:00',
# '2024-01-01 00:40:00', '2024-01-01 00:45:00',
# ...
# '2024-01-03 23:10:00', '2024-01-03 23:15:00',
# '2024-01-03 23:20:00', '2024-01-03 23:25:00',
# '2024-01-03 23:30:00', '2024-01-03 23:35:00',
# '2024-01-03 23:40:00', '2024-01-03 23:45:00',
# '2024-01-03 23:50:00', '2024-01-03 23:55:00'],
# dtype='datetime64[ns]', length=864, freq='5min')
print(df.to_string())
# 0 1 2
# 2024-01-01 00:00:00 0.0 NaN NaN
# 2024-01-01 00:05:00 1.0 NaN NaN
# 2024-01-01 00:10:00 2.0 NaN NaN
# 2024-01-01 00:15:00 3.0 NaN NaN
# 2024-01-01 00:20:00 4.0 NaN NaN
# 2024-01-01 00:25:00 5.0 NaN NaN
# 2024-01-01 00:30:00 6.0 NaN NaN
# 2024-01-01 00:35:00 7.0 NaN NaN
# ...
# 2024-01-01 23:40:00 284.0 NaN NaN
# 2024-01-01 23:45:00 285.0 NaN NaN
# 2024-01-01 23:50:00 286.0 NaN NaN
# 2024-01-01 23:55:00 287.0 NaN NaN
# 2024-01-02 00:00:00 NaN 0.0 NaN
# 2024-01-02 00:05:00 NaN 1.0 NaN
# 2024-01-02 00:10:00 NaN 2.0 NaN
# 2024-01-02 00:15:00 NaN 3.0 NaN
# ...
# 2024-01-02 23:35:00 NaN 283.0 NaN
# 2024-01-02 23:40:00 NaN 284.0 NaN
# 2024-01-02 23:45:00 NaN 285.0 NaN
# 2024-01-02 23:50:00 NaN 286.0 NaN
# 2024-01-02 23:55:00 NaN 287.0 NaN
# 2024-01-03 00:00:00 NaN NaN 0.0
# 2024-01-03 00:05:00 NaN NaN 1.0
# 2024-01-03 00:10:00 NaN NaN 2.0
# 2024-01-03 00:15:00 NaN NaN 3.0
# 2024-01-03 00:20:00 NaN NaN 4.0
# ...
# 2024-01-03 23:25:00 NaN NaN 281.0
# 2024-01-03 23:30:00 NaN NaN 282.0
# 2024-01-03 23:35:00 NaN NaN 283.0
# 2024-01-03 23:40:00 NaN NaN 284.0
# 2024-01-03 23:45:00 NaN NaN 285.0
# 2024-01-03 23:50:00 NaN NaN 286.0
# 2024-01-03 23:55:00 NaN NaN 287.0
df2 = pd.concat([ts1, ts2], axis=1)
print(df2.index)
# DatetimeIndex(['2024-01-01 00:00:00', '2024-01-01 00:05:00',
# '2024-01-01 00:10:00', '2024-01-01 00:15:00',
# '2024-01-01 00:20:00', '2024-01-01 00:25:00',
# '2024-01-01 00:30:00', '2024-01-01 00:35:00',
# '2024-01-01 00:40:00', '2024-01-01 00:45:00',
# ...
# '2024-01-02 23:10:00', '2024-01-02 23:15:00',
# '2024-01-02 23:20:00', '2024-01-02 23:25:00',
# '2024-01-02 23:30:00', '2024-01-02 23:35:00',
# '2024-01-02 23:40:00', '2024-01-02 23:45:00',
# '2024-01-02 23:50:00', '2024-01-02 23:55:00'],
# dtype='datetime64[ns]', length=576, freq='5min')
print(df2.to_string())
# 0 1
# 2024-01-01 00:00:00 0.0 NaN
# 2024-01-01 00:05:00 1.0 NaN
# 2024-01-01 00:10:00 2.0 NaN
# 2024-01-01 00:15:00 3.0 NaN
# 2024-01-01 00:20:00 4.0 NaN
# 2024-01-01 00:25:00 5.0 NaN
# 2024-01-01 00:30:00 6.0 NaN
# ...
# 2024-01-01 23:25:00 281.0 NaN
# 2024-01-01 23:30:00 282.0 NaN
# 2024-01-01 23:35:00 283.0 NaN
# 2024-01-01 23:40:00 284.0 NaN
# 2024-01-01 23:45:00 285.0 NaN
# 2024-01-01 23:50:00 286.0 NaN
# 2024-01-01 23:55:00 287.0 NaN
# 2024-01-02 00:00:00 NaN 0.0
# 2024-01-02 00:05:00 NaN 1.0
# 2024-01-02 00:10:00 NaN 2.0
# 2024-01-02 00:15:00 NaN 3.0
# 2024-01-02 00:20:00 NaN 4.0
# 2024-01-02 00:25:00 NaN 5.0
# ...
# 2024-01-02 23:15:00 NaN 279.0
# 2024-01-02 23:20:00 NaN 280.0
# 2024-01-02 23:25:00 NaN 281.0
# 2024-01-02 23:30:00 NaN 282.0
# 2024-01-02 23:35:00 NaN 283.0
# 2024-01-02 23:40:00 NaN 284.0
# 2024-01-02 23:45:00 NaN 285.0
# 2024-01-02 23:50:00 NaN 286.0
# 2024-01-02 23:55:00 NaN 287.0
### Installed Versions
<details>
INSTALLED VERSIONS
------------------
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python : 3.10.13.final.0
python-bits : 64
OS : Linux
OS-release : 4.18.0-372.9.1.el8.x86_64
Version : #1 SMP Fri Apr 15 22:12:19 EDT 2022
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : C
LOCALE : en_US.UTF-8
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 67.7.2
pip : 23.1.2
Cython : None
pytest : 8.1.1
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.1.3
IPython : 8.23.0
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.8.4
numba : None
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : 16.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : None
tables : None
tabulate : 0.9.0
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
</details>
**Comment From: nkonts**
I am not sure if it is only related to the unit. Here is an example with `ns` (and with `us` as well). Same thing happens with `freq="MS"`. But changing it to `freq="D"` fixes it.
Also notice that in both outputs the `freq` attribute of the index changes from `ME` to `None` by simply reordering the time series.
### Reproduceable Example
import pandas as pd
idx1 = pd.date_range(start="2015-01-31", end="2023-01-31", freq="ME", unit="ns") idx2 = pd.date_range(start="2014-01-31", end="2024-01-31'", freq="ME", unit="ns")
ts1 = pd.Series(range(len(idx1)), index=idx1) ts2 = pd.Series(range(len(idx2)), index=idx2)
Goes from 2014-01-31 (some values of ts1 are missingg) to 2023-01-31 (the values where ts1 is missing are dropped)
df1 = pd.concat([ts1, ts2], axis=1).sort_index() print(df1.index)
DatetimeIndex(['2014-01-31', '2014-02-28', '2014-03-31', '2014-04-30',
'2014-05-31', '2014-06-30', '2014-07-31', '2014-08-31',
'2014-09-30', '2014-10-31',
...
'2022-04-30', '2022-05-31', '2022-06-30', '2022-07-31',
'2022-08-31', '2022-09-30', '2022-10-31', '2022-11-30',
'2022-12-31', '2023-01-31'],
dtype='datetime64[ns]', length=109, freq=None)
As expected output from 2014-01-31 to 2024-01-31
df2 = pd.concat([ts2, ts1], axis=1).sort_index() print(df2.index)
DatetimeIndex(['2014-01-31', '2014-02-28', '2014-03-31', '2014-04-30',
'2014-05-31', '2014-06-30', '2014-07-31', '2014-08-31',
'2014-09-30', '2014-10-31',
...
'2023-04-30', '2023-05-31', '2023-06-30', '2023-07-31',
'2023-08-31', '2023-09-30', '2023-10-31', '2023-11-30',
'2023-12-31', '2024-01-31'],
dtype='datetime64[ns]', length=121, freq='ME')
INSTALLED VERSIONS
commit : f538741432edf55c6b9fb5d0d496d2dd1d7c2457 python : 3.11.5.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 140 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : de_DE.cp1252
pandas : 2.2.0 numpy : 1.26.4 pytz : 2024.1 dateutil : 2.8.2 setuptools : 68.0.0 pip : 23.2.1 Cython : 3.0.3 pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : 3.1.8 lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.16.1 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : 2024.3.1 gcsfs : None matplotlib : 3.8.3 numba : 0.59.1 numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : 13.0.0 pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.12.0 sqlalchemy : 2.0.28 tables : None tabulate : None xarray : 2024.2.0 xlrd : None zstandard : None tzdata : 2024.1 qtpy : None pyqt5 : None
**Comment From: Doggie52**
This is happening for me even with `freq="D"`, but only failing with `unit="us"`.
```python
import pandas as pd
idx1 = pd.date_range("2013-08-16", "2024-05-01", freq="D", unit="us")
idx2 = pd.date_range("2015-09-19", "2024-05-01", freq="D", unit="us")
ts1 = pd.Series(range(len(idx1)), index=idx1)
ts2 = pd.Series(range(len(idx2)), index=idx2)
df1 = pd.concat([ts1, ts2], axis=1)
print(df1.index)
Output:
DatetimeIndex(['2013-08-16', '2015-09-19', '2016-05-12', '2018-06-15',
'2019-02-06', '2021-03-11', '2021-11-02', '2023-12-06',
'2024-07-29', '2026-09-01'],
dtype='datetime64[us]', freq=None)
Note how it loses frequency and also adds bogus dates at the end! Really scary bug, had me scratching my head for a long time.
INSTALLED VERSIONS
------------------
commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252
pandas : 2.2.2
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.5.1
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.0
numba : 0.59.1
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Comment From: Doggie52
Can confirm this is still an issue on the latest commit in the master branch.
@jorgehv you may want to update your original post to tick the box?
import pandas as pd
print(pd.__version__)
idx1 = pd.date_range("2013-08-16", "2024-05-01", freq="D", unit="us")
idx2 = pd.date_range("2015-09-19", "2024-05-01", freq="D", unit="us")
ts1 = pd.Series(range(len(idx1)), index=idx1)
ts2 = pd.Series(range(len(idx2)), index=idx2)
df1 = pd.concat([ts1, ts2], axis=1)
print(df1.index)
Output:
3.0.0.dev0+1024.gb162331554
DatetimeIndex(['2013-08-16', '2015-09-19', '2016-05-12', '2018-06-15',
'2019-02-06', '2021-03-11', '2021-11-02', '2023-12-06',
'2024-07-29', '2026-09-01'],
dtype='datetime64[us]', freq=None)
INSTALLED VERSIONS
------------------
commit : b162331554d7c7f6fd46ddde1ff3908f2dc8bcce
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.22631
machine : AMD64
processor : AMD64 Family 25 Model 33 Stepping 2, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : None
LOCALE : English_United Kingdom.1252
pandas : 3.0.0.dev0+1024.gb162331554
numpy : 1.26.4
pytz : 2024.1
dateutil : 2.9.0.post0
setuptools : 69.5.1
pip : 24.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : None
IPython : None
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : 4.12.3
bottleneck : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.0
numba : 0.59.1
numexpr : None
odfpy : None
openpyxl : None
pyarrow : None
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.13.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Comment From: Doggie52
Might also be worth clarifying in the title that this affects time series in general, not just non-overlapping ones (as @nkonts and I showed).
Comment From: SaneBow
Nice to find this issue before reporting it separately. It took me hours to trace the root cause of a strange behavior of my program to this exact bug.
Minimal example:
dates = pd.date_range('2021-01-01', periods=5, unit='ms')
df1 = pd.DataFrame(range(len(dates)), index=dates)
dates = pd.date_range('2021-01-01', periods=6, unit='ms')
df2 = pd.DataFrame(range(len(dates)), index=dates)
pd.concat([df1, df2], axis=1)
Comment From: jorgehv
Might also be worth clarifying in the title that this affects time series in general, not just non-overlapping ones (as @nkonts and I showed).
Updated
Comment From: rhshadrach
I'm seeing the expected result in all cases. Could likely use a test. PRs welcome!