Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
df = pd.DataFrame({"datecol": [1, 2]})
# Passes with Series
pd.to_datetime(df["datecol"])
# Fails with DataFrame
pd.to_datetime(df)
Issue Description
Converting timestamps to datetime works with Series, but fails with DataFrame
Error with DataFrame:
Traceback (most recent call last):
File "C:\Users\...\main.py", line 8, in <module>
pd.to_datetime(df)
File "C:\Users\...\.venv\Lib\site-packages\pandas\core\tools\datetimes.py", line 1075, in to_datetime
result = _assemble_from_unit_mappings(arg, errors, utc)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\...\.venv\Lib\site-packages\pandas\core\tools\datetimes.py", line 1191, in _assemble_from_unit_mappings
raise ValueError(
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Expected Behavior
to_datetime
should have consistent behaviour between Series & DataFrame.
Installed Versions
Comment From: Alvaro-Kothe
This is taken from the docs
If a DataFrame is provided, the method expects minimally the following columns: "year", "month", "day". The column “year” must be specified in 4-digit format.
If you want to use pd.to_datetime
you should use it like this:
import pandas as pd
df = pd.DataFrame({"day": [1, 2], "month": [1, 1], "year": [1970, 1970]})
print(pd.to_datetime(df))
# prints
# 0 1970-01-01
# 1 1970-01-02
# dtype: datetime64[s]
I don't think that this is a bug, since it is the expected behaviour.
Comment From: ldouteau
This is taken from the docs
Oh right , i completely missed it.
I don't think that this is a bug, since it is the expected behaviour.
I agree that the docs are very clear about this, hence not a bug (my bad for the report). But it feels weird that timestamps are supported when working on series & not on DataFrame.
Comment From: Alvaro-Kothe
But it feels weird that timestamps are supported when working on series & not on DataFrame
The key difference is that a DataFrame can have multiple columns. Hence, you should select the timestamp column.
Comment From: rhshadrach
+1 on @Alvaro-Kothe's reasoning, closing.