Code Sample, a copy-pastable example if possible

Problem description

I tried to search to see if this issue was raised before, but could not find anything.

I notice that read_sas can convert SAS numbers formatted as DATE values (number of days since 1960) or DATETIME values (number of seconds since 1960) to datetime values supported by the datetime class in Python.

But it does not include any logic to convert numbers formatted as TIME values (number of seconds since midnight) even though it looks like the datetime class in Python does support time of day values.

Why does read_sas not also convert those numbers? Is it because nobody actually uses that Python data type? Nobody asked for that feature?

Comment From: TomAugspurger

But it does not include any logic to convert numbers formatted as TIME values (number of seconds since midnight)

Is it actually seconds since midnight? If so that could be a timedelta. But if its supposed to represent the actual time of day, then no, pandas (nor numpy) have a type for that.

Comment From: sasutils

Yes TIME is the number of seconds in 24 hours, ie the number of seconds since midnight.

Looks like the definition for datetime.time. https://docs.python.org/3/library/datetime.html

class datetime.time An idealized time, independent of any particular day, assuming that every day has exactly 246060 seconds (there is no notion of “leap seconds” here). Attributes: hour, minute, second, microsecond, and tzinfo.

Perhaps I am misunderstanding the Python documentation?

Comment From: troels

It could be converted to an ndarray of dtype object with time objects. It wouldn't be efficiently stored like datetime dtype=datetime, but it would be correct.

Comment From: TomAugspurger

@sasutils yes, I was asking whether SAS's TIME matches Python's datetime.time (no leap seconds) or whether it was more of a datetime.timedelta (with leap seconds).

Loading it as a ndarray of time objects would be unusual for pandas. AFAIK, we only do this type of conversion for dtypes that are actually supported.

I'd recommend users do this themselves by converting the loaded data into an object-dtype column holding datetime.time values (and if someone really wants to they can make an extension array for this).

Comment From: troels

Both e.g. the excel parser and the SQL module handle and do the conversion to and from time just fine, and as far as I know that is true for all other file formats supporting time as a type. Why should sas-files be treated differently?

Doing the conversion themselves force the users to manually keep track of the types of the respective columns in a way that is unnecessary, as all the information required is right there in the sas-file. Why would we force the user to do that?

Comment From: TomAugspurger

I didn’t realize we did that for excel and sql.

Comment From: ofajardo

pyreadstat converts SAS TIME to python datetime.time objects. You could add pyreadstat as a backend for read_sas (already used for read_spss) and solve this issue as well as others (https://github.com/pandas-dev/pandas/issues/37088, https://github.com/pandas-dev/pandas/issues/35545, https://github.com/pandas-dev/pandas/issues/31243, https://github.com/pandas-dev/pandas/issues/18198)

Comment From: jbrockmendel

might be able to use new pyarrow time dtype here

Comment From: PasamonteG

I am having an error with read_sas:

Traceback (most recent call last): File "pandas/_libs/tslibs/timedeltas.pyx", line 372, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit File "pandas/_libs/tslibs/conversion.pyx", line 126, in pandas._libs.tslibs.conversion.cast_from_unit OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:


Traceback (most recent call last):
  File "pandas/_libs/tslibs/timedeltas.pyx", line 372, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
  File "pandas/_libs/tslibs/conversion.pyx", line 126, in pandas._libs.tslibs.conversion.cast_from_unit
OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/u/workspaces/pythononzos/src/python/sas_reader_pandas_1.py", line 25, in <module>
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/util/_decorators.py", line 331, in wrapper
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/io/sas/sasreader.py", line 161, in read_sas
    reader = SAS7BDATReader(
             ^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/io/sas/sas7bdat.py", line 207, in __init__
    self._get_properties()
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/io/sas/sas7bdat.py", line 294, in _get_properties
    self.date_created = epoch + pd.to_timedelta(x, unit="s")
                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 211, in to_timedelta
    return _coerce_scalar_to_timedelta_type(arg, unit=unit, errors=errors)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/u/workspaces/nrd_venv/lib/python3.11/site-packages/pandas/core/tools/timedeltas.py", line 219, in _coerce_scalar_to_timedelta_type
    result = Timedelta(r, unit)
             ^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1695, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 351, in pandas._libs.tslibs.timedeltas.convert_to_timedelta64
  File "pandas/_libs/tslibs/timedeltas.pyx", line 374, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 1.3033002481878809e+41 from s to 'ns' without overflow.

This is the code:


df = pd.read_sas(url,format = 'sas7bdat')

where url contains the path for the file.

The problem here is that I can’t avoid casting. This is what the table looks like: Pandas Reading SAS time variables

There might be a way to read the SAS file and assign a data type later.

Thanks