When parsing text into a Timestamp
object we can specify a format string. Currently %f
is documented with
note that "%f" will parse all the way up to nanoseconds
See https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_datetime.html, in particular the description of the format
parameter. The note about %f
was added in this patch: https://github.com/pandas-dev/pandas/pull/8904
The fact that we can parse text using nanosecond precision is great, and here I make use of that behavior (showing two methods yielding the same result):
# Implicit format:
>>> t = pd.to_datetime('2019-10-03T09:30:12.133333337')
>>> t
Timestamp('2019-10-03 09:30:12.133333337')
# Explicit format using %f:
>>> t = pd.to_datetime('2019-10-03T09:30:12.133333337', format='%Y-%m-%dT%H:%M:%S.%f')
>>> t
Timestamp('2019-10-03 09:30:12.133333337')
````
But when I now want to invert that process using `strftime()` then the fractional part is truncated to microsecond precision:
t.strftime('%Y-%m-%dT%H:%M:%S.%f') '2019-10-03T09:30:12.133333'
On the one hand this is inconsistent with the meaning of `%f` while parsing. On the other hand it corresponds to what's documented in Python's stdlib documentation (which says that `%f` means "Microsecond as a decimal number, zero-padded on the left.").
In any case, I think it would make sense to have a format string specifier that allows us to turn the timestamp into a string with nanosecond precision.
If I am not mistaken, we otherwise have to work around the absence of that format specifier by using the `nanosecond` property:
t.nanosecond 337
t.strftime('%Y-%m-%dT%H:%M:%S.%f') + str(t.nanosecond) '2019-10-03T09:30:12.133333337'
Do you agree that we should have a format specifier for that? Or do we have one, and it's just not documented?
<details>
INSTALLED VERSIONS
------------------
commit : None
python : 3.7.4.final.0
python-bits : 64
OS : Linux
OS-release : 5.3.7-200.fc30.x86_64
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.2
numpy : 1.17.3
pytz : 2019.3
dateutil : 2.8.0
pip : 19.0.3
setuptools : 40.8.0
Cython : 0.29.13
pytest : 5.2.1
hypothesis : 4.41.3
sphinx : 2.2.0
blosc : 1.8.1
feather : None
xlsxwriter : 1.2.2
lxml.etree : 4.4.1
html5lib : 1.0.1
pymysql : None
psycopg2 : None
jinja2 : 2.10.3
IPython : 7.8.0
pandas_datareader: None
bs4 : 4.8.1
bottleneck : 1.2.1
fastparquet : 0.3.2
gcsfs : None
lxml.etree : 4.4.1
matplotlib : 3.1.1
numexpr : 2.7.0
odfpy : None
openpyxl : 3.0.0
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : 0.3.5
scipy : 1.3.1
sqlalchemy : 1.3.10
tables : 3.6.0
xarray : 0.14.0
xlrd : 1.2.0
xlwt : 1.3.0
xlsxwriter : 1.2.2
</details>
**Comment From: jgehrcke**
Note to self: interestingly, I just saw that Ruby's strftime has a `%N - Fractional seconds digits, default is 9 digits (nanosecond)`: https://ruby-doc.org/core-2.6.4/Time.html, and Go has no strftime, but a property `https://golang.org/pkg/time/#Time.Nanosecond` that can be zero-padded.
Another note to self: `Timestamp.timestamp()` returns microsecond precision, instead of nanosecond precision:
t = pd.to_datetime('2019-10-03T09:30:12.133333337') t.timestamp() 1570095012.133333
Third note to self: for the specific format I want to emit in my example above there is this working shortcut: `t.isoformat()` yields `'2019-11-07T12:29:23.444348736'`.
**Comment From: matteosantama**
Ran into this issue today. Proposed solution is to add `.strftime()` method to `Timestamp` object in `pandas/_libs/tslibs/timestamps.pyx`.
Is the best route to fully reimplement the method? I think ideally it would look something like this
```python
def strftime(self, fmt: str) -> str:
if self.nanosecond == 0:
return super.strftime(fmt)
# else
but I can't think of an elegant else
clause (outside of reimplementing the entire method).
Comment From: AlexeyDmitriev
1) I don't think that you need the if. Because user may need to output all 9 digits even if zeros are here 2) You can avoid implementing the whole method this way: (supposing %9 to print 9 digits of nanoseconds) You find all %9's (but you need to parse it carefully, e.g. "%%9" is literal percent sign, then literal 9), then replace them all with nine digits of nanoseconds. Then pass the resulting string to the datetime.strftime
Comment From: AlexeyDmitriev
@jgehrcke
Third note to self: for the specific format I want to emit in my example above there is this working shortcut: t.isoformat() yields '2019-11-07T12:29:23.444348736'. The problem with your shortcut through is that if
t
accidentally is whole number of microseconds, the last 000 are not printed
Comment From: smarie
Note that for Period
s there is a different convention in pandas: %u
means microseconds and %n
means nanoseconds. This is mostly because for such objects, %f
means "fiscal year".