-
[x] I have searched the [pandas] tag on StackOverflow for similar questions.
-
[ ] I have asked my usage related question on StackOverflow.
Question about pandas
I'm trying to optimise my df dtypes to consume less memory. I noticed differences in the describe() method output on that df. Why those outputs are different?
Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.
# Your code here, if applicable
df = pd.DataFrame({'a':[11111,22222,3333]})
df['a'].describe()
df['a'].astype('float16').describe()
Comment From: MarcoGorelli
can you show the output as well please?
Comment From: phofl
And please explain why you would expect the same here.
Comment From: phofl
If you are referring to the inf, this is somewhat expected. We are calculating the variance first before taking the square root and the variance is not a float16 number
We could maybe document this better
Comment From: dsaxton
If you are referring to the inf, this is somewhat expected. We are calculating the variance first before taking the square root and the variance is not a float16 number
We could maybe document this better
Just to add a little to this answer with a smaller example:
[ins] In [1]: import pandas as pd
[ins] In [2]: import numpy as np
[ins] In [3]: df = pd.DataFrame({'a':[11111,22222,3333]})
[ins] In [4]: df["a"].var() / np.finfo(np.float16).max
Out[4]: 1375.8598100879335
So clearly the variance is too large.
Comment From: jreback
we have almost 0 support for float16
wouldn't be against completely raising on this type
Comment From: Leejung8763
can you show the output as well please?
Outputs are here
df = pd.DataFrame({'a':[11111,22222,3333]})
df['a'].describe()
count 3.00 mean 12,222.00 std 9,493.38 min 3,333.00 25% 7,222.00 50% 11,111.00 75% 16,666.50 max 22,222.00 Name: a, dtype: float64
df['a'].astype('float16').describe()
count 3.00 mean 12,224.00 std inf min 3,332.00 25% 7,222.00 50% 11,112.00 75% 16,668.00 max 22,224.00 Name: a, dtype: float64
Comment From: Leejung8763
We are calculating the variance first before taking the square root and the variance is not a float16 number
We could maybe document this better
Maybe It's same with what you explained. Thank you for answer.