• [x] I have searched the [pandas] tag on StackOverflow for similar questions.

  • [ ] I have asked my usage related question on StackOverflow.


Question about pandas

I'm trying to optimise my df dtypes to consume less memory. I noticed differences in the describe() method output on that df. Why those outputs are different?

Note: If you'd still like to submit a question, please read this guide detailing how to provide the necessary information for us to reproduce your question.

# Your code here, if applicable
df = pd.DataFrame({'a':[11111,22222,3333]})
df['a'].describe()
df['a'].astype('float16').describe()

Comment From: MarcoGorelli

can you show the output as well please?

Comment From: phofl

And please explain why you would expect the same here.

Comment From: phofl

If you are referring to the inf, this is somewhat expected. We are calculating the variance first before taking the square root and the variance is not a float16 number

We could maybe document this better

Comment From: dsaxton

If you are referring to the inf, this is somewhat expected. We are calculating the variance first before taking the square root and the variance is not a float16 number

We could maybe document this better

Just to add a little to this answer with a smaller example:

[ins] In [1]: import pandas as pd

[ins] In [2]: import numpy as np

[ins] In [3]: df = pd.DataFrame({'a':[11111,22222,3333]})

[ins] In [4]: df["a"].var() / np.finfo(np.float16).max
Out[4]: 1375.8598100879335

So clearly the variance is too large.

Comment From: jreback

we have almost 0 support for float16

wouldn't be against completely raising on this type

Comment From: Leejung8763

can you show the output as well please?

Outputs are here

df = pd.DataFrame({'a':[11111,22222,3333]})
df['a'].describe()

count 3.00 mean 12,222.00 std 9,493.38 min 3,333.00 25% 7,222.00 50% 11,111.00 75% 16,666.50 max 22,222.00 Name: a, dtype: float64

df['a'].astype('float16').describe()

count 3.00 mean 12,224.00 std inf min 3,332.00 25% 7,222.00 50% 11,112.00 75% 16,668.00 max 22,224.00 Name: a, dtype: float64

Comment From: Leejung8763

We are calculating the variance first before taking the square root and the variance is not a float16 number

We could maybe document this better

Maybe It's same with what you explained. Thank you for answer.