Feature Type

  • [x] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

I wish to pass an argument to the skew calculation method like (ddof or bias) that allows calculation of the skew of a population instead of a sample (default).

Feature Description

Example:

df.skew(bias=True)

Note: I think it can be implemented using the kwargs of the method _stat_function or _stat_function_ddof

Alternative Solutions

Using scipy.stats.skew(df, bias=True) is an alternative, but it's better if the method of pandas can handle it.

Additional Context

No response

Comment From: lithomas1

This seems reasonable to me. Not at all an expert in this area though.

Do you know what would be the advantages of using ddof as opposed to a bias keyword?

Comment From: Alain-Godo

Hi @lithomas1 thanks for the quick reply!!! Well, there is not a big difference between bias and ddof in this context, the first two degrees of freedom (0 and 1) describe if the collection of data is a population or a sample, and scipy used the parameter bias for the same two things instead of ddof. I also see that this parameter is missed in kurtosis too.

I check it out looking at the code behind and I think it's not so difficult to add the feature. I would like to contribute my grain of sand :pray: , if you allow me I can make the modifications so that it accepts this parameter and do a MR.

Comment From: Alain-Godo

By the way, sorry if my English is not the best, it's not my mother tongue. I do my best.

Comment From: lithomas1

No worries, I think your English is fine :).

Feel free to give a shot (I think using bias should be fine for now, since there doesn't seem to be a need to specify arbritrary ddof if I undertand correctly).

Comment From: Alain-Godo

Thanks, I will try to add the parameter bias to both functions (skew and kurtosis) 😄 Let u know.

Comment From: Alain-Godo

Hi, @lithomas1 !!! Well, I spend a little time today and find the problem.

These two calculations (skew and kurtosis ) need a special method like _stat_function_ddof, maybe can be called _stat_function_bias:

3

Both functions are implemented using the nanops respectively:

1

And the problem is there, the implementation in both cases is for "sample" data ( I used to think that pandas use Scipy for the calculation behind 😄 ).

2

I can modify both functions for sample/population and add a new parameter bias and fix the problem, with the possibility of selecting True or False like Scipy does, also add a _stat_function_bias too, to handle the new parameter. What do u think?

Comment From: Alain-Godo

Hi @lithomas1 and anyone interested !!!, I made the changes for DataFrames and Series, but I can see that the thing with methods like .rolling() is different, the calculation functions are written in Cython. Can you tell me the main sites where I need to add this functionality like in rolling() that I'm not seen yet?

Comment From: lithomas1

Not sure about rolling, I haven't touched that code ever 🤣 , but maybe it's here https://github.com/pandas-dev/pandas/blob/main/pandas/_libs/window/aggregations.pyx#L482 (I haven't double checked this)