Problem description
I'd like to suggest a modification to df.pop(item). Currently, pop(item)
deletes the column from the dataframe it's being called on and returns that column as a series. It doesn't accept multiple items.
It might be a nice convenience to:
- pop multiple columns at once (ex:
pop(['A', 'B'])
- specifying an
axis
parameter (default: axis=1) to allow popping rows and columns (ex:pop(1, axis=0)
) - pop slices (ex:
pop([1:3], axis=1)
)
Thought I'd throw it out there to the pandas gods and see if it is interesting. If it's not the best API design decision for pop
, I completely understand.
Common use-case
- you have one or multiple problem rows you want to delete from a dataframe but still keep for later evaluation. You'd just pop the rows and they'd be deleted from your existing dataframe and saved to a new variable.
- many times people seem to need to pop the last row, or second row. It is easy to pop the last row using
.iloc[:-1]
but popping the second row in one swoop isn't as easy I think. It could be if you just pop it out of there using pop. - sometimes people loop through a dataframe. not recommended I understand, but in such a scenario, you could pop a row based on a condition while looping perhaps in a complex manner.
Code Sample, a copy-pastable example if possible
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6], 'C': [7, 8, 9]},
columns=['A', 'B', 'C'])
def pop(df, values, axis=1):
if axis == 0:
if isinstance(values, (list, tuple)):
popped_rows = df.loc[values]
df.drop(values, axis=0, inplace=True)
return popped_rows
elif isinstance(values, (int)):
popped_row = df.loc[values].to_frame().T
df.drop(values, axis=0, inplace=True)
return popped_row
else:
print('values parameter needs to be a list, tuple or int.')
elif axis == 1:
# current df.pop(values) logic here
return df.pop(values)
Example Usage
# example df
>>> df
A B C
0 1 4 7
1 2 5 8
2 3 6 9
# pop multiple indices, delete from df inplace, return popped rows
# the df param wouldn't exist in the pop method; it'd be self
# df param just shown here to illustrate the idea
>>>pop(df, [0, 2], axis=0)
A B C
0 1 4 7
2 3 6 9
# pop one index value, delete from df, return row as a dataframe (not series)
>>> pop(df, 1, axis=0)
A B C
1 2 5 8
Demand for such a feature
How to pop rows from a dataframe?
Comment From: TomAugspurger
inplace popping of rows is going to be very inefficient, and I don't thing we should encourage that.
I think the best way to do this is with boolean masking. That covers your use cases 1 and 2, and I don't think we should encourage 3 :)
This could be useful as a cookbook entry, "How do I pop rows from a DataFrame?", answering that you don't.
Comment From: jaradc
I don't disagree with anything you said here @TomAugspurger :) I believe pop
is usually reserved for a concept of popping 1 of something (ex: 1 row, 1 item, 1 column, etc.) so I'm not sure modifying the existing pop function is appropriate in that context.
A cookbook entry would also be a great help if there's a better way to do this. My main idea is having a convenience method to be able to do this kind of action in one call - pop rows or columns in-place (delete from existing dataframe) and return
Comment From: ghost
@jaradc I would like to add that I too thought this would be useful.
I just distributed my first package https://github.com/kdggavkc/pandas-refract for this purpose, but would prefer to see syntax in pandas like:
target_df = df.pop(df['target_column'] == 'target_value', axis=0)
It's not providing functionality that doesn't exist in pandas, but to me its syntax I would have thought existed already. Currently you have to slice based on a condition, and then slice on the inverse (mask and ~mask) to split a df this way.
@TomAugspurger any thoughts here? how open are we too allowing above syntax?)
Comment From: TomAugspurger
@kdggavkc I may misunderstand, but your pop looks different.
dict.pop / DataFrame.pop take a label. This issue was about expanding pop
to take multiple keys and an axis argument.
Your pop seems to take a mask.
On Tue, Jul 10, 2018 at 5:42 AM, kdggavkc notifications@github.com wrote:
@jaradc https://github.com/jaradc I would like to add that I too thought this would be useful.
I just distributed my first package https://github.com/kdggavkc/ pandas-refract for this purpose, but would prefer to see syntax in pandas like:
target_df = df.pop(df['target_column'] == 'target_value')
It's not providing functionality that doesn't exist in pandas, but to me its syntax I would have thought existed already. Currently you have to slice based on a condition, and then slice on the inverse (mask and ~mask) to split a df this way.
@TomAugspurger https://github.com/TomAugspurger any thoughts here? how open are we too allowing above syntax?)
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/19501#issuecomment-403779729, or mute the thread https://github.com/notifications/unsubscribe-auth/ABQHIhAmZqLF3kw62oPjPwkywdnaPRgyks5uFIUhgaJpZM4R2gsF .
Comment From: ghost
@TomAugspurger no you are correct. I piped in here because I saw someone had a similar idea and didn't want to make a separate issue. If you feel it's more appropriate for me to do so I certainly can.
Comment From: jmarshall9120
December calling.. did we ever get a cookbook entry for this?
Comment From: kevinbird15
I know this is pretty old, but this is how I solved this problem for me:
def pop_first_row(df):
first_row = df.iloc[[0]]
df = df.iloc[1:]
return first_row, df
I only cared about the top row but this could also take an index argument but not 100% sure how I would rebuild the df in a case where it wasn't 0. would need to think about that a bit. I guess if performance wasn't a concern, you could do a pd.concat with the two sides of the popped index, but I bet there is a smarter way to deal with it.
Comment From: jbrockmendel
I agree with @tomaugspurger we do t want to encourage popping on rows. Closing.