Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
python
import geopandas as gpd
from shapely.geometry import Polygon, MultiPolygon
gdf = pgd.Geopandas([{"ID1": "3991b7ab", "ID2": nan,"geometry": POLYGON ((...)), "area_m": 10720.28326},
{"ID1": "3991b7ab","ID2": "dc4772ed", "geometry": MULTIPOLYGON(((...)), "area": 0.24245}]
gdf.assign(area=lambda x: x.geometry.area).sort_values(by="area", ascending=False).query("FLUR_OBJECT_UUI == '3991b7ab'").groupby(by="ID1").first()
Issue Description
I have a (Geo)DataFrame with Multiple UUID columns. When sorting the rows, the first columns, can have a None value. After groupby if ID1, the None value of ID2 is replaced withe next value in that column.
Expected Behavior
The value of ID2 should be None.
Installed Versions
Comment From: rhshadrach
Thanks for the report. Can you simplify this? Produce the DataFrame that goes into the groupby explicitly (i.e. remove the use of assign, sort_values, and query), and show the result you get.
Comment From: asishm
df.groupby.first
has a skipna
argument that defaults to True
https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.first.html
Try setting that to False
.
In [190]: pd.DataFrame({'a': [1, 1, 2, 2], 'b': [np.nan, '2', '3', '4']}).groupby('a').first(skipna=False)
Out[190]:
b
a
1 NaN
2 3
In [191]: pd.DataFrame({'a': [1, 1, 2, 2], 'b': [np.nan, '2', '3', '4']}).groupby('a').first()
Out[191]:
b
a
1 2
2 3
Comment From: sehHeiden
Okay, my bad.
Made sure, that I have > 2.2.1 on all machines. skipna, did help. Problem was, when I search for the documentation for first. I find the DataFrame.first and not the DataFrameGroupBy.first. I think, this is a problem, but something different.
Comment From: rhshadrach
@sehHeiden - it's not clear to me if your issue is resolved or not. If it isn't resolved, can you post a reproducible example (see my previous comment).
Comment From: sehHeiden
Solved. @asishm s solution works. Tested with 2.2.2, but does not work with 2.2.0 as it does not have the skipna parameter.
Optional wish: Do link GroupBy.first in the DataFrame.first documentation.