Pandas API: setitem-with-expansion casting - Aurora Blog|java/go/python

This is a doozy:

I'm working through the issues with the setitem-with-expansion tag and about half of them are of the form "i added a row and got unwanted casting." Most of those are int->float, some are EA->object. At least one is object->non_object (when the original is empty we special-case).

Some of these (df.loc[new_row] = values) go through a path that uses concat. Others (df.loc[new_row, :] = values go through a path that reindexes instead. These two paths can give subtly different behaviors:

df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df2 = df.copy()

df.loc[2] = 5
df2.loc[2, :] = 5

assert (df.dtypes == "int64").all()
assert (df2.dtypes == "float64").all()

I have a branch that updates the concat-codepath to call _cast_pointwise_result on the new values to try to cast to the column's original dtype, but there are some problems there:

1) Doing that without patching the reindex path introduces even more inconsistencies 2) We have exactly one test case (test_partial_setting) that seems to be expecting non-casting:

ser = pd.Series([1, 2, 3])
ser.loc[5] = 5.0
expected = Series([1, 2, 3, 5.0], index=[0, 1, 2, 5], dtype="float64")
tm.assert_series_equal(ser, expected)

3) The elegant solution of using cast_pointwise_result requires adding to NumpyEA._cast_pointwise_result and MaskedEA._cast_pointwise_result checks for "if the original is integer and the result is all-round floats, cast back to ints". This change affects a bunch of map/apply/combine behavior that may not be desired. Of course we could put those checks outside the _cast_pointwise_result calls, they just become much less elegant.

In the background there is also the consideration that I'm hoping that in 4.0 we will get to nullable-by-default, in which case the reindex codepath will not do any casting, so the issue will fix itself for those cases. Plausibly we could refactor to always go down that path and get consistency that way.

ATM I'm inclined to say that the test referenced in point 2) is wrong/undesired, and the setitem-with-expansion cases should behave like the non-expansion cases as much as possible (i.e. it will cast round floats to ints).

I'm taking a look at what it would take to patch the reindexing codepath with the same _cast_pointwise_result logic as the concat codepath, but i think its a bit more involved. Assuming that isn't feasible, should we try to fix the concat codepaths alone?

Update Two more differences in the reindex path are that a) we get the PDEP-6 disallowing of extra casting behavior and b) we don't get the special-casing of empty cases.