If you combine two Index objects using append
, if they are both object dtype to start with, it seems we still infer the dtype for the result instead of preserving object dtype:
idx1 = pd.Index(["b", "a"], dtype=object)
idx2 = pd.Index(["c", "d"], dtype=object)
>>> idx1.append(idx2)
Index(['b', 'a', 'c', 'd'], dtype='object')
>>> pd.options.future.infer_string = True
>>> idx1.append(idx2)
Index(['b', 'a', 'c', 'd'], dtype='str') # <-- upcast to string
I would expect that to preserve the dtype of the calling / passed Index (or at least its "common" dtype, which in this case clearly is object dtype)
Comment From: rhshadrach
Hopefully a rare case, but what about the behavior of:
idx1 = pd.Index([1, 2], dtype=object)
idx2 = pd.Index([3, 4])
print(idx1.append(idx2))
I would think this should also be object, as otherwise we get values-dependent behavior.
Comment From: Swati-Sneha
take
Comment From: jorisvandenbossche
I would think this should also be object, as otherwise we get values-dependent behavior.
Indeed, I would also expect object dtype in that case
Comment From: jbrockmendel
+1 for getting rid of dtype inference. want to Just Do It for 3.0?
Comment From: jbrockmendel
Trying this out:
============================================================================================== short test summary info ==============================================================================================
FAILED pandas/tests/frame/methods/test_combine_first.py::TestDataFrameCombineFirst::test_combine_first - AssertionError: DataFrame.columns are different
FAILED pandas/tests/frame/methods/test_info.py::test_memory_usage_empty_no_warning - AssertionError: Series.index are different
FAILED pandas/tests/indexes/interval/test_interval.py::TestIntervalIndex::test_insert[data2] - AssertionError: Index are different
FAILED pandas/tests/indexes/interval/test_interval.py::TestIntervalIndex::test_insert[data3] - AssertionError: Index are different
FAILED pandas/tests/indexes/test_setops.py::TestSetOps::test_symmetric_difference[object]
FAILED pandas/tests/indexing/test_categorical.py::TestCategoricalIndex::test_loc_scalar - AssertionError: DataFrame.index are different
FAILED pandas/tests/reshape/concat/test_categorical.py::TestCategoricalConcat::test_categorical_index_preserver - AssertionError: DataFrame.index are different
FAILED pandas/tests/reshape/concat/test_empty.py::TestEmptyConcat::test_concat_inner_join_empty - AssertionError: DataFrame.columns are different
FAILED pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_on_index_with_more_values[index14-expected_index14-right] - AssertionError: DataFrame.index are different
FAILED pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_on_index_with_more_values[index14-expected_index14-outer] - AssertionError: DataFrame.index are different
FAILED pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_on_index_with_more_values[index15-expected_index15-right] - AssertionError: DataFrame.index are different
FAILED pandas/tests/reshape/merge/test_merge.py::TestMerge::test_merge_on_index_with_more_values[index15-expected_index15-outer] - AssertionError: DataFrame.index are different
FAILED pandas/tests/reshape/test_union_categoricals.py::TestUnionCategoricals::test_union_categoricals_nan - AssertionError: Categorical.categories are different
The union_categorical tests look easy to update. The merge ones look like there might be something real going on there (i suspect an empty object dtype Index is being created at some point in the process). Not sure about the others.
Comment From: jorisvandenbossche
We will probably have to ignore empty object dtype Indexes for determining the resulting dtype, like I did for Index.union et al.