• [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] (optional) I have confirmed this bug exists on the master branch of pandas.


Code Sample, a copy-pastable example

The sample at https://github.com/pandas-dev/pandas/blob/6210077d32a9e9675526ea896e6d1f9189629d4a/pandas/core/arrays/base.py#L1031-L1047 is buggy since it doesn't handle scalars properly

class MyDtype(ExtensionDtype):
    name = "name"

class MyArray(ExtensionArray):
    dtype = MyDtype()
    def __init__(self, data):
        self._data = data

    @classmethod
    def _from_sequence(cls, scalars, *, dtype=None, copy=False):
        return cls(np.array(scalars))

    def __getitem__(self, item):
        return self._data[item]

    def __len__(self):
        return len(self._data)
    def take(self, indices, allow_fill=False, fill_value=None):
        from pandas.core.algorithms import take
        # If the ExtensionArray is backed by an ndarray, then
        # just pass that here instead of coercing to object.
        data = self.astype(object)
        if allow_fill and fill_value is None:
            fill_value = self.dtype.na_value
        # fill value should always be translated from the scalar
        # type for the array, to the physical storage type for
        # the data, before passing to take.
        result = take(data, indices, fill_value=fill_value,
                      allow_fill=allow_fill)
        return self._from_sequence(result, dtype=self.dtype)

a = MyArray._from_sequence([1, 2, 3])
result = a.take(0)
assert result == 1

Problem description

Expected Output

.take(0) should return the scalar 1, rather than trying to wrap it in a new MyArray.

Output of pd.show_versions()

[paste the output of ``pd.show_versions()`` here leaving a blank line after the details tag]

Comment From: TomAugspurger

Hmm I may have been mistaken. I confused myself since my extension array isn't naturally a "scalar", and is (somewhere) being converted back to a length-2 ndarray.

Comment From: jorisvandenbossche

My expectation was that take requires always a list-like of indices, and shouldn't work for scalar indices (so that the return value is always a new EA of the same type). That seems to be confirmed by the docstring, but then the example implementation should maybe check for that?

Comment From: jorisvandenbossche

On the other hand, numpy's take also works with scalar indices (although also documented to accept a array like of indices)

And for example IntegerArray also doesn't give a proper error message for it:

In [19]: arr = pd.array([1, 2, 3])

In [20]: arr.take(0)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-20-6b42d04f0add> in <module>
----> 1 arr.take(0)

~/scipy/pandas/pandas/core/arrays/masked.py in take(self, indexer, allow_fill, fill_value)
    324             mask = mask ^ fill_mask
    325 
--> 326         return type(self)(result, mask, copy=False)
    327 
    328     def copy(self: BaseMaskedArrayT) -> BaseMaskedArrayT:

~/scipy/pandas/pandas/core/arrays/integer.py in __init__(self, values, mask, copy)
    289         if not (isinstance(values, np.ndarray) and values.dtype.kind in ["i", "u"]):
    290             raise TypeError(
--> 291                 "values should be integer numpy array. Use "
    292                 "the 'pd.array' function instead"
    293             )

TypeError: values should be integer numpy array. Use the 'pd.array' function instead

So at least we should decide on the exact spec, and then update implementation or error checking / base tests / docs based on that.

Comment From: TomAugspurger

NumPy accepting scalars is somewhat new:

indices : array_like (Nj...)
    The indices of the values to extract.

    .. versionadded:: 1.8.0

    Also allow scalars for indices.

I don't have a strong preference, but it's probably best match NumPy if it doesn't introduce any issues.

Comment From: jbrockmendel

1) no reason to support scalars as they should never be passed 2) no reason to add a check since its just perf overhead that should never be needed 3) no harm in adding a note in the docs