Feature Type

  • [X] Adding new functionality to pandas

  • [ ] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

The current pandas ExtensionArray is a standard Python class, and throughout our code base we do things like isinstance(obj, ExtensionArray) to determine at runtime if an object is an instance of the ExtensionArray.

While this works for classes implemented purely in Python that may inherit from a Python class, it does not work with extension classes that are implemented in either Cython, pybind11, nanobind, etc... See https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#subclassing for documentation of this limitation in Cython

As such, unless you implement your extension purely in Python it will not work correctly as an ExtensionArray

Feature Description

PEP 544 describes the runtime_checkable decorator that in theory can solve this issue without any major changes to our code base (ignoring any performance implications for now)

Alternative Solutions

Not sure there are any - I may be wrong but I do not think extension types in Python can inherit from Python types

Additional Context

No response

Comment From: jbrockmendel

Not sure there are any - I may be wrong but I do not think extension types in Python can inherit from Python types

Can't you

cdef class ActualImplementation:
    [most of the implementation]

class MyEA(ActualImplementation, ExtensionArray):
    pass

That's basically what we do with NDArrayBacked.

Comment From: WillAyd

That works until you try to call an method of the extension class. MyEA().copy() will return an instance of ActualImplementation not of MyEA

Comment From: WillAyd

Ah I take that back - OK cool I'll have to look more into what Cython is doing to make that maintain the MyEA type. Was not getting this with nanobind so must be a Cython feature:

import numpy as np

from pandas.api.extensions import ExtensionArray
from pandas._libs.arrays import NDArrayBacked


class MyEA(NDArrayBacked, ExtensionArray):
    ...

arr = MyEA(np.arange(3), np.int64)
assert type(arr) == type(arr.copy())

Comment From: jbrockmendel

Yah I’m implicitly assuming the implementation returns type(self) instead of hard-coding the class there. That seems pretty harmless to me.

On Mon, Feb 26, 2024 at 1:20 PM William Ayd @.***> wrote:

That works until you try to call an method of the extension class. MyEA().copy() will return an instance of ActualImplementation not of MyEA

— Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/57633#issuecomment-1965310556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5UM6CFMVNUFOA6VFEDQF3YVT4BNAVCNFSM6AAAAABD2Z4EU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRVGMYTANJVGY . You are receiving this because you commented.Message ID: @.***>

Comment From: twoertwein

PEP 544 describes the runtime_checkable decorator that in theory can solve this issue without any major changes to our code base (ignoring any performance implications for now)

I think isinstance checks on a protocol are more expensive than on concrete classes: comparing all symbols (protocol) vs just checking __mro__ (concrete class)

Comment From: WillAyd

Yea there is going to be some performance overhead, I think especially before Python 3.12. How much that matters I don't know - I am under the impression we aren't doing these checks in a tight loop but if you have ideas on what to benchmark happy to profile

Comment From: jbrockmendel

We aren’t doing the checks in a tight loop, but we are doing them everywhere.