Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
The current pandas ExtensionArray is a standard Python class, and throughout our code base we do things like isinstance(obj, ExtensionArray)
to determine at runtime if an object is an instance of the ExtensionArray.
While this works for classes implemented purely in Python that may inherit from a Python class, it does not work with extension classes that are implemented in either Cython, pybind11, nanobind, etc... See https://cython.readthedocs.io/en/latest/src/userguide/extension_types.html#subclassing for documentation of this limitation in Cython
As such, unless you implement your extension purely in Python it will not work correctly as an ExtensionArray
Feature Description
PEP 544 describes the runtime_checkable decorator that in theory can solve this issue without any major changes to our code base (ignoring any performance implications for now)
Alternative Solutions
Not sure there are any - I may be wrong but I do not think extension types in Python can inherit from Python types
Additional Context
No response
Comment From: jbrockmendel
Not sure there are any - I may be wrong but I do not think extension types in Python can inherit from Python types
Can't you
cdef class ActualImplementation:
[most of the implementation]
class MyEA(ActualImplementation, ExtensionArray):
pass
That's basically what we do with NDArrayBacked
.
Comment From: WillAyd
That works until you try to call an method of the extension class. MyEA().copy()
will return an instance of ActualImplementation
not of MyEA
Comment From: WillAyd
Ah I take that back - OK cool I'll have to look more into what Cython is doing to make that maintain the MyEA type. Was not getting this with nanobind so must be a Cython feature:
import numpy as np
from pandas.api.extensions import ExtensionArray
from pandas._libs.arrays import NDArrayBacked
class MyEA(NDArrayBacked, ExtensionArray):
...
arr = MyEA(np.arange(3), np.int64)
assert type(arr) == type(arr.copy())
Comment From: jbrockmendel
Yah I’m implicitly assuming the implementation returns type(self) instead of hard-coding the class there. That seems pretty harmless to me.
On Mon, Feb 26, 2024 at 1:20 PM William Ayd @.***> wrote:
That works until you try to call an method of the extension class. MyEA().copy() will return an instance of ActualImplementation not of MyEA
— Reply to this email directly, view it on GitHub https://github.com/pandas-dev/pandas/issues/57633#issuecomment-1965310556, or unsubscribe https://github.com/notifications/unsubscribe-auth/AB5UM6CFMVNUFOA6VFEDQF3YVT4BNAVCNFSM6AAAAABD2Z4EU2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRVGMYTANJVGY . You are receiving this because you commented.Message ID: @.***>
Comment From: twoertwein
PEP 544 describes the runtime_checkable decorator that in theory can solve this issue without any major changes to our code base (ignoring any performance implications for now)
I think isinstance
checks on a protocol are more expensive than on concrete classes: comparing all symbols (protocol) vs just checking __mro__
(concrete class)
Comment From: WillAyd
Yea there is going to be some performance overhead, I think especially before Python 3.12. How much that matters I don't know - I am under the impression we aren't doing these checks in a tight loop but if you have ideas on what to benchmark happy to profile
Comment From: jbrockmendel
We aren’t doing the checks in a tight loop, but we are doing them everywhere.