Collecting TODOs for once we bump to cython3:
- [x] use conditional nogil e.g. here
- [x] change labels.shape[0] -> len(labels) e.g. groupbyL965
- [ ] update
searchsorted
usages to use fixed cython version - [ ] Move Timestamp methods to _Timestamp, assuming https://github.com/cython/cython/pull/3605 is merged
- [x] in
_libs.internals
usefrom cpython.pyport cimport PY_SSIZE_T_MAX
instead ofcdef extern ...
- [x] in
_libs.util
usefrom numpy cimport PyArray_CLEARFLAGS, NPY_ARRAY_C_CONTIGUOUS, NPY_ARRAY_F_CONTIGUOUS
instead ofcdef extern ...
- [x] in
_libs.parsers
usefrom cpython.unicode cimport PyUnicode_FromString
instead ofcdef extern ...
- [ ] in
tslibs.parsing
replacePyObject_Str
withstr
(xref https://github.com/cython/cython/pull/3478) - [ ]
from numpy cimport NPY_DATETIMEUNIT
- [ ] Audit usages of
arg : datetime
annotations in tslibs. Can interfere with non-nano usages ofTimestamp
(xref https://github.com/pandas-dev/pandas/pull/54169)
@WillAyd am i missing anything (const object[:]?)? If so, go ahead and edit the OP.
Comment From: rainwoodman
@jbrockmendelor @WillAyd Could any of you post an update of the current the status of the cython 3.0 support?
I'd like to work on this but would like to avoid running into duplicated work.
Also, which language level of cython are we currently on (2 or 3, or 3str)?
Comment From: jbrockmendel
Could any of you post an update of the current the status of the cython 3.0 support?
nothing really to do at the moment; we'll revisit once cython 3.0 is released
Also, which language level of cython are we currently on (2 or 3, or 3str)?
exclusively 3
Comment From: rainwoodman
Thanks for the heads up. So there will be no replication of work.
Just discovered an additional problem to the current list: in reduction.pyx, we are mutating the .data member of ndarray. This operation is no longer supported in cython 3.0 and also deprecated in numpy. Is there already a plan / work in progress for eliminating the mutation of ndarray.data?
Comment From: jbrockmendel
Is there already a plan / work in progress for eliminating the mutation of ndarray.data?
Yah that one is pretty ugly. Three approaches have been discussed:
1) find a supported numpy API to continue doing this mutation - I haven't looked for this and am not optimistic it exists 2) instead of mutating, just create new ndarrays - this will be less performant than the mutation, but not that bad since it is just slicing, and way less kludgy - but last time I tried this i got segfaults 3) rip out the relevant pieces of reduction.pyx and use the non-cython paths - we've recently significantly optimized the non-cython paths, so this isn't that bad; xref #40263
If you have another idea, or can make options 1 or 2 work, that'd be great.
Comment From: rainwoodman
Thanks. I have a PR along the lines of 1: The approach is more like find a supported Cython way to use the deprecated numpy API till we decide on 2 or 3.
After this PR there is a segment fault from an apparently infinite recursive looking up of a typeslot inside the tslib. I feel this is more likely a Cython bug than a pandas bug.
@jbrockmendel: any pointers as of where to look at on the pandas side?
@da-woods, @scoder: any pointers on the cython side?
Here is a piece of the stacktrace:
#13 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#14 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850,
right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#15 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#16 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850,
right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#17 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#18 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850,
right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#19 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#20 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850,
right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88043
#21 0x00007fffe7f4af97 in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset (left=0x7fffdf626850, right=0x7fffdf512360) at pandas/_libs/tslibs/offsets.c:88079
#22 0x00007fffe7f4adea in __pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot (type=0x7fffe7fcb580 <__pyx_type_6pandas_5_libs_6tslibs_7offsets_BusinessMixin>, left=0x7fffdf626850,
static CYTHON_INLINE PyObject *__pyx_nb_add_6pandas_5_libs_6tslibs_7offsets_BaseOffset_maybe_call_slot(PyTypeObject* type, PyObject *left, PyObject *right ) {
binaryfunc slot;
#if CYTHON_USE_TYPE_SLOTS || PY_MAJOR_VERSION < 3 || CYTHON_COMPILING_IN_PYPY
slot = type->tp_as_number ? type->tp_as_number->nb_add : NULL;
#else
slot = (binaryfunc) PyType_GetSlot(type, Py_nb_add);
#endif
return slot ? slot(left, right ) : __Pyx_NewRef(Py_NotImplemented);
}
Comment From: da-woods
@rainwoodman
Special methods for binary operators now follow Python semantics. Rather than e.g. a single
__add__
method for cdef classes, where "self" can be either the first or second argument, one can now define both__add__
and__radd__
as for standard Python classes. This behavior can be disabled with the c_api_binop_methods directive to return to the previous semantics in Cython code (available from Cython 0.29.20), or the reversed method (__radd__
) can be implemented in addition to an existing two-sided operator method (__add__
) to get a backwards compatible implementation. (Github issue :issue:2056
)
I don't quite see why it's looking forever now (it's possibly a bug...) but I think that's the relevant change and hopefully c_api_binop_methods
should disable it.
Not hugely confident of this diagnosis though...
Comment From: jbrockmendel
Special methods for binary operators now follow Python semantics
Neat! We'll be able to get rid of a few non-pythonic code paths in Timestamp, Timedelta, Period, NaT.
Comment From: rainwoodman
Adding c_api_binop_methods=True fixed all segfaults.
Comment From: da-woods
I believe it's a bug https://github.com/cython/cython/issues/4172. It sounds like you can work around it for now though
Comment From: jbrockmendel
@lithomas1 can you check off the relevant boxes in the OP and possibly close?
Comment From: jbrockmendel
@rhshadrach @lithomas1 is this closable?
Comment From: lithomas1
Haven't been up to date on this issue, did we put back #54482 changes?
Comment From: QuLogic
pyproject.toml
has Cython~=3.0.5
, and has since v2.2.0, so seems like this can be closed?
Comment From: rhshadrach
@QuLogic - there are unchecked items in the OP here. At a glance, it isn't clear to me what has been done and what remains. If we can confirm all tasks have been done, then yes, this can be closed.
Comment From: jbrockmendel
We’ve been on 3+ for a while now, closing.