Feature Type
-
[x] Adding new functionality to pandas
-
[x] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
Please regard
Took some free time this month for a run of the mill architecture assessment over pandas Python sources, this is delivery. For health related reasons, I use LLMs redactions to prevent severe prolix on public comms. I'll mark personal remarks in italics. Engage at will, worry free.
Summary
This issue proposes an enhancement to the internal architecture of accessors in pandas, aiming to support persistent, interoperable accessors that can coexist with and respond to their host NDFrame
or Index
objects in a lifecycle-aware and low-copy fashion.
Motivation
Accessors in pandas today (e.g., .str
, .dt
, .cat
, .attrs
) are ephemeral, created on-demand for every access. While this is lightweight and efficient for simple use cases, it limits their utility in richer modeling contexts, especially for:
- Persisting contextual or intermediate state;
- Synchronizing behavior after host transformations (e.g.,
.copy()
, slicing, chaining); - Acting as first-class modeling layers inside pandas pipelines.
These limitations are especially relevant in data engineering and ML pipelines, where pandas often loses its centrality once feature computation becomes non-trivial — forcing users to fall back to NumPy, custom classes, or external orchestration layers.
_Also beware of safe excess over implementaion of promissing concepts around the topic. Obstructive inclinations drives punctual desertion, often followed by antithetical responses, the well known crux of enhencement frameworks design. Structure that is expressive and interoperable, supports continuity, comprehension, and reuse, granting high reach under few familiar instrument sets should principle approach.__
Feature Description
Proposal
Introduce a model for persistent accessors, which:
- Are instantiated once per host instance and survive as long as the host exists; - Lifetime enroll either to the high level Pandas Object, the underlying data suplly, or be a composite of both legs
- Can respond to critical transformations on the host (e.g., copy, assignment, indexing);
- Are managed via hooks, weak references, or catalog mechanisms;
- Remain interoperable with existing pandas workflows;
- Allow extension via a structured API for third parties.
Advise tripartit roadmap, issuing persistence and minimal lifetime only hooks first, follows immersion for carefull goldilock hooks placement for full managed event propagation design. Finally, design and edify provision infrastructure, availing accessors with outlined read access to source data entities.
This does not require a full architectural rewrite, and can initially be scoped to enable a formal accessor lifecycle with opt-in semantics and soft hooks.
Alternative Solutions
Proof of Concept (POC)
I'm currently prototyping an accessor extension under a sliding window use case. The idea is to offer vertical rolling windows of fixed size, supporting:
- Multiple successive rolls along axis 0 (depth stacking);
- Minimal memory footprint (zero-copy across views);
- "Scoped local views" for stateless metric computation, visual flattening, and full-dimensional window analysis;
- A lightweight dispatcher for
apply
-like routines, with hyperparameter support and batch-level threading (GIL-free environments only).
Key implementation details include:
- Accessors are indexed by a catalog based on
weakref.WeakKeyDictionary
, usingattrs
as the sole strong reference; - A root hook object (UUID-backed immutable
set
subclass) is stored in.attrs
, which carries the accessor identity; - Copy operations trigger deep hooks via custom
__deepcopy__
, enabling propagation of the accessor along with host duplication; - When the host NDFrame is garbage collected, the accessor is finalized via
weakref.finalize
; - NDArray-level hooks are being tested to track mathematical and logical transformations on the underlying data;
- Implementation is being tested in a dev environment with no-GIL support, built using scientific nightly wheels via Mamba.
This approach is entirely backward-compatible and demonstrates how a structured accessor lifecycle could empower pandas to participate in more advanced modeling scenarios — without becoming a full modeling framework itself.
Additional Context
Use Case Implications
Persistent, interoperable accessors would allow:
- Declarative pipelines inside pandas (e.g.,
.fe
,.validate
,.track
, etc.); - Advanced feature engineering without leaving NDFrame;
- Safer data transformation propagation (accessor-aware copies and views);
- Reuse and encapsulation of logic across modeling contexts. _- Architectural equidistance with _
It could also help pandas reclaim some conceptual territory currently dominated by _ honestly interface-wise dead ringer doppelganger marvels__ (e.g., Polars, Modin, cuDF), by moving beyond isomorphic syntax and toward architectural expression.
Comment From: jbrockmendel
This looks like AI slop. Can you give me a human-only Tl;dr? An example of lack of persistence being a problem?
Comment From: dangreb
This looks like AI slop. Can you give me a human-only Tl;dr? An example of lack of persistence being a problem?
There's no context of problems, as far as i'm concerned. As is should meet intended requirements, as it does. I speak of new functionality motivated by hindered potential. My irelevant sensibilities questions instance spamming against OO syntax addiction, or perhaps another groundhog day easter egg? Merely aesthetics verbatim ofc, dont be mad pls.
Advice is for revision standing on more ambitious grounds. OP redaction punctuates my proposal, a solution for component accessors as first class citizens, federated to runtime conviviality and properly supplied with reading privileges over data primitives owned by the host. Granular design of eevent propagation mechanics as natural gauge for it's own availability, measured against criteria established as equaly granular handling interface entrypoints ABC. Max allowed propagation hooks per accessor for performance and narrower, specialized accessors.
Successful imlementation defeats impending incentives for transition into array only scopes for computation intersive operations. Now one can alternate operations aiming dataframes, ndarrays, arrows, etc under the same modular composition. Upgrades DataFrames to potential cornerstones for modular compositions, enrolling a variety of independent functionalities into the range of its popular instruments. Add to that some essentials, like on demand accessor local data rendering as dataframes, on demand consolitated output composer, and so forth.
Anyway i'll finish the demo i mentioned first, then i should act further on it, you be so kind!