Feature Type
-
[X] Adding new functionality to pandas
-
[ ] Changing existing functionality in pandas
-
[ ] Removing existing functionality in pandas
Problem Description
Reviewing Arrow docs link from @WillAyd, spotted this
https://arrow.apache.org/docs/format/CanonicalExtensions.html#variable-shape-tensor
Tensor is exactly what I'm talking about in Additional Context [1] and would enable Pandas users to have a column datatype for big blocks of some underlying type
Feature Description
Support Arrow Tensor in Pandas
Python https://arrow.apache.org/docs/python/generated/pyarrow.Tensor.html#pyarrow.Tensor
Rust https://github.com/apache/arrow-rs/blob/3715d5447e468a5a4dc631ae9aafec706c57aa20/arrow/src/tensor.rs#L115
Alternative Solutions
just make everything an "object":
>>> import numpy as np
>>> import pandas as pd
>>> x = {'hello': 'world'}
>>> y = np.ones(3)
>>> df = pd.DataFrame({'X': [x], 'Y': [y]})
>>> df
X Y
0 {'hello': 'world'} [1.0, 1.0, 1.0]
>>> df.dtypes
X object
Y object
dtype: object
Additional Context
[1] https://github.com/pandas-dev/pandas/pull/58455#issuecomment-2161603939 onward
Comment From: mroeschke
cc @jbrockmendel if pyarrow plans to support it's compute functions for pyarrow.Tensors, this may be the appropriate 2D EA block backing for ArrowExtensionArray instead of pyarrow.Table
Comment From: WillAyd
I think the nullability bitmap for the extension array only applies to the entire datum itself, not to individual records within each struct
Comment From: jbrockmendel
I don’t think this belongs in pandas.