This came up when reviewing #62118. There are too many StringDtypes and FooStringArrays. Apparently pd.ArrowDtype can accommodate some specific types of pyarrow strings that StringDtype(storage="pyarrow") cannot. I propose that we
1) Extend pd.StringDtype (and ArrowStringArray) to allow it to support the specific variants of pyarrow strings we want to support 2) Deprecate support for those in ArrowDtype/ArrowEA, moving users to the StringArray. 3) Try to refactor all the FooStringArray variants down to just one StringArray.
Comment From: jbrockmendel
Getting rid of the NumpySemantics classes is straightforward. Combining the ArrowStrimgArray into StringArray would take real effort/thought
Comment From: mroeschke
Would e.g. Series[datetime[pyarrow]].dt.strftime start to return StringDtype instead of ArrowDtype?
I would prefer to do this once we have "the logical type system" defined for all our types.
Comment From: jbrockmendel
Would e.g. Series[datetime[pyarrow]].dt.strftime start to return StringDtype instead of ArrowDtype?
Yes, it would return a StringDtype that behaves semantically just like the ArrowDtype does now. Or are there other differences I'm missing?
Comment From: mroeschke
Or are there other differences I'm missing?
I'm not sure if currently e.g. Series.str.count would return int64[pyarrow] or Int64 for StringDtype. I believe at least ArrowDtype consistently returns another arrow backed dtype (unless the EA definition says to return a numpy backed type).
Comment From: jbrockmendel
Fair point. Short term on board for getting rid of the NumpySemantics classes?
Comment From: mroeschke
Sure, but I'm not familiar with what the ArrowStringArrayNumpySemantics was for. AFAICT it's just for using a different _na_value which should be available on the dtype?
Comment From: jbrockmendel
AFAICT it's just for using a different _na_value which should be available on the dtype?
Yes. Basically the change would be to add dtype to __init__ so type(self)(values) becomes type(self)(values, dtype=self.dtype) in a bunch of places.