This came up when reviewing #62118. There are too many StringDtypes and FooStringArrays. Apparently pd.ArrowDtype can accommodate some specific types of pyarrow strings that StringDtype(storage="pyarrow") cannot. I propose that we

1) Extend pd.StringDtype (and ArrowStringArray) to allow it to support the specific variants of pyarrow strings we want to support 2) Deprecate support for those in ArrowDtype/ArrowEA, moving users to the StringArray. 3) Try to refactor all the FooStringArray variants down to just one StringArray.

Comment From: jbrockmendel

Getting rid of the NumpySemantics classes is straightforward. Combining the ArrowStrimgArray into StringArray would take real effort/thought

Comment From: mroeschke

Would e.g. Series[datetime[pyarrow]].dt.strftime start to return StringDtype instead of ArrowDtype?

I would prefer to do this once we have "the logical type system" defined for all our types.

Comment From: jbrockmendel

Would e.g. Series[datetime[pyarrow]].dt.strftime start to return StringDtype instead of ArrowDtype?

Yes, it would return a StringDtype that behaves semantically just like the ArrowDtype does now. Or are there other differences I'm missing?

Comment From: mroeschke

Or are there other differences I'm missing?

I'm not sure if currently e.g. Series.str.count would return int64[pyarrow] or Int64 for StringDtype. I believe at least ArrowDtype consistently returns another arrow backed dtype (unless the EA definition says to return a numpy backed type).

Comment From: jbrockmendel

Fair point. Short term on board for getting rid of the NumpySemantics classes?

Comment From: mroeschke

Sure, but I'm not familiar with what the ArrowStringArrayNumpySemantics was for. AFAICT it's just for using a different _na_value which should be available on the dtype?

Comment From: jbrockmendel

AFAICT it's just for using a different _na_value which should be available on the dtype?

Yes. Basically the change would be to add dtype to __init__ so type(self)(values) becomes type(self)(values, dtype=self.dtype) in a bunch of places.