Pandas Change default string storage from "python" to "pyarrow" (if installed) for for NA-variant of StringDtype

Historically, the default value for the string storage (globally configurable through pd.options.mode.string_storage) of StringDtype was "python", and users needed to explicitly ask for "pyarrow". For example:

>>> ser = pd.Series(["a", "b"], dtype="string")
>>>  ser.dtype
string[python]

and this is still the behaviour on main.

For the new NaN-variant of StringDtype, however, we implemented the default string storage option "auto" meaning "use pyarrow if installed, otherwise use python". So on a system with pyarrow installed:

>>> pd.options.future.infer_string = True
>>> ser = pd.Series(["a", "b"], dtype="str")
>>> ser.dtype.storage
'pyarrow'

Essentially we interpret the default string_storage option setting of "auto" differently for the NaN vs NA variant of the string dtype, which you can see in the code here:

https://github.com/pandas-dev/pandas/blob/5f23aced2f97f2ed481deda4eaeeb049d6c7debe/pandas/core/arrays/string_.py#L152-L163

Proposal: I think it makes sense to also switch to "pyarrow" as the default string storage (if installed) for the nullable StringDtype. This is somewhat a breaking change (although mostly for the dtype object itself, because behaviour-wise for string operations, there should be hardly any difference between both backends), so I would keep this for 3.0 and properly document it in the whatsnew notes.

Comment From: jbrockmendel

+1, surprised we haven’t already done this

Comment From: jorisvandenbossche

-> https://github.com/pandas-dev/pandas/pull/62118