Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import io

import pandas as pd

buf = io.StringIO("date,value\n2024-01-01 00:00:00,1\n2024-02-01 00:00:00,2")
df = pd.read_csv(buf, parse_dates=["date"])
df.set_index("date").loc["2024-01"] # works


buf = io.StringIO("date,value\n2024-01-01 00:00:00,1\n2024-02-01 00:00:00,2")
df = pd.read_csv(buf, parse_dates=["date"], dtype_backend="pyarrow", engine="pyarrow")
df.set_index("date").loc["2024-01"]  # KeyError


### Issue Description

The pyarrow timestamp type gets put into a generic `Index` when assigned via set_index, so the datetime overloads do not work correctly

### Expected Behavior

The pyarrow timestamp type should be wrapped by a DatetimeIndex

### Installed Versions

3.0.0.dev0+1696.gfae3e8034f'

**Comment From: WillAyd**

I think this is another one to keep track of for PDEP-13 https://github.com/pandas-dev/pandas/pull/58455

**Comment From: AbhishekChaudharii**

take

**Comment From: robert-schmidtke**

Hi, I see that #58455 was closed but this is still open. What's the status of this or are there any recommended workarounds?

**Comment From: show981111**

@AbhishekChaudharii @WillAyd 
Are you still working on this? If not I would love to take this issue. 


**Comment From: show981111**

take

**Comment From: show981111**

I tried the example and I documented some findings here. I just want to make sure the direction we are going towards are aligned. Let me know what you think. @WillAyd 

## The issue
When we call `set_index` on pyarrow timestamp type, it sets the regular `index` instead of `DatetimeIndex`. 
For example, if I do 

import io import pandas as pd

buf = io.StringIO("date,value\n2024-01-01 00:00:00,1\n2024-02-01 00:00:00,2") df = pd.read_csv(buf, parse_dates=["date"]) res = df.set_index("date") print(res.index) # prints DatetimeIndex(['2024-01-01', '2024-02-01'], dtype='datetime64[s]', name='date', freq=None)

However, if I use pyarrow,

buf = io.StringIO("date,value\n2024-01-01 00:00:00,1\n2024-02-01 00:00:00,2") df = pd.read_csv(buf, parse_dates=["date"], dtype_backend="pyarrow", engine="pyarrow") res = df.set_index("date") print(res.index) # prints Index([2024-01-01 00:00:00, 2024-02-01 00:00:00], dtype='timestamp[s][pyarrow]', name='date') `` Therefore, if I dolocwith2024-01, since it is a regular index, it doesn't perform the range search like it does forDatetimeIndex`.

Problem

  1. The first issue is that when constructing the index, it goes here https://github.com/pandas-dev/pandas/blob/e97a56e746f8cdeabf7e83ec83455cbf5386c909/pandas/core/indexes/base.py#L580 which end up here since the type of the array is ArrowExtensionArray (isinstance(dtype, ExtensionDtype) returns true), https://github.com/pandas-dev/pandas/blob/e97a56e746f8cdeabf7e83ec83455cbf5386c909/pandas/core/indexes/base.py#L609 Now the issue is that the above returns Index not the DatetimeIndex even though dtype is timestamp[s][pyarrow] Over here, I asuume the expectation is DatetimeIndex for arrow timestamp dtype? (correct me if I am wrong). (I'm still debugging this... I'll add more findings after I find the issue).

  2. I tried just returning DatetimeIndex at the above line, but it still doesn't solve the issue. It errors out here: https://github.com/pandas-dev/pandas/blob/e97a56e746f8cdeabf7e83ec83455cbf5386c909/pandas/core/indexes/base.py#L656 Since <class 'pandas.core.arrays.arrow.array.ArrowExtensionArray'> is not an instance of <class 'pandas.core.arrays.datetimes.DatetimeArray'>. For this issue, do we have to make another special class like ArrowDatetimeArray? I saw there is an ArrowStringArray. (PS: I even tried skipping this assert. set_index does work, but when it tries to print the result, it errors out saying 'ArrowExtensionArray' object has no attribute 'freq', which makes sense since ArrowExtensionArray doesn't implement DatetimeIndexOpsMixin)