Pandas version checks
-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[x] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import re
DATA = ["applep", "bananap", "Cherryp", "DATEp", "eGGpLANTp", "123p", "23.45p"]
s=pd.Series(DATA)
s.str.fullmatch(re.compile(r"applep"))
s.str.match(re.compile(r"applep"))
sa=pd.Series(DATA, dtype="string[pyarrow]")
sa.str.fullmatch(re.compile(r"applep"))
sa.str.match(re.compile(r"applep"))
Issue Description
with pyarrow strings, the last line fails with:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Condadirs\envs\pandasstubs311\Lib\site-packages\pandas\core\strings\accessor.py", line 140, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Condadirs\envs\pandasstubs311\Lib\site-packages\pandas\core\strings\accessor.py", line 1429, in fullmatch
result = self._data.array._str_fullmatch(pat, case=case, flags=flags, na=na)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Condadirs\envs\pandasstubs311\Lib\site-packages\pandas\core\arrays\_arrow_string_mixins.py", line 320, in _str_fullmatch
if not pat.endswith("$") or pat.endswith("\\$"):
^^^^^^^^^^^^
AttributeError: 're.Pattern' object has no attribute 'endswith'
>>> sa.str.match(re.compile(r"applep"))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Condadirs\envs\pandasstubs311\Lib\site-packages\pandas\core\strings\accessor.py", line 140, in wrapper
return func(self, *args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Condadirs\envs\pandasstubs311\Lib\site-packages\pandas\core\strings\accessor.py", line 1388, in match
result = self._data.array._str_match(pat, case=case, flags=flags, na=na)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Condadirs\envs\pandasstubs311\Lib\site-packages\pandas\core\arrays\_arrow_string_mixins.py", line 309, in _str_match
if not pat.startswith("^"):
^^^^^^^^^^^^^^
AttributeError: 're.Pattern' object has no attribute 'startswith'
Expected Behavior
No exception
Installed Versions
Comment From: khemkaran10
take
Comment From: jorisvandenbossche
FWIW, we don't actually document (or test, I think) that this is supported. But because pat
is passed to re.compile(..)
in
https://github.com/pandas-dev/pandas/blob/e4a03b6e47a8ef9cd045902916289cbc976d3d33/pandas/core/strings/object_array.py#L249-L259
this works (since re.compile
accepts that).
Given this works currently, I think it is certainly a good idea to keep this working for the string dtype as well. But then probably also should update the typing and docs.