-
[x] I have checked that this issue has not already been reported.
-
[x] I have confirmed this bug exists on the latest version of pandas.
-
[ ] (optional) I have confirmed this bug exists on the master branch of pandas.
Code Sample, a copy-pastable example
import pandas as pd
import numpy as np
i = pd.Index(["AbC", "de", "FGHI", "j", "kLm"])
other = (
pd.Series(["f", "g", "h", "i", "j"]),
np.array(["f", "a", "b", "f", "a"]),
pd.Series(["f", "g", "h", "i", "j"]),
np.array(["f", "a", "b", "f", "a"]),
np.array(["f", "a", "b", "f", "a"]),
pd.Index(["1", "2", "3", "4", "5"]),
np.array(["f", "a", "b", "f", "a"]),
pd.Index(["f", "g", "h", "i", "j"]),
)
print(i.str.cat(other))
x = pd.Series(["f", "g", "h", "i", "j"])
print(i.str.cat(x))
0.25.3 output:
Index(['AbCfffff1ff', 'degagaa2ag', 'FGHIhbhbb3bh', 'jififf4fi',
'kLmjajaa5aj'],
dtype='object')
Index(['AbCf', 'deg', 'FGHIh', 'ji', 'kLmj'], dtype='object')
1.1.0 output:
Index([nan, nan, nan, nan, nan], dtype='object')
Index([nan, nan, nan, nan, nan], dtype='object')
Problem description
The output of 0.25.3 version seems to be correct as when we change i
to be a Series
the incorrect results go away:
import pandas as pd
import numpy as np
i = pd.Series(["AbC", "de", "FGHI", "j", "kLm"])
other = (
pd.Series(["f", "g", "h", "i", "j"]),
np.array(["f", "a", "b", "f", "a"]),
pd.Series(["f", "g", "h", "i", "j"]),
np.array(["f", "a", "b", "f", "a"]),
np.array(["f", "a", "b", "f", "a"]),
pd.Index(["1", "2", "3", "4", "5"]),
np.array(["f", "a", "b", "f", "a"]),
pd.Index(["f", "g", "h", "i", "j"]),
)
print(i.str.cat(other))
x = pd.Series(["f", "g", "h", "i", "j"])
print(i.str.cat(x))
0 AbCfffff1ff
1 degagaa2ag
2 FGHIhbhbb3bh
3 jififf4fi
4 kLmjajaa5aj
dtype: object
0 AbCf
1 deg
2 FGHIh
3 ji
4 kLmj
dtype: object
Expected Output
The output should be similar to Series
behaviour/ 0.25.3 behaviour.
Output of pd.show_versions()
Comment From: dsaxton
I think this is expected as per the warning from 0.25.3 (that the operation is first going to perform index alignment in a future version):
In [1]: import pandas as pd
In [2]: idx = pd.Index(["a", "b", "c", "d", "e"])
In [3]: ser = pd.Series(["f", "g", "h", "i", "j"])
In [4]: print(idx.str.cat(ser))
<ipython-input-4-157619096d44>:1: FutureWarning: A future version of pandas will perform index alignment when `others` is a Series/Index/DataFrame (or a list-like containing one). To disable alignment (the behavior before v.0.23) and silence this warning, use `.values` on any Series/Index/DataFrame in `others`. To enable alignment and silence this warning, pass `join='left'|'outer'|'inner'|'right'`. The future default will be `join='left'`.
print(idx.str.cat(ser))
/Users/danielsaxton/opt/miniconda3/envs/pandas-0.25.3/lib/python3.8/site-packages/numpy/core/fromnumeric.py:87: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
Index(['af', 'bg', 'ch', 'di', 'ej'], dtype='object')
In [5]: print(pd.__version__)
0.25.3
Following the suggestion from the warning and casting to numpy first we get the desired output:
In [1]: import pandas as pd
In [2]: idx = pd.Index(["a", "b", "c", "d", "e"])
In [3]: ser = pd.Series(["f", "g", "h", "i", "j"])
In [4]: print(idx.str.cat(ser))
Index([nan, nan, nan, nan, nan], dtype='object')
In [5]: print(pd.__version__)
1.2.0.dev0+29.ga4203cf8d
In [6]: idx.str.cat(ser.to_numpy())
Out[6]: Index(['af', 'bg', 'ch', 'di', 'ej'], dtype='object')
After taking a look I'm not sure if this is explicit enough in the documentation.
Comment From: simonjayhawkins
Thanks @galipremsagar for the report and thanks @dsaxton for the answer.
After taking a look I'm not sure if this is explicit enough in the documentation.
would take documentation improvements so will leave this issue open
Comment From: souris-dev
Hello, I would like to work on this issue.
Comment From: simonjayhawkins
Thanks @souris-dev. If you need any help, just ask.
Comment From: souris-dev
take
Comment From: Praveenk8051
Is this solved already ? Looks like #35784 is not closed yet? Can i take ? I'm first time contributor
Comment From: simonjayhawkins
@Praveenk8051 There is an open PR addressing this https://github.com/pandas-dev/pandas/pull/35784#issuecomment-720567827
Comment From: kerrf
take