Pandas BUG: dtype keyword in Series constructor ignored for Series input in certain cases

Pandas version checks

[X] I have checked that this issue has not already been reported.
[X] I have confirmed this bug exists on the latest version of pandas.
[X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd

try:
    s = pd.Series([1, 2, 'x', 4, 5], dtype=int)
    print("1. No exception")
except Exception as ex:
    print("1. Exception thrown", ex)

s = pd.Series([1, 2, 'x', 4, 5])
try:
    s2 = pd.Series(s, dtype=int)
    print("2. No exception")
except Exception as ex:
    print("2. Exception thrown", ex)

print("s2 dtype", s2.dtype)

Issue Description

The behavior of the above code doesn't seem right. If I try to create a Series with dtype=int using a list that contains a string, it throws a ValueError, as expected.

But if the input to the Series constructor is another Series that contains a string, there is no exception and it quietly returns a new Series with dtype=object.

Is this expected? If so it's very counterintuitive. Perhaps the recommended way is s.astype(int) but it seems like this should produce the same behavior

Expected Behavior

In both of the cases above, it should throw a ValueError. Instead, only the first case does:

1. Exception thrown invalid literal for int() with base 10: 'x'
1. No exception
s2 dtype object

Installed Versions

INSTALLED VERSIONS ------------------ commit : d9cdd2ee5a58015ef6f4d15c7226110c9aab8140 python : 3.11.6.final.0 python-bits : 64 OS : Darwin OS-release : 23.1.0 Version : Darwin Kernel Version 23.1.0: Mon Oct 9 21:27:24 PDT 2023; root:xnu-10002.41.9~6/RELEASE_ARM64_T6000 machine : arm64 processor : arm byteorder : little LC_ALL : None LANG : en_US.UTF-8 LOCALE : en_US.UTF-8 pandas : 2.2.2 numpy : 1.23.4 pytz : 2022.5 dateutil : 2.8.2 setuptools : 65.5.0 pip : 24.0 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : None html5lib : None pymysql : None psycopg2 : None jinja2 : 3.1.2 IPython : 8.12.0 pandas_datareader : None adbc-driver-postgresql: None adbc-driver-sqlite : None bs4 : 4.12.2 bottleneck : None dataframe-api-compat : None fastparquet : None fsspec : None gcsfs : None matplotlib : None numba : None numexpr : None odfpy : None openpyxl : 3.1.2 pandas_gbq : None pyarrow : None pyreadstat : None python-calamine : None pyxlsb : None s3fs : None scipy : 1.13.1 sqlalchemy : None tables : None tabulate : None xarray : None xlrd : None zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

Comment From: chaoyihu

@jonmooser Thanks for raising this issue! If I understand it correctly, you are trying to change the dtype of a Series that is already initialized.

I agree it is counter-intuitive that s2 = pd.Series(s, dtype=int) ignores the customized dtype silently without raising.

However, I don't think the constructor should be used for casting. As you have mentioned, a better way of doing this might be Series.astype, in which case a ValueError is raised as expected:

>>> s = pd.Series([1, 2, 'x', 4, 5])  # dtype of s: object
>>> s3 = s.astype('int64')
ValueError: invalid literal for int() with base 10: 'x'

Comment From: jonmooser

Thanks for looking into this. Glad there is a work around but it would be great if future versions are more consistent and intuitive.

Comment From: taranarmo

The constructor doesn't use dtype keyword if Series passed in, source code link

        elif isinstance(data, Series):
            if index is None:
                index = data.index
                data = data._mgr.copy(deep=False)
            else:
                data = data.reindex(index)
                copy = False
                data = data._mgr

I guess warning might be raised if data is Series and dtype is not None. If so I can make a PR. Personally I'd add cast data to the referenced dtype to make the code fail explicitly but I assume that's breaking change and could cause problems to (some) users.

Comment From: jorisvandenbossche

The constructor doesn't use dtype keyword if Series passed in, source code link

The dtype is still used later on in that case:

https://github.com/pandas-dev/pandas/blob/39bd3d38ac97177c22e68a9259bf4f09f7315277/pandas/core/series.py#L502

So while the dtype argument indeed seems to be ignored in certain cases as the example in the top post show, it is not ignored always. Quick example:

>>> ser_float = pd.Series([1.0], dtype=float)
>>> ser_float
0    1.0
dtype: float64
>>> pd.Series(ser_float, dtype="int64")
0    1
dtype: int64

I agree it is counter-intuitive that s2 = pd.Series(s, dtype=int) ignores the customized dtype silently without raising.

I fully agree this is counter-intuitive, and as shown above also inconsistent. So I think we should rather try to fix this. When passing a dtype argument, some level of casting being performed in the constructor is unavoidable I think (although I agree that for the explicit action of casting a Series to a different dtype, using ser.astype(..) is the better and clearer option).

Comment From: taranarmo

True, I missed that use of dtype further. Though inside the sanitize_array if data is Series then data is cast into numpy array (since hasattr(Series, "__array__") is true) and then sanitize_array is being called again, then it goes into _try_cast, link.

    if dtype == object:
        if not is_ndarray:
            subarr = construct_1d_object_array_from_listlike(arr)
            return subarr
        return ensure_wrapped_if_datetimelike(arr).astype(dtype, copy=copy)

I agree that this should be considered a bug as current behavior is more like "dtype might be ignored". My last change should be reverted I suppose.

Comment From: taranarmo

Maybe we should make dtype being ignored if data is Series/DataFrame. Simple to fix and will make users to avoid creating Series from Series with casting to different dtype.

Comment From: jbrockmendel

I'm seeing both (correctly) raise on main and 2.3, but the OP behavior in 2.2. @jonmooser can you confirm

Comment From: jonmooser

Hmm... I just upgraded to pandas 2.3.0 (and numpy 2.3.1) and still see no exception in the second case. Any idea why we might be seeing different behavior from the same version