Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

# Example code to reproduce the issue
import pandas as pd

# Imagine this CSV content represents a large file with similar structure
csv_content = """
column1
0
1
2
3
...
'abcd'  # A string value inserted here
...
999996
999997
999998
999999
"""

Issue Description

I have encountered an issue where Pandas seems to infer different data types for the same numeric values based on the presence of a single string value in a large CSV file (about 100w rows). The columns with numeric values are being inferred as object dtype if a single string is inserted somewhere in the column. This affects not only the row with the string but also a significant number of rows around it, the dtype of the 20w(it's different every time) rows before and after have been converted to object, leading to inconsistent dtype inference.The numerical values in different rows may have different types.

And read_csv with low_memory=False, the result is normal, the numerical values dtypes will be consistent.

Expected Behavior

raw = pd.read_csv('test.csv')
raw['type'] = raw['key'].map(lambda x: str(type(x)))
print(raw[raw.key=='abcd'])
          key              type
599999  abcd  <class 'str'>

print(raw.type.value_counts())
type
<class 'int'>    524288
<class 'str'>    475713

print(raw[raw.type=="<class 'int'>"])
           key              type
0            0  <class 'int'>
1            1  <class 'int'>
2            2  <class 'int'>
3            3  <class 'int'>
4            4  <class 'int'>
...        ...            ...
524283  524283  <class 'int'>
524284  524284  <class 'int'>
524285  524285  <class 'int'>
524286  524286  <class 'int'>
524287  524287  <class 'int'>

print(raw[raw.type=="<class 'str'>"])
            key              type
524288   524288  <class 'str'>
524289   524289  <class 'str'>
524290   524290  <class 'str'>
524291   524291  <class 'str'>
524292   524292  <class 'str'>
...         ...            ...
999996   999995  <class 'str'>
999997   999996  <class 'str'>
999998   999997  <class 'str'>
999999   999998  <class 'str'>
1000000  999999  <class 'str'>

Installed Versions

pandas version 2.1.4 system ubuntu 22.04

Comment From: anirudh-hegde

Hi @yinzhedfs, have you tried using StringIO to create a file-like object for csv_content?