Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[X] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
pip install requests
pip install --upgrade --pre pandas==1.5.0rc0
import requests
import pandas as pd
r = requests.get(f"https://elcinema.com/en/index/work/country/eg?page=1")
df_one= pd.read_html(r.text)[0]
row_a = df_one["Release Year"].iloc[1]
print(row_a) # `2023`
df_two= pd.read_html(r.text, extract_links="body")[0]
row_b = df_two["Release Year"].iloc[1]
print(row_b) # `('2023', None)`
Issue Description
The print statement returns 2023
The second print statement returns ('2023', None)
the first item in the tuple has the wrong type.
extracting_links
is changing how the type is being determined.
Expected Behavior
The first argument in the tuple should have the same type as if extract_links
was not defined.
In this case it should return (2023, None)
i.e
assert row_a == row_b[0]
Should hold, but it doesn't
Installed Versions
Comment From: mroeschke
Thanks for the report. This may be tricky to address.
A row of the data with extract_links=None
is just strings that the TextFileReader
will infer types of:
... ['1)', 'Already Happened', 'Series', '2023', '0 Rating disabledRating disabled']...
A row of the data with extract_links="anything"
is a sequence of tuples of strings/None
, and TextFileReader
does not recursively infer types:
... ('1)', None), ('Already Happened', '/en/work/2075121/'), ('Series', None), ('2023', None), ('0 Rating disabledRating disabled', None)]...
So I guess fundamentally the question is if TextFileReader
should recursively infer types of container data types.
Comment From: MarcoGorelli
moving off the 2.0 milestone as it's a regression from 1.5
Comment From: jbrockmendel
Is this actionable?