Pandas version checks

  • [X] I have checked that this issue has not already been reported.

  • [X] I have confirmed this bug exists on the latest version of pandas.

  • [X] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import requests

# URL for data from University of California Irvine
url = 'https://archive.ics.uci.edu/ml/machine-learning-databases/00492/Metro_Interstate_Traffic_Volume.csv.gz'

file = 'Metro_Interstate_Traffic_Volume.csv.gz'

content = requests.get(url).content
with open(file, 'wb') as f:
    f.write(content)

df = pd.read_csv(file, compression='gzip')
  holiday    temp  rain_1h  snow_1h  clouds_all weather_main weather_description            date_time  traffic_volume
0     NaN  288.28      0.0      0.0          40       Clouds    scattered clouds  2012-10-02 09:00:00            5545
1     NaN  289.36      0.0      0.0          75       Clouds       broken clouds  2012-10-02 10:00:00            4516
2     NaN  289.58      0.0      0.0          90       Clouds     overcast clouds  2012-10-02 11:00:00            4767
3     NaN  290.13      0.0      0.0          90       Clouds     overcast clouds  2012-10-02 12:00:00            5026
4     NaN  291.14      0.0      0.0          75       Clouds       broken clouds  2012-10-02 13:00:00            4918
               temp       rain_1h       snow_1h    clouds_all  traffic_volume
count  48204.000000  48204.000000  48204.000000  48204.000000    48204.000000
mean     281.205870      0.334264      0.000222     49.362231     3259.818355
std       13.338232     44.789133      0.008168     39.015750     1986.860670
min        0.000000      0.000000      0.000000      0.000000        0.000000
25%      272.160000      0.000000      0.000000      1.000000     1193.000000
50%      282.450000      0.000000      0.000000     64.000000     3380.000000
75%      291.806000      0.000000      0.000000     90.000000     4933.000000
max      310.070000   9831.300000      0.510000    100.000000     7280.000000

Plotting

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(15, 6))

# pandas.DataFrame.plot / pandas.DataFrame.plot.hist
df.plot(kind='hist', column='traffic_volume', legend=False, ax=ax1)
ax1.bar_label(ax1.containers[0])
ax1.set_title('pandas.DataFrame.plot does not\nseem to correctly determine df.traffic_volume.max()')

# pandas.Series.plot / pandas.Series.plot.hist
df.traffic_volume.plot(kind='hist', ax=ax2)
_ = ax2.bar_label(ax2.containers[0])
ax2.set_title('pandas.Series.plot produces the expected result')

Issue Description

Pandas BUG: DataFrame.plot.hist and Series.plot.hist do not produce the same histogram for the same column

The plot produced by pandas.DataFrame.plot does not match the plot produced by pandas.Series.plot. kind='hist' for both.

Also see Different results when plotting histogram using DataFrame.plot.hist and Series.plot.hist

df[['traffic_volume', 'clouds_all']].plot(kind='hist', column='traffic_volume') produces the expected plot df[['traffic_volume']].plot(kind='hist', column='traffic_volume') produces the expected plot df[['traffic_volume', 'rain_1h']].plot(kind='hist', column='traffic_volume') produces the incorrect plot.

It seems def _calculate_bins applies to the entire numeric portion of the DataFrame, even if column='traffic_volume' is specified.

Expected Behavior

The expected behavior is the plot produced by pandas.Series.plot or matplotlib.axes.Axes.hist

pandas.DataFrame.hist and pandas.Series.hist also both produce the expected plot.

fig, ax = plt.subplots()
ax.hist(df.traffic_volume, bins=10)
_ = ax.bar_label(ax.containers[0])

Pandas BUG: DataFrame.plot.hist and Series.plot.hist do not produce the same histogram for the same column

I expect def _calculate_bins to calculate only on the specified columns, if column= is used. The column parameter here states If passed, will be used to limit data to a subset of columns.

Installed Versions

INSTALLED VERSIONS ------------------ commit : 478d340667831908b5b4bf09a2787a11a14560c9 python : 3.11.3.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.22621 machine : AMD64 processor : Intel64 Family 6 Model 167 Stepping 1, GenuineIntel byteorder : little LC_ALL : None LANG : None LOCALE : English_United States.1252 pandas : 2.0.0 numpy : 1.24.3 pytz : 2022.7 dateutil : 2.8.2 setuptools : 66.0.0 pip : 23.0.1 Cython : None pytest : None hypothesis : None sphinx : None blosc : None feather : None xlsxwriter : None lxml.etree : 4.9.2 html5lib : 1.1 pymysql : None psycopg2 : None jinja2 : None IPython : 8.12.0 pandas_datareader: 0.10.0 bs4 : 4.12.2 bottleneck : None brotli : fastparquet : None fsspec : None gcsfs : None matplotlib : 3.7.1 numba : None numexpr : None odfpy : None openpyxl : None pandas_gbq : None pyarrow : None pyreadstat : None pyxlsb : None s3fs : None scipy : 1.10.1 snappy : None sqlalchemy : None tables : None tabulate : None xarray : None xlrd : 2.0.1 zstandard : None tzdata : 2023.3 qtpy : None pyqt5 : None

Comment From: mittal-aakriti

I tried to reproduce this issue but it seems like it's working correctly now, the plots using all the different methods mentioned above look correct. Is there anything else that needs to be done to close the issue?