Pandas version checks
-
[X] I have checked that this issue has not already been reported.
-
[X] I have confirmed this bug exists on the latest version of pandas.
-
[ ] I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
import pandas as pd
import numpy as np
from pathlib import Path
# Create a simple df with a RangeIndex for the columns. N.B. The RangeIndex
# start and stop have the same sign.
range_idx = pd.RangeIndex(start=-10, stop=-5, step=1)
df = pd.DataFrame(np.random.randn(3, len(range_idx)), columns=range_idx)
csv_file = Path("./data/test_range_index.csv")
csv_file.parent.mkdir(exist_ok=True)
# Write df to a CSV along with row+col indices
df.to_csv(
csv_file,
header=True,
index=True,
)
# Create a dictionary to specify the dtypes
dtype_spec = {}
for v in range_idx:
dtype_spec[v] = "float64"
# dtype_spec[f"{v}"] = "float64" # * Alternatively, convert to str, which works
# Reload the CSV
df_reload = pd.read_csv(
csv_file,
header=0,
index_col=0,
float_precision='round_trip',
dtype=dtype_spec,
)
Issue Description
This may in part be my misunderstanding of how dtypes should be specified and the implicit conversion of the column index to str.
Notwithstanding that, the problem is when reading a CSV using read_csv()
with a header row that consists of integers of the same sign and specifying the dtype with a dictionary that has integer keys: it fails with an IndexError: list index out of range
.
Traceback
IndexError Traceback (most recent call last)
Cell In[106], [line 1](vscode-notebook-cell:?execution_count=106&line=1)
----> [1](vscode-notebook-cell:?execution_count=106&line=1) df_reload = pd.read_csv(
[2](vscode-notebook-cell:?execution_count=106&line=2) csv_file,
[3](vscode-notebook-cell:?execution_count=106&line=3) header=0,
[4](vscode-notebook-cell:?execution_count=106&line=4) index_col=0,
[5](vscode-notebook-cell:?execution_count=106&line=5) float_precision='round_trip',
[6](vscode-notebook-cell:?execution_count=106&line=6) dtype=dtype_spec,
[7](vscode-notebook-cell:?execution_count=106&line=7) )
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026), in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, date_format, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options, dtype_backend)
[1013](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1013) kwds_defaults = _refine_defaults_read(
[1014](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1014) dialect,
[1015](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1015) delimiter,
(...)
[1022](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1022) dtype_backend=dtype_backend,
[1023](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1023) )
[1024](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1024) kwds.update(kwds_defaults)
-> [1026](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1026) return _read(filepath_or_buffer, kwds)
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:626](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:626), in _read(filepath_or_buffer, kwds)
[623](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:623) return parser
[625](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:625) with parser:
--> [626](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:626) return parser.read(nrows)
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1923](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1923), in TextFileReader.read(self, nrows)
[1916](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1916) nrows = validate_integer("nrows", nrows)
[1917](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1917) try:
[1918](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1918) # error: "ParserBase" has no attribute "read"
[1919](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1919) (
[1920](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1920) index,
[1921](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1921) columns,
[1922](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1922) col_dict,
-> [1923](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1923) ) = self._engine.read( # type: ignore[attr-defined]
[1924](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1924) nrows
[1925](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1925) )
[1926](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1926) except Exception:
[1927](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/readers.py:1927) self.close()
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:333](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:333), in CParserWrapper.read(self, nrows)
[330](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:330) data = {k: v for k, (i, v) in zip(names, data_tups)}
[332](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:332) names, date_data = self._do_date_conversions(names, data)
--> [333](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:333) index, column_names = self._make_index(date_data, alldata, names)
[335](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/c_parser_wrapper.py:335) return index, column_names, date_data
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:372](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:372), in ParserBase._make_index(self, data, alldata, columns, indexnamerow)
[370](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:370) elif not self._has_complex_date_col:
[371](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:371) simple_index = self._get_simple_index(alldata, columns)
--> [372](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:372) index = self._agg_index(simple_index)
[373](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:373) elif self._has_complex_date_col:
[374](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:374) if not self._name_processed:
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:489](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:489), in ParserBase._agg_index(self, index, try_parse_dates)
[484](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:484) if col_name is not None:
[485](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:485) col_na_values, col_na_fvalues = _get_na_values(
[486](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:486) col_name, self.na_values, self.na_fvalues, self.keep_default_na
[487](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:487) )
--> [489](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:489) clean_dtypes = self._clean_mapping(self.dtype)
[491](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:491) cast_type = None
[492](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:492) index_converter = False
File [~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:455](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:455), in ParserBase._clean_mapping(self, mapping)
[453](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:453) for col, v in mapping.items():
[454](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:454) if isinstance(col, int) and col not in self.orig_names:
--> [455](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:455) col = self.orig_names[col]
[456](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:456) clean[col] = v
[457](https://file+.vscode-resource.vscode-cdn.net/home/peterma/exio_dev/repo/df_file_interchange/pandas_github_issues/~/miniconda3/envs/exio-datamanager/lib/python3.11/site-packages/pandas/io/parsers/base_parser.py:457) if isinstance(mapping, defaultdict):
IndexError: list index out of range
Further discussion
The IndexError
does not get raised if the original RangeIndex
has a start and stop of different signs.
Neither does it occur if the dtype=
is not specified in read_csv()
's arguments or if the keys in the dictionary of the dtypes are converted to str.
In all successful cases, the columns are an Index
that contain the str representation of the integers. Does this mean they should always be integers for the purposes of read_csv()
and that any conversion back to integer has to happen later? If so, the docs might benefit from being a little bit clearer.
Expected Behavior
That read_csv()
would succeed when specifying the dtypes with a dict with integer keys (of the same sign).
Installed Versions
Comment From: gb0808
take