Pandas PERF: Brainstorming read_csv perf improvements

[ ] With free-threading, could _convert_column_data be called in parallel for each column?
[ ] (free-threading) For large files, split into chunks and parse in parallel, then concat?
[ ] In a pyarrow-always-available world, could _string_box_utf8 allocate a buffer+mask rather than ndarray[object]?
[ ] #17743

Anyone else have more ideas?

Comment From: samukweku

Is it possible to prefilter? @jbrockmendel

Comment From: jbrockmendel

Can you describe what you have in mind?

Comment From: samukweku

@jbrockmendel I mean do a filter on the rows while reading the csv, instead of returning the entire CSV. I'm thinking of what happens with parquet, where a user can return a subset of the data, filtered based on partitioning. CSV doesn't have that; maybe there is a different way to make that happen? Polars has something similar with scan_csv; I believe duckdb has similar capabilities with their CSV reader