Location of the documentation

pandas.read_csv

Documentation problem

I've just noticed that s3fs is required when you read an URL from s3. While it is documented that you can read from S3, the implication that you need to install an extra is not documented.

Also, it would be nice if this was a pandas extra in setup.py (e.g. s3).

Comment From: jorisvandenbossche

The user guide mentions it: https://pandas.pydata.org/docs/user_guide/io.html#reading-remote-files, and the install guide as well: https://pandas.pydata.org/docs/getting_started/install.html#optional-dependencies

I think it would probably be too much to list all optional dependencies in the read_csv docstring as well (S3 is one, but eg Azure or Google Cloud need other optional deps), but we should maybe mention it in general that additional deps might be needed and link to one the other places where this is explained?

Comment From: MartinThoma

I wasn't aware that there are even more 😱

we should maybe mention it in general that additional deps might be needed

Sounds good! Should I make a PR?

Comment From: jorisvandenbossche

Yes, PR very welcome!

Comment From: abdoulayegk

hello, can I make a PR cuz till now nobody makes it yet?

Comment From: MartinThoma

@abdoulayegk Oops, sorry, I forgot. Please go ahead if you want to take care of that :-)

Comment From: alecglassford

  1. Perhaps this has already been noted, but it looks like fsspec also needs to be installed in addition to s3fs or gcsfs (related PR: #34266). This is reflected in the optional dependencies list but it's not necessarily obvious on first glance. It might be nice if the relevant rows noted this requirement, for example (my addition in bold):
Dependency Minimum Version Notes
gcsfs 0.6.0 Google Cloud Storage access (must be used with fsspec)
s3fs 0.4.0 Amazon S3 access (must be used with fsspec)
  1. If you're adding a link in the read_csv docstring to the optional dependencies list, it likely makes sense to add an identical link to the docstrings of other pandas.read_{format} methods. I'm not sure it applies to all of them, but at least pandas.read_json and pandas.read_excel.

  2. I couldn't find a list of all the supported filesystems anywhere; the most comprehensive listing I found is this release note. Given that fsspec supports many filesystems, maybe it's not feasible to list them all (and keep up with a potentially growing list); however, the reading remote files section of the IO doc could be updated to link to the fsspec documentation for users to learn about additional compatible filesystems. (Unfortunately, I couldn't find a more concise list of supported filesystems in the fsspec documentation than the source code that I just linked to.)

Sorry if these are beyond the scope of this issue! They seemed closely related, so I thought that I would note these gaps here rather than create a new issue.