Pandas version checks

  • [x] I have checked that this issue has not already been reported.

  • [x] I have confirmed this bug exists on the latest version of pandas.

  • [x] I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

N/A

Issue Description

The Dockerfile doesn't contain any direct installation instruction of Pandas. The CI logs shows that it is installed as a dependency of fastparquet.

Collecting pandas>=1.5.0 (from fastparquet>=2024.11.0->-r /tmp/requirements-dev.txt (line 22))

Expected Behavior

Pandas should be explicitly installed from PyPi during the Docker build, or it should be built from source. The latter seems to be the preferred option, as the objective of the Dockerfile is to contribute to pandas.

Edit: Another option is to extend the documentation, adding instructions to build it from source.

Installed Versions

N/A

Comment From: WillAyd

I think this is intentional - if you install pandas as part of the image then you are snapshotting the version that was used at the time the image was built.

Comment From: Alvaro-Kothe

if you install pandas as part of the image then you are snapshotting the version that was used at the time the image was built.

I agree, considering that the goal is to provide a development environment. I think that it's best if we don't build pandas during docker build since the goal is to mount the repository. But the current behavior is flaky for the following reasons:

  1. If we drop all dependencies from environment.yml that depends on pandas, the "Build Docker Dev Environment" CI job will fail, since it won't be installed in the docker environment to call import pandas as pd.
  2. The Option 3: using Docker doesn't make it clear that the pandas you will be using is the one from PyPI, so it would be useful to state that you may need to build it from source if you want to modify the source code.

Comment From: WillAyd

Recently we've been moving away from Docker - not many contributors to pandas have been using it in all the years we have documented it, and it seems to cause more confusion than good. That is not meant as an edict on Docker itself, but in our niche it appears not many developers use it enough to understand how it is supposed to work

That CI job seems fundamentally flawed - if it wants to run that type of test then it should install pandas into the container, not into the image. Regarding the documentation, I think others have been looking to remove that section, but I'm also OK to expand it if we want to clarify further

Comment From: mroeschke

I agree with @WillAyd's sentiment of removing the Docker development setup (and therefore Gitpod/devcontainer setup). I think we should just remove all of it since it's not really maintained in lock-step with our development dependencies.

I would be supportive of making a note in the documentation like "if you're trying to install these dependencies in a container, make sure you install the following system packages ..."