Feature Type

  • [ ] Adding new functionality to pandas

  • [x] Changing existing functionality in pandas

  • [ ] Removing existing functionality in pandas

Problem Description

When using a functionality that requires a ~~performance dependency~~ optional dependency that is not installed, the error message points out a specific library instead of the multiple options that the user has.

See the following report from an earlier, very similar, already closed issue:

Hi, I have just started learning about pandas with "Getting Started tutorials". On the "How do I read and write tabular data?" tutorial when I ran the command got an unexpected error and no suggestion was provided there to solve this issue. The command and error are as follows:

Command: titanic.to_excel('titanic.xlsx', sheet_name='Passengers', index=False)

Error: ModuleNotFoundError: No module named 'openpyxl'

I solved the issue by installing openpyxl using pip with pip install openpyxl.

[...]

Feature Description

The error message should be changed to something along the lines of:

Missing optional dependency. To use this functionality, you need to install xlrd, xlsxwriter, openpyxxl, pyxlsb or python-calamine.

Missing optional dependency. To use this functionality, you need to install xlswriter or openpyxl.

Similar error messages should be emitted when trying to use any of the other ~~performance~~ optional dependencies (plots, computation, HTML, XML, SQL, etc.)

Alternative Solutions

If you are good at searching the web or know Pandas well, you can figure out that you have multiple options, otherwise you just install the module mentioned in the current error message.

Additional Context

No response

Comment From: wilocu

take

Comment From: rhshadrach

When using a functionality that requires a performance dependency that is not installed, the error message points out a specific library instead of the multiple options that the user has.

Can you give a reproducer here? Is it really the case that installing any one of them will resolve the issue?

Comment From: joooeey

Can you give a reproducer here? Is it really the case that installing any one of them will resolve the issue?

Per the docs linked already twice in this thread, to_excel should only work with openpyxl and xlswriter. I did mamba install xlswriter.

I'm not sure if it's best to link to the documentation or list the options in each case. If we list the options, it would be good to pull the options dynamically from the surrounding code. I'll edit my question and scratch the too-long list of options.

Comment From: rhshadrach

Per the docs linked already twice in this thread, to_excel should only work with openpyxl and xlswriter. I did mamba install xlswriter.

I see no links to documentation on Excel in this thread, nor is this statement correct: odf can also be used as a writer. In any case, I do not see what this has to do with my request for a reproducer of the performance dependency error.

Comment From: joooeey

I see no links to documentation on Excel in this thread, nor is this statement correct: odf can also be used as a writer. In any case, I do not see what this has to do with my request for a reproducer of the performance dependency error.

This is the section about excel in the Pandas performance dependency docs.

What do you mean with "odf"? There is no Python library with that name that writes tables.

The reproducer of the error has been in the issue all along. Here's a more focused reproducer:

$ mamba create -n test pandas
$ mamba activate test
$ python
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3]]).to_excel("test.xlsx")
Traceback (most recent call last):
  File "<python-input-1>", line 1, in <module>
    pd.DataFrame([[1, 2, 3]]).to_excel("test.xlsx")
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^
  File "/home/lukas/miniforge3/envs/test/lib/python3.13/site-packages/pandas/util/_decorators.py", line 333, in wrapper
    return func(*args, **kwargs)
  File "/home/lukas/miniforge3/envs/test/lib/python3.13/site-packages/pandas/core/generic.py", line 2436, in to_excel
    formatter.write(
    ~~~~~~~~~~~~~~~^
        excel_writer,
        ^^^^^^^^^^^^^
    ...<6 lines>...
        engine_kwargs=engine_kwargs,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "/home/lukas/miniforge3/envs/test/lib/python3.13/site-packages/pandas/io/formats/excel.py", line 943, in write
    writer = ExcelWriter(
        writer,
    ...<2 lines>...
        engine_kwargs=engine_kwargs,
    )
  File "/home/lukas/miniforge3/envs/test/lib/python3.13/site-packages/pandas/io/excel/_openpyxl.py", line 57, in __init__
    from openpyxl.workbook import Workbook
ModuleNotFoundError: No module named 'openpyxl'
>>>  quit()
$ mamba install xlsxwriter
$ python
>>> import pandas as pd
>>> pd.DataFrame([[1, 2, 3]]).to_excel("test.xlsx")
>>>

It succeeds after install xlsxwriter.

Comment From: rhshadrach

Ahhh, I think I finally understand the confusion! The docs you have been linking to are Optional dependencies. In that, there are two subsections (among others): Performance dependencies and Excel files. These are separate subsections. I believe you mean this issue to be about the Excel file dependencies (e.g. openpyxl, xlsxwriter) and not performance dependencies (e.g. numba, bottleneck).

What do you mean with "odf"? There is no Python library with that name that writes tables.

df = pd.DataFrame({"a": [1, 1, 2], "b": [3, 4, 5]})
df.to_excel("test.ods", engine="odf")

I am positive for changing the error message here to be more general as long as (a) it is accurate for the file type specified by the user and engine if provided and (b) does not involve introducing more metadata pandas must maintain (e.g. a list of engines for each file type, above and beyond what we already have). If this is not possible, I think the error message is okay as-is.

Likewise, this applies to other types of optional dependencies as well.

Comment From: joooeey

Ahhh, I think I finally understand the confusion! The docs you have been linking to are Optional dependencies. In that, there are two subsections (among others): Performance dependencies and Excel files. These are separate subsections. I believe you mean this issue to be about the Excel file dependencies (e.g. openpyxl, xlsxwriter) and not performance dependencies (e.g. numba, bottleneck).

My issue was meant to be about all Optional dependencies. However, I only have a reproducer for Excel. I assume this applies to the others as well but haven't checked. I'll edit the OP accordingly.

Comment From: rhshadrach

I assume this applies to the others as well but haven't checked.

Perhaps in some, but certainly not all. E.g. bottleneck and numexpr are not used for the same things. Also even in the Excel case, if a user does df.to_excel(..., enging="openpyxl"), I do not think we should suggest installing xlsxwriter.

I'd be okay with looking to improve the error message here, but only if it doesn't add significant complexities to the code. Otherwise, I think this is okay as-is.