Pandas version checks

  • [x] I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_feather.html

Documentation problem

Since #35408, the method docs say:

path str, path object, file-like object String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.

And indeed it does work with a BytesIO buffer or an open file:

with open('/tmp/foo', 'wb') as handle:
    df.to_feather(handle)

But not with other file-like objects, such as an AsyncWriter from hdfs.InsecureClient.write():

with self.client.write(self._path(name)) as writer:
    df.to_feather(writer)

Traceback (most recent call last):
  File "/home/chris/ram-system/.venv/lib/python3.10/site-packages/pyarrow/feather.py", line 186, in write_feather
    _feather.write_feather(table, dest, compression=compression,
AttributeError: 'AsyncWriter' object has no attribute 'closed'
ValueError: I/O operation on closed file

I note that it's not actually supposed to work: pyarrow.feather.write_feather says:

deststr Local destination path.

Which says nothing about file-like objects being acceptable. It does seem to have some special cases for handling buffers specifically, but this is undocumented and could change at any time.

I think that write_feather insists on checking the closed attribute of the passed handle, which this one doesn't have. It seems to work if I poke such an attribute onto the object, but it could easily stop working.

Also I know about hdfs.ext.dataframe.write_dataframe for this particular use case, but it only supports Avro which is not a great file format for DataFrames, and there are likely to be other file-like objects that people might try to pass to to_feather().

Similarly, read_feather claims to accept:

pathstr, path object, or file-like object String, path object (implementing os.PathLike[str]), or file-like object implementing a binary read() function.

But read() is not enough:

  File "pyarrow/_feather.pyx", line 79, in pyarrow._feather.FeatherReader.__cinit__
  File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
  File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
io.UnsupportedOperation: seek

Suggested fix for documentation

I think it's better to describe these functions as officially taking only strings (URLs and paths) and mmap objects. File-like objects currently work but this is not guaranteed.

Comment From: rhshadrach

Thanks for the report. You mention:

I note that it's not actually supposed to work: pyarrow.feather.write_feather says: ...

I do not follow this. What does this have to do whether pd.write_feather supports file-like objects?

But not with other file-like objects, such as an AsyncWriter from hdfs.InsecureClient.write():

If the object does not have a closed attribute, then it is not implementing IOBase, and therefore is not file-like.

Comment From: qris

Thanks for the quick reply!

I do not follow this. What does this have to do whether pd.write_feather supports file-like objects?

I mean that Pandas delegates to_feather() to feather.write_feather:

    from pyarrow import feather
    ...
    with get_handle(
        path, "wb", storage_options=storage_options, is_text=False
    ) as handles:
        feather.write_feather(df, handles.handle, **kwargs)

It calls get_handle first, but that doesn't change handles, it just wraps them. Then it calls write_feather with that same handle. But write_feather does not accept a handle, so we have no right to expect this to work. The fact that it does at all is an undocumented feature of write_feather.

If the object does not have a closed attribute, then it is not implementing IOBase, and therefore is not file-like.

Fair point, I wasn't aware of that connection, I read it as "[any] object implementing a binary write() function."

Comment From: rhshadrach

Is it possible to provide a reproducer that fails and implements IOBase? If not, I do not think we should change the documentation here.

Comment From: qris

I expect it works with any object implementing IOBase. If you don't want to change the docs then there's nothing to do here.