Pandas version checks
- [x] I have checked that the issue still exists on the latest versions of the docs on
main
here
Location of the documentation
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_feather.html
Documentation problem
Since #35408, the method docs say:
path str, path object, file-like object String, path object (implementing os.PathLike[str]), or file-like object implementing a binary write() function.
And indeed it does work with a BytesIO buffer or an open file:
with open('/tmp/foo', 'wb') as handle:
df.to_feather(handle)
But not with other file-like objects, such as an AsyncWriter
from hdfs.InsecureClient.write()
:
with self.client.write(self._path(name)) as writer:
df.to_feather(writer)
Traceback (most recent call last):
File "/home/chris/ram-system/.venv/lib/python3.10/site-packages/pyarrow/feather.py", line 186, in write_feather
_feather.write_feather(table, dest, compression=compression,
AttributeError: 'AsyncWriter' object has no attribute 'closed'
ValueError: I/O operation on closed file
I note that it's not actually supposed to work: pyarrow.feather.write_feather says:
deststr Local destination path.
Which says nothing about file-like objects being acceptable. It does seem to have some special cases for handling buffers specifically, but this is undocumented and could change at any time.
I think that write_feather
insists on checking the closed
attribute of the passed handle, which this one doesn't have. It seems to work if I poke such an attribute onto the object, but it could easily stop working.
Also I know about hdfs.ext.dataframe.write_dataframe for this particular use case, but it only supports Avro which is not a great file format for DataFrames, and there are likely to be other file-like objects that people might try to pass to to_feather()
.
Similarly, read_feather claims to accept:
pathstr, path object, or file-like object String, path object (implementing os.PathLike[str]), or file-like object implementing a binary read() function.
But read()
is not enough:
File "pyarrow/_feather.pyx", line 79, in pyarrow._feather.FeatherReader.__cinit__
File "pyarrow/error.pxi", line 155, in pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/error.pxi", line 89, in pyarrow.lib.check_status
io.UnsupportedOperation: seek
Suggested fix for documentation
I think it's better to describe these functions as officially taking only strings (URLs and paths) and mmap objects. File-like objects currently work but this is not guaranteed.
Comment From: rhshadrach
Thanks for the report. You mention:
I note that it's not actually supposed to work: pyarrow.feather.write_feather says: ...
I do not follow this. What does this have to do whether pd.write_feather
supports file-like objects?
But not with other file-like objects, such as an
AsyncWriter
fromhdfs.InsecureClient.write()
:
If the object does not have a closed
attribute, then it is not implementing IOBase, and therefore is not file-like.
Comment From: qris
Thanks for the quick reply!
I do not follow this. What does this have to do whether
pd.write_feather
supports file-like objects?
I mean that Pandas delegates to_feather() to feather.write_feather
:
from pyarrow import feather
...
with get_handle(
path, "wb", storage_options=storage_options, is_text=False
) as handles:
feather.write_feather(df, handles.handle, **kwargs)
It calls get_handle
first, but that doesn't change handles, it just wraps them. Then it calls write_feather
with that same handle. But write_feather
does not accept a handle, so we have no right to expect this to work. The fact that it does at all is an undocumented feature of write_feather
.
If the object does not have a
closed
attribute, then it is not implementing IOBase, and therefore is not file-like.
Fair point, I wasn't aware of that connection, I read it as "[any] object implementing a binary write() function."
Comment From: rhshadrach
Is it possible to provide a reproducer that fails and implements IOBase
? If not, I do not think we should change the documentation here.
Comment From: qris
I expect it works with any object implementing IOBase
. If you don't want to change the docs then there's nothing to do here.