As of Go 1.21, the syscall.Sendfile function has no documentation.

For many functions in the syscall package, we assume POSIX semantics in the absence of explicit documentation. However, sendfile is not defined by POSIX, and its semantics vary significantly among platforms.

Notably:

On Linux, “sendfile() will transfer at most 0x7ffff000 (2,147,479,552) bytes, returning the number of bytes actually transferred”. FreeBSD, macOS, and Solaris do not document any such restriction.

The reporting of the actual number of bytes transferred varies by platform. - On FreeBSD and macOS, “sendfile() may send fewer bytes than requested” only “[w]hen using a socket marked for non-blocking I/O”. In that case, it sets the sbytes out-parameter to indicate then number of bytes written, returns -1, and sets errno to EAGAIN. - Linux documents that “a successful call to sendfile() may write fewer bytes than requested”, but does not specify what happens to the offset parameter or the input file's offset on error. - Illumos documents that “In some error cases sendfile() may still write some data before encountering an error and returning -1. When that occurs, off is updated to point to the byte that follows the last byte copied and should be compared with its value before calling sendfile() to determine how much data was sent.”

It appears that the return-value from Go's syscall.Sendfile on FreeBSD and macOS always reports the *sbytes (a.k.a len) out-parameter, which is always nonnegative. On Linux and Solaris, it reports the return value from the call, which is -1 on error.

The effect on the offset of the input file varies by platform. - Solaris and Linux document that if the offset parameter is null, “data will be read from in_fd starting at the file offset, and the file offset will be updated by the call.” - Illumos documents that “[t]he sendfile() function does not modify the current file pointer of in_fd, but does modify the file pointer for out_fd if it is a regular file.” It does not document any particular behavior if the off argument is null, but its error behavior seems to imply than a non-null offset pointer should always be used. - FreeBSD does not document whether the file offset of fd is modified by the call. (I'm guessing that it's not, though.)

The allowed output descriptors vary by platform. - FreeBSD and macOS require a socket. - Linux 2.6.33 and above allows any file. - Solaris and Illumos allow “a file descriptor to a regular file opened for writing or to a connected AF_INET or AF_INET6 socket of SOCK_STREAM type”.

In addition, on Solaris and Illumos it appears that EAGAIN can be returned for reasons other than full send buffers — it can also occur due to file or record locking on the input or output file.

Given these variations, it seems to me that the semantics and usage of the Go syscall wrapper should be documented — especially given that the signature of Go's syscall.Sendfile on FreeBSD and macOS doesn't match the signature of the corresponding system C function.

References: - Linux - macOS - FreeBSD - Illumos - Solaris

Comment From: bcmills

(CC @panjf2000)

Comment From: paulzhol

FreeBSD does not document whether the file offset of fd is modified by the call. (I'm guessing that it's not, though.)

I also don't think it modifies fd. I didn't catch any fo_seek() calls in vn_sendfile() however linux_sendfile_common() does them. It is part of the Linuxulator (Linux Emulation) / Linux binary compatibility. That code also carries the following comment:

Differences between FreeBSD and Linux sendfile:
    /*
     * Differences between FreeBSD and Linux sendfile:
     * - Linux doesn't send anything when count is 0 (FreeBSD uses 0 to
     *   mean send the whole file.)  In linux_sendfile given fds are still
     *   checked for validity when the count is 0.
     * - Linux can send to any fd whereas FreeBSD only supports sockets.
     *   The same restriction follows for linux_sendfile.
     * - Linux doesn't have an equivalent for FreeBSD's flags and sf_hdtr.
     * - Linux takes an offset pointer and updates it to the read location.
     *   FreeBSD takes in an offset and a 'bytes read' parameter which is
     *   only filled if it isn't NULL.  We use this parameter to update the
     *   offset pointer if it exists.
     * - Linux sendfile returns bytes read on success while FreeBSD
     *   returns 0.  We use the 'bytes read' parameter to get this value.
     */

Comment From: panjf2000

Thank you for bringing this up. @bcmills

As the Linux man pages stated, sendfile(2) on Linux is indeed implemented distinctively from other UNIX systems.

As for the scenario of partial write, sendfile() may send fewer bytes than requested on either EAGAIN or EINTR on BSD-like OS's while a successful yet incomplete call to sendfile on Linux would return no error because EAGAIN from sendfile should only happen in the "zero-byte sent" case, as with other read/write-like system calls.

Another implementation detail worth mentioning is that sendfile(2) on Linux uses splice(2) to fulfill the zero-copy job under the hood since kernel v2.6.23, which might help us better understand the behavior of sendfile(2).

Comment From: gopherbot

Change https://go.dev/cl/546295 mentions this issue: syscall: document Sendfile with semantics and usage

Comment From: gopherbot

Change https://go.dev/cl/537275 mentions this issue: internal/poll: revise the determination about [handled] and improve the code readability for SendFile