#!watchflakes
post <- goos == "windows" && pkg == "os" && ((log ~ `panic: test timed out` && log ~ `^\s*syscall\.CancelIoEx`) || status == "ABORT" )

Issue created automatically to collect these failures.

Example (log):

panic: test timed out after 3m0s
running tests:
    TestPipeIOCloseRace (2m59s)

goroutine 879 gp=0x4000174340 m=10 mp=0x4000180708 [running]:
panic({0x7ff7b437a3a0?, 0x4000400040?})
    C:/workdir/go/src/runtime/panic.go:779 +0x140 fp=0x400017bf00 sp=0x400017be50 pc=0x7ff7b4229d10
testing.(*M).startAlarm.func1()
    C:/workdir/go/src/testing/testing.go:2366 +0x30c fp=0x400017bfd0 sp=0x400017bf00 pc=0x7ff7b42f001c
runtime.goexit({})
...

goroutine 273 gp=0x4000443860 m=nil [GC worker (idle), 2 minutes]:
runtime.gopark(0x8501acdde81b4?, 0x0?, 0x0?, 0x0?, 0x0?)
    C:/workdir/go/src/runtime/proc.go:402 +0xc8 fp=0x400034bf30 sp=0x400034bf10 pc=0x7ff7b422d398
runtime.gcBgMarkWorker()
    C:/workdir/go/src/runtime/mgc.go:1310 +0xd8 fp=0x400034bfd0 sp=0x400034bf30 pc=0x7ff7b420c1c8
runtime.goexit({})
    C:/workdir/go/src/runtime/asm_arm64.s:1222 +0x4 fp=0x400034bfd0 sp=0x400034bfd0 pc=0x7ff7b4263734
created by runtime.gcBgMarkStartWorkers in goroutine 308
    C:/workdir/go/src/runtime/mgc.go:1234 +0x28

watchflakes

Comment From: gopherbot

Found new dashboard test flakes for:

#!watchflakes
default <- pkg == "os" && test == ""
2023-11-30 18:35 windows-arm64-11 go@bda1ef13 os (log) panic: test timed out after 3m0s running tests: TestPipeIOCloseRace (2m59s) goroutine 879 gp=0x4000174340 m=10 mp=0x4000180708 [running]: panic({0x7ff7b437a3a0?, 0x4000400040?}) C:/workdir/go/src/runtime/panic.go:779 +0x140 fp=0x400017bf00 sp=0x400017be50 pc=0x7ff7b4229d10 testing.(*M).startAlarm.func1() C:/workdir/go/src/testing/testing.go:2366 +0x30c fp=0x400017bfd0 sp=0x400017bf00 pc=0x7ff7b42f001c runtime.goexit({}) ... goroutine 273 gp=0x4000443860 m=nil [GC worker (idle), 2 minutes]: runtime.gopark(0x8501acdde81b4?, 0x0?, 0x0?, 0x0?, 0x0?) C:/workdir/go/src/runtime/proc.go:402 +0xc8 fp=0x400034bf30 sp=0x400034bf10 pc=0x7ff7b422d398 runtime.gcBgMarkWorker() C:/workdir/go/src/runtime/mgc.go:1310 +0xd8 fp=0x400034bfd0 sp=0x400034bf30 pc=0x7ff7b420c1c8 runtime.goexit({}) C:/workdir/go/src/runtime/asm_arm64.s:1222 +0x4 fp=0x400034bfd0 sp=0x400034bfd0 pc=0x7ff7b4263734 created by runtime.gcBgMarkStartWorkers in goroutine 308 C:/workdir/go/src/runtime/mgc.go:1234 +0x28

watchflakes

Comment From: bcmills

@golang/windows, any idea why CancelIoEx would block for over 2 minutes?

Comment From: alexbrainman

@golang/windows, any idea why CancelIoEx would block for over 2 minutes?

No idea. According to https://learn.microsoft.com/en-us/windows/win32/fileio/cancelioex-func

The CancelIoEx function does not wait for all canceled operations to complete.

Alex

Comment From: gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "windows" && pkg == "os" && ((log ~ `panic: test timed out` && log ~ `^\s*syscall\.CancelIoEx`) || status == "ABORT" )
2024-03-07 05:26 gotip-windows-386 go@a0624701 os.TestPipeIOCloseRace [ABORT] (log) === RUN TestPipeIOCloseRace === PAUSE TestPipeIOCloseRace === CONT TestPipeIOCloseRace

watchflakes

Comment From: gopherbot

Found new dashboard test flakes for:

#!watchflakes
post <- goos == "windows" && pkg == "os" && ((log ~ `panic: test timed out` && log ~ `^\s*syscall\.CancelIoEx`) || status == "ABORT" )
2025-02-22 21:25 gotip-windows-arm64 go@38825ef3 os.TestPipeIOCloseRace [ABORT] (log) === RUN TestPipeIOCloseRace === PAUSE TestPipeIOCloseRace === CONT TestPipeIOCloseRace

watchflakes

Comment From: wiresock

Hello!

I extracted the test from this issue and turned it into a standalone application. It consistently hangs under load, and after digging into the cause, I believe the root issue is in how Go handles pipe cleanup on Windows.

Specifically, os.Pipe() uses CreatePipe, which creates synchronous handles (i.e., not created with FILE_FLAG_OVERLAPPED). However, in (*FD).Close() Go still calls:

if fd.kind == kindPipe {
    syscall.CancelIoEx(fd.Sysfd, nil)
}

This is unsafe because CancelIoEx is designed for canceling overlapped I/O, not synchronous I/O. Calling it on a synchronous pipe handle can lead to hangs.

After commenting out that call and rebuilding Go from source, the hang completely disappeared.

So this is an issue in the runtime: Go calls CancelIoEx on handles where it's neither needed nor safe. A fix could be to avoid calling CancelIoEx unless the handle was explicitly created with FILE_FLAG_OVERLAPPED.

I actually stepped into this issue while investigating another CancelIoEx-related hang, which occurs at this line. Any ideas on that one?

Comment From: wiresock

Hello @qmuntal,

To add more technical detail: the hang happens when a thread issues two I/Os—first an async IO, then a sync IO (which blocks with normal kernel APCs disabled). If another thread then tries to cancel the async IO, the IO manager does not call KeAlertThread because the thread is blocked in a state where normal kernel APCs can’t be delivered. As a result, the cancel APC is never processed, and the thread is stuck indefinitely.

In general, a single thread should not mix synchronous and asynchronous I/O, especially on Windows, to avoid these low-level kernel issues. It would be great if the Go runtime could be audited for any such cases.

Just to clarify: in the case of os.Pipe(), there isn’t an actual asynchronous I/O in play, but calling CancelIoEx on a synchronous pipe handle still triggers the same underlying scenario.

Internally, CancelIoEx queues a kernel APC to the target thread. But if that thread is currently blocked on a synchronous I/O operation (with normal kernel APCs disabled), the APC can’t be delivered, and the IO manager doesn’t invoke KeAlertThread. As a result, the thread gets stuck in an uninterruptible wait.

So even without async IO, calling CancelIoEx on a synchronous pipe handle is unsafe and can lead to a hang.

Comment From: qmuntal

Thanks for this detailed information @wiresock, what you say makes sense and it is really useful.

On the other hand, that CancelIoEx call was put there to in https://github.com/golang/go/commit/d227a0811b76791fad04eeba35cf2794a719d610 to fix several other issues, so we can't just remove it.

I'll try to fix this for Go 1.26 (although any help is welcomed).