Detecting Goroutine Leaks in Go with synctest and pprof
Explore how to identify and prevent goroutine leaks in Go applications using Go 1.24+'s `synctest` package and the experimental `goroutineleak` profile in Go 1.26, along with common leak patterns and their solutions.
Goroutine leaks, along with deadlocks and race conditions, are prevalent issues in concurrent Go programming. While deadlocks often lead to panics and are easier to diagnose, and the race detector aids in finding data races, goroutine leaks have historically lacked dedicated tooling within Go.
A goroutine leak occurs when one or more goroutines become indefinitely blocked on synchronization primitives (like channels) while the rest of the program continues to execute. This article will demonstrate several common leak scenarios and how to detect them.
Significant advancements in Go 1.24 introduced the synctest package, and Go 1.26 further enhances leak detection with an experimental goroutineleak profile. Let's delve into these tools.
A Simple Leak Scenario
Consider a function Gather that runs provided functions concurrently and sends their results to an output channel:
// Gather runs the given functions concurrently
// and collects the results.
func Gather(funcs ...func() int) <-chan int {
out := make(chan int)
for _, f := range funcs {
go func() {
out <- f()
}()
}
return out
}
A basic test might look like this:
func Test(t *testing.T) {
out := Gather(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
total := 0
for range 3 {
total += <-out
}
if total != 66 {
t.Errorf("got %v, want 66", total)
}
}
This test passes, indicating the function seems to work correctly. However, if we call Gather without consuming its results and then check the number of active goroutines:
func main() {
Gather(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
time.Sleep(50 * time.Millisecond)
nGoro := runtime.NumGoroutine() - 1 // minus the main goroutine
fmt.Println("nGoro =", nGoro)
}
nGoro = 3
After 50 milliseconds, three goroutines remain stuck. This happens because the out channel is unbuffered. If the client doesn't read all results, the goroutines launched by Gather block indefinitely when attempting to send their f() results to out.
Let's explore robust methods to detect such leaks.
Detecting Leaks with goleak
Relying on runtime.NumGoroutine is fragile for testing. A common third-party solution is the goleak package:
// Gather runs the given functions concurrently
// and collects the results.
func Gather(funcs ...func() int) <-chan int {
out := make(chan int)
for _, f := range funcs {
go func() {
out <- f()
}()
}
return out
}
func Test(t *testing.T) {
defer goleak.VerifyNone(t)
Gather(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
}
The test output clearly pinpoints the leak:
--- FAIL: Test (0.44s)
goleak_test.go:28: found unexpected goroutines:
Goroutine 8 in state chan send, with play.Gather.func1 on top of the stack:
play.Gather.func1()
/tmp/sandbox4216740326/prog_test.go:16 +0x37
created by play.Gather in goroutine 7
/tmp/sandbox4216740326/prog_test.go:15 +0x45
Goroutine 9 in state chan send, with play.Gather.func1 on top of the stack:
play.Gather.func1()
/tmp/sandbox4216740326/prog_test.go:16 +0x37
created by play.Gather in goroutine 7
/tmp/sandbox4216740326/prog_test.go:15 +0x45
Goroutine 10 in state chan send, with play.Gather.func1 on top of the stack:
play.Gather.func1()
/tmp/sandbox4216740326/prog_test.go:16 +0x37
created by play.Gather in goroutine 7
/tmp/sandbox4216740326/prog_test.go:15 +0x45
goleak efficiently checks for unexpected goroutines by repeatedly inspecting the stack, with exponentially increasing wait times (from 1 microsecond to 100 milliseconds). While effective, it's a third-party dependency and relies on time.Sleep internally.
Detecting Leaks with synctest
Go 1.24 introduced the experimental synctest package (production-ready in Go 1.25+), offering an alternative for leak detection without external packages or time.Sleep:
// Gather runs the given functions concurrently
// and collects the results.
func Gather(funcs ...func() int) <-chan int {
out := make(chan int)
for _, f := range funcs {
go func() {
out <- f()
}()
}
return out
}
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
Gather(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
synctest.Wait()
})
}
The output reveals a deadlock:
--- FAIL: Test (0.00s)
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain [recovered, repanicked]
goroutine 10 [chan send (durable), synctest bubble 1]:
sandbox.Gather.func1()
/tmp/sandbox/main_test.go:34 +0x37
created by sandbox.Gather in goroutine 9
/tmp/sandbox/main_test.go:33 +0x45
goroutine 11 [chan send (durable), synctest bubble 1]:
sandbox.Gather.func1()
/tmp/sandbox/main_test.go:34 +0x37
created by sandbox.Gather in goroutine 9
/tmp/sandbox/main_test.go:33 +0x45
goroutine 12 [chan send (durable), synctest bubble 1]:
sandbox.Gather.func1()
/tmp/sandbox/main_test.go:34 +0x37
created by sandbox.Gather in goroutine 9
/tmp/sandbox/main_test.go:33 +0x45
Here's a breakdown of how synctest identifies the leak:
synctest.Testinitiates a testing "bubble" in a separate goroutine.Gatherlaunches three goroutines.synctest.Waitblocks the root bubble goroutine.- Each
Gathergoroutine attempts to write tooutbut gets blocked because no receiver is active. synctest.Waitdetects that all child goroutines within the bubble are durably blocked and unblocks the root goroutine.- The inner test function completes.
synctest.Testthen waits for all child goroutines to finish. Upon discovering durably blocked goroutines, it panics with "main bubble goroutine has exited but blocked goroutines remain," effectively reporting the leak.
synctest proves highly useful for leak detection during testing without relying on time.Sleep or external packages.
Detecting Leaks with pprof (Go 1.26 goroutineleak profile)
Go 1.26 introduces an experimental goroutineleak profile in pprof, making it possible to detect leaks even in running programs. First, we'll use a helper function to run profiled code and print results:
func printLeaks(f func()) {
prof := pprof.Lookup("goroutineleak")
defer func() {
time.Sleep(50 * time.Millisecond) // Give the runtime a moment to settle.
var content strings.Builder
prof.WriteTo(&content, 2)
// Print only the leaked goroutines.
goros := strings.Split(content.String(), "
")
for _, goro := range goros {
if strings.Contains(goro, "(leaked)") {
fmt.Println(goro + "
")
}
}
}()
f()
}
(Note: When trying this locally, remember to set the environment variable GOEXPERIMENT=goroutineleakprofile.)
Now, let's call Gather with three functions and observe the leaks:
func main() {
printLeaks(func() {
Gather(
func() int { return 11 },
func() int { return 22 },
func() int { return 33 },
)
})
}
goroutine 5 [chan send (leaked)]:
main.Gather.func1()
/tmp/sandbox/main.go:35 +0x37
created by main.Gather in goroutine 1
/tmp/sandbox/main.go:34 +0x45
goroutine 6 [chan send (leaked)]:
main.Gather.func1()
/tmp/sandbox/main.go:35 +0x37
created by main.Gather in goroutine 1
/tmp/sandbox/main.go:34 +0x45
goroutine 7 [chan send (leaked)]:
main.Gather.func1()
/tmp/sandbox/main.go:35 +0x37
created by main.Gather in goroutine 1
/tmp/sandbox/main.go:34 +0x45
The output provides clear stack traces for the leaked goroutines. While this method requires time.Sleep in tests (unless combined with synctest's fake clock), its ability to collect profiles from running programs makes it invaluable for diagnosing leaks in production environments.
Leak Detection Algorithm
The goroutineleak profile utilizes the garbage collector's marking phase to identify permanently blocked (leaked) goroutines. This approach is detailed in the proposal and a related paper.
Here's a simplified overview of the algorithm:
[ Start: GC mark phase ]
│
│ 1. Collect live goroutines
v
┌───────────────────────┐
│ Initial roots │ <────────────────┐
│ (runnable goroutines) │ │
└───────────────────────┘ │
│ │
│ 2. Mark reachable memory │
v │
┌───────────────────────┐ │
│ Reachable objects │ │
│ (channels, mutexes) │ │
└───────────────────────┘ │
│ │
│ 3a. Check blocked goroutines │
v │
┌───────────────────────┐ (Yes) │
│ Is blocked G waiting │ ─────────────────┘
│ on a reachable obj? │ 3b. Add G to roots
└───────────────────────┘
│
│ (No - repeat until no new Gs found)
v
┌───────────────────────┐
│ Remaining blocked │
│ goroutines │
└───────────────────────┘
│
│ 5. Report the leaks
v
[ LEAKED! ]
(Blocked on unreachable
synchronization objects)
- Collect Live Goroutines: Start by identifying active (runnable or running) goroutines as initial "roots," temporarily ignoring blocked ones.
- Mark Reachable Memory: Trace pointers from these roots to determine which memory objects (like channels or mutexes) are currently accessible.
- Resurrect Blocked Goroutines: Examine all currently blocked goroutines. If a blocked goroutine is waiting for a synchronization resource that was just marked as reachable, add that goroutine to the set of roots.
- Iterate: Repeat steps 2 and 3 until no new goroutines are found to be blocked on reachable objects.
- Report the Leaks: Any goroutines remaining in a blocked state are considered leaked because they are waiting for resources that no active part of the program can access.
The following sections will review common goroutine leak patterns, demonstrating how synctest and goroutineleak effectively detect each one. These examples are based on code from the common-goroutine-leak-patterns repository, licensed under Apache-2.0.
Common Goroutine Leak Patterns
Range Over Channel
One or more goroutines receive from a channel using range, but the sender never closes the channel, leading to all receivers leaking indefinitely.
func RangeOverChan(list []any, workers int) {
ch := make(chan any)
// Launch workers.
for range workers {
go func() {
// Each worker processes items one by one.
// The channel is never closed, so every worker leaks
// once there are no more items left to process.
for item := range ch {
_ = item
}
}()
}
// Send items for processing.
for _, item := range list {
ch <- item
}
// close(ch) // FIX: Uncomment to resolve by closing the channel after sending.
}
Using synctest:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
RangeOverChan([]any{11, 22, 33, 44}, 2)
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 10 [chan receive (durable), synctest bubble 1]:
sandbox.RangeOverChan.func1()
/tmp/sandbox/main_test.go:36 +0x34
created by sandbox.RangeOverChan in goroutine 9
/tmp/sandbox/main_test.go:34 +0x45
goroutine 11 [chan receive (durable), synctest bubble 1]:
sandbox.RangeOverChan.func1()
/tmp/sandbox/main_test.go:36 +0x34
created by sandbox.RangeOverChan in goroutine 9
/tmp/sandbox/main_test.go:34 +0x45
Using goroutineleak:
func main() {
printLeaks(func() {
RangeOverChan([]any{11, 22, 33, 44}, 2)
})
}
goroutine 19 [chan receive (leaked)]:
main.RangeOverChan.func1()
/tmp/sandbox/main.go:36 +0x34
created by main.RangeOverChan in goroutine 1
/tmp/sandbox/main.go:34 +0x45
goroutine 20 [chan receive (leaked)]:
main.RangeOverChan.func1()
/tmp/sandbox/main.go:36 +0x34
created by main.RangeOverChan in goroutine 1
/tmp/sandbox/main.go:34 +0x45
Both tools provide clear stack traces. The fix involves ensuring the sender closes the channel after all items have been sent.
Double Send
A sender accidentally writes more values to a channel than intended, leading to a leak if the receiver isn't ready for the extra sends.
func DoubleSend() <-chan any {
ch := make(chan any)
go func() {
res, err := work(13)
if err != nil {
// In case of an error, send nil.
ch <- nil
// return // FIX: Uncomment to prevent sending twice on error.
}
// Otherwise, continue with normal behavior.
// This leaks if err != nil, as 'ch <- res' will block.
ch <- res
}()
return ch
}
Using synctest:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
<-DoubleSend()
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 22 [chan send (durable), synctest bubble 1]:
sandbox.DoubleSend.func1()
/tmp/sandbox/main_test.go:42 +0x4c
created by sandbox.DoubleSend in goroutine 21
/tmp/sandbox/main_test.go:32 +0x5f
Using goroutineleak:
func main() {
printLeaks(func() {
<-DoubleSend()
})
}
goroutine 19 [chan send (leaked)]:
main.DoubleSend.func1()
/tmp/sandbox/main.go:42 +0x4c
created by main.DoubleSend in goroutine 1
/tmp/sandbox/main.go:32 +0x67
The solution is to ensure each code path sends to the channel no more times than the receiver expects, or to use a buffered channel large enough to accommodate all potential sends.
Early Return
The parent goroutine exits without receiving a value from its child goroutine, causing the child to leak.
func EarlyReturn() {
ch := make(chan any) // FIX: Should be buffered.
go func() {
res, _ := work(42)
// Leaks if the parent goroutine terminates early.
ch <- res
}()
_, err := work(13)
if err != nil {
// Early return in case of error.
// The child goroutine leaks.
return
}
// Only receive if there is no error.
<-ch
}
Using synctest:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
EarlyReturn()
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 22 [chan send (durable), synctest bubble 1]:
sandbox.EarlyReturn.func1()
/tmp/sandbox/main_test.go:35 +0x45
created by sandbox.EarlyReturn in goroutine 21
/tmp/sandbox/main_test.go:32 +0x5f
Using goroutineleak:
func main() {
printLeaks(func() {
EarlyReturn()
})
}
goroutine 7 [chan send (leaked)]:
main.EarlyReturn.func1()
/tmp/sandbox/main.go:35 +0x45
created by main.EarlyReturn in goroutine 1
/tmp/sandbox/main.go:32 +0x67
The remedy is to make the channel buffered, allowing the child goroutine to send its value without blocking, even if the parent doesn't immediately receive it.
Cancel/Timeout
Similar to "early return," if the parent's context is canceled before receiving a value from a child goroutine, the child may leak.
func Canceled(ctx context.Context) {
ch := make(chan any) // FIX: Should be buffered.
go func() {
res, _ := work(100)
// Leaks if the parent goroutine gets canceled.
ch <- res
}()
// Wait for the result or for cancellation.
select {
case <-ctx.Done():
// The child goroutine leaks.
return
case res := <-ch:
// Process the result.
_ = res
}
}
Using synctest:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
ctx, cancel := context.WithCancel(t.Context())
cancel() // Immediately cancel the context
Canceled(ctx)
time.Sleep(time.Second) // Give time for operations to settle
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 22 [chan send (durable), synctest bubble 1]:
sandbox.Canceled.func1()
/tmp/sandbox/main_test.go:35 +0x45
created by sandbox.Canceled in goroutine 21
/tmp/sandbox/main_test.go:32 +0x76
Using goroutineleak:
func main() {
printLeaks(func() {
ctx, cancel := context.WithCancel(context.Background())
cancel() // Immediately cancel the context
Canceled(ctx)
})
}
goroutine 19 [chan send (leaked)]:
main.Canceled.func1()
/tmp/sandbox/main.go:35 +0x45
created by main.Canceled in goroutine 1
/tmp/sandbox/main.go:32 +0x7b
Again, a buffered channel is the fix, allowing the child goroutine to send its result even if the parent's context is canceled before a read occurs.
Take First
A parent goroutine launches multiple child goroutines but is only interested in the first result, causing the remaining children to leak. A leak also occurs if the input is empty and the parent waits indefinitely.
func TakeFirst(items []any) {
ch := make(chan any)
// Iterate over every item.
for _, item := range items {
go func() {
ch <- process(item)
}()
}
// Retrieve the first result. All other children leak.
// Also, the parent leaks if len(items) == 0 because it waits for a send that never comes.
<-ch
}
Using synctest (zero items, parent leaks):
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
go TakeFirst(nil) // Call in a goroutine to not block the test bubble directly
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 22 [chan receive (durable), synctest bubble 1]:
sandbox.TakeFirst({0x0, 0x0, 0x0?})
/tmp/sandbox/main_test.go:40 +0xdd
created by sandbox.Test.func1 in goroutine 21
/tmp/sandbox/main_test.go:44 +0x1a
Using synctest (multiple items, children leak):
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
go TakeFirst([]any{11, 22, 33}) // Call in a goroutine
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 10 [chan send (durable), synctest bubble 1]:
sandbox.TakeFirst.func1()
/tmp/sandbox/main_test.go:35 +0x2e
created by sandbox.TakeFirst in goroutine 9
/tmp/sandbox/main_test.go:34 +0x51
goroutine 11 [chan send (durable), synctest bubble 1]:
sandbox.TakeFirst.func1()
/tmp/sandbox/main_test.go:35 +0x2e
created by sandbox.TakeFirst in goroutine 9
/tmp/sandbox/main_test.go:34 +0x51
Using goroutineleak (zero items, parent leaks):
func main() {
printLeaks(func() {
go TakeFirst(nil)
})
}
goroutine 19 [chan receive (leaked)]:
main.TakeFirst({0x0, 0x0, 0x0?})
/tmp/sandbox/main.go:40 +0xeb
created by main.main.func1 in goroutine 1
/tmp/sandbox/main.go:44 +0x1a
Using goroutineleak (multiple items, children leak):
func main() {
printLeaks(func() {
go TakeFirst([]any{11, 22, 33})
})
}
goroutine 20 [chan send (leaked)]:
main.TakeFirst.func1()
/tmp/sandbox/main.go:35 +0x2e
created by main.TakeFirst in goroutine 19
/tmp/sandbox/main.go:34 +0x51
goroutine 21 [chan send (leaked)]:
main.TakeFirst.func1()
/tmp/sandbox/main.go:35 +0x2e
created by main.TakeFirst in goroutine 19
/tmp/sandbox/main.go:34 +0x51
The fix involves handling empty input slices by returning early and making the channel buffered to accommodate all potential results.
func TakeFirst(items []any) {
if len(items) == 0 {
// Return early if the source collection is empty.
return
}
// Make the channel's buffer large enough.
ch := make(chan any, len(items))
// Iterate over every item
for _, item := range items {
go func() {
ch <- process(item)
}()
}
// Retrieve first result.
<-ch
}
Orphans
Inner goroutines leak because a client fails to adhere to a type's interface contract, often by not properly stopping a background worker.
Consider a Worker type with the following contract:
// A worker processes a queue of items one by one in the background.
// A started worker must eventually be stopped.
// Failing to stop a worker results in a goroutine leak.
type Worker struct {
// ...
}
// NewWorker creates a new worker.
func NewWorker() *Worker
// Start starts the processing.
func (w *Worker) Start()
// Stop stops the processing.
func (w *Worker) Stop()
// Push adds an item to the processing queue.
func (w *Worker) Push(item any)
If a client neglects to call Stop on the worker:
func Orphans() {
w := NewWorker()
w.Start()
// defer w.Stop() // FIX: Uncomment to resolve by stopping the worker.
items := make([]any, 10)
for _, item := range items {
w.Push(item)
}
}
The worker's goroutines will leak as documented.
Using synctest:
func Test(t *testing.T) {
synctest.Test(t, func(t *testing.T) {
Orphans()
synctest.Wait()
})
}
panic: deadlock: main bubble goroutine has exited but blocked goroutines remain
goroutine 10 [select (durable), synctest bubble 1]:
sandbox.(*Worker).run(0xc00009c190)
/tmp/sandbox/main_test.go:113 +0xcc
created by sandbox.(*Worker).Start.func1 in goroutine 9
/tmp/sandbox/main_test.go:89 +0xb6
goroutine 11 [select (durable), synctest bubble 1]:
sandbox.(*Worker).run(0xc00009c190)
/tmp/sandbox/main_test.go:113 +0xcc
created by sandbox.(*Worker).Start.func1 in goroutine 9
/tmp/sandbox/main_test.go:90 +0xf6
Using goroutineleak:
func main() {
printLeaks(func() {
Orphans()
})
}
goroutine 19 [select (leaked)]:
main.(*Worker).run(0x147fe4630000)
/tmp/sandbox/main.go:112 +0xce
created by main.(*Worker).Start.func1 in goroutine 1
/tmp/sandbox/main.go:88 +0xba
goroutine 20 [select (leaked)]:
main.(*Worker).run(0x147fe4630000)
/tmp/sandbox/main.go:112 +0xce
created by main.(*Worker).Start.func1 in goroutine 1
/tmp/sandbox/main.go:89 +0x105
The resolution is to ensure the Worker's Stop method is always called, typically using defer, to gracefully shut down its internal goroutines.
Final Thoughts
Thanks to continuous improvements in Go versions 1.24 through 1.26, detecting goroutine leaks has become significantly easier, whether during development testing or in production environments.
The synctest package, introduced as experimental in Go 1.24 and made production-ready in Go 1.25+, offers a powerful way to test for concurrency issues without relying on external libraries or arbitrary delays.
The goroutineleak profile, an experimental feature in Go 1.26, provides a robust mechanism for identifying leaked goroutines using insights from the garbage collector. While currently marked experimental to gather API feedback, its underlying implementation is considered production-ready.
These tools represent a crucial step forward for Go developers in building more reliable and efficient concurrent applications.