Go's concurrency primitives—goroutines and channels—are often praised for their simplicity, but that simplicity can be deceptive. Many teams, especially those new to the language, encounter subtle bugs that are hard to reproduce and debug. This guide identifies the most common concurrency pitfalls in Go and offers practical, field-tested advice for avoiding them. We focus on the why behind each pitfall, not just the what, and provide concrete steps you can apply today.
This overview reflects widely shared professional practices as of May 2026; verify critical details against current official guidance where applicable.
The Hidden Costs of Goroutine Leaks
One of the most frequent issues in Go concurrency is goroutine leaks—goroutines that never exit, consuming memory and goroutine stack space indefinitely. This often happens when a goroutine is blocked on a channel send or receive, or waiting on a timer, but the parent routine has moved on without signaling termination.
How Leaks Occur
Consider a typical pattern: a function starts a goroutine to perform background work, but if the function returns early due to an error or timeout, the goroutine may remain blocked. For example, a goroutine sending to an unbuffered channel that no one reads from will block forever. Similarly, a goroutine waiting on a time.After inside a select statement can leak if the select exits via another case and the timer is not stopped.
Detection and Prevention
The Go runtime does not automatically detect leaked goroutines. You can use the runtime.NumGoroutine() function in tests to assert that the goroutine count returns to baseline after a function call. Tools like the race detector (-race) can help surface some leaks indirectly. The most effective prevention is to ensure every goroutine has a clear exit path: use context cancellation, close channels to signal completion, or employ a sync.WaitGroup to track lifetimes. A common pattern is to pass a context.Context to goroutines and select on ctx.Done() alongside the main work.
Another approach is to limit goroutine lifetimes with timeouts. For example, wrapping a blocking operation in a select with time.After ensures the goroutine can exit if the operation takes too long. However, be careful to stop the timer to avoid a separate leak of the timer's underlying goroutine.
In a typical project, teams often discover leaks only under load when memory usage climbs. Adding a goroutine leak check to integration tests can catch these issues early. For instance, you might write a test that starts and stops a service, then asserts that the goroutine count returns to the starting value within a few seconds.
Race Conditions and the Data Race Detector
Data races occur when two goroutines access the same variable concurrently, and at least one access is a write. Go's memory model is designed to make races undefined behavior, meaning the program may crash, produce wrong results, or behave unpredictably. The race detector is a powerful tool, but it only finds races that occur during execution, so it's not a guarantee of correctness.
Common Patterns Leading to Races
Shared mutable state is the primary culprit. A classic example is incrementing a counter from multiple goroutines without synchronization. Another is reading a map while another goroutine writes to it; maps in Go are not safe for concurrent access. Even simple operations like i++ are not atomic—they involve a read, modify, and write, which can interleave.
Using the Race Detector Effectively
Enable the race detector with go run -race or go test -race during development and in CI. It adds significant overhead (memory and CPU), so it's not suitable for production, but it should be part of every test run. The detector reports the exact line numbers and goroutine stacks involved, making it easier to fix. However, it only catches races that happen at runtime, so you need good test coverage, especially under concurrent scenarios.
Beyond the detector, the best defense is to minimize shared state. Prefer communicating via channels to sharing memory, as per Go's motto. When shared state is unavoidable, use synchronization primitives like sync.Mutex or sync.RWMutex. For simple counters, consider sync/atomic operations. For maps, use sync.Map or protect the map with a mutex.
Example: Safe Counter
Instead of a plain int incremented with i++, use atomic.AddInt64(&counter, 1) or protect with a mutex. For read-heavy workloads, an sync.RWMutex allows multiple concurrent readers while still providing exclusive write access.
Deadlocks and Livelocks: When Goroutines Get Stuck
Deadlocks occur when two or more goroutines wait on each other indefinitely. Livelocks are similar but the goroutines are actively trying to resolve the conflict yet make no progress. Both are serious issues that can freeze your program.
Classic Deadlock Scenarios
The most common deadlock in Go is when all goroutines are blocked on channel operations that can never complete. For example, a program that only sends to a channel but never receives, or vice versa. Another classic is circular lock ordering: goroutine A holds lock L1 and waits for L2, while goroutine B holds L2 and waits for L1.
Prevention Strategies
To avoid deadlocks, follow a consistent lock ordering across your codebase. Document the order and enforce it with linting rules if possible. Use channels with buffering to reduce the chance of blocking, but be aware that buffered channels can still deadlock if the buffer is full and no one is reading. Prefer using select statements with default cases to make operations non-blocking, but use this sparingly as it can lead to busy loops.
Another effective technique is to use timeouts with context.WithTimeout or select with time.After. This ensures that blocked goroutines eventually give up, preventing indefinite hangs. In practice, many teams adopt a pattern where every blocking operation has a timeout, especially in network services.
Livelocks are harder to detect because the program appears to be running. They often arise from poorly designed retry loops or backoff algorithms that cause goroutines to repeatedly yield to each other without making progress. Using randomized backoff and limiting the number of retries can help.
Improper Use of sync.WaitGroup
sync.WaitGroup is a simple and effective way to wait for a collection of goroutines to finish. However, misuse is common and can lead to panics or hangs.
Common Mistakes
The most frequent mistake is calling Add inside a goroutine that hasn't started yet, causing a race condition. The correct pattern is to call Add before launching the goroutine, often by passing the WaitGroup pointer and incrementing the counter in the launching goroutine. Another error is forgetting to call Done in every exit path of the goroutine, especially when errors occur. This causes Wait to block forever.
Best Practices
Always call wg.Add(1) before spawning the goroutine, not inside it. Use defer wg.Done() at the very start of the goroutine to ensure it's called even if the goroutine panics. If a goroutine may exit early due to an error, still ensure Done is called exactly once. Consider using a helper function that encapsulates the goroutine lifecycle.
For example:
func doWork(wg *sync.WaitGroup, data Data) {
defer wg.Done()
// process data
}Then call wg.Add(1); go doWork(&wg, data). This pattern is clear and reduces the chance of mistakes.
In scenarios where you need to wait for a dynamic number of goroutines, such as processing items from a channel, ensure that Add is called for each item before sending it to the goroutine. A common pattern is to have a producer goroutine that reads from a source and launches worker goroutines, incrementing the WaitGroup for each worker.
Channel Misuse: Buffering and Closing
Channels are a powerful synchronization primitive, but they are often misused, leading to deadlocks, panics, or data races.
Buffered vs. Unbuffered Channels
Unbuffered channels synchronize the sender and receiver—a send blocks until a receive occurs. This is useful for signaling and ensuring that work is handed off directly. However, if you use an unbuffered channel for a fan-out pattern, you may inadvertently block the sender if no receiver is ready. Buffered channels decouple send and receive up to the buffer size, but they can mask deadlocks and make it harder to reason about ordering.
A common pitfall is using a buffered channel with a large buffer to avoid blocking, but then the sender can outpace the receiver, leading to memory pressure. More subtly, if the buffer is full and the sender blocks, you may still have a deadlock if the receiver is also blocked elsewhere.
Closing Channels
Closing a channel signals that no more values will be sent. Sending on a closed channel panics. Receiving from a closed channel returns the zero value immediately (with the second return value indicating closed). A common mistake is closing a channel multiple times or closing it while other goroutines are still sending. The rule is: only the sender should close a channel, and it should close it only once.
To avoid panics, use a sync.Once or a dedicated goroutine to close the channel. Another pattern is to use a context.Context to signal cancellation instead of closing a channel, which is safer when multiple goroutines may need to signal.
When using range over a channel, the loop exits when the channel is closed. Ensure that all senders have finished before closing, otherwise a sender may panic. In a producer-consumer pattern, the producer should close the channel after all items are sent, and consumers should only receive.
Context Cancellation and Graceful Shutdown
Go's context package is the standard way to propagate cancellation signals across API boundaries and goroutines. However, it is often used incorrectly, leading to leaked goroutines or incomplete shutdowns.
Ignoring Context Cancellation
Many functions accept a context.Context but never check if it's cancelled. This means that even if the caller cancels the context, the function continues to run, potentially wasting resources. The fix is to select on ctx.Done() in any long-running operation, such as database queries, HTTP requests, or loops.
For example, in a loop that processes items, add a select:
for {
select {
case <-ctx.Done():
return ctx.Err()
default:
// process item
}
}This ensures the loop exits promptly when the context is cancelled.
Graceful Shutdown of Services
In HTTP servers and other long-running services, you need to shut down gracefully, allowing in-flight requests to complete. The standard approach is to use signal.NotifyContext to catch OS signals and then call server.Shutdown with a timeout context. However, a common pitfall is not propagating the shutdown context to all goroutines. For instance, background goroutines that are not part of the server's Shutdown will continue running, potentially holding onto resources.
To handle this, maintain a sync.WaitGroup of all goroutines and wait for them after the server shuts down. Use a root context that is cancelled on signal, and pass derived contexts to all goroutines. This ensures that every goroutine eventually exits when the service is shutting down.
Another mistake is not setting a deadline on the shutdown context. Without a deadline, the server may wait indefinitely for connections to drain. Always use context.WithTimeout to limit the shutdown duration.
Testing Concurrent Code
Testing concurrent code is notoriously difficult because non-determinism makes bugs hard to reproduce. Many teams rely on unit tests that only test sequential logic, leaving concurrency bugs to be found in production.
Writing Deterministic Tests
One approach is to abstract concurrency behind interfaces. For example, instead of directly using channels, define an interface for the communication primitive. In tests, you can inject a mock that records interactions or simulates specific orderings. This allows you to test the logic without the full concurrency.
Another technique is to use the race detector in all tests, as mentioned earlier. Additionally, write tests that run the concurrent code under stress, such as using go test -race -count=100 to run the test multiple times. This increases the chance of hitting a race condition.
Using the Go Testing Library
The testing package provides t.Parallel() to run tests in parallel, which can help surface concurrency issues. However, be careful with shared state between parallel tests—each test should have its own copy of data.
For more advanced testing, consider using the goleak library (from Uber) to detect goroutine leaks at the end of tests. It works by comparing the goroutine snapshot before and after the test, reporting any leaked goroutines. This is especially useful for integration tests that start and stop services.
Finally, consider using property-based testing (e.g., with testing/quick or rapid) to generate random sequences of operations and verify invariants. This can uncover concurrency bugs that are hard to think of manually.
FAQ: Common Concurrency Questions
Should I use channels or mutexes?
Channels are preferred for communicating between goroutines, especially when you need to signal completion or pass ownership of data. Mutexes are better for protecting shared state, such as a map or a struct field. In general, prefer channels for coordination and mutexes for data protection. However, there is overlap, and the choice often depends on the specific problem.
How do I limit the number of goroutines?
Use a worker pool pattern with a buffered channel as a semaphore. Create a channel with a capacity equal to the maximum number of goroutines. Before launching a goroutine, send a token to the channel; after the goroutine finishes, receive from the channel to release the slot. Alternatively, use a sync.WaitGroup combined with a counting semaphore.
What is the best way to handle timeouts?
Use context.WithTimeout or context.WithDeadline for most cases. For individual operations, use a select with time.After but be aware of the timer leak issue—prefer time.NewTimer and stop it after the select. For HTTP servers, use http.Server's ReadTimeout and WriteTimeout fields.
How do I detect goroutine leaks in production?
You can expose metrics via expvar or a custom HTTP endpoint that reports runtime.NumGoroutine(). Set up alerts when the count exceeds a threshold. Additionally, use profiling tools like pprof to take goroutine profiles and analyze which goroutines are blocked.
Conclusion and Next Steps
Concurrency in Go is a powerful tool, but it requires discipline to use correctly. The pitfalls covered—goroutine leaks, data races, deadlocks, improper use of sync primitives, channel misuse, ignoring context cancellation, and inadequate testing—are among the most common that teams encounter. By understanding the underlying mechanisms and applying the preventive strategies outlined here, you can write concurrent Go code that is both efficient and reliable.
To get started, review your existing codebase for these patterns. Add the race detector to your CI pipeline. Write tests that exercise concurrent behavior, and use tools like goleak to catch leaks. Finally, adopt a culture of code review that specifically looks for concurrency issues.
Remember that concurrency bugs are often subtle and may only appear under load. Invest in good testing and monitoring from the start. The Go community has developed many best practices over the years—this guide summarizes the most critical ones. For further reading, consult the official Go blog posts on concurrency and the sync package documentation.
Comments (0)
Please sign in to post a comment.
Don't have an account? Create one
No comments yet. Be the first to comment!