10 KiB
Go Networking Performance Guide
Connection Management
HTTP Transport Tuning
The default http.Transport is conservative. For high-throughput services, tune it:
transport := &http.Transport{
MaxIdleConns: 1000,
MaxConnsPerHost: 100,
IdleConnTimeout: 90 * time.Second,
ExpectContinueTimeout: 0, // skip 100-continue wait
DialContext: (&net.Dialer{
Timeout: 5 * time.Second,
KeepAlive: 30 * time.Second,
}).DialContext,
}
client := &http.Client{Transport: transport}
Critical: Always drain response bodies before closing, or Go won't reuse connections:
defer resp.Body.Close()
io.Copy(io.Discard, resp.Body)
Connection Pooling with bufio
Pool bufio.Reader/bufio.Writer to avoid per-connection allocations:
var readerPool = sync.Pool{
New: func() interface{} {
return bufio.NewReaderSize(nil, 4096)
},
}
func getReader(conn net.Conn) *bufio.Reader {
r := readerPool.Get().(*bufio.Reader)
r.Reset(conn)
return r
}
Handling 10K+ Concurrent Connections
OS-Level Tuning (Linux)
ulimit -n 200000
sysctl -w net.core.somaxconn=65535 # pending connection queue
sysctl -w net.ipv4.ip_local_port_range="10000 65535" # ephemeral port range
sysctl -w net.ipv4.tcp_tw_reuse=1 # reuse TIME_WAIT sockets
sysctl -w net.ipv4.tcp_fin_timeout=15 # reduce FIN_WAIT2 duration
Concurrency Limiting with Semaphore
var connLimiter = make(chan struct{}, 10000)
for {
conn, _ := ln.Accept()
connLimiter <- struct{}{} // acquire slot
go func(c net.Conn) {
defer func() {
c.Close()
<-connLimiter // release slot
}()
handle(c)
}(conn)
}
Real-World Benchmarks (c5.2xlarge, 8 CPU)
| Configuration | Connections | Aggregate Throughput |
|---|---|---|
| No buffering | 10,000 | 29 Mbps |
| Buffered writes | 10,000 | 232 Mbps (8x) |
| Buffered writes | 30,000 | 360 Mbps |
| Buffered + SHA256 | 30,000 | 149 Mbps (CPU-bound) |
Key insight: Buffered writes with periodic flushing improved throughput 8x. CPU-bound work (SHA256) cuts throughput by ~60%.
TLS Optimization
Session Resumption (Skip Full Handshake)
tlsConfig := &tls.Config{
SessionTicketsDisabled: false,
SessionTicketKey: [32]byte{...}, // persist and rotate
}
Benefit: Eliminates at least one RTT and asymmetric crypto operations.
Optimized Cipher Selection
tlsConfig := &tls.Config{
CipherSuites: []uint16{
tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
},
PreferServerCipherSuites: true,
CurvePreferences: []tls.CurveID{tls.CurveP256, tls.X25519},
MinVersion: tls.VersionTLS12,
NextProtos: []string{"h2", "http/1.1"}, // ALPN
}
Why these choices:
- ECDHE for forward secrecy and performance
- AES-GCM for hardware acceleration (AES-NI)
- ECDSA shorter signatures, lower CPU than RSA
- ALPN order matters: server picks first match with client
Certificate Verification Caching
var verificationCache sync.Map
func cachedCertVerifier(rawCerts [][]byte, verifiedChains [][]*x509.Certificate) error {
fingerprint := sha256.Sum256(rawCerts[0])
if _, exists := verificationCache.Load(fingerprint); exists {
return nil
}
// full verification...
verificationCache.Store(fingerprint, struct{}{})
return nil
}
Version Note
Go 1.25 improved TLS handshake performance by ~58% since Go 1.23 (TLS 1.3 fast path optimization).
DNS Performance
Resolver Selection
Go has two DNS resolvers:
- Pure-Go (
GODEBUG=netdns=go): Self-contained, no cgo, produces static binary - cgo-based (
GODEBUG=netdns=cgo): Uses libc, better LDAP/mDNS compat, adds cgo overhead
Force pure-Go for static builds and performance. Debug with GODEBUG=netdns=2.
DNS Caching
Go does NOT cache DNS by default. For latency-sensitive services:
var dnsCache = cache.New(5*time.Minute, 10*time.Minute)
func LookupWithCache(host string) ([]net.IP, error) {
if cached, found := dnsCache.Get(host); found {
return cached.([]net.IP), nil
}
ips, err := net.LookupIP(host)
if err != nil {
return nil, err
}
dnsCache.Set(host, ips, cache.DefaultExpiration)
return ips, nil
}
Custom DNS Server
var dialer = &net.Dialer{
Timeout: 5 * time.Second,
KeepAlive: 30 * time.Second,
Resolver: &net.Resolver{
PreferGo: true,
Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
return net.Dial(network, "8.8.8.8:53")
},
},
}
Protocol Selection: TCP vs HTTP/2 vs gRPC
| Property | Raw TCP | HTTP/2 | gRPC |
|---|---|---|---|
| Latency | Lowest | Low (framing overhead) | Low (+ protobuf encode) |
| Multiplexing | No | Yes (streams) | Yes (HTTP/2) |
| Schema/Types | No | No | Yes (protobuf) |
| Best for | Trading, gaming | Web APIs | Internal microservices |
Raw TCP with Length-Prefix Framing
func writeFrame(conn net.Conn, payload []byte) error {
buf := make([]byte, 4+len(payload))
binary.BigEndian.PutUint32(buf[:4], uint32(len(payload)))
copy(buf[4:], payload)
_, err := conn.Write(buf)
return err
}
func readFrame(conn net.Conn) ([]byte, error) {
lenBuf := make([]byte, 4)
if _, err := io.ReadFull(conn, lenBuf); err != nil {
return nil, err
}
payload := make([]byte, binary.BigEndian.Uint32(lenBuf))
_, err := io.ReadFull(conn, payload)
return payload, err
}
QUIC (UDP-based transport)
QUIC advantages over TCP:
- No head-of-line blocking: Independent streams per connection
- Integrated TLS 1.3: Encryption built into transport
- 0-RTT connection resumption: Send data immediately on reconnect
- Connection migration: Connection IDs persist across network changes
// Basic QUIC server (quic-go)
listener, err := quic.ListenAddr("localhost:4242", tlsConfig, nil)
for {
conn, _ := listener.Accept(context.Background())
go func(c quic.Connection) {
for {
stream, err := c.AcceptStream(context.Background())
if err != nil { return }
go handleStream(stream)
}
}(conn)
}
Status (quic-go v0.52.0): NAT rebinding works. Active interface switching (PATH_CHALLENGE/PATH_RESPONSE) not yet supported.
Low-Level Socket Optimizations
TCP_NODELAY (Disable Nagle's Algorithm)
conn.SetNoDelay(true) // for latency-critical apps
SO_REUSEPORT (Multi-Process Binding)
listenerConfig := &net.ListenConfig{
Control: func(network, address string, c syscall.RawConn) error {
return c.Control(func(fd uintptr) {
syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_REUSEPORT, 1)
})
},
}
Socket Buffer Tuning
Rule of thumb: buffer size = bandwidth × RTT (bandwidth-delay product).
conn.SetReadBuffer(recvBuf)
conn.SetWriteBuffer(sendBuf)
TCP Keepalive
conn.SetKeepAlive(true)
conn.SetKeepAlivePeriod(30 * time.Second) // controls TCP_KEEPIDLE only
// For TCP_KEEPINTVL and TCP_KEEPCNT, use raw syscalls
Resilience Patterns
Circuit Breaker
Three states: Closed (normal) → Open (fail fast) → Half-Open (test recovery).
Track failures with a sliding window. Open the circuit after N failures in a time window. Periodically allow trial requests to test if the service recovered.
Load Shedding
Passive (bounded channel):
requests := make(chan *Request, 1000)
select {
case requests <- req: // accepted
default:
conn.Close() // channel full: drop
}
Active (CPU-based):
if getCPULoad() > 0.85 {
w.Header().Set("Retry-After", "5")
w.WriteHeader(http.StatusServiceUnavailable)
return
}
Connection Lifecycle Best Practices
- Set read/write deadlines — prevent indefinite blocking:
conn.SetReadDeadline(time.Now().Add(30 * time.Second)) conn.SetWriteDeadline(time.Now().Add(30 * time.Second)) - Use context cancellation for goroutine cleanup
- Copy data from buffers to avoid retaining large backing arrays:
// BAD: retains full 4KB backing array data := buf[:n] go process(data) // GOOD: copy only needed bytes data := make([]byte, n) copy(data, buf[:n]) go process(data)
Connection Observability
Instrument each connection lifecycle phase:
- DNS resolution — measure lookup latency
- Dialing — measure connection establishment time
- TLS handshake — measure crypto negotiation time
- Request/response — measure per-request latency
Use structured logging with sampling (e.g., Zap with rate limits) to control log volume. Promote phase durations to Prometheus metrics; log only threshold breaches.
Scheduler and Netpoller
Go's runtime uses the netpoller (epoll on Linux, kqueue on macOS) for non-blocking I/O:
- Goroutine calls
conn.Read() - If FD not ready: goroutine is parked, FD registered with poller
- OS thread released to run other goroutines
- When FD ready: poller wakes goroutine
GOMAXPROCS sets the number of OS threads executing user code (default = CPU count). Tune only after measuring — the default is correct for most workloads.
Thread pinning (runtime.LockOSThread()) rarely helps on cloud infrastructure. Only beneficial on bare metal with isolated CPUs and taskset.
Alternative Libraries
For extreme performance beyond net/http:
- cloudwego/netpoll: epoll-based, event-driven, minimal GC overhead
- tidwall/evio: Non-blocking event loop, reactor pattern
Benchmarking and Load Testing
| Tool | Focus | Best For |
|---|---|---|
| vegeta | Constant-rate attack | Latency percentiles, CI benchmarking |
| wrk | Max throughput | Raw capacity, concurrency limits |
| k6 | Scenario-based | Real-world user workflows |
# Vegeta: constant 100 req/s for 30s
echo "GET http://localhost:8080/api" | vegeta attack -rate=100 -duration=30s | vegeta report
# wrk: 4 threads, 100 connections, 30s
wrk -t4 -c100 -d30s http://localhost:8080/api
# pprof during load test
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30