3.7 KiB
3.7 KiB
| name | description | allowed-tools | |||||
|---|---|---|---|---|---|---|---|
| pprof-profiling | Quick reference for Go pprof profiling. Use when you need to profile CPU, memory, goroutines, or contention in a Go project. |
|
CPU Profiling
# Via benchmarks
go test -bench=. -cpuprofile=cpu.prof -benchtime=5s ./path/to/pkg/...
# Via tests
go test -cpuprofile=cpu.prof -run TestTarget ./path/to/pkg/...
# Analyze
go tool pprof -top -cum cpu.prof # ranked by cumulative time
go tool pprof -top -flat cpu.prof # ranked by self time
go tool pprof -list=FuncName cpu.prof # source annotation
Memory Profiling
# Allocation profile
go test -bench=. -memprofile=mem.prof -benchmem -count=5 ./path/to/pkg/...
# Analyze
go tool pprof -top -alloc_space mem.prof # total bytes allocated
go tool pprof -top -alloc_objects mem.prof # allocation count (GC pressure)
go tool pprof -top -inuse_space mem.prof # currently live
go tool pprof -list=FuncName mem.prof # source annotation
Escape Analysis
go build -gcflags='-m' ./... # basic
go build -gcflags='-m -m' ./... # detailed reasons
GC Trace
GODEBUG=gctrace=1 go test -bench=BenchmarkTarget -benchtime=5s ./... 2>&1 | grep '^gc'
Concurrency Profiling
# Block profile (where goroutines wait)
go test -bench=. -blockprofile=block.prof ./...
go tool pprof -top block.prof
# Mutex contention
go test -bench=. -mutexprofile=mutex.prof ./...
go tool pprof -top mutex.prof
# Runtime trace (per-goroutine timeline)
go test -trace=trace.out ./...
go tool trace trace.out
Comparing Benchmarks with benchstat
# Install benchstat
go install golang.org/x/perf/cmd/benchstat@latest
# Run before
go test -bench=. -benchmem -count=10 ./... > old.txt
# Make changes, then run after
go test -bench=. -benchmem -count=10 ./... > new.txt
# Compare
benchstat old.txt new.txt
Output: name old ns/op new ns/op delta with statistical significance (p-value).
Compiler Insights
# What gets inlined
go build -gcflags='-m' ./... 2>&1 | grep 'inlining'
# Bounds check elimination
go build -gcflags='-d=ssa/check_bce/debug=1' ./... 2>&1 | grep 'Found'
From a Running Server
Add import _ "net/http/pprof" and expose on a debug port:
# CPU profile (30 seconds)
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap
# Goroutine dump
go tool pprof http://localhost:6060/debug/pprof/goroutine
# Mutex contention
go tool pprof http://localhost:6060/debug/pprof/mutex
Load Testing During Profiling
# vegeta: constant rate attack with latency distribution
echo "GET http://localhost:8080/api" | vegeta attack -rate=100 -duration=30s | vegeta report
# wrk: max throughput
wrk -t4 -c100 -d30s http://localhost:8080/api
# Profile during load test
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
go tool pprof http://localhost:6060/debug/pprof/heap
GC Tuning Quick Reference
GOGC=100 # Default: GC when heap doubles
GOGC=off # Disable GC (batch jobs only)
GOMEMLIMIT=1GiB # Soft memory limit, GC adapts (Go 1.19+)
Key Rules
- Always use -count=5 or higher for benchstat to have enough samples
- Always use -benchmem to see allocation metrics alongside timing
- -benchtime=5s for stable CPU profiles (default 1s may be noisy)
- Race detector (
go test -race) after any concurrency change — non-negotiable - Suppress benchmark variance: Pin to cores (
taskset -c 2-3), set CPU governor toperformance, disable Turbo Boost - CV > 15% means the benchmark is unreliable — re-run with more iterations or fix the noise source