codeflash-agent/languages/go/plugin/skills/pprof-profiling/SKILL.md
m-ali-24 044b2f190a
[FEAT] golang agents (#11)
* go base

* missing javascript

---------

Co-authored-by: ali <--global>
2026-04-14 18:55:36 -05:00

3.7 KiB

name description allowed-tools
pprof-profiling Quick reference for Go pprof profiling. Use when you need to profile CPU, memory, goroutines, or contention in a Go project.
Bash
Read
Write
Grep
Glob

CPU Profiling

# Via benchmarks
go test -bench=. -cpuprofile=cpu.prof -benchtime=5s ./path/to/pkg/...

# Via tests
go test -cpuprofile=cpu.prof -run TestTarget ./path/to/pkg/...

# Analyze
go tool pprof -top -cum cpu.prof          # ranked by cumulative time
go tool pprof -top -flat cpu.prof         # ranked by self time
go tool pprof -list=FuncName cpu.prof     # source annotation

Memory Profiling

# Allocation profile
go test -bench=. -memprofile=mem.prof -benchmem -count=5 ./path/to/pkg/...

# Analyze
go tool pprof -top -alloc_space mem.prof     # total bytes allocated
go tool pprof -top -alloc_objects mem.prof   # allocation count (GC pressure)
go tool pprof -top -inuse_space mem.prof     # currently live
go tool pprof -list=FuncName mem.prof        # source annotation

Escape Analysis

go build -gcflags='-m' ./...          # basic
go build -gcflags='-m -m' ./...       # detailed reasons

GC Trace

GODEBUG=gctrace=1 go test -bench=BenchmarkTarget -benchtime=5s ./... 2>&1 | grep '^gc'

Concurrency Profiling

# Block profile (where goroutines wait)
go test -bench=. -blockprofile=block.prof ./...
go tool pprof -top block.prof

# Mutex contention
go test -bench=. -mutexprofile=mutex.prof ./...
go tool pprof -top mutex.prof

# Runtime trace (per-goroutine timeline)
go test -trace=trace.out ./...
go tool trace trace.out

Comparing Benchmarks with benchstat

# Install benchstat
go install golang.org/x/perf/cmd/benchstat@latest

# Run before
go test -bench=. -benchmem -count=10 ./... > old.txt

# Make changes, then run after
go test -bench=. -benchmem -count=10 ./... > new.txt

# Compare
benchstat old.txt new.txt

Output: name old ns/op new ns/op delta with statistical significance (p-value).

Compiler Insights

# What gets inlined
go build -gcflags='-m' ./... 2>&1 | grep 'inlining'

# Bounds check elimination
go build -gcflags='-d=ssa/check_bce/debug=1' ./... 2>&1 | grep 'Found'

From a Running Server

Add import _ "net/http/pprof" and expose on a debug port:

# CPU profile (30 seconds)
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Heap profile
go tool pprof http://localhost:6060/debug/pprof/heap

# Goroutine dump
go tool pprof http://localhost:6060/debug/pprof/goroutine

# Mutex contention
go tool pprof http://localhost:6060/debug/pprof/mutex

Load Testing During Profiling

# vegeta: constant rate attack with latency distribution
echo "GET http://localhost:8080/api" | vegeta attack -rate=100 -duration=30s | vegeta report

# wrk: max throughput
wrk -t4 -c100 -d30s http://localhost:8080/api

# Profile during load test
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
go tool pprof http://localhost:6060/debug/pprof/heap

GC Tuning Quick Reference

GOGC=100          # Default: GC when heap doubles
GOGC=off          # Disable GC (batch jobs only)
GOMEMLIMIT=1GiB   # Soft memory limit, GC adapts (Go 1.19+)

Key Rules

  1. Always use -count=5 or higher for benchstat to have enough samples
  2. Always use -benchmem to see allocation metrics alongside timing
  3. -benchtime=5s for stable CPU profiles (default 1s may be noisy)
  4. Race detector (go test -race) after any concurrency change — non-negotiable
  5. Suppress benchmark variance: Pin to cores (taskset -c 2-3), set CPU governor to performance, disable Turbo Boost
  6. CV > 15% means the benchmark is unreliable — re-run with more iterations or fix the noise source