codeflash-agent/languages/go/plugin/references/networking/guide.md
m-ali-24 044b2f190a
[FEAT] golang agents (#11)
* go base

* missing javascript

---------

Co-authored-by: ali <--global>
2026-04-14 18:55:36 -05:00

10 KiB
Raw Blame History

Go Networking Performance Guide

Connection Management

HTTP Transport Tuning

The default http.Transport is conservative. For high-throughput services, tune it:

transport := &http.Transport{
    MaxIdleConns:          1000,
    MaxConnsPerHost:       100,
    IdleConnTimeout:       90 * time.Second,
    ExpectContinueTimeout: 0, // skip 100-continue wait
    DialContext: (&net.Dialer{
        Timeout:   5 * time.Second,
        KeepAlive: 30 * time.Second,
    }).DialContext,
}
client := &http.Client{Transport: transport}

Critical: Always drain response bodies before closing, or Go won't reuse connections:

defer resp.Body.Close()
io.Copy(io.Discard, resp.Body)

Connection Pooling with bufio

Pool bufio.Reader/bufio.Writer to avoid per-connection allocations:

var readerPool = sync.Pool{
    New: func() interface{} {
        return bufio.NewReaderSize(nil, 4096)
    },
}

func getReader(conn net.Conn) *bufio.Reader {
    r := readerPool.Get().(*bufio.Reader)
    r.Reset(conn)
    return r
}

Handling 10K+ Concurrent Connections

OS-Level Tuning (Linux)

ulimit -n 200000
sysctl -w net.core.somaxconn=65535          # pending connection queue
sysctl -w net.ipv4.ip_local_port_range="10000 65535"  # ephemeral port range
sysctl -w net.ipv4.tcp_tw_reuse=1           # reuse TIME_WAIT sockets
sysctl -w net.ipv4.tcp_fin_timeout=15       # reduce FIN_WAIT2 duration

Concurrency Limiting with Semaphore

var connLimiter = make(chan struct{}, 10000)

for {
    conn, _ := ln.Accept()
    connLimiter <- struct{}{} // acquire slot
    go func(c net.Conn) {
        defer func() {
            c.Close()
            <-connLimiter // release slot
        }()
        handle(c)
    }(conn)
}

Real-World Benchmarks (c5.2xlarge, 8 CPU)

Configuration Connections Aggregate Throughput
No buffering 10,000 29 Mbps
Buffered writes 10,000 232 Mbps (8x)
Buffered writes 30,000 360 Mbps
Buffered + SHA256 30,000 149 Mbps (CPU-bound)

Key insight: Buffered writes with periodic flushing improved throughput 8x. CPU-bound work (SHA256) cuts throughput by ~60%.

TLS Optimization

Session Resumption (Skip Full Handshake)

tlsConfig := &tls.Config{
    SessionTicketsDisabled: false,
    SessionTicketKey:       [32]byte{...}, // persist and rotate
}

Benefit: Eliminates at least one RTT and asymmetric crypto operations.

Optimized Cipher Selection

tlsConfig := &tls.Config{
    CipherSuites: []uint16{
        tls.TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,
        tls.TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,
    },
    PreferServerCipherSuites: true,
    CurvePreferences:         []tls.CurveID{tls.CurveP256, tls.X25519},
    MinVersion:               tls.VersionTLS12,
    NextProtos:               []string{"h2", "http/1.1"}, // ALPN
}

Why these choices:

  • ECDHE for forward secrecy and performance
  • AES-GCM for hardware acceleration (AES-NI)
  • ECDSA shorter signatures, lower CPU than RSA
  • ALPN order matters: server picks first match with client

Certificate Verification Caching

var verificationCache sync.Map

func cachedCertVerifier(rawCerts [][]byte, verifiedChains [][]*x509.Certificate) error {
    fingerprint := sha256.Sum256(rawCerts[0])
    if _, exists := verificationCache.Load(fingerprint); exists {
        return nil
    }
    // full verification...
    verificationCache.Store(fingerprint, struct{}{})
    return nil
}

Version Note

Go 1.25 improved TLS handshake performance by ~58% since Go 1.23 (TLS 1.3 fast path optimization).

DNS Performance

Resolver Selection

Go has two DNS resolvers:

  • Pure-Go (GODEBUG=netdns=go): Self-contained, no cgo, produces static binary
  • cgo-based (GODEBUG=netdns=cgo): Uses libc, better LDAP/mDNS compat, adds cgo overhead

Force pure-Go for static builds and performance. Debug with GODEBUG=netdns=2.

DNS Caching

Go does NOT cache DNS by default. For latency-sensitive services:

var dnsCache = cache.New(5*time.Minute, 10*time.Minute)

func LookupWithCache(host string) ([]net.IP, error) {
    if cached, found := dnsCache.Get(host); found {
        return cached.([]net.IP), nil
    }
    ips, err := net.LookupIP(host)
    if err != nil {
        return nil, err
    }
    dnsCache.Set(host, ips, cache.DefaultExpiration)
    return ips, nil
}

Custom DNS Server

var dialer = &net.Dialer{
    Timeout:   5 * time.Second,
    KeepAlive: 30 * time.Second,
    Resolver: &net.Resolver{
        PreferGo: true,
        Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
            return net.Dial(network, "8.8.8.8:53")
        },
    },
}

Protocol Selection: TCP vs HTTP/2 vs gRPC

Property Raw TCP HTTP/2 gRPC
Latency Lowest Low (framing overhead) Low (+ protobuf encode)
Multiplexing No Yes (streams) Yes (HTTP/2)
Schema/Types No No Yes (protobuf)
Best for Trading, gaming Web APIs Internal microservices

Raw TCP with Length-Prefix Framing

func writeFrame(conn net.Conn, payload []byte) error {
    buf := make([]byte, 4+len(payload))
    binary.BigEndian.PutUint32(buf[:4], uint32(len(payload)))
    copy(buf[4:], payload)
    _, err := conn.Write(buf)
    return err
}

func readFrame(conn net.Conn) ([]byte, error) {
    lenBuf := make([]byte, 4)
    if _, err := io.ReadFull(conn, lenBuf); err != nil {
        return nil, err
    }
    payload := make([]byte, binary.BigEndian.Uint32(lenBuf))
    _, err := io.ReadFull(conn, payload)
    return payload, err
}

QUIC (UDP-based transport)

QUIC advantages over TCP:

  • No head-of-line blocking: Independent streams per connection
  • Integrated TLS 1.3: Encryption built into transport
  • 0-RTT connection resumption: Send data immediately on reconnect
  • Connection migration: Connection IDs persist across network changes
// Basic QUIC server (quic-go)
listener, err := quic.ListenAddr("localhost:4242", tlsConfig, nil)
for {
    conn, _ := listener.Accept(context.Background())
    go func(c quic.Connection) {
        for {
            stream, err := c.AcceptStream(context.Background())
            if err != nil { return }
            go handleStream(stream)
        }
    }(conn)
}

Status (quic-go v0.52.0): NAT rebinding works. Active interface switching (PATH_CHALLENGE/PATH_RESPONSE) not yet supported.

Low-Level Socket Optimizations

TCP_NODELAY (Disable Nagle's Algorithm)

conn.SetNoDelay(true) // for latency-critical apps

SO_REUSEPORT (Multi-Process Binding)

listenerConfig := &net.ListenConfig{
    Control: func(network, address string, c syscall.RawConn) error {
        return c.Control(func(fd uintptr) {
            syscall.SetsockoptInt(int(fd), syscall.SOL_SOCKET, syscall.SO_REUSEPORT, 1)
        })
    },
}

Socket Buffer Tuning

Rule of thumb: buffer size = bandwidth × RTT (bandwidth-delay product).

conn.SetReadBuffer(recvBuf)
conn.SetWriteBuffer(sendBuf)

TCP Keepalive

conn.SetKeepAlive(true)
conn.SetKeepAlivePeriod(30 * time.Second) // controls TCP_KEEPIDLE only
// For TCP_KEEPINTVL and TCP_KEEPCNT, use raw syscalls

Resilience Patterns

Circuit Breaker

Three states: Closed (normal) → Open (fail fast) → Half-Open (test recovery).

Track failures with a sliding window. Open the circuit after N failures in a time window. Periodically allow trial requests to test if the service recovered.

Load Shedding

Passive (bounded channel):

requests := make(chan *Request, 1000)
select {
case requests <- req:  // accepted
default:
    conn.Close()       // channel full: drop
}

Active (CPU-based):

if getCPULoad() > 0.85 {
    w.Header().Set("Retry-After", "5")
    w.WriteHeader(http.StatusServiceUnavailable)
    return
}

Connection Lifecycle Best Practices

  1. Set read/write deadlines — prevent indefinite blocking:
    conn.SetReadDeadline(time.Now().Add(30 * time.Second))
    conn.SetWriteDeadline(time.Now().Add(30 * time.Second))
    
  2. Use context cancellation for goroutine cleanup
  3. Copy data from buffers to avoid retaining large backing arrays:
    // BAD: retains full 4KB backing array
    data := buf[:n]
    go process(data)
    
    // GOOD: copy only needed bytes
    data := make([]byte, n)
    copy(data, buf[:n])
    go process(data)
    

Connection Observability

Instrument each connection lifecycle phase:

  1. DNS resolution — measure lookup latency
  2. Dialing — measure connection establishment time
  3. TLS handshake — measure crypto negotiation time
  4. Request/response — measure per-request latency

Use structured logging with sampling (e.g., Zap with rate limits) to control log volume. Promote phase durations to Prometheus metrics; log only threshold breaches.

Scheduler and Netpoller

Go's runtime uses the netpoller (epoll on Linux, kqueue on macOS) for non-blocking I/O:

  1. Goroutine calls conn.Read()
  2. If FD not ready: goroutine is parked, FD registered with poller
  3. OS thread released to run other goroutines
  4. When FD ready: poller wakes goroutine

GOMAXPROCS sets the number of OS threads executing user code (default = CPU count). Tune only after measuring — the default is correct for most workloads.

Thread pinning (runtime.LockOSThread()) rarely helps on cloud infrastructure. Only beneficial on bare metal with isolated CPUs and taskset.

Alternative Libraries

For extreme performance beyond net/http:

  • cloudwego/netpoll: epoll-based, event-driven, minimal GC overhead
  • tidwall/evio: Non-blocking event loop, reactor pattern

Benchmarking and Load Testing

Tool Focus Best For
vegeta Constant-rate attack Latency percentiles, CI benchmarking
wrk Max throughput Raw capacity, concurrency limits
k6 Scenario-based Real-world user workflows
# Vegeta: constant 100 req/s for 30s
echo "GET http://localhost:8080/api" | vegeta attack -rate=100 -duration=30s | vegeta report

# wrk: 4 threads, 100 connections, 30s
wrk -t4 -c100 -d30s http://localhost:8080/api

# pprof during load test
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30