Go 性能压测实战——wrk、vegeta、go-bench 工具链完整使用

老张2026/4/30大约 7 分钟

Go 性能压测实战——wrk、vegeta、go-bench 工具链完整使用

适读人群：Go开发工程师、后端工程师 | 阅读时长：约14分钟 | 核心价值：掌握Go生态的主流压测工具wrk、vegeta、go-bench的使用方法，以及如何用Go编写自定义压测程序

用wrk发现了一个Node.js服务的性能问题

2022年我们有一个历史遗留的Node.js API网关服务，负责转发请求到后端微服务。随着流量增长，这个服务偶发超时，但用JMeter压测时发现不了问题——JMeter测出来TPS 2000，P99 180ms，看起来很好。

一个Go工程师小林说："你让我试试wrk。"

他在一台机器上跑：

wrk -t12 -c400 -d30s http://api-gateway.example.com/api/health

直接报了大量连接错误，TPS只有400，P99 1800ms。

我们惊讶：同样的接口，JMeter说TPS 2000，wrk说TPS 400，相差5倍。

原因找到了：JMeter默认每个线程和服务器建立一个持久TCP连接（Keep-Alive），整个测试期间400线程只建立400个TCP连接。而wrk启动12个线程，每个线程管理33个连接，连接是异步的，每个请求结束后连接可能被关闭，触发大量的TCP建连。Node.js的事件循环在处理大量并发TCP建连时出现了性能瓶颈。

这是一个真实生产中的问题：真实流量里每个用户的请求都是新连接，而不是持久连接。wrk的测试方式更接近真实场景。

wrk——高性能 HTTP 压测工具

安装

# macOS
brew install wrk

# Ubuntu
sudo apt-get install build-essential libssl-dev git -y
git clone https://github.com/wg/wrk.git
cd wrk && make
sudo cp wrk /usr/local/bin

# 验证
wrk --version

基本用法

# 基本语法
wrk [options] <url>

# 参数说明：
# -t  线程数（通常等于CPU核心数）
# -c  总连接数（并发）
# -d  测试持续时间
# -s  指定Lua脚本（用于自定义请求）
# --timeout  超时时间（默认无限）

# 示例：12线程，400并发，测30秒
wrk -t12 -c400 -d30s https://api.example.com/api/products/list

# 带超时
wrk -t12 -c400 -d30s --timeout 5s https://api.example.com/api/products/list

wrk 输出解读

Running 30s test @ https://api.example.com/api/products/list
  12 threads and 400 connections

  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency   145.23ms   48.92ms  892.34ms   68.20%
    Req/Sec   231.45     42.18    345.00     72.00%

  Latency Distribution
     50%  132.45ms   ← P50
     75%  165.23ms
     90%  198.45ms
     99%  389.12ms   ← P99

  83142 requests in 30.06s, 48.32MB read
  Socket errors: connect 0, read 0, write 0, timeout 12
  Requests/sec:   2765.89    ← TPS
  Transfer/sec:      1.61MB

注意Socket errors：任何数量的timeout或connect错误都需要关注，说明有连接超时问题。

Lua 脚本：自定义请求

wrk本身只能发GET请求，发POST请求需要Lua脚本：

-- post_order.lua

-- 初始化：生成测试数据
function init(args)
    math.randomseed(os.time())
end

-- 每次请求前执行，设置请求内容
function request()
    -- 随机商品ID
    local product_id = math.random(1001, 10000)
    local user_id = math.random(100001, 200000)
    
    local body = string.format(
        '{"userId":%d,"productId":%d,"quantity":1}',
        user_id,
        product_id
    )
    
    return wrk.format("POST", "/api/order/create", {
        ["Content-Type"] = "application/json",
        ["Authorization"] = "Bearer test-token-123",
        ["Content-Length"] = tostring(#body)
    }, body)
end

-- 每个响应后执行，用于统计或断言
function response(status, headers, body)
    if status ~= 200 then
        print("Error: status=" .. status .. ", body=" .. body)
    end
end

-- 测试结束后执行，输出自定义统计
function done(summary, latency, requests)
    io.write("-----Custom Stats-----\n")
    io.write(string.format("  Success: %d (%.2f%%)\n",
        summary.requests - summary.errors.status,
        (1 - summary.errors.status / summary.requests) * 100
    ))
    io.write(string.format("  P99: %.2fms\n", latency:percentile(99) / 1000))
    io.write(string.format("  P999: %.2fms\n", latency:percentile(99.9) / 1000))
end

使用：

wrk -t12 -c400 -d60s -s post_order.lua https://api.example.com

vegeta——声明式负载测试工具

vegeta的核心特点是精确控制RPS（到达率），而不是控制并发数。这和constant-arrival-rate模式类似，是测试系统在特定QPS下性能的好工具。

安装

# macOS
brew install vegeta

# Go安装
go install github.com/tsenart/vegeta@latest

# 直接下载二进制
wget https://github.com/tsenart/vegeta/releases/latest/download/vegeta_linux_amd64.tar.gz
tar -xzf vegeta_linux_amd64.tar.gz

基本用法

# 100 RPS，持续60秒
echo "GET https://api.example.com/api/products/1001" | \
  vegeta attack -rate=100 -duration=60s | \
  vegeta report

# 结果输出：
# Requests      [total, rate, throughput]         6000, 100.02, 99.98/s
# Duration      [total, attack, wait]             1m0.006s, 59.99s, 15.788ms
# Latencies     [min, mean, 50, 90, 95, 99, max]  12.453ms, 45.234ms, 38.12ms, 89.23ms, 115.45ms, 234.56ms, 1.234s
# Bytes In      [total, mean]                     1234567, 205.76
# Bytes Out     [total, mean]                     0, 0
# Success       [ratio]                           99.97%
# Status Codes  [code:count]                      200:5998  503:2
# Error Set:
# 503 Service Unavailable

阶梯加压

# 用 vegeta attack 的 -rate 结合 seq 实现阶梯加压
for rate in 100 200 300 400 500; do
    echo "Testing at ${rate} RPS..."
    echo "GET https://api.example.com/api/order/create" | \
        vegeta attack -rate=${rate} -duration=30s | \
        vegeta report -type=text | \
        grep -E "Latencies|Success|Rate" | \
        sed "s/^/  [$rate RPS] /"
    sleep 5  # 间隔5秒让服务恢复
done

POST 请求

# 使用targets文件定义请求
cat > targets.txt << 'EOF'
POST https://api.example.com/api/order/create
Content-Type: application/json
Authorization: Bearer test-token

{"userId": 10001, "productId": 1001, "quantity": 1}
EOF

vegeta attack -targets=targets.txt -rate=200 -duration=60s | vegeta report

动态参数需要用vegeta的-targets选项配合文件，或者用Go写自定义的attack程序。

go-bench——用 Go 写自定义压测

当wrk和vegeta不够用时，直接用Go写压测程序，完全掌控测试逻辑。

完整的 Go 压测程序

package main

import (
    "bytes"
    "context"
    "encoding/json"
    "fmt"
    "io"
    "math/rand"
    "net/http"
    "os"
    "sort"
    "sync"
    "sync/atomic"
    "time"
)

type BenchmarkConfig struct {
    TargetURL   string
    Concurrency int
    Duration    time.Duration
    RateLimit   int // 0表示不限制
}

type BenchmarkResult struct {
    TotalRequests   int64
    SuccessRequests int64
    FailedRequests  int64
    Latencies       []time.Duration
    StartTime       time.Time
    EndTime         time.Time
}

type OrderRequest struct {
    UserID    int64 `json:"userId"`
    ProductID int64 `json:"productId"`
    Quantity  int   `json:"quantity"`
}

func benchmark(cfg BenchmarkConfig) *BenchmarkResult {
    result := &BenchmarkResult{
        Latencies: make([]time.Duration, 0, 100000),
        StartTime: time.Now(),
    }
    
    var mu sync.Mutex
    var wg sync.WaitGroup
    
    ctx, cancel := context.WithTimeout(context.Background(), cfg.Duration)
    defer cancel()
    
    // 速率限制器
    var rateLimiter <-chan time.Time
    if cfg.RateLimit > 0 {
        ticker := time.NewTicker(time.Second / time.Duration(cfg.RateLimit/cfg.Concurrency))
        defer ticker.Stop()
        rateLimiter = ticker.C
    }
    
    // HTTP客户端池
    client := &http.Client{
        Timeout: 10 * time.Second,
        Transport: &http.Transport{
            MaxIdleConnsPerHost: cfg.Concurrency,
            MaxConnsPerHost:     cfg.Concurrency,
            IdleConnTimeout:     90 * time.Second,
        },
    }
    
    // 启动并发worker
    for i := 0; i < cfg.Concurrency; i++ {
        wg.Add(1)
        go func(workerID int) {
            defer wg.Done()
            
            for {
                select {
                case <-ctx.Done():
                    return
                default:
                    if rateLimiter != nil {
                        <-rateLimiter
                    }
                    
                    // 构造请求
                    orderReq := OrderRequest{
                        UserID:    int64(rand.Intn(100000) + 10001),
                        ProductID: int64(rand.Intn(10000) + 1001),
                        Quantity:  rand.Intn(3) + 1,
                    }
                    
                    body, _ := json.Marshal(orderReq)
                    req, _ := http.NewRequestWithContext(ctx, "POST",
                        cfg.TargetURL+"/api/order/create",
                        bytes.NewReader(body))
                    req.Header.Set("Content-Type", "application/json")
                    req.Header.Set("Authorization", "Bearer test-token")
                    
                    // 发送请求并计时
                    start := time.Now()
                    resp, err := client.Do(req)
                    latency := time.Since(start)
                    
                    atomic.AddInt64(&result.TotalRequests, 1)
                    
                    if err != nil || resp.StatusCode >= 400 {
                        atomic.AddInt64(&result.FailedRequests, 1)
                        if resp != nil {
                            io.Copy(io.Discard, resp.Body)
                            resp.Body.Close()
                        }
                    } else {
                        atomic.AddInt64(&result.SuccessRequests, 1)
                        io.Copy(io.Discard, resp.Body)
                        resp.Body.Close()
                    }
                    
                    mu.Lock()
                    result.Latencies = append(result.Latencies, latency)
                    mu.Unlock()
                }
            }
        }(i)
    }
    
    wg.Wait()
    result.EndTime = time.Now()
    return result
}

func (r *BenchmarkResult) Print() {
    duration := r.EndTime.Sub(r.StartTime).Seconds()
    sort.Slice(r.Latencies, func(i, j int) bool {
        return r.Latencies[i] < r.Latencies[j]
    })
    
    total := len(r.Latencies)
    percentile := func(p float64) time.Duration {
        idx := int(float64(total) * p / 100)
        if idx >= total {
            idx = total - 1
        }
        return r.Latencies[idx]
    }
    
    tps := float64(r.TotalRequests) / duration
    errorRate := float64(r.FailedRequests) / float64(r.TotalRequests) * 100
    
    fmt.Printf("\n========== Benchmark Results ==========\n")
    fmt.Printf("Duration:          %.2fs\n", duration)
    fmt.Printf("Total Requests:    %d\n", r.TotalRequests)
    fmt.Printf("Successful:        %d\n", r.SuccessRequests)
    fmt.Printf("Failed:            %d\n", r.FailedRequests)
    fmt.Printf("Error Rate:        %.3f%%\n", errorRate)
    fmt.Printf("TPS:               %.1f req/s\n", tps)
    fmt.Printf("\nLatency Distribution:\n")
    fmt.Printf("  P50:  %v\n", percentile(50))
    fmt.Printf("  P75:  %v\n", percentile(75))
    fmt.Printf("  P90:  %v\n", percentile(90))
    fmt.Printf("  P95:  %v\n", percentile(95))
    fmt.Printf("  P99:  %v\n", percentile(99))
    fmt.Printf("  P999: %v\n", percentile(99.9))
    fmt.Printf("  Max:  %v\n", r.Latencies[total-1])
    fmt.Printf("======================================\n")
}

func main() {
    targetURL := "https://api.example.com"
    if len(os.Args) > 1 {
        targetURL = os.Args[1]
    }
    
    cfg := BenchmarkConfig{
        TargetURL:   targetURL,
        Concurrency: 200,
        Duration:    10 * time.Minute,
        RateLimit:   0,  // 不限制速率
    }
    
    fmt.Printf("Starting benchmark: %d concurrent, %v duration\n",
        cfg.Concurrency, cfg.Duration)
    
    result := benchmark(cfg)
    result.Print()
}