Go 分布式缓存实战——groupcache、ristretto 的使用与选型分析

老张2026/4/30大约 6 分钟

Go 分布式缓存实战——groupcache、ristretto 的使用与选型分析

适读人群：需要在 Go 服务里做本地缓存或分布式缓存的工程师 | 阅读时长：约15分钟 | 核心价值：理解 Go 生态里缓存库的选型逻辑，避免缓存击穿和雪崩

因为缓存选型错误，把数据库打垮的那次事故

两年前我们做了一个商品详情接口的缓存优化。接口的数据库查询很慢（大约 50ms），高峰期流量很大，我就加了一层本地内存缓存。

当时我用了一个最简单的方案——sync.Map + 手写 TTL 检查。

问题出在缓存过期的处理上：我的逻辑是"缓存 key 过期时，从数据库查询新数据"。高峰期时，某个热点商品的缓存同时过期，有几十个并发请求同时发现缓存不存在，然后几十个请求同时去查数据库，把数据库 QPS 瞬间打高了 10 倍，触发了数据库的连接池超时……

这就是经典的缓存击穿（thundering herd）。

那次事故让我认认真真研究了 Go 生态里的缓存库，今天把这些经验整理出来。

Go 缓存库的选型地图

库	类型	特点	适合场景
groupcache	分布式	Google 开源，自动 singleflight	多实例共享缓存，防击穿
ristretto	本地高性能	基于 TinyLFU，高命中率	单机高性能本地缓存
bigcache	本地大容量	GC 友好，适合海量 key	存储大量小对象
go-cache	本地简单	简单易用	中小项目本地缓存
freecache	本地固定内存	固定内存使用量	内存使用要严格控制

ristretto：高性能本地缓存

ristretto 是 Dgraph 团队开发的本地缓存库，基于 TinyLFU 算法，命中率比 LRU 高很多，而且对 GC 压力很低。

package cache

import (
    "fmt"
    "time"

    "github.com/dgraph-io/ristretto"
)

// ProductCache 商品缓存
type ProductCache struct {
    cache *ristretto.Cache
}

func NewProductCache() (*ProductCache, error) {
    cache, err := ristretto.NewCache(&ristretto.Config{
        // NumCounters：估计的最大 key 数量 * 10
        // 如果缓存里最多放 100 万 key，这里设 1000 万
        NumCounters: 1e7,

        // MaxCost：最大缓存容量（字节）
        // 这里设 1GB
        MaxCost: 1 << 30,

        // BufferItems：内部 channel 大小，64 通常足够
        BufferItems: 64,

        // Metrics：是否开启指标（会有轻微性能开销）
        Metrics: true,
    })
    if err != nil {
        return nil, err
    }
    return &ProductCache{cache: cache}, nil
}

type Product struct {
    ID    string
    Name  string
    Price float64
    Stock int
}

func (c *ProductCache) Set(product *Product, ttl time.Duration) {
    // cost：这个 key 消耗的"资源单位"
    // 简单做法：用实际字节大小（需要估算）
    // 或者直接用 1 代表"一个条目"
    c.cache.SetWithTTL(product.ID, product, 1, ttl)
}

func (c *ProductCache) Get(id string) (*Product, bool) {
    value, found := c.cache.Get(id)
    if !found {
        return nil, false
    }
    product, ok := value.(*Product)
    return product, ok
}

func (c *ProductCache) Delete(id string) {
    c.cache.Del(id)
}

// PrintStats 打印缓存统计信息
func (c *ProductCache) PrintStats() {
    m := c.cache.Metrics
    fmt.Printf("缓存命中率: %.2f%%\n", m.Ratio()*100)
    fmt.Printf("命中次数: %d, 未命中: %d\n", m.Hits(), m.Misses())
    fmt.Printf("淘汰次数: %d\n", m.KeysEvicted())
}

groupcache：防缓存击穿的分布式缓存

groupcache 是 Go 生态里最优雅的防缓存击穿方案。它用 singleflight 确保同一时间对同一个 key 只有一个回源请求，其他请求等待第一个请求的结果。

package groupcache

import (
    "context"
    "fmt"
    "time"

    "github.com/golang/groupcache"
)

// 初始化 groupcache pool（多节点时，这里配置所有节点地址）
var pool *groupcache.HTTPPool

func InitGroupCache(selfAddr string, peers []string) {
    pool = groupcache.NewHTTPPool("http://" + selfAddr)
    pool.Set(peers...)  // 设置所有节点地址
}

// 定义一个 group：商品详情
var productGroup = groupcache.NewGroup(
    "products",
    64<<20, // 64MB
    groupcache.GetterFunc(func(ctx context.Context, key string, dest groupcache.Sink) error {
        // 这个函数只在缓存 miss 时调用
        // groupcache 保证：对同一个 key，即使有 1000 个并发请求，这个函数只执行一次
        fmt.Printf("从数据库加载商品: %s\n", key)

        product, err := loadProductFromDB(ctx, key)
        if err != nil {
            return err
        }

        data, err := json.Marshal(product)
        if err != nil {
            return err
        }

        // 设置数据（groupcache 不支持 TTL，数据一旦加载会一直在）
        return dest.SetBytes(data)
    }),
)

// 对外的获取接口
func GetProduct(ctx context.Context, id string) (*Product, error) {
    var data []byte
    err := productGroup.Get(ctx, id, groupcache.AllocatingByteSliceSink(&data))
    if err != nil {
        return nil, err
    }

    var product Product
    if err := json.Unmarshal(data, &product); err != nil {
        return nil, err
    }
    return &product, nil
}

func loadProductFromDB(ctx context.Context, id string) (*Product, error) {
    // 模拟数据库查询
    time.Sleep(50 * time.Millisecond)
    return &Product{ID: id, Name: "商品" + id, Price: 99.9}, nil
}

手写 singleflight：最小化防击穿实现

如果不想引入 groupcache，用 Go 标准库的 singleflight 包也能轻松实现防击穿：

package cache

import (
    "context"
    "encoding/json"
    "fmt"
    "sync"
    "time"

    "golang.org/x/sync/singleflight"
)

type SmartCache struct {
    mu    sync.RWMutex
    data  map[string]*cacheEntry
    sfg   singleflight.Group // 防击穿
    loader func(ctx context.Context, key string) (interface{}, error)
}

type cacheEntry struct {
    value     interface{}
    expiresAt time.Time
}

func NewSmartCache(loader func(ctx context.Context, key string) (interface{}, error)) *SmartCache {
    c := &SmartCache{
        data:   make(map[string]*cacheEntry),
        loader: loader,
    }

    // 后台定期清理过期 key
    go c.cleanup()
    return c
}

func (c *SmartCache) Get(ctx context.Context, key string, ttl time.Duration) (interface{}, error) {
    // 先从缓存读
    c.mu.RLock()
    entry, ok := c.data[key]
    c.mu.RUnlock()

    if ok && time.Now().Before(entry.expiresAt) {
        return entry.value, nil
    }

    // 缓存不存在或已过期，用 singleflight 确保只有一个请求回源
    // singleflight.Do 会等待第一个请求完成，然后所有等待者共享结果
    val, err, _ := c.sfg.Do(key, func() (interface{}, error) {
        value, err := c.loader(ctx, key)
        if err != nil {
            return nil, err
        }

        c.mu.Lock()
        c.data[key] = &cacheEntry{
            value:     value,
            expiresAt: time.Now().Add(ttl),
        }
        c.mu.Unlock()

        return value, nil
    })

    return val, err
}

func (c *SmartCache) Invalidate(key string) {
    c.mu.Lock()
    delete(c.data, key)
    c.mu.Unlock()
}

func (c *SmartCache) cleanup() {
    ticker := time.NewTicker(1 * time.Minute)
    for range ticker.C {
        now := time.Now()
        c.mu.Lock()
        for k, v := range c.data {
            if now.After(v.expiresAt) {
                delete(c.data, k)
            }
        }
        c.mu.Unlock()
    }
}

三个踩坑实录

坑一：缓存击穿（singleflight 解了）

现象：热点 key 过期瞬间，大量并发请求同时回源数据库。

解法：用 singleflight.Group.Do，确保同一时间同一 key 只有一个 goroutine 在回源，其他 goroutine 等待并共享结果。这是防击穿的标准方案。

坑二：ristretto 的 Set 不是立即生效的

现象：调用 cache.Set(key, value, 1, ttl) 后立刻 cache.Get(key)，发现 key 不存在。

原因：ristretto 为了高性能，Set 操作是异步的——数据先进入 buffer，然后由后台 goroutine 写入实际缓存。文档里有说明但很容易忽略。

解法：在 Set 后需要立即 Get 的场景，调用 cache.SetWithTTL() 后等待一个短暂的时间（比如 time.Sleep(10 * time.Millisecond)），或者改用同步的缓存库（如 go-cache）。

对于正常业务场景（Set 后不立即 Get），这个异步行为没问题，反而提升了性能。

坑三：缓存雪崩——大量 key 同时过期

现象：某个时间点缓存全部失效，数据库瞬间被大量请求打满。

原因：预热缓存时，所有 key 设置了相同的 TTL（比如 1 小时），1 小时后同时失效。

解法：在 TTL 上加随机抖动：

// 不好：所有 key 同一时间过期
cache.Set(key, value, 1*time.Hour)

// 好：加 ±10% 的随机抖动
baseTTL := 1 * time.Hour
jitter := time.Duration(rand.Int63n(int64(baseTTL / 5))) // 0~12分钟
cache.Set(key, value, baseTTL+jitter)

Java 对比

Java 里最常用的本地缓存是 Caffeine（基于 W-TinyLFU，和 ristretto 思路类似），分布式缓存用 Redis + Jedis/Lettuce。

Go 里 ristretto 的算法和 Caffeine 类似，功能上比较接近。但 Go 没有对应 Caffeine 的成熟度——Caffeine 在 Java 生态里经过了非常多的生产验证。

groupcache 是 Go 生态独特的存在，Java 里没有直接对应物。它把"防缓存击穿"和"分布式一致性哈希"合在一个库里，非常适合无状态的 Go 服务集群场景。

选型建议

单机服务 + 缓存容量 < 1GB：go-cache 或 ristretto，简单易用
单机服务 + 高性能要求：ristretto（TinyLFU，高命中率）
多实例服务 + 需要防击穿：groupcache
数据量极大（几亿 key）：bigcache 或 freecache（GC 友好）
跨服务共享缓存：Redis（这就超出 Go 本地缓存的范畴了）