Java 无锁编程实战——Atomic 类族与 CAS 的正确使用姿势

老张2026/4/30大约 8 分钟

Java 无锁编程实战——Atomic 类族与 CAS 的正确使用姿势

适读人群：有Java并发基础，想摆脱锁依赖、做极致性能优化的工程师 | 阅读时长：约17分钟 | 核心价值：彻底搞清楚CAS原理和Atomic类族，在真正适合无锁的场景里用对工具

那次把计数器从synchronized改成AtomicLong的经历

2021年我们做了一个全局请求计数器，用于统计各接口的调用量，每次请求进来都会increment一次。刚开始用synchronized：

private long count = 0;
public synchronized void increment() { count++; }
public synchronized long get() { return count; }

在日均流量50万的情况下，这个计数器几乎没有竞争，运行好好的。

后来业务增长，日均流量涨到了800万，峰值QPS达到8000+。计数器的synchronized方法开始出现锁竞争，用Arthas看到有大量线程BLOCKED在这里。我把它改成了AtomicLong，耗时从平均12μs降到了0.8μs，性能提升了15倍。

但不是所有场景都适合用AtomicLong代替synchronized，今天把这套东西讲透。

CAS：无锁编程的基石

CAS（Compare-And-Swap）是CPU层面的原子指令。它做的事情是：

原子地执行：
如果内存地址addr的值 == expectedValue
    则将该地址的值改为 newValue，返回true
否则
    什么都不做，返回false

在Java中，这个操作由sun.misc.Unsafe的compareAndSwapXxx方法直接映射到CPU的CMPXCHG指令（x86）或LDREX/STREX指令（ARM）。

CAS的三要素：

内存地址（V）：要修改的变量的内存位置
期望值（E）：我以为当前是什么值
新值（N）：我想改成什么值

CAS的典型使用模式（自旋）：

// 以AtomicInteger.incrementAndGet为例，理解CAS自旋的本质
public final int incrementAndGet() {
    for (;;) { // 自旋
        int current = get();         // 读当前值
        int next = current + 1;      // 计算新值
        if (compareAndSet(current, next)) // CAS尝试更新
            return next;             // 更新成功返回
        // 更新失败：说明有人先改了，重新读取再试
    }
}

这就是"乐观锁"的本质：先乐观地去做，发现有人竞争了，再重来。

Atomic 类族全景

Java java.util.concurrent.atomic包提供了一整套原子类：

基本类型：

AtomicBoolean：原子布尔
AtomicInteger：原子整数
AtomicLong：原子长整数（大多数场景的首选）

引用类型：

AtomicReference<V>：原子对象引用
AtomicStampedReference<V>：带版本号的原子引用（解决ABA问题）
AtomicMarkableReference<V>：带标记位的原子引用

数组类型：

AtomicIntegerArray
AtomicLongArray
AtomicReferenceArray<E>

字段更新器：

AtomicIntegerFieldUpdater<T>
AtomicLongFieldUpdater<T>
AtomicReferenceFieldUpdater<T,V>

高并发计数（JDK 8+）：

LongAdder：高并发计数的最优选
LongAccumulator：更通用的累加器
DoubleAdder
DoubleAccumulator

完整代码：Atomic 类族正确使用

import java.util.concurrent.atomic.*;

/**
 * Atomic 类族完整使用示例
 * 涵盖：基本类型、引用类型、ABA问题解决、LongAdder对比
 */
public class AtomicUsageDemo {
    
    // ============ AtomicInteger 常用操作 ============
    
    static void basicAtomicOps() {
        AtomicInteger ai = new AtomicInteger(10);
        
        System.out.println(ai.get());                      // 10 - 读
        System.out.println(ai.getAndSet(20));              // 10 - 读旧值后设置新值
        System.out.println(ai.incrementAndGet());          // 21 - 先加后读
        System.out.println(ai.getAndIncrement());          // 21 - 先读后加
        System.out.println(ai.addAndGet(5));               // 27 - 加5后读
        System.out.println(ai.compareAndSet(27, 100));     // true - CAS成功
        System.out.println(ai.get());                      // 100
        
        // JDK 8+ 支持 Lambda 更新
        ai.updateAndGet(v -> v * 2);                       // 200
        ai.accumulateAndGet(10, Integer::max);             // max(200, 10) = 200
    }
    
    // ============ AtomicReference：原子更新对象引用 ============
    
    record Config(String host, int port, long version) {}
    
    static class ConfigManager {
        private final AtomicReference<Config> currentConfig =
            new AtomicReference<>(new Config("localhost", 8080, 1L));
        
        /**
         * 原子性地更新配置，防止并发更新时丢失版本
         */
        public boolean updateConfig(String newHost, int newPort) {
            Config current;
            Config newConfig;
            do {
                current = currentConfig.get();
                // 新配置的版本必须比当前版本大
                newConfig = new Config(newHost, newPort, current.version() + 1);
            } while (!currentConfig.compareAndSet(current, newConfig));
            // CAS保证：version的递增是连续的，不会有版本跳跃
            return true;
        }
        
        public Config getConfig() {
            return currentConfig.get();
        }
    }
    
    // ============ ABA 问题与 AtomicStampedReference ============
    
    /**
     * 演示 ABA 问题：
     * 线程1读到A，线程2把A改成B再改回A，线程1的CAS成功了
     * 但实际上数据已经发生了变化（A → B → A），CAS无法感知
     * 
     * 典型危险场景：无锁栈的pop操作
     */
    static class ABAProblem {
        // 模拟无锁栈节点
        record Node(int val, Node next) {}
        
        private AtomicReference<Node> head = new AtomicReference<>();
        
        // 危险：存在ABA问题的pop
        public Integer unsafePop() {
            Node h;
            do {
                h = head.get();
                if (h == null) return null;
                // 问题：在这个间隙里，head可能 A→B→A
                // CAS认为没变，但next指向的已经是不同的节点链
            } while (!head.compareAndSet(h, h.next()));
            return h.val();
        }
        
        // 安全：用 AtomicStampedReference 解决ABA问题
        private final AtomicStampedReference<Node> stampedHead =
            new AtomicStampedReference<>(null, 0);
        
        public void safePush(int val) {
            int[] stamp = new int[1];
            Node current;
            Node newHead;
            do {
                current = stampedHead.get(stamp);
                newHead = new Node(val, current);
            } while (!stampedHead.compareAndSet(current, newHead, stamp[0], stamp[0] + 1));
            // 每次操作stamp+1，即使值回到A，stamp也不同，CAS一定失败
        }
        
        public Integer safePop() {
            int[] stamp = new int[1];
            Node current;
            do {
                current = stampedHead.get(stamp);
                if (current == null) return null;
            } while (!stampedHead.compareAndSet(
                current, current.next(), stamp[0], stamp[0] + 1));
            return current.val();
        }
    }
    
    // ============ LongAdder vs AtomicLong 性能对比 ============
    
    /**
     * 高并发计数：LongAdder 在高竞争下性能显著优于 AtomicLong
     * 
     * 原理：LongAdder 将计数分散到多个 Cell 中
     * 每个线程优先更新自己的 Cell，减少竞争
     * 需要读取时，把 base + 所有 Cell 加起来
     */
    static void longAdderVsAtomicLong() throws InterruptedException {
        int threadCount = 32;
        int opsPerThread = 1_000_000;
        
        // AtomicLong 测试
        AtomicLong atomicLong = new AtomicLong(0);
        long start1 = System.currentTimeMillis();
        runConcurrent(threadCount, () -> {
            for (int i = 0; i < opsPerThread; i++) atomicLong.incrementAndGet();
        });
        System.out.printf("AtomicLong: %dms, 结果=%d%n",
            System.currentTimeMillis() - start1, atomicLong.get());
        
        // LongAdder 测试
        LongAdder longAdder = new LongAdder();
        long start2 = System.currentTimeMillis();
        runConcurrent(threadCount, () -> {
            for (int i = 0; i < opsPerThread; i++) longAdder.increment();
        });
        System.out.printf("LongAdder:  %dms, 结果=%d%n",
            System.currentTimeMillis() - start2, longAdder.sum());
        
        // 在32线程高竞争下，LongAdder 通常比 AtomicLong 快 3-6 倍
    }
    
    static void runConcurrent(int threads, Runnable task) throws InterruptedException {
        CountDownLatch latch = new CountDownLatch(threads);
        for (int i = 0; i < threads; i++) {
            new Thread(() -> {
                task.run();
                latch.countDown();
            }).start();
        }
        latch.await();
    }
    
    // ============ AtomicFieldUpdater：节省内存的无锁更新 ============
    
    /**
     * 如果一个类有大量实例，每个实例都new一个AtomicLong，内存开销大
     * 用AtomicLongFieldUpdater可以让多个实例共享同一个Updater对象
     */
    static class Counter {
        volatile long count = 0; // 必须是 volatile，必须非private
        
        private static final AtomicLongFieldUpdater<Counter> COUNT_UPDATER =
            AtomicLongFieldUpdater.newUpdater(Counter.class, "count");
        
        public void increment() {
            COUNT_UPDATER.incrementAndGet(this); // this 是被更新的对象
        }
        
        public long get() { return count; }
    }
    
    public static void main(String[] args) throws Exception {
        basicAtomicOps();
        longAdderVsAtomicLong();
        System.out.println("测试完成");
    }
}

三个踩坑实录

坑一：CAS 自旋在高竞争下 CPU 飙升

现象： 把某个高频更新的计数器改成AtomicLong后，单机CPU从40%飙升到了85%，接口反而变慢了。

原因： 并发非常高（32个线程都在更新同一个AtomicLong），大量CAS失败，线程一直自旋重试，CPU做的都是无效工作（spin-wait）。

32个线程同时 incrementAndGet:
    线程1: CAS(0, 1) 成功
    线程2: CAS(0, 1) 失败，重读到1，CAS(1, 2) 成功
    线程3: CAS(0, 1) 失败，重读到2，CAS(2, 3) 成功
    ...
    实际上32个线程在串行，只是换成了CPU密集的方式

解法： 高并发计数场景换用LongAdder，它把竞争分散到多个Cell上，大幅减少CAS失败。

// 换成LongAdder
LongAdder counter = new LongAdder();
counter.increment(); // 内部分散到Cell，减少竞争
long total = counter.sum(); // 读时汇总

实测换LongAdder后，CPU降到45%，吞吐量提升了约2.8倍。

坑二：忘记ABA问题导致链表数据损坏

现象： 无锁栈偶发数据损坏，pop出来的元素顺序不对，甚至有null出现。

原因： 无锁栈的pop操作存在ABA问题：线程A读到头节点H1，被挂起；线程B将H1 pop出去，再push了H2，又push了H1（对象复用）；线程A恢复执行，CAS(H1, H1.next)成功，但H1.next已经不是原来的节点了。

解法： 用AtomicStampedReference给每次操作打版本号，即使值回到A，版本号也不同，CAS必然失败。

坑三：AtomicReference 的 CAS 比较的是引用，不是内容

现象： 用AtomicReference<String>做CAS，明明值相等，但compareAndSet返回false。

原因： compareAndSet比较的是对象引用（地址），不是equals()。如果两个String内容相同但是不同的对象，CAS会失败。

AtomicReference<String> ref = new AtomicReference<>("hello");

String expected = new String("hello"); // new出来的，不同对象
boolean result = ref.compareAndSet(expected, "world");
System.out.println(result); // false！内容相同但引用不同

// 正确：确保使用同一个引用
String current = ref.get(); // 拿到当前引用
boolean result2 = ref.compareAndSet(current, "world");
System.out.println(result2); // true

对于字符串字面量（常量池中的），相同内容的引用是同一个，所以CAS会成功。但对于new出来的String或者运行时创建的String，必须通过get()获取当前引用。

无锁 vs 有锁选型

场景	推荐方案	原因
简单计数，低并发	AtomicLong	简单够用
简单计数，高并发	LongAdder	减少CAS竞争
状态机（几个状态转换）	AtomicInteger+CAS	无锁状态转换
复杂数据结构更新	synchronized/ReentrantLock	CAS无法保证复杂原子性
需要事务性操作	synchronized/ReentrantLock	无锁无法做到

记住这条原则： CAS适合"少量字段的简单更新"，不适合"多字段的复合操作"。遇到需要同时更新多个字段的场景，还是老老实实用锁。

小结

无锁编程的核心是CAS——一个乐观的原子操作。Atomic类族把CAS封装成友好的API，让我们不需要直接操作Unsafe。

选择使用Atomic类还是锁，本质上是在"乐观锁（低竞争高效，高竞争自旋）"和"悲观锁（有竞争就挂起，上下文切换开销）"之间权衡。

我的经验：低竞争选Atomic，高竞争的简单计数选LongAdder，复杂逻辑选锁。不要迷信"无锁一定更快"——在高竞争场景下，自旋CPU开销可能比锁更贵。