tips:
以下测试如无特殊说明,隔离级别均为 MySQL 默认的 REPEATABLE READ 级;
测试使用的 MySQL 版本为 8.0.43;
docker exec -it mysql /bin/bash
mysql -h127.0.0.1 -P3306 -uroot -p123456
一、分析前的一些准备工作
如何查看一个连接持有哪些锁
你可以使用下面的 SQL 语句来查询指定的连接持有哪些锁
-- 获取当前连接的 connection id
SELECT CONNECTION_ID() INTO @connection_id;
-- 根据 connection id 查询指定连接持有的锁
SELECT t2.*
FROM information_schema.innodb_trx t1
JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
WHERE t1.trx_mysql_thread_id = @connection_id
;
在分析死锁 Case 时候,你可能会需要关闭死锁检测、延长锁超时时限来观察锁的情况:
SET GLOBAL innodb_deadlock_detect = OFF;
SET GLOBAL innodb_lock_wait_timeout = 9999999999;
-- 实验完成后记得改回去
SET GLOBAL innodb_deadlock_detect = ON;
SET GLOBAL innodb_lock_wait_timeout = 50;测试数据表、模拟数据
如无特殊说明,这篇文章中的所有 Test Case 都基于下面这个表 t,每次测试完成后都需要重新执行一次下面的 SQL 以重置实验环境。
建表和初始化语句如下:
DROP TABLE IF EXISTS t;
CREATE TABLE t(
id int(11) NOT NULL,
a int(11) DEFAULT NULL,
c int(11) DEFAULT NULL,
d int(11) DEFAULT NULL,
PRIMARY KEY (id),
unique index a(a),
index c (c)
) ENGINE = InnoDB;
insert into t
values (0, 0, 0, 0)
, (5, 5, 5, 5)
, (10, 10, 10, 10)
, (15, 15, 15, 15)
, (20, 20, 20, 20)
, (25, 25, 25, 25);
a 字段:唯一索引
c 字段:普通索引
二、MySQL 的加锁机制分析
分析前的一些前置知识
锁的类型
排他锁(Exclusive Lock,X 锁):又称写锁。当事务持有 X 锁时,可以读取和修改数据,但会阻塞其他事务获取相同数据的 S 锁和 X 锁。INSERT, UPDATE, DELETE 等数据修改操作会自动获取 X 锁。可以通过 SELECT ... FOR UPDATE 显式加锁;
共享锁(Shared Lock,S 锁):又称读锁。当事务持有 S 锁时,可以读取数据,但不能修改数据。它允许其他事务同时获取相同数据的 S 锁,但会阻止其他事务获取 X 锁。可以通过 SELECT ... LOCK IN SHARE MODE 显式加锁;
即 X 锁与 S 锁之间的关系如下:
常见的锁
Intention Locks
Intention Locks 即意向锁,这个锁是表级别的锁,用于标记事务后续对表中某行所需的锁类型(共享锁或独占锁),不会阻塞任何操作,仅对全表请求(例如 LOCK TABLES ... WRITE )例外。
意向锁分为两类:
IS(Intention Shared,意向共享锁):事务计划在该表上获取一个或多个行级共享锁(S) → 典型来源:SELECT ... LOCK IN SHARE MODE、某些读锁场景;
IX(Intention Exclusive):事务计划在该表上获取一个或多个行级排他锁(X) → 典型来源:UPDATE/DELETE/INSERT、SELECT ... FOR UPDATE 等;
意向锁的主要目的是表明有人正在锁定某行,或即将锁定表中的某行。这样数据库可以在表级别快速判断是否可以授予某些表级锁,而不用遍历所有行锁。
想象一下,在高并发下有成千上万条行锁分散在表内:如果某个语句请求对整张表做一个表级 LOCK TABLES t WRITE 或者某个 DDL/表级 X 锁,InnoDB 要快速判断是否能授予这个表级锁,但遍历所有行锁代价太高,于是引入意向锁。
每当事务要在某个表上获得行级 S(共享)或 X(排他)锁时,事务会先在表上上相应的意向锁(IS 或 IX),再在对应的行上上具体行锁。这样查看表上有没有意向锁即可快速判定是否能安全地授予表级锁(而不用检查每一行)。
表级锁类型兼容性如下所示:
Record Lock
记录锁(Record Lock)是一种行级锁,用于锁定索引中的一条具体记录,即使表没有定义索引,InnoDB 也会创建一个隐式的聚簇索引并使用它加锁。
举例说明,对于 update t set d=d+1 where id=10 这个语句,锁分析的结果是:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 916;
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| INNODB | 281473192037592:10808:1206:281473097716832 | 52383 | 955 | 23 | sugarless-admin | t | NULL | NULL | NULL | 281473097716832 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473192037592:10808:140:4:4:281473097713840 | 52383 | 955 | 23 | sugarless-admin | t | NULL | NULL | PRIMARY | 281473097713840 | RECORD | X,REC_NOT_GAP | GRANTED | 10 |
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
2 rows in set (0.02 sec)
表级意向锁 IX
Record Lock
INDEX_NAME = PRIMARY,LOCK_DATA = 10:表示这把锁是加在主键索引上 id=10 的记录;
LOCK_TYPE = RECORD,LOCK_MODE = X,REC_NOT_GAP:表示这是是一个行级写锁,并且只锁定记录本身,不包含记录之前的 GAP(间隙);
Gap Lock
间隙锁(Gap Lock):锁定索引记录之间的间隙(一个区间),而不是记录本身。它的主要作用是防止其他事务在间隙中插入新的数据,从而在 REPEATABLE READ 隔离级别下有效避免“幻读”现象。
比如文章开头的测试数据表 t,初始化插入了 6 条数据,这就产生了 7 个间隙:
Gap Lock 的例子放到下面 Next-Key Lock 一起讲。
Next-Key Lock
Next-Key Lock 即临键锁,是记录锁(Record Lock)和间隙锁(Gap Lock)的合称。它锁定一个索引记录以及该记录之前的间隙,锁定区间是“左开右闭”。
比如文章开头的测试数据表 t,初始化插入了 6 条数据,如果用 select * from t for update 把整个表所有记录锁起来,就形成了 7 个 Next-Key Lock,分别是 (-∞,0]、(0,5]、(5,10]、(10,15]、(15,20]、(20, 25]、(25, +supremum]。
举例说明,对于 update t set d=d+1 where c=10 这个语句,锁分析的结果是:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 916;
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| INNODB | 281473192037592:10810:1206:281473097716832 | 52487 | 955 | 27 | sugarless-admin | t | NULL | NULL | NULL | 281473097716832 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473192037592:10810:140:6:5:281473097714528 | 52487 | 955 | 27 | sugarless-admin | t | NULL | NULL | c | 281473097714528 | RECORD | X,GAP | GRANTED | 15, 15 |
| INNODB | 281473192037592:10810:140:6:4:281473097713840 | 52487 | 955 | 27 | sugarless-admin | t | NULL | NULL | c | 281473097713840 | RECORD | X | GRANTED | 10, 10 |
| INNODB | 281473192037592:10810:140:4:4:281473097714184 | 52487 | 955 | 27 | sugarless-admin | t | NULL | NULL | PRIMARY | 281473097714184 | RECORD | X,REC_NOT_GAP | GRANTED | 10 |
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
4 rows in set (0.00 sec)
表级意向锁 IX
间隙锁
LOCK_TYPE = RECORD,LOCK_MODE = X,GAP:说明这是一个间隙锁;
INDEX_NAME = c,LOCK_DATA = 15, 15:说明锁的范围是索引 c 上(10, 15)这个间隙;
Next-Key Lock
LOCK_TYPE = RECORD,LOCK_MODE = X,说明这是一个 Next-Key Lock;
INDEX_NAME = c,LOCK_DATA = 10, 10:说明锁的范围是索引 c 上 (5,10] 这个区间,即 c=10 这一行和前面的间隙;
Record Lock
LOCK_TYPE = RECORD,LOCK_MODE = X,REC_NOT_GAP:说明这是一个行锁;
INDEX_NAME = PRIMARY,LOCK_DATA = 10:说明锁的范围是主键索引上 id=10 的这一行;
Insert Intention Locks
插入意向锁(Insert Intention Lock):一种特殊的间隙锁(Gap Lock)。它是由 INSERT 操作执行前设置的一种特殊间隙锁,用于表明事务打算在某个间隙中插入新记录。
多个事务只要插入的位置不冲突(即不在间隙中的完全相同的位置),就可以同时持有对相同间隙的插入意向锁,不会相互阻塞。插入意向锁与普通的间隙锁是互斥的,如果一个事务已经持有了某个间隙的间隙锁,另一个事务尝试在该间隙获取插入意向锁时会被阻塞。
举例说明:
对于 Session B 在 t2 时刻执行的 insert 语句,锁分析的结果是:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 917;
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+------------------------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+------------------------+-------------+-----------+
| INNODB | 281473192038400:64250:1206:281473097722960 | 52492 | 956 | 27 | sugarless-admin | t | NULL | NULL | NULL | 281473097722960 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473192038400:64250:140:6:5:281473097720048 | 52492 | 956 | 27 | sugarless-admin | t | NULL | NULL | c | 281473097720048 | RECORD | X,GAP,INSERT_INTENTION | WAITING | 15, 15 |
+--------+-----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+------------------------+-------------+-----------+
2 rows in set (0.01 sec)
表级意向锁 IX
插入意向锁
LOCK_TYPE = RECORD,LOCK_MODE = X,GAP,INSERT_INTENTION:表示这是一个插入意向锁;
INDEX_NAME = c,LOCK_DATA = 15,15:说明要插入的位置是索引 c 上 (10,15) 这个间隙;
LOCK_STATUS = WAITING:说明当前事务还在等待插入意向锁,阻塞中;
select ... lock in share mode 和 select ... for update 在加锁时的区别
这两种 select 语句在事务中都能读到最新的值(即读取到已提交事务的变更,即使在可重复读隔离级别中),并且这两种语句都会加锁,lock in share mode 加的是读锁(S 锁,共享锁),for update 加的是写锁(X 锁,排他锁)。需要注意的是,它们必须在事务中(begin / start transaction / autocommit = 0)才会生效。
下面来看一个例子
在默认的可重复读隔离级别下,Session A 在 t3 时刻读到的值依然是 5,在 t4 和 t5 时刻分别使用 lock in share mode 和 for update 读的是当前的值 6,即 Session B 已经提交的变更,这也是 lock in share mode 和 for update 被称为“当前读(current read)”的原因。
实际上更新数据都是先读后写的,而这个读,只能是“当前读(current read)”,否则其他其他已经提交事务的变更就丢失了。因此 Session A 在 t6 时刻的 update 操作是在 d=6 的基础上 + 1 的。
Case 1:锁是加在索引上的
SELECT ... LOCK IN SHARE MODE 的主要作用是对查询到的数据行设置一个共享锁(S 锁),确保在事务执行期间,阻塞其他事务对这些数据的修改操作。
在这个例子中,lock in share mode 只锁覆盖索引,但是如果是 select ... for update 就不一样了。系统会认为你接下来要更新数据,因此会顺便给主键索引上满足条件的行加上行锁。这个例子说明,锁是加在索引上的。
同时,它给我们的指导是,如果你要用 lock in share mode 来给行加读锁避免数据被更新的话,就必须得绕过覆盖索引的优化,在查询字段中加入索引中不存在的字段。比如,将 session A 的查询语句改成 select d from t where d=5 lock in share mode。
Case 2:lock in share mode 和 for update 加锁的区别
这个例子与 Case 1 中的区别是 for update 和 lock in share mode,这两个例子的区别是 lock in share mode 只锁覆盖索引,但是如果是 select ... for update 就不一样了。系统会认为你接下来要更新数据,因此会顺便给主键索引上满足条件的行加上行锁。
所以这个例子中的 Session B 在 t2 时刻的 update 操作被阻塞了。
Case 3:等值条件下,非索引字段、普通索引、唯一索引、主键索引对加锁的影响
非索引字段:
通过这个 Case (在等值条件下,update by 非索引字段)可以得出的结论是:update t set d=d+1 where d=10; 这个语句会进行通过主键索引进行全表扫描,主键索引上的每一行都会加 Next-Key Lock(行锁 + 间隙锁),此时全表的 update、insert 操作都会被阻塞;
普通索引字段:
通过这个 Case(在等值条件下,update by 普通索引字段)可以得出的结论是:
普通索引 c 上的(c=10, id=10)这一行会上 Next-Key Lock(行锁 + 间隙锁),所以 Session B 在 t2 时刻执行的 update t set d=d+1 where c=10;、t4 时刻执行的 insert into t values(6,6,6,6); insert into t values(14,14,14,14); 都被阻塞了。
主键索引上的(id=10, d=10)这一行会上行锁,所以 Session B 在 t3 时刻执行 update t set d=d+1 where id=10; 会被阻塞;
update t set d=d+1 where c=10; 这个语句会走普通索引 c,不会走全表扫描。
唯一索引字段:
通过这个 Case(在等值条件下,update by 唯一索引字段)可以得出的结论是:
唯一索引 a 上的(a=10, id=10)这一行只会上行锁(Record Lock),不会上间隙锁。所以 Session B 在 t2 时刻执行的 update t set d=d+1 where a=10; 会被阻塞,t4 时刻执行的 insert into t values(6,6,6,6); insert into t values(14,14,14,14); 不会被阻塞;
主键索引上的的(id=10, a=10)这一行会上行锁,所以 Session B 在 t3 时刻执行 update t set d=d+1 where id=10; 会被阻塞;
update t set d=d+1 where a=10; 这个语句会走唯一索引 a,不会走全表扫描。
Case 4:范围条件下,非索引字段、普通索引、唯一索引、主键索引对加锁的影响
非索引字段:
通过这个 Case (在范围条件下,update by 非索引字段)可以得出的结论是:update t set d=d+1 where d>=10 and d<11; 这个语句会进行通过主键索引进行全表扫描,主键索引上的每一行都会加 Next-Key Lock(行锁 + 间隙锁),此时全表的 update、insert 操作都会被阻塞;
普通索引字段:
通过这个 Case(在等值条件下,update by 普通索引字段)可以得出的结论是:
普通索引 c 上的(c=10, id=10)、(c=15, id=15)这两行会上 Next-Key Lock(行锁 + 间隙锁),所以 Session B 在 t2 时刻执行的 update t set d=d+1 where c=10; update t set d=d+1 where c=15; ,t4 时刻执行的 insert into t values(6,6,6,6); insert into t values(14,14,14,14); 都被阻塞了;
主键索引上的(id=10, d=10)、(id=15, d=15)这两行会上行锁,所以 Session B 在 t3 时刻执行 update t set d=d+1 where id=10; update t set d=d+1 where id=15; 会被阻塞;
不符合条件的这一行也被锁住是因为 InnoDB 要扫到 c=15,才知道不需要继续往后找了。
唯一索引字段:
通过这个 Case(在等值条件下,update by 唯一索引字段)可以得出的结论是:唯一索引和普通索引在加锁的范围上没有区别。
唯一索引 a 上的(a=10, id=10)、(a=15, id=15)这两行会上 Next-Key Lock(行锁 + 间隙锁),所以 Session B 在 t2 时刻执行的 update t set d=d+1 where a=10; update t set d=d+1 where a=15; ,t4 时刻执行的 insert into t values(6,6,6,6); insert into t values(14,14,14,14); 都被阻塞了;
主键索引上的(id=10, a=10)、(id=15, a=15)这两行会上行锁,所以 Session B 在 t3 时刻执行 update t set d=d+1 where id=10; update t set d=d+1 where id=15; 会被阻塞;
不符合条件的这一行也被锁住是因为 InnoDB 要扫到 a=15,才知道不需要继续往后找了。
Case 5:limit 对加锁的影响
表 t 里 c=10 的记录其实只有 1 条,因此加不加 limit 1,删除的效果都是一样的,但是加锁的效果却不同。可以看到,Session B 在 t3 时刻执行的 insert 语句执行通过了,跟 Case 3 中的结果不同。
这是因为,这个例子里的 delete 语句明确加了 limit 1 的限制,因此在遍历到 (c=10, id=10) 这一行之后,满足条件的语句已经有 1 条,循环就结束了。
因此,索引 c 上的加锁范围就变成了从(c=5,id=5) 到(c=10,id=10) 这个前开后闭区间,如下图所示,
可以看到,(c=10,id=10)之后的这个间隙并没有在加锁范围里,因此 insert 语句插入 c=14 是可以执行成功的。
这个例子对我们实践的指导意义是,在删除数据的时候尽量加 limit。这样不仅可以控制删除数据的条数,让操作更安全,还可以减小加锁的范围。
Case 6:insert 唯一键冲突时加了哪些锁
在 T1 时刻,Session A 执行完 insert 操作提示唯一键冲突后,对 Session A 进行锁分析:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 33;
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+------------------------+
| INNODB | 281473389169880:2114:1215:281473299043424 | 53957 | 72 | 40 | sugarless-admin | t | NULL | NULL | NULL | 281473299043424 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389169880:2114:149:5:6:281473299040432 | 53957 | 72 | 40 | sugarless-admin | t | NULL | NULL | a | 281473299040432 | RECORD | S | GRANTED | 20, 20 |
| INNODB | 281473389169880:2114:149:4:1:281473299041120 | 53957 | 72 | 40 | sugarless-admin | t | NULL | NULL | PRIMARY | 281473299041120 | RECORD | X | GRANTED | supremum pseudo-record |
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+------------------------+
3 rows in set (0.01 sec)
从这个例子可以看到,当 Session A 执行的 insert 语句发生唯一键冲突(a=20)的时候,并不只是简单地报错返回,还加了两个锁。
索引 a 上 (5,10] 这个区间的 Next-Key Lock(S 锁)
主键索引上 (25, supremum] 这个区间的 Next-Key Lock(X 锁)
在 InnoDB 中,每个索引页都有两条虚拟的记录(Infimum 和 Supremum)来界定边界。Supremum 代表一个比该页中任何实际记录都大的值,可以理解为“正无穷”
关键在于,这种锁定 supremum 的 X 型锁,虽然看起来是 Next-Key Lock,但其本质还是间隙锁(Gap Lock),由于没有具体的记录存在,这个 Next-Key Lock 会退化为间隙锁。因此理论上你可能会看到多个 Session 同时持有一个 Lock Mode = X,Lock Data = supremum pseudo-record 的锁,这并不冲突,因为间隙锁本身是不冲突的。
不过社区/文档中也讨论过:在处理二级唯一索引的 duplicate-key 情况时,InnoDB 有时会在主键索引上留下对 supremum 的锁,这会阻塞对该索引末端的其它插入操作,这个行为曾被报告为 bug/副作用,但并未被采纳。https://bugs.mysql.com/bug.php?id=116815
Case 7:Next-Key Lock 是由间隙锁和行锁组成的
下面看一个死锁的案例,目的是说明:Next-Key Lock 实际上是间隙锁和行锁加起来的结果。
在 t3 时刻,我对 Session A 和 Session B 做了锁分析:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 210;
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+-----------+
| INNODB | 281473389170688:892:1224:281473299049552 | 562948365881344 | 249 | 33 | sugarless-admin | t | NULL | NULL | NULL | 281473299049552 | TABLE | IS | GRANTED | NULL |
| INNODB | 281473389170688:892:158:6:5:281473299046984 | 562948365881344 | 249 | 33 | sugarless-admin | t | NULL | NULL | c | 281473299046984 | RECORD | S,GAP | GRANTED | 15, 15 |
| INNODB | 281473389170688:892:158:6:4:281473299046640 | 562948365881344 | 249 | 33 | sugarless-admin | t | NULL | NULL | c | 281473299046640 | RECORD | S | GRANTED | 10, 10 |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+-----------+
3 rows in set (0.01 sec)
mysql>
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 211;
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+-----------+
| INNODB | 281473389171496:614:1224:281473299055568 | 54470 | 250 | 24 | sugarless-admin | t | NULL | NULL | NULL | 281473299055568 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389171496:614:158:6:4:281473299052656 | 54470 | 250 | 24 | sugarless-admin | t | NULL | NULL | c | 281473299052656 | RECORD | X | WAITING | 10, 10 |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+-----------+-------------+-----------+
2 rows in set (0.00 sec)
这个死锁的逻辑是这样的:
Session A 启动事务后执行查询语句加 lock in share mode,在索引 c 上的(5,10] 这个区间加了 Next-Key Lock(读锁),在(10,15)这个区间加了 Gap Lock;
Session B 的 update 语句也要在索引 c 上的(5,10] 这个区间加 Next-Key Lock,被 Session A 阻塞,因此进入锁等待;
然后 Session A 要再插入 (8,8,8,8) 这一行,被 Session B 的 Gap Lock 锁住;
此时出现了死锁,InnoDB 让 Session B 回滚;
你可能会问,Session B 的 Next-Key Lock 不是还没申请成功吗?其实是这样的,Session B 的“加 Next-Key Lock(5,10] ” 操作,实际上分成了两步,先是加 (5,10) 的间隙锁,加锁成功;然后加 c=10 的行锁,这时候才被锁住的。也就是说,我们在分析加锁规则的时候可以用 Next-Key Lock 来分析。但是要知道,具体执行的时候,是要分成间隙锁和行锁两段来执行的。
三、什么是死锁
死锁是指两个或多个事务在执行过程中,因操作需要加锁而相互等待的现象。这种情况下,MySQL 的 InnoDB 存储引擎会自动检测到死锁并回滚其中一个事务,以打破僵局。
举个例子,比如事务 A 持有事务 B 需要的锁,且事务 B 持有事务 A 需要的锁时,事务 A、B 双方都需要等待对方释放所需的锁,导致事务无法继续执行,形成一种"僵局"(即循环依赖)。
四、如何判断是否发生了死锁
服务端报错日志
MySQL 发生死锁时,服务端应用的报错日志里会带有 Deadlock found when trying to get lock; try restarting transaction 信息。
日志示例:
2025-08-28 22:57:01.806 ERROR [Thread-3] com.alibaba.druid.filter.logging.Slf4jLogFilter: {conn-10008, pstmt-20001} execute error. UPDATE media_accounts SET a=? WHERE id=?
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:115)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:938)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:359)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3462)
at com.alibaba.druid.wall.WallFilter.preparedStatement_execute(WallFilter.java:641)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3460)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:434)
MySQL InnoDB 状态信息
在出现死锁后,执行 show engine innodb status 命令得到的部分输出如下所示,这个命令会输出很多信息,有一节 LATEST DETECTED DEADLOCK,就是记录的最后一次死锁信息。
------------------------
LATEST DETECTED DEADLOCK
------------------------
2025-08-28 22:57:01 281472275640064
*** (1) TRANSACTION:
TRANSACTION 2848, ACTIVE 2 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
MySQL thread id 218, OS thread handle 281473203281664, query id 18934 172.18.0.1 root updating
UPDATE media_accounts SET a=40 WHERE id=4
*** (1) HOLDS THE LOCK(S):
RECORD LOCKS space id 11 page no 4 n bits 80 index PRIMARY of table `sugarless-admin`.`media_accounts` trx id 2848 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 8000000000000005; asc ;;
1: len 6; hex 000000000b20; asc ;;
2: len 7; hex 01000001900151; asc Q;;
3: len 8; hex 800000000000001e; asc ;;
4: SQL NULL;
5: len 1; hex 81; asc ;;
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 11 page no 4 n bits 80 index PRIMARY of table `sugarless-admin`.`media_accounts` trx id 2848 lock_mode X locks rec but not gap waiting
Record lock, heap no 13 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 8000000000000004; asc ;;
1: len 6; hex 000000000b0d; asc ;;
2: len 7; hex 010000018b0151; asc Q;;
3: len 8; hex 800000000000001e; asc ;;
4: SQL NULL;
5: len 1; hex 81; asc ;;
*** (2) TRANSACTION:
TRANSACTION 2853, ACTIVE 2 sec starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s)
MySQL thread id 219, OS thread handle 281473212337920, query id 18936 172.18.0.1 root updating
UPDATE media_accounts SET a=40 WHERE id=5
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 11 page no 4 n bits 80 index PRIMARY of table `sugarless-admin`.`media_accounts` trx id 2853 lock_mode X locks rec but not gap
Record lock, heap no 13 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 8000000000000004; asc ;;
1: len 6; hex 000000000b0d; asc ;;
2: len 7; hex 010000018b0151; asc Q;;
3: len 8; hex 800000000000001e; asc ;;
4: SQL NULL;
5: len 1; hex 81; asc ;;
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 11 page no 4 n bits 80 index PRIMARY of table `sugarless-admin`.`media_accounts` trx id 2853 lock_mode X locks rec but not gap waiting
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 8000000000000005; asc ;;
1: len 6; hex 000000000b20; asc ;;
2: len 7; hex 01000001900151; asc Q;;
3: len 8; hex 800000000000001e; asc ;;
4: SQL NULL;
5: len 1; hex 81; asc ;;
*** WE ROLL BACK TRANSACTION (2)
下面我对第一个事务的信息加了注释说明,你可以参考一下:
------------------------
LATEST DETECTED DEADLOCK
------------------------
## 死锁发生的时间:
2025-08-28 22:57:01 281472275640064
## 第一个事务
*** (1) TRANSACTION:
## InnoDB 内部的事务编号 2848,这个事务已活动 2 秒,当前处于“索引读取”阶段(说明语句在走索引定位记录)。
TRANSACTION 2848, ACTIVE 2 sec starting index read
## 该会话使用了 1 张表,并锁住了 1 张表
mysql tables in use 1, locked 1
## 当前事务内部用了 3 个锁结构,总计持/等待 2 个行级锁(其中有些是已持有,有些在等待)。
LOCK WAIT 3 lock struct(s), heap size 1128, 2 row lock(s), undo log entries 1
## thread id = 218,query id = 18934
MySQL thread id 218, OS thread handle 281473203281664, query id 18934 172.18.0.1 root updating
## SQL 原文
UPDATE media_accounts SET a=40 WHERE id=4
## 第一个事务持有的锁
*** (1) HOLDS THE LOCK(S):
## 在主键索引上的 X(排他)记录锁(record lock),并且是 不包含 gap(REC_NOT_GAP),意思是仅锁定该具体记录,不锁它周围的“空隙”。
RECORD LOCKS space id 11 page no 4 n bits 80 index PRIMARY of table `sugarless-admin`.`media_accounts` trx id 2848 lock_mode X locks rec but not gap
Record lock, heap no 12 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 8000000000000005; asc ;;
1: len 6; hex 000000000b20; asc ;;
2: len 7; hex 01000001900151; asc Q;;
3: len 8; hex 800000000000001e; asc ;;
4: SQL NULL;
5: len 1; hex 81; asc ;;
## 第一个事务还在等待获取的锁
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
## 等待锁:在主键索引上的 X(排他)记录锁(record lock),并且是 不包含 gap(REC_NOT_GAP),意思是仅锁定该具体记录,不锁它周围的“空隙”
RECORD LOCKS space id 11 page no 4 n bits 80 index PRIMARY of table `sugarless-admin`.`media_accounts` trx id 2848 lock_mode X locks rec but not gap waiting
Record lock, heap no 13 PHYSICAL RECORD: n_fields 6; compact format; info bits 0
0: len 8; hex 8000000000000004; asc ;;
1: len 6; hex 000000000b0d; asc ;;
2: len 7; hex 010000018b0151; asc Q;;
3: len 8; hex 800000000000001e; asc ;;
4: SQL NULL;
5: len 1; hex 81; asc ;;
五、生产环境死锁案例
Case 1:update 顺序不一致导致的死锁
背景:在营销业务场景中,需要定期拉取大量用户用于短信推送、邮件推送、电销等。因此,系统中存在大量定时任务,涉及数据的拉取、存储、发送等环节。
定时任务的样例代码如下:
/**
* 读取文件写入数据明细表,最后通过主键更新 campaign_job_instance
*/
class SaveUserJob implements SimpleJob {
@Autowired
CampaignJobInstanceService campaignJobInstanceService
@Transactional
@Override
void execute(ShardingContext shardingContext) {
log.info("save user:${JsonUtil.toString(shardingContext)}")
List<CampaignJobInstance> jobInstances = campaignJobInstanceService.findByStatus(CampaignJobInstanceStatusEnum.SAVING)
jobInstances.each { jobInstance ->
...
updateById(jobInstance)
...
}
}
}
/**
* 发送SaveUserJob写入的数据明细,最后主键更新campaign_job_instance
*/
class SendJob implements DataflowJob<CampaignUserLog> {
...
@Override
@Transactional
void processData(ShardingContext shardingContext, List<CampaignUserLog> logs) {
connectorExecuteService.execute(logs.findAll { it.status == CampaignUserJobStatusEnum.SENDING })
logs.groupBy { it.jobInstanceId }.each {
CampaignJobInstance jobInstance = findByInstanceId(it.key)
...
updateById(jobInstance)
}
}
...
}
SaveUserJob、SendJob 中均涉及到遍历主键更新 campaign_job_instance,且均开启事务。造成死锁的原因是由于两个事务中更新顺序不同带来的加锁顺序不同,造成相互持有对方需要的锁,从而死锁。
举例说明:
在 t3 时刻,Session A 执行 update 被阻塞时,锁分析的结果如下:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 33;
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
| INNODB | 281473389169880:2108:1207:281473299043424 | 53671 | 72 | 24 | sugarless-admin | t | NULL | NULL | NULL | 281473299043424 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389169880:2108:141:4:3:281473299040432 | 53671 | 72 | 24 | sugarless-admin | t | NULL | NULL | PRIMARY | 281473299040432 | RECORD | X,REC_NOT_GAP | GRANTED | 5 |
| INNODB | 281473389169880:2108:141:4:4:281473299041120 | 53671 | 72 | 26 | sugarless-admin | t | NULL | NULL | PRIMARY | 281473299041120 | RECORD | X,REC_NOT_GAP | WAITING | 10 |
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------+-----------------------+-----------+---------------+-------------+-----------+
3 rows in set (0.01 sec)
现象:在 t4 时刻,Session A 执行 update 语句后,几乎立即发现死锁并返回。
这个死锁产生的逻辑是这样的:
T1 时刻:Session A 执行 update 语句,此时在主键索引(id=5)这条记录上加了 Record Lock(Lock Mode = X,REC_NOT_GAP)。(如果是普通索引,则为 Next-Key Lock + Gap Lock)
T2 时刻:Session B 执行 update 语句,此时在主键索引(id=10)这条记录上加了 Record Lock(Lock Mode = X,REC_NOT_GAP)。(如果是普通索引,则为 Next-Key Lock + Gap Lock)
T3 时刻:Session A 执行 update 语句,此时主键索引(id=10)这条记录已经被 Session B 上了 Record Lock(Lock Mode = X,REC_NOT_GAP),因此被阻塞;
T4 时刻:Session B 执行 update 语句,此时主键索引(id=5)这条记录已经被 Session A 上了 Record Lock(Lock Mode = X,REC_NOT_GAP),因此被阻塞;
此时形成循环依赖,即死锁;
Case 2:唯一索引冲突导致的死锁
背景:在分布式系统中,分布式锁已经的使用越发常见,我们系统由于较为老旧,使用数据库实现分布式锁,方案为:使用 lock_key, lock_biz 组成唯一索引,利用数据库对一条记录 insert 和 delete 操作的事务性来实现不可重入的 db 分布式锁。
建表 SQL 及测试数据:
DROP TABLE IF EXISTS `test_lock`;
CREATE TABLE IF NOT EXISTS `test_lock` (
`id` bigint(20) NOT NULL COMMENT '主键',
`gmt_create` datetime NOT NULL COMMENT '创建时间',
`gmt_modified` datetime NOT NULL COMMENT '修改时间',
`lock_key` varchar(128) NOT NULL COMMENT '锁名称',
`lock_context` varchar(512) DEFAULT NULL COMMENT '锁上下文',
`lock_biz` varchar(64) NOT NULL COMMENT '锁类型',
PRIMARY KEY (`id`),
UNIQUE KEY `idx_uk_lock_name` (`lock_key`,`lock_biz`)
) ENGINE=InnoDB AUTO_INCREMENT=77 DEFAULT CHARSET=utf8mb4 COMMENT='分布式锁表'
;
INSERT INTO `test_lock` VALUES
(1,'2025-09-03 14:08:02','2025-09-03 14:08:02','123','0','accountUser')
,(2,'2025-09-03 14:08:02','2025-09-03 14:08:02','150','0','accountUser')
,(3,'2025-09-03 14:08:02','2025-09-03 14:08:02','200','0','accountUser');
复现流程:
在 t2 时刻,Session B、Session C 执行 insert 被阻塞时,锁分析的结果如下:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 34;
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+-----------+-------------+-------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+-----------+-------------+-------------------------+
| INNODB | 281473389170688:658:1209:281473299049552 | 53725 | 73 | 30 | sugarless-admin | test_lock | NULL | NULL | NULL | 281473299049552 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389170688:658:143:5:5:281473299046640 | 53725 | 73 | 30 | sugarless-admin | test_lock | NULL | NULL | idx_uk_lock_name | 281473299046640 | RECORD | S | WAITING | '210', 'AccountUser', 4 |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+-----------+-------------+-------------------------+
2 rows in set (0.01 sec)
mysql>
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 35;
+--------+-------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+-----------+-------------+-------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+-------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+-----------+-------------+-------------------------+
| INNODB | 281473389171496:8:1209:281473299055568 | 53726 | 74 | 26 | sugarless-admin | test_lock | NULL | NULL | NULL | 281473299055568 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389171496:8:143:5:5:281473299052656 | 53726 | 74 | 26 | sugarless-admin | test_lock | NULL | NULL | idx_uk_lock_name | 281473299052656 | RECORD | S | WAITING | '210', 'AccountUser', 4 |
+--------+-------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+-----------+-------------+-------------------------+
2 rows in set (0.00 sec)
在 t3 时刻,Session A 回滚事务后,Session B、Session C 仍然被阻塞,此时锁分析的结果如下(这里我事先关闭了死锁检测,并调高了锁等待超时时限,以便观察死锁时的锁情况):
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 34;
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| INNODB | 281473389170688:658:1209:281473299049552 | 53725 | 73 | 30 | sugarless-admin | test_lock | NULL | NULL | NULL | 281473299049552 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389170688:658:143:5:1:281473299046984 | 53725 | 72 | 33 | sugarless-admin | test_lock | NULL | NULL | idx_uk_lock_name | 281473299046984 | RECORD | S | GRANTED | supremum pseudo-record |
| INNODB | 281473389170688:658:143:5:1:281473299047328 | 53725 | 73 | 30 | sugarless-admin | test_lock | NULL | NULL | idx_uk_lock_name | 281473299047328 | RECORD | X,INSERT_INTENTION | WAITING | supremum pseudo-record |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+--------------------+-------------+------------------------+
3 rows in set (0.00 sec)
mysql>
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 35;
+--------+-------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+-------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| INNODB | 281473389171496:8:1209:281473299055568 | 53726 | 74 | 26 | sugarless-admin | test_lock | NULL | NULL | NULL | 281473299055568 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389171496:8:143:5:1:281473299053000 | 53726 | 72 | 33 | sugarless-admin | test_lock | NULL | NULL | idx_uk_lock_name | 281473299053000 | RECORD | S | GRANTED | supremum pseudo-record |
| INNODB | 281473389171496:8:143:5:1:281473299053344 | 53726 | 74 | 26 | sugarless-admin | test_lock | NULL | NULL | idx_uk_lock_name | 281473299053344 | RECORD | X,INSERT_INTENTION | WAITING | supremum pseudo-record |
+--------+-------------------------------------------+-----------------------+-----------+----------+-----------------+-------------+----------------+-------------------+------------------+-----------------------+-----------+--------------------+-------------+------------------------+
3 rows in set (0.00 sec)
现象:在 Session A 执行 rollback 语句回滚的时候,Session C 几乎同时发现死锁并返回。
这个死锁产生的逻辑是这样的:
T1 时刻:Session A 执行 insert 语句,此时在唯一索引(lock_key=210, lock_biz='accountUser')上加了 Record Lock(Lock Mode = X)。(如果是普通索引,则为 Next-Key Lock + Gap Lock)
T2 时刻:Session B 和 Session C 要执行相同的 insert 语句,在加 Next-Key Lock(Lock Mode = S)时被 Session A 持有的 Record Lock(Lock Mode = X)阻塞;
T3 时刻:Session A 回滚事务,此时 Session B 和 Session C 都在唯一索引(lock_key=210, lock_biz='accountUser')这条记录成功加上了 Next-Key Lock(S Mode),随后试图继续执行插入操作,在加插入意向锁(Lock Mode = X,INSERT_INTENTION)时,彼此都被对方持有的 Next-Key Lock(S Mode)阻塞;
此时形成循环依赖,即死锁;
下面这种情况也会导致死锁,而且看起来可能比较奇怪的是因为不同的唯一索引值导致死锁。
上面这两个场景出现死锁的原因是一致的,本质上都是因为 Next-Key Lock(S Mode)和插入意向锁(Lock Mode = X,INSERT_INTENTION)冲突导致死锁。这个 Case 就不再具体分析了,有兴趣你可以自行分析验证一下。
Case 3:并发 insert into ... on duplicate key update 语句导致的死锁
insert into … on duplicate key update 这个语义的逻辑是,插入一行数据,如果碰到唯一键约束,就执行后面的更新语句。如果有多个列违反了唯一性约束,就会按照索引的顺序,修改跟第一个索引冲突的行。
在业务实现中,若数据存在则修改,不存在则插入,通常大家会选用 INSERT INTO... ON DUPLICATE KEY UPDATE 语句让数据库实现此功能。
背景:在进行开发批量取消预约物流的场景中,由于一个订单对应多个物流,同时取消多个物流时,接口可能会变慢,故采用了多线程,每个物流取消任务对应一个线程。然而,在测试过程中出现了意外问题,当同时取消两个物流单时发生了数据库死锁。
测试数据表 SQL:
DROP TABLE IF EXISTS recycle_order_extend;
CREATE TABLE IF NOT EXISTS recycle_order_extend (
id BIGINT AUTO_INCREMENT PRIMARY KEY,
recycle_order_id varchar(64), -- 回收订单 id
logistics_order_id varchar(64), -- 物流订单 id
value INT,
UNIQUE KEY unique_idx_recycleOrderId_logisticsOrderId(recycle_order_id, logistics_order_id)
);
INSERT INTO recycle_order_extend (recycle_order_id, logistics_order_id, value)
VALUES
("1","11",1),
("1","12",2),
("1","13",3),
("1","14",4),
("2","21",1),
("2","22",2),
("2","23",3),
("2","24",4);
在解释死锁是怎么产生的之前,先看一下这个 Case:
在 T1 时刻,Session A 执行完 insert 操作后,对 Session A 进行锁分析:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 157;
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+---------------+-------------+------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+---------------+-------------+------------------------+
| INNODB | 281473389169880:4040:1222:281473299043424 | 54297 | 196 | 29 | sugarless-admin | recycle_order_extend | NULL | NULL | NULL | 281473299043424 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389169880:4040:156:5:2:281473299040432 | 54297 | 196 | 29 | sugarless-admin | recycle_order_extend | NULL | NULL | unique_idx_recycleOrderId_logisticsOrderId | 281473299040432 | RECORD | X | GRANTED | '1', '11', 1 |
| INNODB | 281473389169880:4040:156:4:1:281473299041120 | 54297 | 196 | 29 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299041120 | RECORD | X | GRANTED | supremum pseudo-record |
| INNODB | 281473389169880:4040:156:4:2:281473299040776 | 54297 | 196 | 29 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299040776 | RECORD | X,REC_NOT_GAP | GRANTED | 1 |
+--------+----------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+---------------+-------------+------------------------+
4 rows in set (0.00 sec)
可以看到 Session A 此时已经持有主键索引上 supremum 伪记录的间隙锁,这意味着所有 insert 插入新记录的操作都会被阻塞(因为拿不到插入意向锁),这也是 Session B 在 t2 时刻 insert 操作被阻塞的原因。
上面的加锁机制分析 Case 6 提到过,当 insert 语句发生唯一键冲突的时候,会加两个锁;
下面来看死锁的案例,这里我暂时关闭了死锁检测,并延长了锁等待超时时限,复现逻辑流程:
这个案例并不一定能 100% 复现,因为需要 Session A 和 Session B 同时执行,这里我先给出依靠代码执行复现死锁时的锁分析情况:
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 185;
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| INNODB | 281473389175536:160:1222:281473299086000 | 54300 | 224 | 11 | sugarless-admin | recycle_order_extend | NULL | NULL | NULL | 281473299086000 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389175536:160:156:5:2:281473299083088 | 54300 | 224 | 11 | sugarless-admin | recycle_order_extend | NULL | NULL | unique_idx_recycleOrderId_logisticsOrderId | 281473299083088 | RECORD | X | GRANTED | '1', '11', 1 |
| INNODB | 281473389175536:160:156:4:1:281473299083776 | 54300 | 224 | 11 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299083776 | RECORD | X | GRANTED | supremum pseudo-record |
| INNODB | 281473389175536:160:156:4:2:281473299083432 | 54300 | 224 | 11 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299083432 | RECORD | X,REC_NOT_GAP | GRANTED | 1 |
| INNODB | 281473389175536:160:156:4:1:281473299084120 | 54300 | 224 | 11 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299084120 | RECORD | X,INSERT_INTENTION | WAITING | supremum pseudo-record |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+--------------------+-------------+------------------------+
5 rows in set (0.01 sec)
mysql>
mysql>
mysql> SELECT t2.*
-> FROM information_schema.innodb_trx t1
-> JOIN performance_schema.data_locks t2 ON t1.trx_id = t2.ENGINE_TRANSACTION_ID
-> WHERE t1.trx_mysql_thread_id = 186;
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| ENGINE | ENGINE_LOCK_ID | ENGINE_TRANSACTION_ID | THREAD_ID | EVENT_ID | OBJECT_SCHEMA | OBJECT_NAME | PARTITION_NAME | SUBPARTITION_NAME | INDEX_NAME | OBJECT_INSTANCE_BEGIN | LOCK_TYPE | LOCK_MODE | LOCK_STATUS | LOCK_DATA |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+--------------------+-------------+------------------------+
| INNODB | 281473389174728:714:1222:281473299079984 | 54299 | 225 | 16 | sugarless-admin | recycle_order_extend | NULL | NULL | NULL | 281473299079984 | TABLE | IX | GRANTED | NULL |
| INNODB | 281473389174728:714:156:5:6:281473299077072 | 54299 | 225 | 16 | sugarless-admin | recycle_order_extend | NULL | NULL | unique_idx_recycleOrderId_logisticsOrderId | 281473299077072 | RECORD | X | GRANTED | '2', '21', 5 |
| INNODB | 281473389174728:714:156:4:1:281473299077760 | 54299 | 225 | 16 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299077760 | RECORD | X | GRANTED | supremum pseudo-record |
| INNODB | 281473389174728:714:156:4:6:281473299077416 | 54299 | 225 | 16 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299077416 | RECORD | X,REC_NOT_GAP | GRANTED | 5 |
| INNODB | 281473389174728:714:156:4:1:281473299078104 | 54299 | 225 | 16 | sugarless-admin | recycle_order_extend | NULL | NULL | PRIMARY | 281473299078104 | RECORD | X,INSERT_INTENTION | WAITING | supremum pseudo-record |
+--------+---------------------------------------------+-----------------------+-----------+----------+-----------------+----------------------+----------------+-------------------+--------------------------------------------+-----------------------+-----------+--------------------+-------------+------------------------+
5 rows in set (0.00 sec)
这个死锁的逻辑是这样的:
Session A 开始执行批量 insert,先处理第一行("1","11"),发现唯一键冲突(已存在该记录),对唯一索引("1","11")加 Next-Key Lock(写锁),对主键索引(id=1)加 Record Lock(写锁),同时获取 supremum 伪记录的间隙锁;
Session B 几乎同时开始执行,处理第一行("2","21"),同样发现唯一键冲突,对唯一索引("2","21")加 Next-Key Lock(写锁),对主键索引(id=5)加 Record Lock(写锁),同时获取 supremum 伪记录的间隙锁;
当 Session A 尝试插入后续记录时,需要获取插入意向锁(插入位置是主键索引上的最后一个 GAP),但 Session B 已持有 supremum 伪记录的间隙锁,这个 GAP 已经被锁住,因此 Session A 被 Session B 阻塞;
Session B 同理,也被 Session A 持有的 supremum 记录的间隙锁阻塞;
最终形成循环依赖,即死锁;
由于多个 Session 同时触发 insert 唯一键冲突,同时拿到 supremum 伪记录间隙锁的时间差要求比较高,手工操作难以复现,所以需要借助下面的复现代码来演示。
下面是复现代码:
1、SQL 语句
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapper PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN" "http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="com.sugarless.sbapiservice.mapper.RecycleOrderExtendMapper">
<insert id="insertOnDuplicateBatch" parameterType="java.util.List">
INSERT INTO recycle_order_extend (recycle_order_id, logistics_order_id, value)
VALUES
<foreach collection="list" item="item" separator=",">
(#{item.recycleOrderId}, #{item.logisticsOrderId}, #{item.value})
</foreach>
ON DUPLICATE KEY UPDATE
value = values(value)
;
</insert>
</mapper>
2、测试方法
@GetMapping("deadlock-test")
public Response<String> deadlockTest(){
// 每次测试都创建一个固定线程数=2的线程池
ExecutorService executor = Executors.newFixedThreadPool(2);
try {
// 循环测试
for (long i = 1; i <= 10000; i++) {
log.info("第 {} 轮测试开始", i);
Future<?> future1 = executor.submit(() -> recycleOrderExtendService.batchUpsertTest(Arrays.asList(
RecycleOrderExtend.builder().recycleOrderId("1").logisticsOrderId("11").value(1).build(),
RecycleOrderExtend.builder().recycleOrderId("1").logisticsOrderId("12").value(2).build(),
RecycleOrderExtend.builder().recycleOrderId("1").logisticsOrderId("13").value(3).build(),
RecycleOrderExtend.builder().recycleOrderId("1").logisticsOrderId("14").value(4).build()
)));
Future<?> future2 = executor.submit(() -> recycleOrderExtendService.batchUpsertTest(Arrays.asList(
RecycleOrderExtend.builder().recycleOrderId("2").logisticsOrderId("21").value(1).build(),
RecycleOrderExtend.builder().recycleOrderId("2").logisticsOrderId("22").value(2).build(),
RecycleOrderExtend.builder().recycleOrderId("2").logisticsOrderId("23").value(3).build(),
RecycleOrderExtend.builder().recycleOrderId("2").logisticsOrderId("24").value(4).build()
)));
// // 等待完成,设置超时时间
future1.get(10, TimeUnit.SECONDS);
future2.get(10, TimeUnit.SECONDS);
}
return Response.success("ok");
} catch (Exception e) {
log.error("死锁测试异常", e);
return Response.error("出现异常:" + e.getMessage());
} finally {
executor.shutdown();
}
}
3、Spring Boot 应用日志:
2025-08-30 12:24:17.730 INFO [http-nio-12345-exec-4] com.sugarless.sbapiservice.controller.api.TestRestController: 第 1 轮测试开始
2025-08-30 12:24:17.758 ERROR [pool-6-thread-2] com.alibaba.druid.filter.logging.Slf4jLogFilter: {conn-10007, pstmt-20000} execute error. INSERT INTO recycle_order_extend (recycle_order_id, logistics_order_id, value)
VALUES
(?, ?, ?)
,
(?, ?, ?)
,
(?, ?, ?)
,
(?, ?, ?)
ON DUPLICATE KEY UPDATE
value = values(value)
;
com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Deadlock found when trying to get lock; try restarting transaction
at com.mysql.cj.jdbc.exceptions.SQLError.createSQLException(SQLError.java:115)
at com.mysql.cj.jdbc.exceptions.SQLExceptionsMapping.translateException(SQLExceptionsMapping.java:113)
at com.mysql.cj.jdbc.ClientPreparedStatement.executeInternal(ClientPreparedStatement.java:938)
at com.mysql.cj.jdbc.ClientPreparedStatement.execute(ClientPreparedStatement.java:359)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3462)
at com.alibaba.druid.wall.WallFilter.preparedStatement_execute(WallFilter.java:641)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3460)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:434)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3460)
at com.alibaba.druid.filter.FilterEventAdapter.preparedStatement_execute(FilterEventAdapter.java:434)
at com.alibaba.druid.filter.FilterChainImpl.preparedStatement_execute(FilterChainImpl.java:3460)
at com.alibaba.druid.proxy.jdbc.PreparedStatementProxyImpl.execute(PreparedStatementProxyImpl.java:158)
at com.alibaba.druid.pool.DruidPooledPreparedStatement.execute(DruidPooledPreparedStatement.java:483)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.ibatis.logging.jdbc.PreparedStatementLogger.invoke(PreparedStatementLogger.java:58)
at com.sun.proxy.$Proxy193.execute(Unknown Source)
at org.apache.ibatis.executor.statement.PreparedStatementHandler.update(PreparedStatementHandler.java:48)
at org.apache.ibatis.executor.statement.RoutingStatementHandler.update(RoutingStatementHandler.java:75)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:61)
at com.sun.proxy.$Proxy191.update(Unknown Source)
at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:50)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:117)
at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:76)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.apache.ibatis.plugin.Invocation.proceed(Invocation.java:61)
at com.baomidou.mybatisplus.extension.plugins.MybatisPlusInterceptor.intercept(MybatisPlusInterceptor.java:106)
at org.apache.ibatis.plugin.Plugin.invoke(Plugin.java:59)
at com.sun.proxy.$Proxy190.update(Unknown Source)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:197)
at org.apache.ibatis.session.defaults.DefaultSqlSession.insert(DefaultSqlSession.java:184)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:425)
at com.sun.proxy.$Proxy119.insert(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.insert(SqlSessionTemplate.java:272)
at com.baomidou.mybatisplus.core.override.MybatisMapperMethod.execute(MybatisMapperMethod.java:59)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy$PlainMethodInvoker.invoke(MybatisMapperProxy.java:152)
at com.baomidou.mybatisplus.core.override.MybatisMapperProxy.invoke(MybatisMapperProxy.java:89)
at com.sun.proxy.$Proxy128.insertOnDuplicateBatch(Unknown Source)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.base/java.lang.reflect.Method.invoke(Method.java:566)
at org.springframework.aop.support.AopUtils.invokeJoinpointUsingReflection(AopUtils.java:344)
at org.springframework.aop.framework.ReflectiveMethodInvocation.invokeJoinpoint(ReflectiveMethodInvocation.java:198)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at com.alibaba.druid.support.spring.stat.DruidStatInterceptor.invoke(DruidStatInterceptor.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241)
at com.sun.proxy.$Proxy130.insertOnDuplicateBatch(Unknown Source)
at com.sugarless.sbapiservice.service.impl.RecycleOrderExtendServiceImpl.batchUpsertTest(RecycleOrderExtendServiceImpl.java:27)
at com.sugarless.sbapiservice.service.impl.RecycleOrderExtendServiceImpl$$FastClassBySpringCGLIB$$917fed63.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:218)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.invokeJoinpoint(CglibAopProxy.java:792)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:163)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762)
at com.alibaba.druid.support.spring.stat.DruidStatInterceptor.invoke(DruidStatInterceptor.java:70)
at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186)
at org.springframework.aop.framework.CglibAopProxy$CglibMethodInvocation.proceed(CglibAopProxy.java:762)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:707)
at com.sugarless.sbapiservice.service.impl.RecycleOrderExtendServiceImpl$$EnhancerBySpringCGLIB$$3737e549.batchUpsertTest(<generated>)
at com.sugarless.sbapiservice.controller.api.TestRestController.lambda$deadlockTest$1(TestRestController.java:50)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Case 4:并发 lock in share mode + update 操作导致死锁
背景:上游系统发了重复消息,由于消费方做去重,有时会收到多条重复消息。在消费逻辑中使用 lock in share mode 来给数据加读锁,update 再更新字段。
这个死锁的逻辑是这样的:
T1 时刻:Session A 给索引 c 上的(c=10)这条记录加了读锁;
T2 时刻:Session B 也给索引 c 上的(c=10)这条记录加了读锁;
T3 时刻:Session A 的 update 操作试图给索引 c 上的(c=10)这条记录加写锁,被 Session B 持有的读锁阻塞了;
T4 时刻:Session B 的 update 操作也试图给索引 c 上的(c=10)这条记录加写锁,被 Session A 持有的读锁阻塞了;
此时形成循环依赖,即死锁;
六、如何预防死锁
数据库设计
选择合适的事务隔离级别:在 MySQL 默认的 REPEATABLE READ 隔离级别下,MySQL 会使用 Gap Lock 来解决幻读的问题,可以考虑修改为 READ COMMITTED 级别(不同业务语义可能受影响,修改前要评估影响范围),降低死锁概率;
合理设计索引:精确的唯一/主键检索(WHERE id = ?)通常只会加记录锁,锁范围更小。
应用开发
分布式锁:通过分布式锁消除在数据库事务中多个线程交叉执行 select ... lock in share mode / for update 并随后尝试插入或更新场景下,因并发或加锁顺序不一致导致的死锁;
统一事务操作顺序:多表更新时,统一先更新表 A 再更新表 B,可以避免交叉锁定形成循环依赖导致的死锁;单表更新时,可以考虑先根据唯一键对数据做排序,也可以避免交叉锁定形成循环依赖导致的死锁;
监控
监控死锁:定期抓取 show engine innodb status 的信息,判断有无死锁发生,通过告警的方式通知开发;
REF
极客时间 - MySQL 实战 45 讲:08 | 事务到底是隔离的还是不隔离的?
极客时间 - MySQL 实战 45 讲:21 | 为什么我只改一行的语句,锁这么多?
微信公众号 - 转转技术:深入剖析 SQL 死锁-两条 SQL 之间的死锁原因
https://github.com/aneasystone/mysql-deadlocks
评论