第2459篇:AI辅助的数据库优化——慢查询诊断和索引推荐自动化
第2459篇:AI辅助的数据库优化——慢查询诊断和索引推荐自动化
适读人群:后端工程师、DBA、技术负责人 | 阅读时长:约17分钟 | 核心价值:用LLM自动分析慢查询、推荐索引方案,把DBA的经验工程化
有个真实的故事。我们公司一个核心报表页面,用户反馈"加载很慢",快的时候2秒,慢的时候能等10秒。
DBA同事去查了慢查询日志,找到了罪魁祸首——一条关联了5张表的复杂查询。他的建议是:在order_date列上加个复合索引,覆盖user_id和status。
加完索引,查询从8秒降到了0.3秒。
这件事让我一直在想:DBA同事做的诊断,核心逻辑是什么?他是怎么从慢查询SQL和执行计划中,推断出需要在哪些列加什么索引的?这个逻辑能不能用代码表达出来?能不能用LLM来做?
答案是:可以。而且比规则引擎好很多。
慢查询诊断的本质
数据库查询慢,原因其实就那么几类:
- 全表扫描:没有索引,或者有索引但优化器没选择使用
- 索引失效:用了函数、隐式转换、前缀模糊查询等导致索引失效
- 回表开销大:索引覆盖率低,大量回表
- 连接顺序不优:多表JOIN的驱动表选择不合理
- 数据倾斜:某个索引列的选择性太低(比如status只有0/1两个值)
- 缺少统计信息:优化器的统计信息过期,导致执行计划选择错误
传统的慢查询分析工具能告诉你"这条查询慢了",能给你执行计划,但给不了你自然语言的诊断报告和具体的优化建议。
LLM能做到这一点。
系统架构设计
核心实现
1. 慢查询日志采集
MySQL的慢查询日志格式有些quirky,需要专门解析:
@Component
public class SlowQueryLogParser {
// MySQL慢查询日志的固定格式
private static final Pattern QUERY_TIME_PATTERN =
Pattern.compile("# Query_time: ([\\d.]+)\\s+Lock_time: ([\\d.]+)\\s+Rows_sent: (\\d+)\\s+Rows_examined: (\\d+)");
private static final Pattern TIMESTAMP_PATTERN =
Pattern.compile("# Time: (\\d{6}\\s+\\d{1,2}:\\d{2}:\\d{2}|\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2})");
private static final Pattern USER_HOST_PATTERN =
Pattern.compile("# User@Host: (\\S+)\\[(\\S+)\\] @ (\\S*) \\[(\\S*)\\]");
public List<SlowQuery> parse(InputStream logStream) throws IOException {
List<SlowQuery> queries = new ArrayList<>();
BufferedReader reader = new BufferedReader(new InputStreamReader(logStream));
SlowQuery.Builder current = null;
StringBuilder sqlBuilder = new StringBuilder();
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("# Time:")) {
if (current != null && sqlBuilder.length() > 0) {
current.sql(normalizeSql(sqlBuilder.toString()));
queries.add(current.build());
sqlBuilder.setLength(0);
}
current = SlowQuery.builder();
Matcher m = TIMESTAMP_PATTERN.matcher(line);
if (m.find()) current.timestamp(parseTimestamp(m.group(1)));
} else if (line.startsWith("# Query_time:") && current != null) {
Matcher m = QUERY_TIME_PATTERN.matcher(line);
if (m.find()) {
current.queryTime(Double.parseDouble(m.group(1)));
current.lockTime(Double.parseDouble(m.group(2)));
current.rowsSent(Long.parseLong(m.group(3)));
current.rowsExamined(Long.parseLong(m.group(4)));
}
} else if (!line.startsWith("#") && current != null && !line.startsWith("SET timestamp=")) {
sqlBuilder.append(line).append(" ");
}
}
return queries.stream()
.filter(q -> q.getQueryTime() >= 1.0) // 只保留超过1秒的查询
.sorted(Comparator.comparingDouble(SlowQuery::getQueryTime).reversed())
.collect(toList());
}
private String normalizeSql(String sql) {
// 把参数值替换为占位符,便于相似SQL聚合
return sql.trim()
.replaceAll("'[^']*'", "?")
.replaceAll("\\b\\d+\\b", "?")
.replaceAll("\\s+", " ");
}
}2. 执行计划分析
拿到慢SQL之后,要获取EXPLAIN ANALYZE的详细信息:
@Component
public class ExecutionPlanAnalyzer {
private final DataSource dataSource;
public ExecutionPlan analyze(String sql, Map<String, Object> sampleParams) {
// 用EXPLAIN FORMAT=JSON获取结构化执行计划
String explainSql = "EXPLAIN FORMAT=JSON " + substituteParams(sql, sampleParams);
try (Connection conn = dataSource.getConnection();
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery(explainSql)) {
if (rs.next()) {
String jsonPlan = rs.getString(1);
return parseExecutionPlan(jsonPlan);
}
} catch (SQLException e) {
log.warn("获取执行计划失败: {}", e.getMessage());
}
return ExecutionPlan.empty();
}
private ExecutionPlan parseExecutionPlan(String jsonPlan) {
ObjectNode planNode = (ObjectNode) objectMapper.readTree(jsonPlan);
JsonNode queryBlock = planNode.get("query_block");
ExecutionPlan.Builder builder = ExecutionPlan.builder();
builder.rawJson(jsonPlan);
// 提取关键信息
extractTableScans(queryBlock, builder);
extractJoinInfo(queryBlock, builder);
extractKeyUsage(queryBlock, builder);
return builder.build();
}
private void extractTableScans(JsonNode node, ExecutionPlan.Builder builder) {
if (node.has("table")) {
JsonNode table = node.get("table");
String tableName = table.path("table_name").asText();
String accessType = table.path("access_type").asText();
double rowsExamined = table.path("rows_examined_per_scan").asDouble();
boolean usingFilesort = table.path("using_filesort").asBoolean(false);
boolean usingTemporary = table.path("using_temporary_table").asBoolean(false);
if ("ALL".equals(accessType)) {
builder.addFullScan(TableScan.of(tableName, rowsExamined));
}
if (usingFilesort) {
builder.addFilesort(tableName);
}
if (usingTemporary) {
builder.addTemporaryTable(tableName);
}
}
// 递归处理嵌套查询
node.fields().forEachRemaining(entry -> {
if (entry.getValue().isObject()) {
extractTableScans(entry.getValue(), builder);
}
});
}
}3. 表结构信息收集
LLM分析需要知道表的结构,包括现有索引情况:
@Component
public class TableMetadataCollector {
private final DataSource dataSource;
public TableMetadata collect(String tableName, String schema) {
return TableMetadata.builder()
.tableName(tableName)
.columns(getColumns(tableName, schema))
.indexes(getIndexes(tableName, schema))
.tableStats(getTableStats(tableName, schema))
.build();
}
private List<ColumnInfo> getColumns(String tableName, String schema) {
String sql = """
SELECT
COLUMN_NAME,
DATA_TYPE,
IS_NULLABLE,
COLUMN_DEFAULT,
COLUMN_COMMENT,
CHARACTER_MAXIMUM_LENGTH,
NUMERIC_PRECISION,
NUMERIC_SCALE
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = ? AND TABLE_SCHEMA = ?
ORDER BY ORDINAL_POSITION
""";
return jdbcTemplate.query(sql,
(rs, row) -> ColumnInfo.builder()
.name(rs.getString("COLUMN_NAME"))
.dataType(rs.getString("DATA_TYPE"))
.nullable(rs.getString("IS_NULLABLE").equals("YES"))
.comment(rs.getString("COLUMN_COMMENT"))
.build(),
tableName, schema);
}
private List<IndexInfo> getIndexes(String tableName, String schema) {
String sql = """
SELECT
INDEX_NAME,
SEQ_IN_INDEX,
COLUMN_NAME,
NON_UNIQUE,
CARDINALITY,
INDEX_TYPE
FROM INFORMATION_SCHEMA.STATISTICS
WHERE TABLE_NAME = ? AND TABLE_SCHEMA = ?
ORDER BY INDEX_NAME, SEQ_IN_INDEX
""";
List<Map<String, Object>> rows = jdbcTemplate.queryForList(sql, tableName, schema);
// 按索引名聚合
Map<String, IndexInfo.Builder> indexMap = new LinkedHashMap<>();
for (Map<String, Object> row : rows) {
String indexName = (String) row.get("INDEX_NAME");
indexMap.computeIfAbsent(indexName, k -> IndexInfo.builder()
.name(k)
.unique(!((Long) row.get("NON_UNIQUE")).equals(1L))
.type((String) row.get("INDEX_TYPE")))
.addColumn((String) row.get("COLUMN_NAME"), (Long) row.get("CARDINALITY"));
}
return indexMap.values().stream()
.map(IndexInfo.Builder::build)
.collect(toList());
}
private TableStats getTableStats(String tableName, String schema) {
String sql = """
SELECT TABLE_ROWS, DATA_LENGTH, INDEX_LENGTH, CREATE_TIME, UPDATE_TIME
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_NAME = ? AND TABLE_SCHEMA = ?
""";
return jdbcTemplate.queryForObject(sql,
(rs, row) -> TableStats.builder()
.estimatedRows(rs.getLong("TABLE_ROWS"))
.dataBytes(rs.getLong("DATA_LENGTH"))
.indexBytes(rs.getLong("INDEX_LENGTH"))
.build(),
tableName, schema);
}
}4. LLM诊断引擎
把所有信息汇总,构建LLM分析请求:
@Service
public class LLMQueryOptimizer {
private final ChatClient chatClient;
private final TableMetadataCollector metadataCollector;
private final ExecutionPlanAnalyzer planAnalyzer;
public OptimizationReport analyze(SlowQuery query) {
// 收集涉及的表信息
List<String> tables = SqlTableExtractor.extract(query.getSql());
Map<String, TableMetadata> tableMetadataMap = new HashMap<>();
for (String table : tables) {
tableMetadataMap.put(table, metadataCollector.collect(table, currentSchema));
}
// 获取执行计划
ExecutionPlan plan = planAnalyzer.analyze(query.getSql(), Collections.emptyMap());
// 构建分析prompt
String analysisPrompt = buildAnalysisPrompt(query, tableMetadataMap, plan);
ChatResponse response = chatClient.call(new Prompt(
List.of(
new SystemMessage(getSystemPrompt()),
new UserMessage(analysisPrompt)
),
OpenAiChatOptions.builder()
.withModel("gpt-4o")
.withTemperature(0.1f)
.withResponseFormat(new ResponseFormat(ResponseFormat.Type.JSON_OBJECT))
.build()
));
return parseOptimizationReport(response.getResult().getOutput().getContent(), query);
}
private String buildAnalysisPrompt(
SlowQuery query,
Map<String, TableMetadata> tables,
ExecutionPlan plan) {
StringBuilder sb = new StringBuilder();
sb.append("## 慢查询信息\n");
sb.append("执行时间: ").append(query.getQueryTime()).append("秒\n");
sb.append("扫描行数: ").append(query.getRowsExamined()).append("\n");
sb.append("返回行数: ").append(query.getRowsSent()).append("\n");
sb.append("SQL:\n```sql\n").append(query.getSql()).append("\n```\n\n");
sb.append("## 执行计划\n");
sb.append("全表扫描: ").append(plan.getFullScans()).append("\n");
sb.append("Using filesort: ").append(plan.getFilesortTables()).append("\n");
sb.append("Using temporary: ").append(plan.getTemporaryTables()).append("\n");
sb.append("```json\n").append(plan.getRawJson()).append("\n```\n\n");
sb.append("## 表结构信息\n");
for (Map.Entry<String, TableMetadata> entry : tables.entrySet()) {
TableMetadata meta = entry.getValue();
sb.append("### 表 `").append(entry.getKey()).append("`\n");
sb.append("预估行数: ").append(meta.getStats().getEstimatedRows()).append("\n");
sb.append("**现有索引**:\n");
for (IndexInfo idx : meta.getIndexes()) {
sb.append("- ").append(idx.getName())
.append(idx.isUnique() ? " (UNIQUE)" : "")
.append(": ").append(idx.getColumns()).append("\n");
}
sb.append("**列信息**:\n");
for (ColumnInfo col : meta.getColumns()) {
sb.append("- ").append(col.getName())
.append(" ").append(col.getDataType())
.append(col.isNullable() ? "" : " NOT NULL");
if (col.getComment() != null && !col.getComment().isBlank()) {
sb.append(" -- ").append(col.getComment());
}
sb.append("\n");
}
sb.append("\n");
}
return sb.toString();
}
private String getSystemPrompt() {
return """
你是一个专业的MySQL DBA,擅长SQL性能优化和索引设计。
分析慢查询时,请:
1. 识别性能瓶颈(全表扫描、filesort、临时表等)
2. 分析WHERE、JOIN、ORDER BY、GROUP BY涉及的列
3. 检查现有索引是否覆盖这些列
4. 推荐具体的索引方案,包括列顺序的理由
5. 评估索引带来的收益和维护成本
返回JSON格式:
{
"diagnosis": "问题诊断描述",
"bottlenecks": ["瓶颈列表"],
"indexRecommendations": [
{
"table": "表名",
"columns": ["列1", "列2"],
"indexType": "BTREE/FULLTEXT",
"reason": "推荐原因",
"expectedImprovement": "预期改善",
"ddl": "CREATE INDEX ... ON ...",
"priority": "HIGH/MEDIUM/LOW"
}
],
"sqlRewriteSuggestion": "如果SQL本身有优化空间,给出重写建议",
"estimatedSpeedup": "预计提速倍数"
}
""";
}
}自动验证推荐效果
光有推荐不够,还需要验证推荐是否真的有效。我们用pt-query-advisor的思路实现了一个dry-run验证:
@Service
public class IndexRecommendationValidator {
public ValidationResult validate(
IndexRecommendation recommendation,
SlowQuery originalQuery) {
// 在测试数据库上应用索引,比较执行计划变化
try {
// 1. 备份原始执行计划
ExecutionPlan beforePlan = planAnalyzer.analyze(originalQuery.getSql(), Collections.emptyMap());
// 2. 在临时连接上创建测试索引
String testIndexDdl = recommendation.getDdl()
.replace("CREATE INDEX", "CREATE INDEX IF NOT EXISTS __test_index");
jdbcTemplate.execute(testIndexDdl);
// 3. 强制使用新索引获取执行计划
ExecutionPlan afterPlan = planAnalyzer.analyze(originalQuery.getSql(), Collections.emptyMap());
// 4. 比较关键指标
double rowsBeforeAvg = beforePlan.getEstimatedRowsExamined();
double rowsAfterAvg = afterPlan.getEstimatedRowsExamined();
boolean eliminated = beforePlan.hasFullScan() && !afterPlan.hasFullScan();
double estimatedSpeedup = rowsBeforeAvg / Math.max(rowsAfterAvg, 1);
return ValidationResult.builder()
.valid(estimatedSpeedup > 2.0 || eliminated)
.estimatedSpeedup(estimatedSpeedup)
.fullScanEliminated(eliminated)
.beforePlan(beforePlan)
.afterPlan(afterPlan)
.build();
} finally {
// 清理测试索引
try {
jdbcTemplate.execute("DROP INDEX IF EXISTS __test_index ON " + recommendation.getTable());
} catch (Exception ignored) {}
}
}
}工程实践经验
关于哪些查询值得优化:不是所有慢查询都值得花时间。我们的策略是按"执行频率 × 平均耗时"来排序,优先处理执行最频繁的慢查询,而不是单次耗时最长的——一条每天执行10000次的1秒查询,优先级远高于每天执行1次的10秒查询。
关于LLM推荐的可信度:LLM的索引推荐大方向一般是对的,但细节要人工review。我们遇到过LLM建议加一个字段选择性很低的索引(比如status字段只有3个值),这种索引加了基本没用,反而增加了写开销。所以LLM的建议是"参考",不是"直接执行"。
关于监控效果:推荐的索引加上去之后,一定要监控两个指标:查询P99延迟变化,以及写操作(INSERT/UPDATE/DELETE)的延迟变化。索引能加快读,但会减慢写,要找到平衡点。
