Python 应用性能监控实战——Prometheus + Grafana 接入，找到真正的瓶颈

老张2026/4/30大约 6 分钟

Python 应用性能监控实战——Prometheus + Grafana 接入，找到真正的瓶颈

适读人群：Python 后端工程师、希望给服务加上完整可观测性的开发者 | 阅读时长：约15分钟 | 核心价值：从零搭建 Prometheus + Grafana 监控体系，用数据说话找性能瓶颈

有段时间，我们的 Python API 服务平均响应时间看起来很正常，大概78毫秒，运营也没投诉。但有个技术总监总说"感觉有点慢"，我们查了半天找不到原因，就当他感觉不准了。

直到我接入了 Prometheus 开始看 P99 响应时间——3.2秒。

平均值 78ms，P99 却是 3200ms。意味着每100个请求里，有1个要等3秒多才有响应。用户体验上就是"大部分时候很快，偶尔会卡一下"。这正好对上了技术总监说的"感觉有点慢"。

平均值会撒谎，百分位数才是真相。

监控体系的三个层次

接入监控之前，先搞清楚要监控什么。我把监控分三层：

第一层：基础指标（必须有）

请求量（QPS/RPS）
响应时间（P50/P95/P99，不是平均值）
错误率（4xx/5xx 占比）
进程资源（CPU、内存、文件描述符）

第二层：依赖健康（重要）

数据库查询时间分布
Redis 操作延迟
外部 HTTP 调用成功率和延迟

第三层：业务指标（按需）

具体接口的业务成功率
核心流程的漏斗转化

今天重点讲怎么在 FastAPI/Flask 里接入 Prometheus，然后在 Grafana 里把这些数据可视化。

FastAPI 接入 Prometheus

# metrics.py
import time
from prometheus_client import (
    Counter, Histogram, Gauge, 
    CONTENT_TYPE_LATEST, generate_latest,
    CollectorRegistry, REGISTRY
)
from fastapi import FastAPI, Request, Response
from fastapi.routing import APIRoute
import psutil
import os

# ---- 定义指标 ----
HTTP_REQUESTS_TOTAL = Counter(
    'http_requests_total',
    'Total number of HTTP requests',
    ['method', 'endpoint', 'http_status']
)

HTTP_REQUEST_DURATION = Histogram(
    'http_request_duration_seconds',
    'HTTP request latency',
    ['method', 'endpoint'],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1.0, 2.5, 5.0, 10.0]
    # bucket 设计：覆盖你实际的响应时间分布范围
)

HTTP_REQUESTS_IN_FLIGHT = Gauge(
    'http_requests_in_flight',
    'Current number of requests being processed'
)

PROCESS_MEMORY = Gauge(
    'process_memory_bytes',
    'Process memory usage in bytes',
    ['type']
)

DB_QUERY_DURATION = Histogram(
    'db_query_duration_seconds',
    'Database query duration',
    ['operation', 'table'],
    buckets=[0.001, 0.005, 0.01, 0.05, 0.1, 0.5, 1.0, 5.0]
)


# ---- 中间件 ----
async def metrics_middleware(request: Request, call_next):
    # 排除 /metrics 和 /health 自身，避免自监控噪音
    if request.url.path in ('/metrics', '/health', '/favicon.ico'):
        return await call_next(request)
    
    endpoint = request.url.path
    method = request.method
    
    HTTP_REQUESTS_IN_FLIGHT.inc()
    start_time = time.perf_counter()
    
    try:
        response = await call_next(request)
        status_code = response.status_code
    except Exception as exc:
        status_code = 500
        raise exc
    finally:
        duration = time.perf_counter() - start_time
        HTTP_REQUESTS_IN_FLIGHT.dec()
        
        HTTP_REQUESTS_TOTAL.labels(
            method=method,
            endpoint=endpoint,
            http_status=str(status_code)
        ).inc()
        
        HTTP_REQUEST_DURATION.labels(
            method=method,
            endpoint=endpoint
        ).observe(duration)
        
        # 每次请求顺便更新内存指标
        proc = psutil.Process(os.getpid())
        mem = proc.memory_info()
        PROCESS_MEMORY.labels(type='rss').set(mem.rss)
        PROCESS_MEMORY.labels(type='vms').set(mem.vms)
    
    return response


# ---- /metrics 端点 ----
def setup_metrics(app: FastAPI):
    app.middleware('http')(metrics_middleware)
    
    @app.get('/metrics', include_in_schema=False)
    async def metrics():
        return Response(
            content=generate_latest(REGISTRY),
            media_type=CONTENT_TYPE_LATEST
        )

在 main.py 里使用：

# main.py
from fastapi import FastAPI
from metrics import setup_metrics, DB_QUERY_DURATION
import time

app = FastAPI()
setup_metrics(app)


# 数据库查询装饰器，方便给所有查询加监控
def track_db_query(operation: str, table: str):
    def decorator(func):
        async def wrapper(*args, **kwargs):
            start = time.perf_counter()
            try:
                result = await func(*args, **kwargs)
                return result
            finally:
                duration = time.perf_counter() - start
                DB_QUERY_DURATION.labels(
                    operation=operation,
                    table=table
                ).observe(duration)
        return wrapper
    return decorator


@app.get("/api/users/{user_id}")
@track_db_query(operation="SELECT", table="users")
async def get_user(user_id: int):
    # 实际查询逻辑
    ...

踩坑实录一：endpoint 标签导致指标爆炸

现象： 服务跑了几天后，Prometheus 内存占用从200MB涨到了1.5GB，报警触发。

原因： 我们的接口路径里有动态 ID，比如 /api/users/12345、/api/users/67890，每一个 user_id 都被当成不同的 endpoint 标签值存储。用了一周，几十万个不同的路径被存成了几十万个不同的时间序列。这叫 Cardinality Explosion（基数爆炸）。

解法： 要把动态路径参数归一化：

async def metrics_middleware(request: Request, call_next):
    # 获取路由模板，而不是实际路径
    endpoint = request.url.path
    
    # 尝试从路由信息里拿模板路径
    route = request.scope.get('route')
    if route and hasattr(route, 'path'):
        endpoint = route.path  # "/api/users/{user_id}" 而不是 "/api/users/12345"
    
    # ... 其余逻辑不变

这样所有 /api/users/任意ID 的请求都会被归到 /api/users/{user_id} 这个标签下，完全解决 Cardinality 问题。

踩坑实录二：Histogram bucket 设不对导致 P99 算不准

现象： Grafana 里看 P99 响应时间，显示的值总是跟实际感知差很远，要么偏高要么偏低。

原因： Histogram 的 P99 是估算值，精度取决于 bucket 的设置。如果 bucket 设的是 [0.1, 0.5, 1, 5, 10]，那么真正100ms到500ms之间的值，只能知道它落在这个区间里，算出来的 P99 误差很大。

解法： bucket 要覆盖实际响应时间的分布，且在关键区间里密度要高：

# 如果你的接口主要在 10ms~500ms 之间，bucket 应该这样设：
buckets = [
    0.005,   # 5ms
    0.010,   # 10ms
    0.025,   # 25ms
    0.050,   # 50ms
    0.100,   # 100ms
    0.200,   # 200ms
    0.500,   # 500ms
    1.000,   # 1s
    2.500,   # 2.5s
    5.000,   # 5s
]

关键区间（通常是P50到P99之间的范围）要设多个 bucket，让估算更精确。

Prometheus 配置

# prometheus.yml
global:
  scrape_interval: 15s     # 每15秒抓一次指标
  evaluation_interval: 15s

alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

rule_files:
  - "alert_rules.yml"

scrape_configs:
  - job_name: 'myapp'
    static_configs:
      - targets: ['myapp:8000']
    metrics_path: '/metrics'
    scrape_timeout: 10s

告警规则示例：

# alert_rules.yml
groups:
  - name: myapp_alerts
    rules:
      # P99 响应时间超过1秒
      - alert: HighP99Latency
        expr: |
          histogram_quantile(0.99, 
            rate(http_request_duration_seconds_bucket[5m])
          ) > 1.0
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "P99 latency {{ $value | humanizeDuration }} on {{ $labels.endpoint }}"
      
      # 错误率超过 1%
      - alert: HighErrorRate
        expr: |
          rate(http_requests_total{http_status=~"5.."}[5m])
          / rate(http_requests_total[5m]) > 0.01
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Error rate {{ $value | humanizePercentage }}"

Grafana Dashboard 关键面板

几个我认为最重要的面板：

面板1：P50/P95/P99 响应时间趋势

# PromQL
histogram_quantile(0.99, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint))
histogram_quantile(0.95, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint))
histogram_quantile(0.50, sum(rate(http_request_duration_seconds_bucket[5m])) by (le, endpoint))

面板2：按接口分组的错误率热力图

# PromQL
sum(rate(http_requests_total{http_status=~"5.."}[5m])) by (endpoint)
/ sum(rate(http_requests_total[5m])) by (endpoint)

面板3：当前正在处理的请求数（发现请求堆积）

http_requests_in_flight

面板4：数据库查询分布

histogram_quantile(0.99, sum(rate(db_query_duration_seconds_bucket[5m])) by (le, operation, table))

踩坑实录三：找到那个3秒 P99 的根因

回到开头说的那个问题，P99 3.2秒。

有了 DB 查询监控之后，我看了一下各个表的 P99 查询时间：

users 表：SELECT P99 = 12ms，正常
orders 表：SELECT P99 = 2800ms，异常！

找到了。然后看慢查询日志，发现有个查询没加索引，在数据量到百万级别后开始全表扫描。加了索引，orders 表 SELECT P99 降到了18ms，整体服务 P99 也从 3200ms 降到了 95ms。

没有监控，我们根本不知道去哪里找问题。有了监控，定位时间从"不知道"变成了5分钟。

快速搭建本地验证环境

# docker-compose.yml
version: '3.8'
services:
  myapp:
    build: .
    ports:
      - "8000:8000"
  
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=7d'
  
  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
    volumes:
      - grafana-data:/var/lib/grafana

volumes:
  grafana-data:

docker-compose up -d 启动，Grafana 访问 http://localhost:3000，添加 Prometheus 数据源，导入或创建 Dashboard。

监控不是花架子，是你在黑暗里的手电筒。没有数据之前，你对服务的一切判断都是猜测。