K8s Ingress深度:Nginx Ingress Controller的限流、灰度、SSL配置
K8s Ingress深度:Nginx Ingress Controller的限流、灰度、SSL配置
适读人群:负责K8s集群流量入口管理的工程师 | 阅读时长:约22分钟 | 适用版本:K8s 1.22+、Nginx Ingress Controller 1.9+
开篇故事
去年我们有个API网关服务遭受了一次爬虫攻击,某个IP以每秒2000次的速度猛刷我们的商品列表接口,把服务的响应时间从平均30ms拉到了2秒多。我们的K8s集群没有任何限流机制,Ingress就是一个透明的代理,流量完全透传。
临时处理方案是在Ingress里加了IP黑名单,但那个IP换了VPN继续打。最终我们在Nginx Ingress Controller上配置了基于IP的限流(每IP每秒最多100个请求)和基于User-Agent的限制,才把这波攻击挡住。
那次事故之后,我们系统地把Nginx Ingress Controller的高级特性都用了起来:限流、灰度发布、SSL终止、请求重写、安全加固。这篇文章把这些配置完整整理出来。
一、核心问题分析
Ingress的定位和能力边界
K8s的Ingress是集群流量入口的标准抽象。Ingress Controller(如Nginx Ingress Controller)是实际的实现。Ingress资源定义了路由规则,Ingress Controller负责把这些规则翻译成实际的代理配置。
Nginx Ingress Controller通过annotation(注解)扩展能力。核心能力包括:基于路径和Host的路由;SSL/TLS终止;限流(请求速率限制、并发连接数限制);灰度发布(Canary);请求/响应头修改;认证(Basic Auth、OAuth2)。
流量路径
二、原理深度解析
Nginx Ingress Controller的工作机制
Nginx Ingress Controller本质上是一个Nginx进程,加上一个控制器(Controller)。控制器监听K8s API Server上的Ingress、Service、Endpoint变更事件,把这些变更翻译成Nginx配置,然后热重载Nginx(nginx -s reload)使配置生效。
每次Ingress规则变更,Nginx都会重载。热重载期间,Nginx使用旧进程处理已有连接,用新进程处理新连接,基本做到无停机切换。但高频更新Ingress(如频繁部署导致Service Endpoints变更)会造成频繁重载,有轻微性能影响。
三、完整配置实现
基础Ingress配置(HTTPS + 路由)
# ingress-basic.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-gateway
namespace: production
annotations:
# 指定使用Nginx Ingress Controller
kubernetes.io/ingress.class: "nginx"
# SSL重定向:HTTP自动跳转HTTPS
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
# HTTP/2支持
nginx.ingress.kubernetes.io/http2-push-preload: "true"
# 安全头
nginx.ingress.kubernetes.io/configuration-snippet: |
add_header X-Frame-Options "SAMEORIGIN" always;
add_header X-Content-Type-Options "nosniff" always;
add_header X-XSS-Protection "1; mode=block" always;
add_header Referrer-Policy "strict-origin-when-cross-origin" always;
add_header Permissions-Policy "camera=(), microphone=(), geolocation=()" always;
# 跨域配置
nginx.ingress.kubernetes.io/enable-cors: "true"
nginx.ingress.kubernetes.io/cors-allow-origin: "https://app.company.com"
nginx.ingress.kubernetes.io/cors-allow-methods: "GET, POST, PUT, DELETE, OPTIONS"
nginx.ingress.kubernetes.io/cors-allow-headers: "Authorization, Content-Type, X-Request-ID"
# 超时配置
nginx.ingress.kubernetes.io/proxy-connect-timeout: "10"
nginx.ingress.kubernetes.io/proxy-send-timeout: "60"
nginx.ingress.kubernetes.io/proxy-read-timeout: "60"
# 上传文件大小限制
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
# TLS配置(SSL终止)
tls:
- hosts:
- api.company.com
secretName: api-tls-secret # 证书存放在Secret里
rules:
- host: api.company.com
http:
paths:
- path: /orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 8080
- path: /users
pathType: Prefix
backend:
service:
name: user-service
port:
number: 8080
- path: /payments
pathType: Prefix
backend:
service:
name: payment-service
port:
number: 8080限流配置
# ingress-rate-limit.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: order-service-ingress
namespace: production
annotations:
kubernetes.io/ingress.class: "nginx"
# =============================================
# 限流配置
# =============================================
# 基于IP的限流:每秒最多100个请求
nginx.ingress.kubernetes.io/limit-rps: "100"
# 基于连接的限流:每个IP最多50个并发连接
nginx.ingress.kubernetes.io/limit-connections: "50"
# 限流超出后返回的HTTP状态码(默认503,改成429更标准)
nginx.ingress.kubernetes.io/limit-req-status-code: "429"
# 限流白名单(多个IP用,分隔)
nginx.ingress.kubernetes.io/limit-whitelist: "10.0.0.0/8,172.16.0.0/12"
# 每个IP的请求速率(使用漏桶算法)
# rps=100 表示每秒100个请求
# burst=200 表示突发允许200个,超过才限流
nginx.ingress.kubernetes.io/global-rate-limit: "100"
nginx.ingress.kubernetes.io/global-rate-limit-window: "1s"
nginx.ingress.kubernetes.io/global-rate-limit-key: "$binary_remote_addr"
nginx.ingress.kubernetes.io/global-rate-limit-status-code: "429"
spec:
rules:
- host: api.company.com
http:
paths:
- path: /orders
pathType: Prefix
backend:
service:
name: order-service
port:
number: 8080灰度发布(Canary)配置
# =============================================
# 稳定版本Ingress(主流量)
# =============================================
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: order-service-stable
namespace: production
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: api.company.com
http:
paths:
- path: /orders
pathType: Prefix
backend:
service:
name: order-service-stable
port:
number: 8080
---
# =============================================
# 灰度版本Ingress(Canary流量)
# =============================================
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: order-service-canary
namespace: production
annotations:
kubernetes.io/ingress.class: "nginx"
# 开启Canary模式
nginx.ingress.kubernetes.io/canary: "true"
# 方式一:按权重分配(5%的流量走灰度)
nginx.ingress.kubernetes.io/canary-weight: "5"
# 方式二:按Header分配(Header中有canary: always的请求走灰度)
# nginx.ingress.kubernetes.io/canary-by-header: "canary"
# nginx.ingress.kubernetes.io/canary-by-header-value: "true"
# 方式三:按Cookie分配(内测用户)
# nginx.ingress.kubernetes.io/canary-by-cookie: "canary_user"
spec:
rules:
- host: api.company.com
http:
paths:
- path: /orders
pathType: Prefix
backend:
service:
name: order-service-canary
port:
number: 8080灰度发布流程:
# 第一步:部署灰度版本(5%流量)
kubectl apply -f ingress-canary-5percent.yaml
# 观察灰度指标,如果正常,增加到20%
kubectl patch ingress order-service-canary \
-n production \
--type=json \
-p='[{"op": "replace", "path": "/metadata/annotations/nginx.ingress.kubernetes.io~1canary-weight", "value": "20"}]'
# 继续增加到50%、80%...
# 最终切到100%后,删除灰度Ingress,更新稳定版本SSL证书自动管理(cert-manager)
# cert-manager-cluster-issuer.yaml
# 前提:已安装cert-manager
# helm repo add jetstack https://charts.jetstack.io
# helm install cert-manager jetstack/cert-manager --set installCRDs=true
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: letsencrypt-prod
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: ops@company.com
privateKeySecretRef:
name: letsencrypt-prod
solvers:
- http01:
ingress:
class: nginx
---
# Ingress中引用cert-manager自动申请证书
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: api-gateway
namespace: production
annotations:
kubernetes.io/ingress.class: "nginx"
# 指定使用哪个ClusterIssuer
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
tls:
- hosts:
- api.company.com
secretName: api-tls-secret # cert-manager会自动创建并管理这个Secret
rules:
- host: api.company.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: api-gateway-service
port:
number: 8080四、生产最佳实践
Nginx Ingress Controller的性能调优
# nginx-ingress-configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: nginx-ingress-controller
namespace: ingress-nginx
data:
# 工作进程数(建议设为CPU核数)
worker-processes: "auto"
# 每个worker的最大连接数
max-worker-connections: "65536"
# keepalive连接超时
keep-alive: "75"
keep-alive-requests: "1000"
# 上游keepalive连接池大小
upstream-keepalive-connections: "200"
upstream-keepalive-time: "1h"
# 访问日志格式(包含请求ID便于追踪)
log-format-upstream: >-
$remote_addr - $remote_user [$time_local] "$request"
$status $body_bytes_sent "$http_referer" "$http_user_agent"
$request_time $upstream_response_time $upstream_addr
$request_id
# 开启gzip压缩
use-gzip: "true"
gzip-level: "5"
gzip-types: "text/plain text/css application/json application/javascript text/xml application/xml"
# 代理缓冲区大小
proxy-buffer-size: "16k"
proxy-buffers: "4 16k"五、踩坑实录
坑一:Canary权重调整后不生效
把灰度权重从5%改到20%,刷新了好多次,流量分布怎么看都还是5%左右。排查了半天,发现是因为Nginx配置没有及时刷新。
Nginx Ingress Controller不是实时监听Ingress变更的,有一个同步周期(默认1分钟)。如果需要立即生效,可以重启Controller Pod:
kubectl rollout restart deployment/nginx-ingress-controller -n ingress-nginx或者把同步周期改短(代价是更频繁地热重载Nginx):
# 添加启动参数
--sync-rate-limit=10 # 每秒最多处理10个变更坑二:大文件上传被拒绝,报413错误
用户上传图片时,经常报413 Request Entity Too Large。Nginx默认的client_max_body_size是1MB,对于图片上传完全不够。
在Ingress的annotation里加上:
nginx.ingress.kubernetes.io/proxy-body-size: "50m"同时,如果后端是Spring Boot,也需要配置:
spring:
servlet:
multipart:
max-file-size: 50MB
max-request-size: 50MB坑三:WebSocket长连接被Nginx断开
有个推送服务用WebSocket,用户反映消息推送时不时断连。查日志发现Nginx每隔60秒就会断开空闲的连接,而WebSocket心跳间隔是90秒,所以正好被切断。
Nginx的WebSocket需要特别配置:
annotations:
nginx.ingress.kubernetes.io/proxy-read-timeout: "3600"
nginx.ingress.kubernetes.io/proxy-send-timeout: "3600"
nginx.ingress.kubernetes.io/configuration-snippet: |
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";同时把WebSocket心跳间隔从90秒改到30秒,在超时前保活。
六、总结
Nginx Ingress Controller是K8s集群流量管理的瑞士军刀,通过annotation可以实现Nginx绝大部分的高级特性,而不需要直接写nginx.conf。
限流是生产必配的,哪怕设得宽松一些,至少能防止异常流量把后端服务打垮;灰度发布是降低发布风险的有效手段,新功能先小流量验证,没问题再全量;SSL终止配合cert-manager可以实现证书的全自动管理,再也不用手动续期了。
Ingress能力有限的场景(如复杂的流量治理、熔断、链路追踪),就需要上服务网格(Istio)了,这是下一个层次的能力。
