K8s日志收集:Fluentd/Fluent Bit的DaemonSet部署与ELK对接
K8s日志收集:Fluentd/Fluent Bit的DaemonSet部署与ELK对接
适读人群:负责K8s日志收集和分析的工程师 | 阅读时长:约20分钟 | 适用版本:K8s 1.22+、Fluent Bit 2.x
开篇故事
有次生产故障,服务下午3点出现了5分钟的不可用,所有用户请求都返回了500错误。故障处理完之后,我们要做复盘,需要查这5分钟内各个服务的日志。
但问题来了:出事的那几个Pod早就被K8s重启了,Pod重启后日志就消失了。之前的日志落到了哪台Node上都不知道,更别说查了。我们翻来覆去找了半天,只找到了少量还在运行的Pod的当前日志,完全不够用。
那次故障的根因到现在也没完全搞清楚,就是因为缺少完整的日志。那件事之后,我们把日志收集列为了基础设施建设的最高优先级。
今天把Fluent Bit + ELK的完整日志收集方案写出来,这是目前最主流也最轻量的K8s日志收集方案。
一、核心问题分析
K8s日志收集的三种模式
节点级日志代理(Node-level logging agent):在每个Node上部署一个日志代理(DaemonSet),代理负责收集该Node上所有容器的日志,转发到中央存储。这是最常见的方案,Fluent Bit/Fluentd都采用这种模式。
Sidecar模式:在每个Pod里加一个日志收集容器,专门负责收集主容器的日志。灵活性高,但资源消耗较大,适合日志格式复杂需要定制化处理的服务。
直接写外部存储:应用直接向日志系统(如ES)写日志,无需代理。对应用有侵入性,一般不推荐。
Fluent Bit vs Fluentd的选择
两者都是CNCF项目,Fluentd是前辈,生态更丰富;Fluent Bit是后起之秀,专为嵌入式和资源受限环境设计,内存占用约为Fluentd的1/3,性能更好。
对于K8s节点级日志代理,Fluent Bit是更好的选择:内存占用约5~35MB(vs Fluentd的300MB+),CPU消耗低,原生支持K8s的metadata enrichment(自动给日志添加Pod名、Namespace、Container名等K8s元数据)。
二、原理深度解析
Fluent Bit的数据流水线
三、完整配置实现
Fluent Bit DaemonSet完整部署
# fluent-bit-deployment.yaml
---
# ServiceAccount和RBAC(需要读取K8s元数据)
apiVersion: v1
kind: ServiceAccount
metadata:
name: fluent-bit
namespace: logging
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: fluent-bit-read
rules:
- apiGroups: [""]
resources: ["namespaces", "pods", "nodes", "nodes/proxy"]
verbs: ["get", "list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: fluent-bit-read
subjects:
- kind: ServiceAccount
name: fluent-bit
namespace: logging
roleRef:
kind: ClusterRole
name: fluent-bit-read
apiGroup: rbac.authorization.k8s.io
---
# Fluent Bit配置文件
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
# 主配置文件
fluent-bit.conf: |
[SERVICE]
Flush 5
Daemon Off
Log_Level warn
Parsers_File parsers.conf
HTTP_Server On
HTTP_Listen 0.0.0.0
HTTP_Port 2020
# 用于Prometheus采集Fluent Bit自身指标
storage.metrics on
# =============================================
# INPUT:读取容器日志
# =============================================
[INPUT]
Name tail
Tag kube.*
Path /var/log/containers/*.log
# 排除不需要收集的日志(日志系统自身、监控等)
Exclude_Path /var/log/containers/fluent-bit-*,/var/log/containers/prometheus-*
Parser docker
# 记录文件读取位置,防止重启后重复收集
DB /var/log/flb_kube.db
Mem_Buf_Limit 50MB
Skip_Long_Lines On
Refresh_Interval 10
# 多行日志合并(Java异常堆栈)
Multiline.parser docker, cri
# =============================================
# FILTER:添加K8s元数据
# =============================================
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Merge_Log_Key log_processed
# 保留原始log字段
Keep_Log Off
# 添加K8s元数据
Labels On
Annotations Off # 一般不需要Annotation,减少日志体积
# 为Java日志添加额外标记
[FILTER]
Name modify
Match kube.*
Add cluster_name production-k8s
Add environment production
# =============================================
# OUTPUT:发送到Elasticsearch
# =============================================
[OUTPUT]
Name es
Match kube.*
Host elasticsearch-service.logging.svc.cluster.local
Port 9200
# 如果有认证
# HTTP_User elastic
# HTTP_Passwd ${ES_PASSWORD}
# TLS配置
# TLS On
# TLS.Verify On
# TLS.CA_File /etc/ssl/certs/es-ca.crt
Logstash_Format On
Logstash_Prefix java-logs
# 按日期分索引:java-logs-2024.01.15
Logstash_DateFormat %Y.%m.%d
Retry_Limit 5
Replace_Dots On
# 批量发送,提高效率
Buffer_Size 5MB
Workers 4
# 日志解析器配置
parsers.conf: |
[PARSER]
Name docker
Format json
Time_Key time
Time_Format %Y-%m-%dT%H:%M:%S.%L
Time_Keep On
[PARSER]
Name java_log
Format regex
Regex ^(?<time>\d{4}-\d{2}-\d{2}\s\d{2}:\d{2}:\d{2}.\d{3})\s+(?<level>[A-Z]+)\s+(?<pid>\d+)\s+---\s+\[(?<thread>[^\]]+)\]\s+(?<logger>[^\s]+)\s+:\s+(?<message>.*)$
Time_Key time
Time_Format %Y-%m-%d %H:%M:%S.%L
---
# DaemonSet:每个Node部署一个Fluent Bit实例
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: fluent-bit
namespace: logging
labels:
k8s-app: fluent-bit-logging
spec:
selector:
matchLabels:
k8s-app: fluent-bit-logging
template:
metadata:
labels:
k8s-app: fluent-bit-logging
spec:
serviceAccountName: fluent-bit
# 容忍所有Node的污点,确保所有Node都有日志收集
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
- operator: "Exists"
effect: "NoSchedule"
containers:
- name: fluent-bit
image: cr.fluentbit.io/fluent/fluent-bit:2.2.2
ports:
- containerPort: 2020
name: http-plugin
env:
- name: ES_PASSWORD
valueFrom:
secretKeyRef:
name: es-secret
key: password
optional: true
volumeMounts:
# 挂载容器日志目录
- name: varlog
mountPath: /var/log
# 挂载容器数据目录(Docker/containerd的容器文件系统)
- name: varlibdockercontainers
mountPath: /var/lib/docker/containers
readOnly: true
# Fluent Bit配置
- name: fluent-bit-config
mountPath: /fluent-bit/etc/
# 日志读取位置DB(防重复收集)
- name: flb-db
mountPath: /var/log/flb_kube.db
resources:
requests:
cpu: "50m"
memory: "32Mi"
limits:
cpu: "200m"
memory: "128Mi"
livenessProbe:
httpGet:
path: /
port: 2020
initialDelaySeconds: 10
periodSeconds: 30
volumes:
- name: varlog
hostPath:
path: /var/log
- name: varlibdockercontainers
hostPath:
path: /var/lib/docker/containers
- name: fluent-bit-config
configMap:
name: fluent-bit-config
- name: flb-db
hostPath:
path: /var/log/flb_kube.db
type: FileOrCreate
# 允许访问宿主机路径
hostNetwork: false
dnsPolicy: ClusterFirst
terminationGracePeriodSeconds: 10Elasticsearch + Kibana部署(简化版)
# elasticsearch-kibana.yaml
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elasticsearch
namespace: logging
spec:
serviceName: elasticsearch
replicas: 1 # 生产建议3节点
selector:
matchLabels:
app: elasticsearch
template:
metadata:
labels:
app: elasticsearch
spec:
# ES需要调整vm.max_map_count
initContainers:
- name: fix-permissions
image: busybox:1.36
command: ["sysctl", "-w", "vm.max_map_count=262144"]
securityContext:
privileged: true
containers:
- name: elasticsearch
image: elasticsearch:8.11.0
env:
- name: discovery.type
value: single-node
- name: ES_JAVA_OPTS
value: "-Xms2g -Xmx2g"
- name: xpack.security.enabled
value: "false" # 生产建议开启
ports:
- containerPort: 9200
- containerPort: 9300
resources:
requests:
cpu: "500m"
memory: "3Gi"
limits:
cpu: "2000m"
memory: "4Gi"
volumeMounts:
- name: data
mountPath: /usr/share/elasticsearch/data
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: alicloud-disk-efficiency
resources:
requests:
storage: 500GiSpring Boot的结构化日志配置
为了让日志在ES里能被高效检索,Spring Boot要输出JSON格式的结构化日志:
<!-- logback-spring.xml -->
<?xml version="1.0" encoding="UTF-8"?>
<configuration>
<springProfile name="prod">
<appender name="STDOUT" class="ch.qos.logback.core.ConsoleAppender">
<encoder class="net.logstash.logback.encoder.LogstashEncoder">
<!-- 添加自定义字段 -->
<customFields>{"service":"order-service","version":"${APP_VERSION:-unknown}"}</customFields>
<!-- 包含MDC字段(如traceId) -->
<includeMdcKeyName>traceId</includeMdcKeyName>
<includeMdcKeyName>spanId</includeMdcKeyName>
<includeMdcKeyName>userId</includeMdcKeyName>
</encoder>
</appender>
<root level="INFO">
<appender-ref ref="STDOUT"/>
</root>
</springProfile>
</configuration>对应的依赖:
<dependency>
<groupId>net.logstash.logback</groupId>
<artifactId>logstash-logback-encoder</artifactId>
<version>7.4</version>
</dependency>四、生产最佳实践
日志分级收集策略
不是所有日志都值得收集到ES,按成本和价值分级:
全量收集到ES:错误日志(ERROR/WARN级别)、审计日志、业务关键操作日志。
抽样收集:INFO级别的访问日志(生产繁忙服务可以抽10%收集,剩余直接丢弃),减少ES存储压力。
只在本地保留:DEBUG级别日志,通过kubectl logs查看,不收集到ES。
在Fluent Bit里实现日志过滤:
[FILTER]
Name grep
Match kube.*
# 丢弃DEBUG日志
Exclude $log_processed['level'] DEBUGES索引生命周期管理
日志量大的情况下,ES的存储很快会撑爆。配置ILM(Index Lifecycle Management)自动管理:
PUT _ilm/policy/java-logs-policy
{
"policy": {
"phases": {
"hot": {
"actions": {
"rollover": {
"max_age": "1d",
"max_size": "50gb"
}
}
},
"warm": {
"min_age": "7d",
"actions": {
"forcemerge": {"max_num_segments": 1},
"allocate": {"number_of_replicas": 0}
}
},
"delete": {
"min_age": "30d",
"actions": {
"delete": {}
}
}
}
}
}五、踩坑实录
坑一:Java异常堆栈被切成多行日志
Java的异常堆栈是多行文本,但容器日志每行独立上报,导致ES里一个异常被拆成十几条日志记录,查起来非常不方便。
解决方案是在Fluent Bit里配置多行日志合并:
[INPUT]
Name tail
...
# 多行模式:以时间戳开头的行是新日志的开始,否则属于上一条日志
Multiline.parser multiline-java
[MULTILINE_PARSER]
name multiline-java
type regex
flush_timeout 1000
# 以日期时间开头的行是新日志的第一行
rule "start_state" "/^\d{4}-\d{2}-\d{2}/" "java_after_timestamp"
# 不以日期时间开头的行属于上一条日志(堆栈跟踪)
rule "java_after_timestamp" "/^[^\d]/" "java_after_timestamp"坑二:Fluent Bit DB文件导致磁盘使用率飙升
Fluent Bit用SQLite DB记录日志文件的读取位置。在日志量大的集群上,这个DB文件会持续增大,长时间运行后可能达到几个GB,占满Node磁盘。
解决方案:定期压缩DB,或者改用更轻量的mem模式(重启后会重新读取,可能有少量重复日志):
[INPUT]
Name tail
...
# 使用内存存储offset,而不是DB文件
# 重启后会从文件末尾开始读
DB.sync normal
DB.locking true坑三:ES集群磁盘满了,导致索引变成只读
ES磁盘使用率超过85%后,会自动把索引设置为只读,Fluent Bit日志写入失败,堆积在内存里最终OOM。
临时解决:手动解除只读锁并删除旧索引:
# 解除只读锁
curl -X PUT "es:9200/_settings" -H 'Content-Type: application/json' \
-d '{"index.blocks.read_only_allow_delete": null}'
# 删除最旧的索引
curl -X DELETE "es:9200/java-logs-2024.01.*"长期解决:配置ILM策略自动删除旧索引,提前配置磁盘使用率告警(建议70%时告警)。
六、总结
K8s日志收集的最佳实践是:Fluent Bit DaemonSet采集所有容器日志,通过K8s metadata filter自动添加Pod/Namespace等上下文信息,发送到Elasticsearch存储,Kibana查询和可视化。
对于Java服务,强烈建议输出JSON格式的结构化日志(用logstash-logback-encoder),这样日志在ES里可以直接按字段检索,不需要额外的解析规则。同时,把traceId写入MDC,日志收集后可以按请求链路聚合,大幅提升排查效率。
