格式有些乱,同学们可以看下面这俩链接:
https://bitnoise.s3-ap-northeast-1.amazonaws.com/index.html
最近有些日志需要进行多行合并收集,就开始研究 filebeat 的多行配置,期间遇到个问题。对 filebeat 配置熟悉的同学可以直接跳到下面的 问题
环节。
使用版本:
filebeat version 6.1.4 (amd64), libbeat 6.1.4
官方文档:
https://www.elastic.co/guide/en/beats/filebeat/6.1/multiline-examples.html
https://www.elastic.co/guide/en/beats/filebeat/6.1/configuration-filebeat-options.html
下面列出几个重要的参数
include_lines
If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by
include_lines
.A list of regular expressions to match the lines that you want Filebeat to include. Filebeat exports only the lines that match a regular expression in the list. By default, all lines are exported.
If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by
include_lines
.The following example configuresFilebeat to export any lines that start with "ERR" or "WARN":
filebeat.prospectors:
- paths:
- /var/log/myapp/*.log
include_lines: ['^ERR', '^WARN']
恶魔的低语:符合列表中任意一个正则的就会被收集。
multiline.pattern
Specifies the regular expression pattern to match. Note that the regexp patterns supported by Filebeat differ somewhat from the patterns supported by Logstash. See Regular expression support for a list of supported regexp patterns. Depending on how you configure other multiline options, lines that match the specified regular expression are considered either continuations of a previous line or the start of a new multiline event. You can set the
negate
option to negate the pattern.
恶魔的低语:符合此正则的行,将进行多行合并,合并规则取决于下面的两个参数
multiline.max_lines
The maximum number of lines that can be combined into one event. If the multiline message contains more than
max_lines
, any additional lines are discarded. The default is 500.
恶魔的低语:可合并的最大行数,超过则被忽略,默认为 500,避免合并过多日志
multiline.timeout
After the specified timeout, Filebeat sends the multiline event even if no new pattern is found to start a new event. The default is 5s.
恶魔的低语:多行匹配超时时间,超过此时间后的当前多行匹配事件将停止并发送,然后开始一个新的多行匹配事件,默认 5 秒
multiline.negate
Defines whether the pattern is negated. The default is
false
.
恶魔的低语:看下面结合 match
的分析
multiline.match
Specifies how Filebeat combines matching lines into an event. The settings are
after
orbefore
. The behavior of these settings depends on what you specify fornegate
:
恶魔的低语:我第一次看上面的 multiline
相关设置的时候,真的被搞懵逼了。
下面是自己对 multiline.match
的一个总结(顺序同上图):
第一种:
negate: false
match: after
将符合正则的行,与前一个不符合正则的行合并为一行。
不匹配 pattern 的行
匹配 pattern 的行
匹配 pattern 的行
第二种:
negate: false
match: before
将符合正则的行,与后面一个不符合正则的行合并为一行。
匹配 pattern 的行
匹配 pattern 的行
不匹配 pattern 的行
第三种:
negate: true
match: after
将不符合正则的行,与前一个符合正则的行合并为一行。
匹配 pattern 的行
不匹配 pattern 的行
不匹配 pattern 的行
第四种:
negate: true
match: before
将不符合正则的行,与后一个符合正则的行合并为一行。
不匹配 pattern 的行
不匹配 pattern 的行
匹配 pattern 的行
include_lines
与 multiline.pattern
一同使用时,与预期产生的效果不同。以下列举了 3 个实例,请关注下 期望收集结果
和 实际收集结果
的差别。
include_lines
官方文档中有一条:
If multiline is also specified, each multiline message is combined into a single line before the lines are filtered by
include_lines
.
我的理解:
include_lines
与 multiline.pattern
一同使用时,程序先把符合 multiline.pattern
的行,按规则进行合并,再由 include_lines
过滤,得到一行。include_lines
正则的行会以单一行进行收集。filebeat.yml 文件配置(无其他配置):
filebeat.prospectors:
- type: log
enabled: true
paths:
- /var/log/i.log
include_lines: ['error']
multiline.pattern: "errorA"
multiline.negate: true
multiline.match: after
output.file:
path: "/var/log"
filename: o.log
echo 'A
B
C
errorA TEST1
1
2
3
' >>/var/log/i.log
errorA TEST1
1
2
3
errorA TEST1
1
2
3
源数据
{"@timestamp":"2019-06-04T10:21:37.332Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.4"},"beat":{"name":"localhost.localdomain","hostname":"localhost.localdomain","version":"6.1.4"},"source":"/var/log/i.log","offset":26,"message":"errorA TEST1\n1\n2\n3\n","prospector":{"type":"log"}}
正确
echo 'A
B
C
error TEST2
1
2
3
' >>/var/log/i.log
error TEST2
A
B
C
error TEST2
1
2
3
源数据
{"@timestamp":"2019-06-04T10:22:22.336Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.4"},"message":"A\nB\nC\nerror TEST2\n1\n2\n3\n","source":"/var/log/i.log","offset":51,"prospector":{"type":"log"},"beat":{"name":"localhost.localdomain","hostname":"localhost.localdomain","version":"6.1.4"}}
错误
echo 'error TEST3
error TEST3
error TEST3
error TEST3
' >>/var/log/i.log
4 条
error TEST3
error TEST3
error TEST3
error TEST3
1 条
error TEST3
error TEST3
error TEST3
error TEST3
源数据
{"@timestamp":"2019-06-04T10:23:37.342Z","@metadata":{"beat":"filebeat","type":"doc","version":"6.1.4"},"prospector":{"type":"log"},"beat":{"name":"localhost.localdomain","hostname":"localhost.localdomain","version":"6.1.4"},"source":"/var/log/i.log","offset":112,"message":"error TEST3\nerror TEST3\nerror TEST3\nerror TEST3\nerror TEST3\n"}
错误
include_lines
与 multiline.pattern
一同使用时,程序先把符合 multiline.pattern
的行,按规则进行合并,再由 include_lines
过滤,得到一行。include_lines
的行也会进行多行合并,它会与前后行都进行合并(无视 negate
和 match
规则),得到一行。include_lines
,除了 multiline.pattern
会被优先合并外,其他所有行都将被合并为一行(等效于设置了: include_lines: ['.*'])。恶魔的低语: 这样造成很多不必要的多余内容收集,有什么办法吗?分成两个配置?
之前的结论是错误的。请看下面新的解释。
include_lines 与 multiline.pattern 一同使用时,程序先把符合 multiline 规则的行合并为一行。再检测该行是否符合 include_lines, 符合规则的行才被收集。
若符合合并规则的行无法在整个文本中找到终止行,则在达到 multiline.max_lines 或 multiline.timeout 条件后,自动合并为一行。合并后仍然会被 include_lines 进行一次过滤,符合才被收集。
[输入样本2] 多行合并的规则: 将不包含 errorA 的行,续接到前一个包含 errorA 行的后面,作为一行。若合并后包含 error ,则被收集。
[输入样本2] 样本:
echo 'A
B
C
error TEST2
1
2
3
' >>/var/log/i.log
程序优先执行多行合并,整个样本中没有 errorA 文本,多行合无法找到终止行(没有设置 multiline.max_lines 或 multiline.timeout )。所有文本被归为一行,且该行符合 include_lines 因此被作为一条日志收集了 。 [输入样本3] 同理,程序没有终止行,日志合并会优先执行,将所有信息合并为一行。