fluent-plugin-sensu
Fluentd output plugin to send check results to sensu-client, which is an agent process of Sensu monitoring framework.
Configuration
Example configuration
<match ddos>
type sensu
# Connection settings
server localhost
port 3030
# Payload settings
## The check is named "ddos_detection"
check_name ddos_detection
## The severity is read from the field "level"
check_status_field level
</match>Plugin type
Tye type of this plugin is sensu.
Specify type sensu in the match section.
Connection setting
-
server(default is "localhost")- The IP address or the hostname of the host running sensu-client daemon.
-
port(default is 3030)- The TCP port number of the Sensu client socket on which sensu-client daemon is listening.
Check payload setting
The payload of a check result is a JSON object which contains attributes as follows. Attributes are indicated by JSONPath expressions.
-
$.name- The check name to identify the check.
-
$.output- An arbitrary string to describe the check result.
- This attribute is often used to contain metric values.
-
$.status- The severity of the check result.
- 0 (OK), 1 (WARNING), 2 (CRITICAL) or 3 (UNKNOWN or CUSTOM).
The check result can also contain other attributes. This plugin supports the attributes below.
-
$.type- Either "standard" or "metric".
- If the attribute is set to "standard", the sensu-server creates an event only when the status is not OK or when the status is changed to OK.
- If the attribute is set to "metric", the sensu-server creates an event even if the status is OK. It is useful when Sensu sends check results to metrics collectors such as Graphite.
-
$.ttl- The time to live (TTL) in seconds, until the check result is considered stale. If TTL expires, sensu-server creates an event.
- This attribute is useful when you want to be notified when logs are not output for a certain period.
- Same as
freshness_thresholdin Nagios.
-
$.handlers- The names of handlers which process events created for the check.
-
$.low_flap_thresholdand$.high_flap_threshold- Threshold percentages to determine the status is considered "flapping," or the state is changed too frequently.
- Same as the options in Nagios. See the description of Flap Detection in Nagios.
-
$.source- The source of the check, such as servers or network switches.
- If this attribute is not specified, the host of sensu-client is considered as the source.
-
$.executed- The timestamp on which the check is executed.
- Note that there is also another timestamp attribute named
issued, which is automatically measured by sensu-client process. Uchiwa, the default dashboard of Sensu, displaysissuedas the timestamp of check results.
This plugin additionally adds "fluentd" attribute to the check result. The value of the attribute is a JSON object whoes elements are input to the plugin.
-
$.fluentd.tag- The tag of the Fluentd data.
-
$.fluentd.time- The time of the Fluentd data, in seconds since the Unix epoch.
-
$.fluentd.record- The record of the Fluentd data.
name attribute
The check name is determined as below.
- The field specified by
check_name_fieldoption, if present and valid (highest priority)
- The valid values are strings composed of ASCII alphanumerics, underscores, periods, and hyphens.
- or
check_nameoption, if present
- The valid values are same as above.
- or the tag name, if valid
- The valid values are same as above.
- or "fluent-plugin-sensu" (lowest priority)
output attribute
The check output is determined as below.
- The field specified by
check_output_fieldoption, if present (highest priority) - or
check_outputoption, if present - or JSON notation of the record (lowest priority)
status attribute
The severity of the check result is determined as below.
- The field specified by
check_status_fieldoption, if present and permitted (highest priority)
- The values permitted to the field for each status (case insensitive):
- status 0: an integer
0and strings"0","OK" - status 1: an integer
1and strings"1","WARNING","warn" - status 2: an integer
2and strings"2","CRITICAL","crit" - status 3: an integer
3and strings"3","UNKNOWN","CUSTOM"
- status 0: an integer
- or
check_statusoption, if present
- The permitted values for each status (case insensitive):
- status 0:
0andOK - status 1:
1,WARNING,warn - status 2:
2,CRITICAL,crit - status 3:
3,UNKNOWN,CUSTOM
- status 0:
- If the value is not permitted, it causes a configuration error.
- or
3, which means UNKNOWN or CUSTOM (lowest priority)
"warn" and "crit" come from fluent-plugin-notifier.
type attribute
The check type is determined as below.
-
check_typeoption (highest priority)
- The value must be a string
"standard"or"metric".
- or "standard" (lowest priority)
ttl attribute
The TTL seconds till expiration is determined as below.
-
check_ttloption (highest priority)
- The value must be an integer which represents the TTL seconds.
- or N/A (lowest priority)
- It means no expiration detection is performed.
handlers attributes
The handlers which process check results are determined as below.
-
check_handlersoption (highest priority)
- The value must be an array of strings which represent handler names.
- or
["default"](lowest priority)
low_flap_threshold and high_flap_threshold attributes
The threshold percentages for flap detection are determined as below.
-
check_low_flap_thresholdandcheck_high_flap_thresholdoptions (highest priority)
- The values must be integers of threshold percentages.
- or N/A (lowest priority)
- It means no flap detection is performed.
The two options either must be specified together, not specified at all.
If the options are specified,
the following condition must be true:
0 <= check_low_flap_threshold <= check_high_flap_threshold <= 100.
source attribute
The source of the checks is determined as below.
- The field specified by
check_source_fieldoption, if present and valid (highest priority) - or
check_sourceoption - or N/A (lowest priority)
- It means the host of sensu-client is considered as the check source.
executed attribute
The executed timestamp is determined as below.
- The field specified by
check_executed_fieldif present and valid (highest priority)
- The value must be an integer which represents seconds since the Unix epoch.
- The time of the Fluentd record (lowest priority)
Buffering
The default value of flush_interval option is set to 1 second.
It means that check results are delayed at most 1 second
before being sent.
Except for flush_interval,
the plugin uses default options
for buffered output plugins (defined in Fluent::BufferedOutput class).
You can override buffering options in the configuration. For example:
<match ddos>
type sensu
...snip...
buffer_type file
buffer_path /var/lib/fluentd/buffer/ddos
flush_interval 0.1
try_flush_interval 0.1
</match>Use case: "too many server errors" alert
Situation
Assume you have a web server which runs:
- Apache HTTP server
- Fluentd
- sensu-client
- which listens to the TCP port 3030 for Sensu client socket.
You want to be notified when Apache responds too many server errors, for example 5 errors per minute as WARNING, and 50 errors per minute as CRITICAL.
Configuration
The setting for Fluentd utilizes fluent-plugin-datacounter, fluent-plugin-record-reformer, and of course fluent-plugin-sensu. Install those plugins and add configuration as below.
# Parse Apache access log
<source>
type tail
tag access
format apache2
# The paths vary by setup
path /var/log/httpd/access_log
pos_file /var/lib/fluentd/pos/httpd-access_log.pos
</source>
# Count 5xx errors per minute
<match access>
type datacounter
tag count.access
unit minute
aggregate all
count_key code
pattern1 error ^5\d\d$
</match>
# Calculate the severity level
<match count.access>
type record_reformer
tag server_errors
enable_ruby true
<record>
level ${error_count < 5 ? 'OK' : error_count < 50 ? 'WARNING' : 'CRITICAL'}
</record>
</match>
# Send checks to sensu-client
<match server_errors>
type sensu
server localhost
port 3030
check_name server_errors
check_type standard
check_status_field level
check_ttl 100
</match>The TTL is set to 100 seconds here, because the check must be sent for each 60 seconds, plus 40 seconds as a margin.
Alternatives
You can use record_transformer filter
instead of fluent-plugin-record-reformer
on Fluentd 0.12.0 and above.
If you are concerned with scalability, fluent-plugin-norikra may be a better option than datacounter and record_reformer.
Another alternative configuration for the use case is sending the error count to Graphite using fluent-plugin-graphite, and making Sensu monitor the value on Graphite with check-data.rb.
Installation
Install fluent-plugin-sensu gem.
Contributing
Submit an issue or a pull request.
Feedback to @miyakawa_taku on Twitter is also welcome.