From 1c816ac9ad2eb3f68095e94f5fcc6946af423f29 Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Wed, 5 Apr 2017 19:49:00 +0200 Subject: [PATCH] Update documentation (troubleshooting, monitor Icinga 2, configs, integrations, etc.) fixes #5137 fixes #5140 fixes #1880 fixes #5142 fixes #5144 --- doc/11-cli-commands.md | 46 ++++--- doc/12-icinga2-api.md | 9 +- doc/14-features.md | 17 ++- doc/15-troubleshooting.md | 245 ++++++++++++++++++++++++++++++---- doc/18-library-reference.md | 4 +- doc/4-configuring-icinga-2.md | 71 +++++++--- doc/5-service-monitoring.md | 3 + doc/8-advanced-topics.md | 165 +++++++++++++++++------ 8 files changed, 456 insertions(+), 104 deletions(-) diff --git a/doc/11-cli-commands.md b/doc/11-cli-commands.md index 9b3301f99..1068be4f6 100644 --- a/doc/11-cli-commands.md +++ b/doc/11-cli-commands.md @@ -3,7 +3,7 @@ Icinga 2 comes with a number of CLI commands which support bash autocompletion. These CLI commands will allow you to use certain functionality -provided by and around the Icinga 2 daemon. +provided by and around Icinga 2. Each CLI command provides its own help and usage information, so please make sure to always run them with the `--help` parameter. @@ -84,13 +84,13 @@ options. Bash Auto-Completion (pressing ``) is provided only for the corresponding context. -While `--config` will suggest and auto-complete files and directories on disk, -`feature enable` will only suggest disabled features. Try it yourself. +While `--config` suggests and auto-completes files and directories on disk, +`feature enable` only suggests disabled features. RPM and Debian packages install the bash completion files into `/etc/bash_completion.d/icinga2`. -You will need to install the `bash-completion` package if not already installed. +You need to install the `bash-completion` package if not already installed. RHEL/CentOS/Fedora: @@ -117,11 +117,13 @@ into your current session and test it: By default the `icinga2` binary loads the `icinga` library. A different application type can be specified with the `--app` command-line option. +Note: This is not needed by the average Icinga user, only developers. ### Libraries Instead of loading libraries using the [`library` config directive](17-language-reference.md#library) you can also use the `--library` command-line option. +Note: This is not needed by the average Icinga user, only developers. ### Constants @@ -135,8 +137,8 @@ brackets like this: include -This would cause Icinga 2 to search its include path for the configuration file -`test.conf`. By default the installation path for the Icinga Template Library +This causes Icinga 2 to search its include path for the configuration file +`test.conf`. By default the installation path for the [Icinga Template Library](10-icinga-template-library.md#icinga-template-library) is the only search directory. Using the `--include` command-line option additional search directories can be @@ -145,11 +147,11 @@ added. ## CLI command: Console -The CLI command `console` can be used to evaluate Icinga 2 config expressions, e.g. to test -[functions](17-language-reference.md#functions). +The CLI command `console` can be used to debug and evaluate Icinga 2 config expressions, +e.g. to test [functions](17-language-reference.md#functions) in your local sandbox. $ icinga2 console - Icinga 2 (version: v2.4.0) + Icinga 2 (version: v2.6.0) <1> => function test(name) { <1> .. log("Hello " + name) <1> .. } @@ -159,6 +161,7 @@ The CLI command `console` can be used to evaluate Icinga 2 config expressions, e null <3> => +Further usage examples can be found in the [library reference](18-library-reference.md#library-reference) chapter. On operating systems without the `libedit` library installed there is no support for line-editing or a command history. However you can @@ -166,18 +169,23 @@ use the `rlwrap` program if you require those features: $ rlwrap icinga2 console -The `console` can be used to connect to a running Icinga 2 instance using +The debug console can be used to connect to a running Icinga 2 instance using the [REST API](12-icinga2-api.md#icinga2-api). [API permissions](12-icinga2-api.md#icinga2-api-permissions) are required for executing config expressions and auto-completion. > **Note** -> The console does not currently support SSL certificate verification. +> +> The debug console does not currently support SSL certificate verification. +> +> Runtime modifications are not validated and might cause the Icinga 2 +> daemon to crash or behave in an unexpected way. Use these runtime changes +> at your own risk and rather *inspect and debug objects read-only*. You can specify the API URL using the `--connect` parameter. Although the password can be specified there process arguments on UNIX platforms are usually visible to other users (e.g. through `ps`). In order to securely specify the -user credentials the console supports two environment variables: +user credentials the debug console supports two environment variables: Environment variable | Description ---------------------|------------- @@ -248,7 +256,7 @@ Here's an example that retrieves the command that was used by Icinga to check th ## CLI command: Daemon The CLI command `daemon` provides the functionality to start/stop Icinga 2. -Furthermore it provides the [configuration validation](11-cli-commands.md#config-validation). +Furthermore it allows to run the [configuration validation](11-cli-commands.md#config-validation). # icinga2 daemon --help icinga2 - The Icinga 2 network monitoring daemon (version: v2.6.0) @@ -286,7 +294,7 @@ Furthermore it provides the [configuration validation](11-cli-commands.md#config ### Config Files -Using the `--config` option you can specify one or more configuration files. +You can specify one or more configuration files with the `--config` option. Configuration files are processed in the order they're specified on the command-line. When no configuration file is specified and the `--no-config` is not used @@ -295,7 +303,7 @@ Icinga 2 automatically falls back to using the configuration file ### Config Validation -The `--validate` option can be used to check if your configuration files +The `--validate` option can be used to check if configuration files contain errors. If any errors are found, the exit status is 1, otherwise 0 is returned. More details in the [configuration validation](11-cli-commands.md#config-validation) chapter. @@ -374,9 +382,15 @@ nodes in a [distributed monitoring](6-distributed-monitoring.md#distributed-moni ## CLI command: Object The `object` CLI command can be used to list all configuration objects and their -attributes. The command also shows where each of the attributes was modified. +attributes. The command also shows where each of the attributes was modified and as such +provides debug information for further configuration problem analysis. That way you can also identify which objects have been created from your [apply rules](17-language-reference.md#apply). +Runtime modifications via the [REST API](12-icinga2-api.md#icinga2-api-config-objects) +are not immediately updated. Furthermore there is a known issue with +[group assign expressions](17-language-reference.md#group-assign) which are not reflected in the host object output. +You need to restart Icinga 2 in order to update the `icinga2.debug` cache file. + More information can be found in the [troubleshooting](15-troubleshooting.md#list-configuration-objects) section. # icinga2 object --help diff --git a/doc/12-icinga2-api.md b/doc/12-icinga2-api.md index e47195eb9..c35b02fb2 100644 --- a/doc/12-icinga2-api.md +++ b/doc/12-icinga2-api.md @@ -1612,6 +1612,11 @@ The following parameters need to be specified (either as URL parameters or in a The [API permission](12-icinga2-api.md#icinga2-api-permissions) `console` is required for executing expressions. +> **Note** +> +> Runtime modifications via `execute-script` calls are not validated and might cause the Icinga 2 +> daemon to crash or behave in an unexpected way. Use these runtime changes at your own risk. + If you specify a session identifier, the same script context can be reused for multiple requests. This allows you to, for example, set a local variable in a request and use that local variable in another request. Sessions automatically expire after a set period of inactivity (currently 30 minutes). Example for fetching the command line from the local host's last check result: @@ -1695,7 +1700,9 @@ The Windows installer already includes Icinga Studio. On Debian and Ubuntu the p ### Icinga 2 Console -By default the [console CLI command](11-cli-commands.md#cli-command-console) evaluates expressions in a local interpreter, i.e. independently from your Icinga 2 daemon. Using the `--connect` parameter you can use the Icinga 2 console to evaluate expressions via the API. +By default the [console CLI command](11-cli-commands.md#cli-command-console) evaluates +expressions in a local interpreter, i.e. independently from your Icinga 2 daemon. +Add the `--connect` parameter to debug and evaluate expressions via the API. ### API Clients Programmatic Examples diff --git a/doc/14-features.md b/doc/14-features.md index 5df18951a..27aba9f3a 100644 --- a/doc/14-features.md +++ b/doc/14-features.md @@ -29,7 +29,7 @@ platforms. This configuration ensures that the `icinga2.log`, `error.log` and The IDO (Icinga Data Output) modules for Icinga 2 take care of exporting all configuration and status information into a database. The IDO database is used -by a number of projects including Icinga Web 1.x and 2. +by Icinga Web 2. Details on the installation can be found in the [Configuring DB IDO](2-getting-started.md#configuring-db-ido-mysql) chapter. Details on the configuration can be found in the @@ -336,7 +336,9 @@ expects the InfluxDB daemon to listen at `127.0.0.1` on port `8086`. More configuration details can be found [here](9-object-types.md#objecttype-influxdbwriter). -### GELF Writer +### Graylog Integration + +#### GELF Writer The `Graylog Extended Log Format` (short: [GELF](http://www.graylog2.org/resources/gelf)) can be used to send application logs directly to a TCP socket. @@ -358,7 +360,16 @@ Currently these events are processed: * State changes * Notifications -### Logstash Writer +### Elastic Stack Integration + +[Icingabeat](https://github.com/icinga/icingabeat) is an Elastic Beat that fetches data +from the Icinga 2 API and sends it either directly to Elasticsearch or Logstash. + +More integrations in development: +* [Logstash output](https://github.com/Icinga/logstash-output-icinga) for the Icinga 2 API. +* [Logstash Grok Pattern](https://github.com/Icinga/logstash-grok-pattern) for Icinga 2 logs. + +#### Logstash Writer [Logstash](https://www.elastic.co/products/logstash) receives and processes event messages sent by Icinga 2 and the [LogstashWriter](9-object-types.md#objecttype-logstashwriter) diff --git a/doc/15-troubleshooting.md b/doc/15-troubleshooting.md index 6d7c3efcb..d372b6357 100644 --- a/doc/15-troubleshooting.md +++ b/doc/15-troubleshooting.md @@ -1,19 +1,111 @@ # Icinga 2 Troubleshooting -## Which information is required +## Required Information -* Run `icinga2 troubleshoot` to collect required troubleshooting information -* Alternative, manual steps: +Please ensure to provide any detail which may help reproduce and understand your issue. +Whether you ask on the community channels or you create an issue at [GitHub](https://github.com/Icinga), make sure +that others can follow your explanations. If necessary, draw a picture and attach it for +better illustration. This is especially helpful if you are troubleshooting a distributed +setup. + +We've come around many community questions and compiled this list. Add your own +findings and details please. + +* Describe the expected behavior in your own words. +* Describe the actual behavior in one or two sentences. +* Ensure to provide general information such as: + * How was Icinga 2 installed (and which repository in case) and which distribution are you using * `icinga2 --version` * `icinga2 feature list` - * `icinga2 daemon --validate` - * Relevant output from your main and debug log ( `icinga2 object list --type='filelogger'` ) - * The newest Icinga 2 crash log if relevant - * Your icinga2.conf and, if you run multiple Icinga 2 instances, your zones.conf -* How was Icinga 2 installed (and which repository in case) and which distribution are you using -* Provide complete configuration snippets explaining your problem in detail -* If the check command failed, what's the output of your manual plugin tests? -* In case of [debugging](20-development.md#development) Icinga 2, the full back traces and outputs + * `icinga2 daemon -C` + * [Icinga Web 2](https://www.icinga.com/products/icinga-web-2/) version (screenshot from System - About) + * [Icinga Web 2 modules](https://www.icinga.com/products/icinga-web-2-modules/) e.g. the Icinga Director (optional) +* Configuration insights: + * Provide complete configuration snippets explaining your problem in detail + * Your [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) file + * If you run multiple Icinga 2 instances, the [zones.conf](4-configuring-icinga-2.md#zones-conf) file (or `icinga2 object list --type Endpoint` and `icinga2 object list --type Zone`) from all affected nodes. +* Logs + * Relevant output from your main and [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) in `/var/log/icinga2`. Please add step-by-step explanations with timestamps if required. + * The newest Icinga 2 crash log if relevant, located in `/var/log/icinga2/crash` +* Additional details + * If the check command failed, what's the output of your manual plugin tests? + * In case of [debugging](20-development.md#development) Icinga 2, the full back traces and outputs + +## Analyze your Environment + +There are many components involved on a server running Icinga 2. When you +analyze a problem, keep in mind that basic system administration knowledge +is also key to identify bottlenecks and issues. + +> **Tip** +> +> [Monitor Icinga 2](8-advanced-topics.md#monitoring-icinga) and use the hints for further analysis. + +* Analyze the system's performance and dentify bottlenecks and issues. +* Collect details about all applications (e.g. Icinga 2, MySQL, Apache, Graphite, Elastic, etc.). +* If data is exchanged via network (e.g. central MySQL cluster) ensure to monitor the bandwidth capabilities too. +* Add graphs and screenshots to your issue description + +Install tools which help you to do so. Opinions differ, let us know if you have any additions here! + +## Analyse your Linux/Unix Environment + +[htop](http://hisham.hm/htop/) is a better replacement for `top` and helps to analyze processes +interactively. + +``` +yum install htop +apt-get install htop +``` + +If you are for example experiencing performance issues, open `htop` and take a screenshot. +Add it to your question and/or bug report. + +Analyse disk I/O performance in Grafana, take a screenshot and obfuscate any sensitive details. +Attach it when posting a question to the community channels. + +The [sysstat](https://github.com/sysstat/sysstat) package provides a number of tools to +analyze the performance on Linux. On FreeBSD you could use `systat` for example. + +``` +yum install htop +apt-get install htop +``` + +Example for `vmstat` (summary of memory, processes, etc.): + +``` +# summary +vmstat -s +# print timestamps, format in MB, stats every 1 second, 5 times +vmstat -t -S M 1 5 +``` + +Example for `iostat`: + +``` +watch -n 1 iostat +``` + +Example for `sar`: +``` +sar //cpu +sar -r //ram +sar -q //load avg +sar -b //I/O +``` + +`sysstat` also provides the `iostat` binary. On FreeBSD you could use `systat` for example. + +If you are missing checks and metrics found in your analysis, add them to your monitoring! + +## Analyze your Windows Environment + +A good tip for Windows are the tools found inside the [Sysinternals Suite](https://technet.microsoft.com/en-us/sysinternals/bb842062.aspx). + +You can also start `perfmon` and analyze specific performance counters. +Keep notes which could be important for your monitoring, and add service +checks later on. ## Enable Debug Output @@ -22,14 +114,14 @@ Enable the `debuglog` feature: # icinga2 feature enable debuglog # service icinga2 restart -You can find the debug log file in `/var/log/icinga2/debug.log`. +The debug log file can be found in `/var/log/icinga2/debug.log`. Alternatively you may run Icinga 2 in the foreground with debugging enabled. Specify the console log severity as an additional parameter argument to `-x`. # /usr/sbin/icinga2 daemon -x notice -The log level can be one of `critical`, `warning`, `information`, `notice` +The [log severity](9-object-types.md#objecttype-filelogger) can be one of `critical`, `warning`, `information`, `notice` and `debug`. ## List Configuration Objects @@ -98,10 +190,16 @@ You can also filter by name and type: [2014-10-15 14:27:19 +0200] information/cli: Parsed 175 objects. +Runtime modifications via the [REST API](12-icinga2-api.md#icinga2-api-config-objects) +are not immediately updated. Furthermore there is a known issue with +[group assign expressions](17-language-reference.md#group-assign) which are not reflected in the host object output. +You need to restart Icinga 2 in order to update the `icinga2.debug` cache file. + + ## Where are the check command definitions? Icinga 2 features a number of built-in [check command definitions](10-icinga-template-library.md#plugin-check-commands) which are -included using +included with include include @@ -123,7 +221,8 @@ for their check result containing the executed shell command. to fetch the checkable object, its check result and the executed shell command. * Alternatively enable the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) and look for the executed command. -Example for a service object query using a [regex match]() on the name: +Example for a service object query using a [regex match](18-library-reference.md#global-functions-regex) +on the name: $ curl -k -s -u root:icinga -H 'Accept: application/json' -H 'X-HTTP-Method-Override: GET' -X POST 'https://localhost:5665/v1/objects/services' \ -d '{ "filter": "regex(pattern, service.name)", "filter_vars": { "pattern": "^http" }, "attrs": [ "__name", "last_check_result" ] }' | python -m json.tool @@ -194,17 +293,99 @@ Fetch all check result events matching the `event.service` name `random`: $ curl -k -s -u root:icinga -X POST 'https://localhost:5665/v1/events?queue=debugchecks&types=CheckResult&filter=match%28%22random*%22,event.service%29' +### Late Check Results + +[Icinga Web 2](https://www.icinga.com/products/icinga-web-2/) provides +a dashboard overview for `overdue checks`. + +The REST API provides the [status] URL endpoint with some generic metrics +on Icinga and its features. + + # curl -k -s -u root:icinga 'https://localhost:5665/v1/status' | python -m json.tool | less + +You can also calculate late check results via the REST API: + +* Fetch the `last_check` timestamp from each object +* Compare the timestamp with the current time and add `check_interval` multiple times (change it to see which results are really late, like five times check_interval) + +You can use the [icinga2 console](11-cli-commands.md#cli-command-console) to connect to the instance, fetch all data +and calculate the differences. More infos can be found in [this blogpost](https://www.icinga.com/2016/08/11/analyse-icinga-2-problems-using-the-console-api/). + + # ICINGA2_API_USERNAME=root ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://localhost:5665/' + + <1> => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res + + [ [ "10807-host!10807-service", "2016-06-10 15:54:55 +0200" ], [ "mbmif.int.netways.de!disk /", "2016-01-26 16:32:29 +0100" ] ] + +Or if you are just interested in numbers, call [len](18-library-reference.md#array-len) on the result array `res`: + + <2> => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res.len() + + 2.000000 + +If you need to analyze that problem multiple times, just add the current formatted timestamp +and repeat the commands. + + <23> => DateTime(get_time()).to_string() + + "2017-04-04 16:09:39 +0200" + + <24> => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res.len() + + 8287.000000 + +More details about the Icinga 2 DSL and its possibilities can be +found in the [language](17-language-reference.md#language-reference) and [library](18-library-reference.md#library-reference) reference chapters. + +### Late Check Results in Distributed Environments + +When it comes to a distributed HA setup, each node is responsible for a load-balanced amount of checks. +Host and Service objects provide the attribute `paused`. If this is set to `false`, the current node +actively attempts to schedule and execute checks. Otherwise the node does not feel responsible. + + <3> => var res = {}; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res[s.paused] += 1 }; res + { + @false = 2.000000 + @true = 1.000000 + } + +You may ask why this analysis is important? Fair enough - if the numbers are not inverted in a HA zone +with two members, this may give a hint that the cluster nodes are in a split-brain scenario, or you've +found a bug in the cluster. + + +If you are running a cluster setup where the master/satellite executes checks on the client via +[top down command endpoint](6-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) mode, +you might want to know which zones are affected. + +This analysis assumes that clients which are not connected, have the string `connected` in their +service check result output and their state is `UNKNOWN`. + + <4> => var res = {}; for (s in get_objects(Service)) { if (s.state==3) { if (match("*connected*", s.last_check_result.output)) { res[s.zone] += [s.host_name] } } }; for (k => v in res) { res[k] = len(v.unique()) }; res + + { + Asia = 31.000000 + Europe = 214.000000 + USA = 207.000000 + } + +The result set shows the configured zones and their affected hosts in a unique list. The output also just prints the numbers +but you can adjust this by omitting the `len()` call inside the for loop. + ## Notifications are not sent -* Check the debug log to see if a notification is triggered. +* Check the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) to see if a notification is triggered. * If yes, verify that all conditions are satisfied. * Are any errors on the notification command execution logged? +Please ensure to add these details with your own description +to any question or issue posted to the community channels. + Verify the following configuration: * Is the host/service `enable_notifications` attribute set, and if so, to which value? -* Do the notification attributes `states`, `types`, `period` match the notification conditions? -* Do the user attributes `states`, `types`, `period` match the notification conditions? +* Do the [notification](9-object-types.md#objecttype-notification) attributes `states`, `types`, `period` match the notification conditions? +* Do the [user](9-object-types.md#objecttype-user) attributes `states`, `types`, `period` match the notification conditions? * Are there any notification `begin` and `end` times configured? * Make sure the [notification](11-cli-commands.md#enable-features) feature is enabled. * Does the referenced NotificationCommand work when executed as Icinga user on the shell? @@ -232,18 +413,33 @@ to `features-enabled` and that the latter is included in [icinga2.conf](4-config * Are the feature attributes set correctly according to the documentation? * Any errors on the logs? +Look up the [object type](9-object-types.md#object-types) for the required feature and verify it is enabled: + + # icinga2 object list --type + +Example for the `graphite` feature: + + # icinga2 object list --type GraphiteWriter + ## Configuration is ignored * Make sure that the line(s) are not [commented out](17-language-reference.md#comments) (starting with `//` or `#`, or encapsulated by `/* ... */`). * Is the configuration file included in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf)? +Run the [configuration validation](11-cli-commands.md#config-validation) and add `notice` as log severity. +Search for the file which should be included i.e. using the `grep` CLI command. + + # icinga2 daemon -C -x notice | grep command + ## Configuration attributes are inherited from Icinga 2 allows you to import templates using the [import](17-language-reference.md#template-imports) keyword. If these templates contain additional attributes, your objects will automatically inherit them. You can override or modify these attributes in the current object. +The [object list](15-troubleshooting.md#list-configuration-objects) CLI command allows you to verify the attribute origin. + ## Configuration Value with Single Dollar Sign In case your configuration validation fails with a missing closing dollar sign error message, you @@ -251,6 +447,9 @@ did not properly escape the single dollar sign preventing its usage as [runtime critical/config: Error: Validation failed for Object 'ping4' (Type: 'Service') at /etc/icinga2/zones.d/global-templates/windows.conf:24: Closing $ not found in macro format string 'top-syntax=${list}'. +Correct the custom attribute value to + + "top-syntax=$${list}" ## Cluster and Clients Troubleshooting @@ -261,19 +460,19 @@ done so already. > **Note** > -> Some problems just exist due to wrong file permissions or packet filters applied. Make +> Some problems just exist due to wrong file permissions or applied packet filters. Make > sure to check these in the first place. ### Cluster Troubleshooting Connection Errors -General connection errors normally lead you to one of the following problems: +General connection errors could be one of the following problems: -* Wrong network configuration -* Packet loss on the connection +* Incorrect network configuration +* Packet loss * Firewall rules preventing traffic Use tools like `netstat`, `tcpdump`, `nmap`, etc. to make sure that the cluster communication -happens (default port is `5665`). +works (default port is `5665`). # tcpdump -n port 5665 -i any diff --git a/doc/18-library-reference.md b/doc/18-library-reference.md index 856cf4256..0a3f91225 100644 --- a/doc/18-library-reference.md +++ b/doc/18-library-reference.md @@ -4,9 +4,9 @@ These functions are globally available in [assign/ignore where expressions](3-monitoring-basics.md#using-apply-expressions), [functions](17-language-reference.md#functions), [API filters](12-icinga2-api.md#icinga2-api-filters) -and the [Icinga 2 console](11-cli-commands.md#cli-command-console). +and the [Icinga 2 debug console](11-cli-commands.md#cli-command-console). -You can use the [Icinga 2 console](11-cli-commands.md#cli-command-console) +You can use the [Icinga 2 debug console](11-cli-commands.md#cli-command-console) as a sandbox to test these functions before implementing them in your scenarios. diff --git a/doc/4-configuring-icinga-2.md b/doc/4-configuring-icinga-2.md index 14a5631e2..9c595bba3 100644 --- a/doc/4-configuring-icinga-2.md +++ b/doc/4-configuring-icinga-2.md @@ -4,9 +4,8 @@ This chapter provides an introduction into best practices with your Icinga 2 con The configuration files which are automatically created when installing the Icinga 2 packages are a good way to start with Icinga 2. -If you're interested in a detailed explanation of each language feature used in those -configuration files, you can find more information in the [Language Reference](17-language-reference.md#language-reference) -chapter. +The [Language Reference](17-language-reference.md#language-reference) chapter explains details +on value types (string, number, dictionaries, etc.) and the general configuration syntax. ## Configuration Best Practice @@ -17,12 +16,12 @@ decide for a possible strategy. There are many ways of creating Icinga 2 configuration objects: * Manually with your preferred editor, for example vi(m), nano, notepad, etc. +* A configuration tool for Icinga 2 e.g. the [Icinga Director](https://github.com/Icinga/icingaweb2-module-director) * Generated by a [configuration management tool](13-addons.md#configuration-tools) such as Puppet, Chef, Ansible, etc. -* A configuration addon for Icinga 2 ([Icinga Director](https://github.com/Icinga/icingaweb2-module-director)) * A custom exporter script from your CMDB or inventory tool -* your own. +* etc. -In order to find the best strategy for your own configuration, ask yourself the following questions: +Find the best strategy for your own configuration and ask yourself the following questions: * Do your hosts share a common group of services (for example linux hosts with disk, load, etc. checks)? * Only a small set of users receives notifications and escalations for all hosts/services? @@ -36,11 +35,12 @@ host and service basis. Then you should look for the object specific configuration setting `host_name` etc. accordingly. -Finding the best files and directory tree for your configuration is up to you. Make sure that -the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file includes them, -and then think about: +You decide on the "best" layout for configuration files and directories. Ensure that +the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file includes them. -* tree-based on locations, hostgroups, specific host attributes with sub levels of directories. +Consider these ideas: + +* tree-based on locations, host groups, specific host attributes with sub levels of directories. * flat `hosts.conf`, `services.conf`, etc. files for rule based configuration. * generated configuration with one file per host and a global configuration for groups, users, etc. * one big file generated from an external application (probably a bad idea for maintaining changes). @@ -62,12 +62,33 @@ If you are planning to use a distributed monitoring setup with master, satellite take the configuration location into account too. Everything configured on the master, synced to all other nodes? Or any specific local configuration (e.g. health checks)? -TODO +There is a detailed chapter on [distributed monitoring scenarios](6-distributed-monitoring.md#distributed-monitoring-scenarios). +Please ensure to have read the [introduction](6-distributed-monitoring.md#distributed-monitoring) at first glance. If you happen to have further questions, do not hesitate to join the [community support channels](https://www.icinga.com/community/get-involved/) and ask community members for their experience and best practices. +## Your Configuration + +If you prefer to organize your own local object tree, you can also remove +`include_recursive "conf.d"` from your icinga2.conf file. + +Create a new configuration directory, e.g. `objects.d` and include it +in your icinga2.conf file. + + [root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/objects.d + + [root@icinga2-master1.localdomain /]# vim /etc/icinga2/icinga2.conf + + /* Local object configuration on our master instance. */ + include_recursive "objects.d" + +This approach is used by the [Icinga 2 Puppet module](https://github.com/Icinga/puppet-icinga2). + +If you plan to setup a distributed setup with HA clusters and clients, please refer to [this chapter](#6-distributed-monitoring.md#distributed-monitoring-top-down) +for examples with `zones.d` as configuration directory. + ## Configuration Overview ### icinga2.conf @@ -148,6 +169,10 @@ This `include_recursive` directive is used for discovery of services on remote c and their generated configuration described in [this chapter](6-distributed-monitoring.md#distributed-monitoring-bottom-up). +**Note**: This has been DEPRECATED in Icinga 2 v2.6 and is **not** required for +satellites and clients using the [top down approach](#6-distributed-monitoring.md#distributed-monitoring-top-down). +You can safely disable/remove it. + /** * Although in theory you could define all your objects in this file @@ -177,7 +202,6 @@ Example: /* The directory which contains the plugins from the Monitoring Plugins project. */ const PluginDir = "/usr/lib64/nagios/plugins" - /* The directory which contains the Manubulon plugins. * Check the documentation, chapter "SNMP Manubulon Plugin Check Commands", for details. */ @@ -197,9 +221,24 @@ Example: The `ZoneName` and `TicketSalt` constants are required for remote client and distributed setups only. +### zones.conf + +This file can be used to specify the required [Zone](9-object-types.md#objecttype-zone) +and [Endpoint](9-object-types.md#objecttype-endpoint) configuration object for +[distributed monitoring](6-distributed-monitoring.md#distributed-monitoring). + +By default the `NodeName` and `ZoneName` [constants](4-configuring-icinga-2.md#constants-conf) will be used. + +It also contains several [global zones](6-distributed-monitoring.md#distributed-monitoring-global-zone-config-sync) +for distributed monitoring environments. + +Please ensure to modify this configuration with real names i.e. use the FQDN +mentioned in [this chapter](6-distributed-monitoring.md#distributed-monitoring-conventions) +for your `Zone` and `Endpoint` object names. + ### The conf.d Directory -This directory contains example configuration which should help you get started +This directory contains **example configuration** which should help you get started with monitoring the local host and its services. It is included in the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file by default. @@ -207,8 +246,10 @@ It can be used as reference example for your own configuration strategy. Just keep in mind to include the main directories in the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) file. -You are certainly not bound to it. Remove it if you prefer your own -way of deploying Icinga 2 configuration. +> **Note** +> +> You can remove the include directive in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) +> if you prefer your own way of deploying Icinga 2 configuration. Further details on configuration best practice and how to build your own strategy is described in [this chapter](4-configuring-icinga-2.md#configuration-best-practice). diff --git a/doc/5-service-monitoring.md b/doc/5-service-monitoring.md index e70008849..cdab3f757 100644 --- a/doc/5-service-monitoring.md +++ b/doc/5-service-monitoring.md @@ -183,6 +183,8 @@ Instead, choose a plugin and configure its parameters and thresholds. The follow * [disk](10-icinga-template-library.md#plugin-check-command-disk) * [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap) +* [procs](10-icinga-template-library.md#plugin-check-command-processes) +* [users](10-icinga-template-library.md#plugin-check-command-users) * [running_kernel](10-icinga-template-library.md#plugin-contrib-command-running_kernel) * package management: [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum), etc. * [ssh](10-icinga-template-library.md#plugin-check-command-ssh) @@ -269,6 +271,7 @@ check [this blog entry](http://www.claudiokuenzler.com/blog/650/slow-vmware-perl * [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [ssmtp](10-icinga-template-library.md#plugin-check-command-ssmtp) * [imap](10-icinga-template-library.md#plugin-check-command-imap), [simap](10-icinga-template-library.md#plugin-check-command-simap) * [pop](10-icinga-template-library.md#plugin-check-command-pop), [spop](10-icinga-template-library.md#plugin-check-command-spop) +* [mailq](10-icinga-template-library.md#plugin-check-command-mailq) ### Hardware Monitoring diff --git a/doc/8-advanced-topics.md b/doc/8-advanced-topics.md index 2c9622fc0..c7f62f278 100644 --- a/doc/8-advanced-topics.md +++ b/doc/8-advanced-topics.md @@ -343,7 +343,121 @@ and adds the excluded time period names as an array. } } -## Advanced Use of Apply Rules +## Check Result Freshness + +In Icinga 2 active check freshness is enabled by default. It is determined by the +`check_interval` attribute and no incoming check results in that period of time. + + threshold = last check execution time + check interval + +Passive check freshness is calculated from the `check_interval` attribute if set. + + threshold = last check result time + check interval + +If the freshness checks are invalid, a new check is executed defined by the +`check_command` attribute. + + +## Check Flapping + +The flapping algorithm used in Icinga 2 does not store the past states but +calculates the flapping threshold from a single value based on counters and +half-life values. Icinga 2 compares the value with a single flapping threshold +configuration attribute named `flapping_threshold`. + +Flapping detection can be enabled or disabled using the `enable_flapping` attribute. + + +## Volatile Services + +By default all services remain in a non-volatile state. When a problem +occurs, the `SOFT` state applies and once `max_check_attempts` attribute +is reached with the check counter, a `HARD` state transition happens. +Notifications are only triggered by `HARD` state changes and are then +re-sent defined by the `interval` attribute. + +It may be reasonable to have a volatile service which stays in a `HARD` +state type if the service stays in a `NOT-OK` state. That way each +service recheck will automatically trigger a notification unless the +service is acknowledged or in a scheduled downtime. + +## Monitoring Icinga 2 + +Why should you do that? Icinga and its components run like any other +service application on your server. There are predictable issues +such as "disk space is running low" and your monitoring suffers from just +that. + +You would also like to ensure that features and backends are running +and storing required data. Be it the database backend where Icinga Web 2 +presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or +the entire distributed setup. + +This list isn't complete but should help with your own setup. +Windows client specific checks are highlighted. + +Type | Description | Plugins and CheckCommands +----------------|-------------------------------|----------------------------------------------------- +System | Filesystem | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client) +System | Memory, Swap | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client) +System | Hardware | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor) +System | Virtualization | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware) +System | Processes | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client) +System | System Activity Reports | [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py) +System | I/O | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat) +System | Network interfaces | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces) +System | Users | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client) +System | Logs | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts. +System | NTP | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time) +System | Updates | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum) +Icinga | Status & Stats | [icinga](10-icinga-template-library.md#itl-icinga) (more below) +Icinga | Cluster & Clients | [health checks](6-distributed-monitoring.md#distributed-monitoring-health-checks) +Database | MySQL | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health) +Database | PostgreSQL | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres) +Database | Housekeeping | Check the database size and growth and analyse metrics to examine trends. +Database | DB IDO | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below) +Webserver | Apache2, Nginx, etc. | [http](10-icinga-template-library.md#plugin-check-command-http), [apache_status](10-icinga-template-library.md#plugin-contrib-command-apache_status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status) +Webserver | Certificates | [http](10-icinga-template-library.md#plugin-check-command-http) +Webserver | Authorization | [http](10-icinga-template-library.md#plugin-check-command-http) +Notifications | Mail (queue) | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq) +Notifications | SMS (GSM modem) | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status) +Notifications | Messengers, Cloud services | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc. +Metrics | PNP, RRDTool | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files. +Metrics | Graphite | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite) +Metrics | InfluxDB | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin) +Metrics | Elastic Stack | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration) +Metrics | Graylog | [Graylog integration](14-features.md#graylog-integration) + + +The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of +Icinga 2. You can forward them to your preferred graphing solution. +If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write +your own custom check plugin. Or you keep using the built-in [object accessor functions](8-advanced-topics.md#access-object-attributes-at-runtime) +to calculate stats in-memory. + +There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL +which provides additional metrics for the IDO database. + +``` +apply Service "ido-mysql" { + check_command = "ido" + + vars.ido_type = "IdoMysqlConnection" + vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf + + assign where match("master*.localdomain", host.name) +} +``` + +More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter. + +Distributed setups should include specific [health checks](6-distributed-monitoring.md#distributed-monitoring-health-checks). +You might also want to add additional checks for SSL certificate expiration. + + +## Advanced Configuration Hints + +### Advanced Use of Apply Rules [Apply rules](3-monitoring-basics.md#using-apply) can be used to create a rule set which is entirely based on host objects and their attributes. @@ -426,7 +540,7 @@ service checks in this example. In addition to defining check parameters this way, you can also enrich the `display_name` attribute with more details. This will be shown in in Icinga Web 2 for example. -## Use Functions in Object Configuration +### Use Functions in Object Configuration There is a limited scope where functions can be used as object attributes such as: @@ -449,7 +563,7 @@ inside the `icinga2.log` file depending in your log severity * Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary) * Build them step-by-step. You can always refactor your code later on. -### Use Functions in Command Arguments set_if +#### Use Functions in Command Arguments set_if The `set_if` attribute inside the command arguments definition in the [CheckCommand object definition](9-object-types.md#objecttype-checkcommand) is primarily used to @@ -528,7 +642,7 @@ The more programmatic approach for `set_if` could look like this: } -### Use Functions as Command Attribute +#### Use Functions as Command Attribute This comes in handy for [NotificationCommands](9-object-types.md#objecttype-notificationcommand) or [EventCommands](9-object-types.md#objecttype-eventcommand) which does not require @@ -582,7 +696,7 @@ You can omit the `log()` calls, they only help debugging. } } -### Use Custom Functions as Attribute +#### Use Custom Functions as Attribute To use custom functions as attributes, the function must be defined in a slightly unexpected way. The following example shows how to assign values @@ -609,7 +723,7 @@ as value for `ping_wrta`, all other hosts use 100. assign where true } -### Use Functions in Assign Where Expressions +#### Use Functions in Assign Where Expressions If a simple expression for matching a name or checking if an item exists in an array or dictionary does not fit, you should consider @@ -698,7 +812,7 @@ with the `vars_app` dictionary. assign where check_app_type(host, "ABAP") } -## Access Object Attributes at Runtime +### Access Object Attributes at Runtime The [Object Accessor Functions](18-library-reference.md#object-accessor-functions) can be used to retrieve references to other objects by name. @@ -801,40 +915,3 @@ time of the day compared to the defined time period. } -## Check Result Freshness - -In Icinga 2 active check freshness is enabled by default. It is determined by the -`check_interval` attribute and no incoming check results in that period of time. - - threshold = last check execution time + check interval - -Passive check freshness is calculated from the `check_interval` attribute if set. - - threshold = last check result time + check interval - -If the freshness checks are invalid, a new check is executed defined by the -`check_command` attribute. - - -## Check Flapping - -The flapping algorithm used in Icinga 2 does not store the past states but -calculates the flapping threshold from a single value based on counters and -half-life values. Icinga 2 compares the value with a single flapping threshold -configuration attribute named `flapping_threshold`. - -Flapping detection can be enabled or disabled using the `enable_flapping` attribute. - - -## Volatile Services - -By default all services remain in a non-volatile state. When a problem -occurs, the `SOFT` state applies and once `max_check_attempts` attribute -is reached with the check counter, a `HARD` state transition happens. -Notifications are only triggered by `HARD` state changes and are then -re-sent defined by the `interval` attribute. - -It may be reasonable to have a volatile service which stays in a `HARD` -state type if the service stays in a `NOT-OK` state. That way each -service recheck will automatically trigger a notification unless the -service is acknowledged or in a scheduled downtime.