Drop the "-source" suffix from the configuration option. Furhtermore,
with the latest IGL release 0.8.1[0], the "api-base-url" is renamed to
just "url".
[0]: https://github.com/Icinga/icinga-go-library/pull/168
The whole StreamSorter logic is only required for Icinga Notifications.
Thus, the implementation was moved from the history package to the
notifications package, removing some unnecessary generalizations on the
way. This results in big changes to be made in the notifications
package, while other modules are mostly not affected.
- Don't validate notifications config in a background Goroutine.
- Clip pipeline slice to avoid reusing capability twice.
- Rework notification Client.buildCommonEvent and depending methods.
- Resubmit events after updating rules in one go.
- Simplify Client.fetchHostServiceName based on Julian's suggestion.
Co-Authored-By: Julian Brost <julian.brost@icinga.com>
Do not silently drop failing callback submissions - such as Icinga
Notification during restarts or network disruptions -, but switch the
internal makeCallbackStageFunc stageFunc into a backlog mode.
This resulted in multiple changes, including removing the background
worker for notifications.Client, as otherwise the event submission
status could not be propagated back.
- Bump IGL to latest changes in Icinga/icinga-go-library#145.
- Allow specifying which pipeline keys are relevant, ignore others.
- Allow specifying which pipeline key should be parsed in which type.
- Create history.DowntimeHistoryMeta as a chimera combining
history.DowntimeHistory and history.HistoryDowntime to allow access
event_type, distinguishing between downtime_start and downtime_end.
- Trace times for submission steps in the worker. Turns out, the single
threaded worker blocks roughly two seconds for each
Client.ProcessEvent method call. This might sum up to minutes if lots
of events are processed at once. My current theory is that the delay
results in the expensive bcrypt hash comparison on Notifications.
Instead of retrieving the host and service names from the used RDBMs,
this commit allows us to query them from Redis. This is done to avoid
the overhead of database queries, especially when the host and service
names are always to be found in Redis. The previous implementation
simply perfomed two database queries with each received entity based on
their IDs, but we can perform this operation more efficiently from Redis
using the same filtering logic as before. Of course, we now have to
maintain more code needed to handle the Redis operations, but this is a
trade-off we should be willing to make for performance reasons.
This is the first version to use Icinga DB as an event source for Icinga
Notifications. If configured accordingly, Icinga DB forwards events
crafted from the Redis pipeline to the Icinga Notifications API.
This required a small refactoring of the history synchronization to
allow hooking into the Redis stream. Afterwards, the newly introduced
notifications package handles the rest.
Note: As part of this architectural change, Icinga Notifications offers
filters to be evaluated by Icinga DB. At the moment, these are SQL
queries being executed on the Icinga DB relational database. Either
consider both Icinga DB and Icinga Notifications to be part of the same
trust domain or consider the security implications.
Furthermore, this change requires a change on Icinga Notifications as
well. This will not work with the current version 0.1.1.
The changes are made by the Go's modernizer analyzer tool.
`go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -category=forvar -fix -test './...'`
The changes are made by the Go's modernizer analyzer tool
`go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -category=efaceany -fix -test './...'`
Use a dedicated SQL connection for the schema import to eliminate side
effects on future queries.
In particular, the MySQL/MariaDB schema file began by altering SQL
system variables, such as "sql_mode". These changes persist throughout
the entire connection, affecting every subsequent query.
Our database queries require the ANSI_QUOTES[0] SQL mode. However, the
schema file had unset this mode, which resulted in failed queries and a
subsequent Icinga DB crash. This was reported independently in #1002 and
in the community forum[1].
Creating a short-lived, dedicated SQL connection for the schema import
eliminates any impact on the main database connection.
Fixes#1002.
[0]: https://dev.mysql.com/doc/refman/8.4/en/sql-mode.html#sqlmode_ansi_quotes
[1]: https://community.icinga.com/t/error-1064-42000-you-have-an-error-in-your-sql-syntax-on-initialization/15080
Ensure that no invalid FROM_UNIXTIME SQL function calls result in no
data being marked as migration-worthy in icingadb-migrate.
MariaDB changed the behavior of the FROM_UNIXTIME function[^0] in
version 11.7, resulting in FROM_UNIXTIME(0) now returning NULL instead
of "1970-01-01 00:00:00". The icingadb-migrate utility uses this SQL
function within the "computeIdRange" function for "ido.from" and
"ido.to". Since "ido.from" defaults to 0, a NULL comparison occurs in
the WHERE clause. This results in an empty result set and no data is
marked as migration-worthy.
This fix wraps the user-supplied or default FROM_UNIXTIME value in a
COALESCE with either an infimum or a supremum. If the first
FROM_UNIXTIME returns NULL, the second returns a valid border.
Fixes#974.
[^0]: https://mariadb.com/kb/en/from_unixtime/
Ensure only valid host and service state values are migrated from the
IDO to the state_history table via icingadb-migrate.
In a recent Icinga Community post[^0], a user reported an Icinga DB Web
error caused by an invalid host state in the database. After tracing the
root cause down to icingadb-migrate, it turned out that unexpected state
values were present in the IDO database.
Since the icingadb-migrate utility does not perform any checks for the
state values, the faulty value was imported into Icinga DB's relational
database, causing issues.
This fix caps state values at their expected supremum, which is DOWN (1)
for hosts and UNKNOWN (3) for services. The PENDING (99) special case is
excluded from this logic.
Fixes#968.
[^0]: https://community.icinga.com/t/history-invalid-host-state-2/14866
When the icingadb application was aborted with a fatal log event, the
user received an impressive stack trace. Unfortunately, the relevant
information is not directly visible for the untrained eye.
This change now prints a normal log event at FATAL level with an error
attached. If an error implements the fmt.Formatter, the zap logging
framework adds both an "error" and an "errorVerbose" field. So we end up
with a short error description, the error string concatenated of all
wrapped errors, and a stack trace.
Since error and errorVerbose are both fields, it is possible to filter
them for the systemd journald logger at a later point.
> 2025-04-15T16:43:36.974+0200 FATAL icingadb Can't connect to database {"error": "can't connect to database: dial tcp [::1]:5432: connect: connection refused", "errorVerbose": "dial tcp [::1]:5432: connect: connection refused\ncan't connect to database\ngithub.com/icinga/icinga-go-library/database.RetryConnector.Connect\n\tgithub.com/icinga/icinga-go-library@v0.6.3/database/driver.go:48\ndatabase/sql.(*DB).conn\n\tdatabase/sql/sql.go:1431\ndatabase/sql.(*DB).PingContext.func1\n\tdatabase/sql/sql.go:900\ndatabase/sql.(*DB).retry\n\tdatabase/sql/sql.go:1576\ndatabase/sql.(*DB).PingContext\n\tdatabase/sql/sql.go:899\ndatabase/sql.(*DB).Ping\n\tdatabase/sql/sql.go:917\nmain.run\n\tgithub.com/icinga/icingadb/cmd/icingadb/main.go:64\nmain.main\n\tgithub.com/icinga/icingadb/cmd/icingadb/main.go:37\nruntime.main\n\truntime/proc.go:283\nruntime.goexit\n\truntime/asm_amd64.s:1700"}
For hard state change: icinga_statehistory.state is the current (hard) state (icinga_statehistory.last_hard_state is the previous one)
For soft state change: icinga_statehistory.last_hard_state is the current hard state
Setting an explicit logger for the two database connections shows faulty
connection attempts directly when they occur and not only after the
retry timeout of five minutes.
References #949.
Based on the configured log level, some useful log messages will be
discarded. While this works as intended, this will result in unbalanced
logs - HA hand over is logged, taking over is not - or missing version
strings useful for log analysis.
Fixes#918
To get rid of docker-icingadb and its additional entry point, the schema
import functionality has been implemented directly in Icinga DB. Using
the new --database-auto-import command line argument will result in an
automatic schema import if no schema is found.
The implementation is split between the already existing CheckSchema
function and the introduced ImportSchema function.
The CheckSchema function is now able to distinguish between the absence
of a schema and an incorrect schema version. Both situations return a
separate error type.
As before, CheckSchema is called in the main function. If the error type
now implies the absence of a schema (ErrSchemaNotExists) and the
--database-auto-import flag is set, the auto-import is started.
The schema import itself is performed in the new ImportSchema function,
which loads the schema from a given file and inserts it within a
transaction, allowing to rollback in case of an error.
Fixes#896.
By introducing an explicit "AS" to set output names in the SELECT column
list, there are no issues with reserved names. Unfortunately, this
happened on PostgreSQL in the older version 13 with the reserved name
"name". Adding "AS" mitigates this issue.
Furthermore, I have put each column name in its own line for the SELECT
queries to ease the readability of the query itself and of future diffs.
Fixes#884.
Previously, the HA feature was allowed to open `max_connections`
database connection in parallel to other Icinga DB components. Meaning,
Icinga DB wasn't limited to the configured `max_connections`, but
effectively to `2 * max_connections`.
The main loop select cases for hactx.Done() and ctx.Done() were unified,
as hactx is a derived ctx. A closed ctx case may be lost as the hactx
case could have been chosen.
Instead of declaring a field as `int64` and using helper functions for
atomic operations, it is better and recommended to use the new atomic
types like `atomic.Int64`.
An env ID with the wrong length, either due to a copy-paste error or
human error during testing, results in a SQL CHECK CONSTRAINT violation
that is retried multiple times until it finally fails.
Historical data from an older Icinga 2 installation contained NULL
values for the name column in some rows of the icinga_commenthistory and
icinga_downtimehistory tables.
Normally this field contains something like
${name1}!${name2}!${unique_value} where the $unique_value is based on a
timestamp for older entries and a UUID for newer ones. For a concrete
example, this could be "host.example.com!ping6!123…".
Unfortunately, using an empty string for these NULL values will cause an
error later because the new primary key will be calculated based on it.
Therefore, a new deterministic name is generated based on the primary
keys and the known name1 and name2 values.
Closes#766.