Commit graph

720 commits

Author SHA1 Message Date
Alvar
88c801bf5a
Merge pull request #1063 from Icinga/schema-align-icinga2-data-structure-and-schema
Align Icinga 2 Types with SQL Representation
2026-01-14 15:27:43 +00:00
Alvar
c32e1a4e1d
Merge pull request #1060 from Icinga/i997-retryable-queries
Retry more SQL Queries
2026-01-14 15:27:20 +00:00
Alvar
fb347f4ed6
Merge pull request #789 from Icinga/schema-upgrade-check-intermediates
CheckSchema: Verify intermediate schema upgrades
2026-01-14 15:26:38 +00:00
Alvar Penning
d7b761d9a4
overdue.initSync: Retry SQL query
The overdue sync starts with an initializing SQL query which was not
retryed so far. Thus, if there is a race between the relational database
and Icinga DB, this query might fail and let Icinga DB crash.

This might happen if restarting Icinga DB together with the relational
database server, for example after an upgrade.
2026-01-14 10:20:01 +01:00
Alvar Penning
597ef9ddd4
notifications.fetchCustomVarFromSql: Retry SQL query
The fetchCustomVarFromSql method is used deep in the call trace to fetch
custom variables for notification events to be submitted to the Icinga
Notifications API. If, for whatever reason, the SQL query within this
method fails, the event will be discarded.

By retrying the SQL query using our usual retry logic, temporary
database hiccups do not result in loosing an event.
2026-01-14 10:10:59 +01:00
Alvar Penning
c5f1afc067
notifications: Retry Timeout for fetchHostServiceFromRedis
Set the default timeout for the retryable HGET Redis query for fetching
host or service names. Without any timeout so far, this query could fail
and be retryed until eternity.
2026-01-14 10:04:27 +01:00
Alvar Penning
f49fac0798
CheckSchema: Verify intermediate schema upgrades
When skipping a version for an Icinga DB upgrade, all intermediate
upgrade steps must be taken. While this is already stated in the
documentation, it might be overlooked.

This happened for one community user, upgrading from v1.1.0 to v1.2.0,
skipping the intermediate schema upgrade for v1.1.1.

> https://community.icinga.com/t/icingadb-failing-exactly-5-minutes-after-start/13955

First, the necessity for all upgrades in their release order was made
more prominent in the documentation, hoping that less users would ignore
this when skimming the upgrade docs.

However, the real change here is adding another check to the CheckSchema
function, verifying that all schema upgrades between the lowest known
version and the highest known version in the icingadb_schema table
exists. If an intermediate schema upgrade was skipped, as in the thread
above, this raises a descriptive error.
2026-01-14 09:55:56 +01:00
Alvar
4e3fe1c0d5
Merge pull request #1058 from Icinga/overdue-sync-redis-key-namespace
overdue: Temporary Key in "icingadb:" Namespace
2026-01-13 16:05:13 +00:00
Alvar Penning
167076cc4b
Align Icinga 2 Types with SQL Representation
Certain Icinga 2 object fields of a floating type are incorrectly stored
as unsigned integers in the schema. Since none of those columns are in
the history tables, changing them was considered not too invasive.

Furthermore, some struct fields were changed from "float64" to
"types.Float", since the SQL schema supports NULL values.

Fixes #882.
2026-01-13 16:53:38 +01:00
Alexander A. Klimov
b29fa56c88
Test icingaredis.SetChecksums() 2026-01-09 10:22:38 +01:00
Alvar Penning
3a6df542f7
overdue: Temporary Key in "icingadb:" Namespace
The get_overdues.lua script uses a temporary Redis key to store data. So
far, this key was a random UUID, not being prefixed or namespaced. This
does not work when applying Redis ACLs on keys, as this random key is
unpredictable. Now, this key is prefixed with "icingadb:temp:".

This was initially reported in the Community Forum[^0] where the user
applied ACLs to the Redis user for Icinga DB.

It was easy to reproduce this by creating or reconfiguring a dedicated
Redis user, allowing all operations on keys in the "icinga:" and
"icingadb:" namespaces.

> 127.0.0.1:6380> ACL SETUSER icingadb on >icingadb ~icinga:* ~icingadb:* +@all
> OK
> 127.0.0.1:6380> ACL LIST
> 1) "user default on nopass sanitize-payload ~* &* +@all"
> 2) "user icingadb on sanitize-payload #1631be4f74353b72282ba144d82b6764f885feefc99c15c2c5f37b5c65bb3006 ~icinga:* ~icingadb:* resetchannels +@all"

After a while, the previous code failed as expected.

> 2026-01-07T11:22:10.253Z    FATAL   icingadb        NOPERM No permissions to access a key
> can't execute Redis script
> github.com/icinga/icingadb/pkg/icingadb/overdue.Sync.sync
>     /go/src/github.com/Icinga/icingadb/pkg/icingadb/overdue/sync.go:164
> github.com/icinga/icingadb/pkg/icingadb/overdue.Sync.Sync.func3
>     /go/src/github.com/Icinga/icingadb/pkg/icingadb/overdue/sync.go:70
> golang.org/x/sync/errgroup.(*Group).Go.func1
>     /go/pkg/mod/golang.org/x/sync@v0.19.0/errgroup/errgroup.go:93
> runtime.goexit
>     /usr/local/go/src/runtime/asm_amd64.s:1700

With this change, Icinga DB only uses these two namespaces and
continuous to operate.

[^0]: https://community.icinga.com/t/redis-user-acl-for-icingadb/15309
2026-01-07 12:33:13 +01:00
Alvar Penning
ea3d8e6b07
notifications: Close Prepared Statement
The prepared statement used to fetch custom vars from SQL for
notification events is not closed. This results in leaking prepared
statements, eventually resulting in the rejection of new prepared
statements.

> notifications: Cannot build event from history entry error="cannot
> build event for \"[...]\",\"[...]\": Error 1461 (42000): Can't create
> more than max_prepared_stmt_count statements (current value: 16382)"

Monitoring MariaDB's prepared_stmt_count over roughly ten minutes before
and after this change shows an increasing number of open prepared
statements.

                                Before patch
 180 +-----------------------------------------------------------------+
     |       +        +       +       +       +        +     AAAA      |
 160 |-+                                                   AAA       +-|
     |                                                    AA           |
 140 |-+                                              AAAAA          +-|
     |                                             AAAA                |
 120 |-+                                         AAA                 +-|
     |                                       AAAAA                     |
 100 |-+                               AAAAAAA                       +-|
  80 |-+                            AAAA                             +-|
     |                             AA                                  |
  60 |-+                         AAA                                 +-|
     |                    AAAAAAAA                                     |
  40 |-+               AAAA                                          +-|
     |             AAAAA                                               |
  20 |-+        AAA                                                  +-|
     |       +AAA     +       +       +       +        +       +       |
   0 +-----------------------------------------------------------------+  9
1.76579 1.76579x 1.76579 1.76579 1.76579 1.76579x 1.76579 1.765 1.76579x10
                               Unix timestamp

The second graph, a record after the patch, contains measurements at the
zero level.

                                 After patch
   1 +-----------------------------------------------------------------+
     |       +        +       +       +       +        +       +       |
     |                                                                 |
 0.8 |-+                                                             +-|
     |                                                                 |
     |                                                                 |
     |                                                                 |
 0.6 |-+                                                             +-|
     |                                                                 |
     |                                                                 |
 0.4 |-+                                                             +-|
     |                                                                 |
     |                                                                 |
     |                                                                 |
 0.2 |-+                                                             +-|
     |                                                                 |
     |       +        +       +       +       +        +       +       |
   0 +-----------------------------------------------------------------+  9
1.76579 1.76579x 1.76579 1.76579 1.76579 1.76579x 1.76579 1.765 1.76579x10
                                   Unix timestamp
2025-12-15 11:23:44 +01:00
Alvar Penning
f8c2ab4b17
notifications: Speed up StreamSorter Tests
Allow configurable timeouts for the StreamSorter, to set them to a
fraction of their default for the tests. Now the tests are done in three
seconds instead of three minutes.

While doing so, another race condition with the test logging was
unveiled. Since this race results from a closing test context and test
logger, there was not much to do and I decided to just drop the logging
message, which was used only for tests anyway.
2025-11-17 09:20:07 +01:00
Alvar Penning
2a5fde1594
notifications: Mute and Unmute Events
Populate the Event's Mute field for muting and unmuting for flapping
events and acknowledgements.
2025-11-17 09:20:07 +01:00
Alvar Penning
46b1c6d673
notifications: TypeAcknowledgementCleared Message
Change the message for TypeAcknowledgementCleared events to a more
obvious one.
2025-11-17 09:20:07 +01:00
Alvar Penning
e012ef6d1b
notifications: Import StreamSorter Logic
The whole StreamSorter logic is only required for Icinga Notifications.
Thus, the implementation was moved from the history package to the
notifications package, removing some unnecessary generalizations on the
way. This results in big changes to be made in the notifications
package, while other modules are mostly not affected.
2025-11-17 09:20:07 +01:00
Alvar Penning
c6368b1f82
notifications.Client: Allow Parameters of any Type
The parameters can not only be strings, but anything to PHP's liking. In
one example, an integer was observed. Since Parameters is converted to
an []any later anyways, this is no real change in behavior.
2025-11-17 09:20:07 +01:00
Alvar Penning
0cd4978419
history.StreamSorter: Few comments, No Data Races
After Julian reworked big parts of the StreamSorter for the better, I
went over the code multiple times and added a few comments for parts I
had to read twice.

Within the tests, there might be a data race when zaptest is used after
the test's context is done. Since there were a few log messages
potentially occurring after the test's end, a guard was added to ensure
no verbose log messages are being produced if the context is done.
2025-11-17 09:20:07 +01:00
Julian Brost
6569487fbb
StreamSorter: improve output channel close behavior and simplify implementation
This commit is pretty much an overhaul of the implementation to allow for a
more straight-forward way to close the output channel. The main changes to the
implementation are:

- StreamSorter now provides a method PipelineFunc that can directly be used in
  a history sync pipeline. This allows StreamSorter to handle the in + out
  stream pair internally, so that it closes out after in was closed and all
  messages from it were passed to out.
- The two worker goroutines were combined into a single one and the secondary
  queue was removed. All pending messages remain in the heap and will only be
  removed from the heap when they are about to be passed to the callback.
- The worker now handles all operations (send and close) on the output stream.
2025-11-17 09:20:07 +01:00
Alvar Penning
18518cf813
history.streamSorterSubmissions: Use Pointer
Next to some other small cleanups, the streamSorterSubmissions slice
type now references pointers.
2025-11-17 09:20:07 +01:00
Alvar Penning
3e4f05d9fd
notifications: Fix flat customvars
Next to minor changes, the custom variables are now fetched from
customvar_flat, being in their flat format.
2025-11-17 09:20:07 +01:00
Alvar Penning
d97e1624dc
telemetry: Remove leftover Stats.Callback
Seems to be forgotten in e475a5ef91.
2025-11-17 09:20:07 +01:00
Alvar Penning
b61b0ab279
history.StreamSorter: Cleanup Output Channel
Introduce the StreamSorter.CloseOutput method to remove all submissions
for a certain output channel from both workers before closing the
channel.

The motivation behind this change is to have a single point where the
output channel is closed while no submissions are being sent into an
already closed channel.
2025-11-17 09:20:07 +01:00
Alvar Penning
9dee483ed1
history.StreamSorter: Various Fixes
- Store the streamSorterSubmission submission time as a time.Time
  instead of a nanosecond timestamp, comparing the time.Timer's
  monotonic clock.
- Replace time-based buckets in StreamSorter.submissionWorker by a heap
  to be pushed and popped. However, only submissions of a certain age
  are being forwarded. Reduces complexity quite a bit.
- Reduce complexity of StreamSorter.queueWorker by getting rid of
  unnecessary channel signals by checking for new queue events for
  processing at the loop start.
2025-11-17 09:20:07 +01:00
Alvar Penning
1c2e14a5e2
history.parseRedisStreamId: Remove regex
With parseRedisStreamId being in the "hot path", the quite simple
regular expression was exchanged with an even simpler string split.
However, as network operations precede and follow this, the benefit of
this optimization might be questionable.
2025-11-17 09:20:07 +01:00
Alvar Penning
45877e72a3
telemetry: Undo Stats rework
Effectively reverting cf4bd92611 and
passing a pointer to the relevant com.Counter to the history sync.
2025-11-17 09:20:07 +01:00
Alvar Penning
a8ff0a23e5
notifications: Simplify Icinga DB Web Rule Evaluation
- Ignore the "config" part of the JSON struct which is only relevant for
  Icinga DB Web.
- Remove unnecessary string conversions.
- Small code changes/improvements.
2025-11-17 09:20:07 +01:00
Alvar Penning
8baa5e7f5a
history: SyncCallbackConf For Common Callback Conf
Refactor multiple variables into common struct to ease handling.
2025-11-17 09:20:07 +01:00
Alvar Penning
ebcdadbd44
notifications: Custom Vars From SQL, Output Format
Rework the prior custom variable fetching code to no longer fetch
everything in a looping fashion from Redis, but send SQL queries for
custom variables now.

In addition, for service objects now contain both the service and host
custom variables, prefixed by "host.vars." or "service.vars.".
2025-11-17 09:20:07 +01:00
Alvar Penning
4a4792dfee
history: StreamSorter for Notifications Callback
The StreamSorter was added to history, allowing to collect messages from
multiple Redis streams, sorting them based on the timestamp in the
Stream ID, and ejecting them back.

This is used for the callback stage, required by Icinga Notification. In
the Notification context, an ordered stream is required.

Despite my best intention, it felt like I have created an Erlang.
2025-11-17 09:20:07 +01:00
Alvar Penning
b8e11b390e
notifications: Evaluate Icinga DB Web Rule Filter
The rules are no longer just plain SQL queries, but have now their own
JSON format, introduced by Icinga DB Web. This format is now supported
by Client.evaluateRulesForObject.

- https://github.com/Icinga/icingadb-web/pull/1289
- https://github.com/Icinga/icingadb/pull/998#issuecomment-3442298348
2025-11-17 09:20:07 +01:00
Alvar Penning
5abb8b4212
notifications: Fetch customvars from Redis
After reintroducing Event.ExtraTags in the IGL and Icinga Notifications,
Icinga DB populates events by their custom variables.

At the moment, the required customvars are fetched from Redis for each
event. Due to the Redis schema, at least on HGETALL with manual
filtering is required. This might be a good candidate for further
caching, and cache invalidation.
2025-11-17 09:20:07 +01:00
Alvar Penning
1ec561415d
Minor Tweaks for Icinga Notifications Integration
- Don't validate notifications config in a background Goroutine.
- Clip pipeline slice to avoid reusing capability twice.
- Rework notification Client.buildCommonEvent and depending methods.
- Resubmit events after updating rules in one go.
- Simplify Client.fetchHostServiceName based on Julian's suggestion.

Co-Authored-By: Julian Brost <julian.brost@icinga.com>
2025-11-17 09:20:07 +01:00
Alvar Penning
ad26a7857d
Configurable callback sync telemetry stat name
Refactor the telemetry.Stats to allow custom names. This enabled dynamic
callback names for the Redis history sync, used by Icinga Notifications.
2025-11-17 09:20:07 +01:00
Alvar Penning
432db22f82
notifications: Reflect RulesInfo IGL update
The RulesInfo type was simplified. Rules are no longer a custom struct,
but just represented by the map key and a filter expression string.
2025-11-17 09:20:07 +01:00
Alvar Penning
30b5c45162
notifications: Don't abort for faulty object rules
When a faulty - like syntactical incorrect - object filter expression
was loaded, each evaluation fails. However, prior to this change, the
submission logic was exited, making Icinga DB unable to recover. Now,
the event will be considered as no rule has matched and new rule version
can be loaded.
2025-11-17 09:20:07 +01:00
Alvar Penning
37212a2b64
notifications: Send relative Icinga Web 2 URLs
There is no need to let each Icinga Notifications source know the root
URL of Icinga Web 2. Since the latest IGL and IN change, partly URLs
relative to Icinga Web 2 are supported.
2025-11-17 09:20:07 +01:00
Alvar Penning
b6e9b61b15
history: Retry failing callback submissions
Do not silently drop failing callback submissions - such as Icinga
Notification during restarts or network disruptions -, but switch the
internal makeCallbackStageFunc stageFunc into a backlog mode.

This resulted in multiple changes, including removing the background
worker for notifications.Client, as otherwise the event submission
status could not be propagated back.
2025-11-17 09:20:07 +01:00
Alvar Penning
3a7e1f4aff
notifications: IGL Changes For Rules
The rules and rule version is now part of the Event.

Also rename the Client method receiver variable.
2025-11-17 09:20:07 +01:00
Alvar Penning
697eca139d
Notifications: Address Code Review
- Bump IGL to latest changes in Icinga/icinga-go-library#145.
- Allow specifying which pipeline keys are relevant, ignore others.
- Allow specifying which pipeline key should be parsed in which type.
- Create history.DowntimeHistoryMeta as a chimera combining
  history.DowntimeHistory and history.HistoryDowntime to allow access
  event_type, distinguishing between downtime_start and downtime_end.
- Trace times for submission steps in the worker. Turns out, the single
  threaded worker blocks roughly two seconds for each
  Client.ProcessEvent method call. This might sum up to minutes if lots
  of events are processed at once. My current theory is that the delay
  results in the expensive bcrypt hash comparison on Notifications.
2025-11-17 09:20:07 +01:00
Yonas Habteab
49b7d98084
Drop superfluous rulesMutex
There won't be any concurrent access to the rules, so we don't need to
guard it with a mutex.
2025-11-17 09:20:07 +01:00
Yonas Habteab
7e5b8e5385
Retrieve host and service names from Redis
Instead of retrieving the host and service names from the used RDBMs,
this commit allows us to query them from Redis. This is done to avoid
the overhead of database queries, especially when the host and service
names are always to be found in Redis. The previous implementation
simply perfomed two database queries with each received entity based on
their IDs, but we can perform this operation more efficiently from Redis
using the same filtering logic as before. Of course, we now have to
maintain more code needed to handle the Redis operations, but this is a
trade-off we should be willing to make for performance reasons.
2025-11-17 09:20:07 +01:00
Yonas Habteab
ca3f7c5c9d
Reevaluate rules immediately after refetching them
Otherwise, posting the entity in a `go s.Submit(entity)`
manner in the background will mess up the order of events
as there might be another even in the queue affecting the
same entity.

Apart from that, the log entry "submitted event ..." is also
downgraded to debug level, as it creates too much noise at the
info level without saying anything relevant to an end user.
2025-11-17 09:20:07 +01:00
Yonas Habteab
f170f3763e
Don't limit queries referncing to {host,service}_id & Env ID params
Instead allow them to reference any columns of the database entity as
long as that entity provides it. It also removes the retry mechanism
used to execute the queries as this would block the worker
unnecessarily.
2025-11-17 09:20:07 +01:00
Yonas Habteab
95da9ee443
Use the newly introduced notifications event utils from igl
Most of the notifications related code from here were outsourced to
Icinga Go Library, thus removes all the now obsolte ones from here.
2025-11-17 09:20:07 +01:00
Alvar Penning
848807d96c
Initial Icinga Notifications Source
This is the first version to use Icinga DB as an event source for Icinga
Notifications. If configured accordingly, Icinga DB forwards events
crafted from the Redis pipeline to the Icinga Notifications API.

This required a small refactoring of the history synchronization to
allow hooking into the Redis stream. Afterwards, the newly introduced
notifications package handles the rest.

Note: As part of this architectural change, Icinga Notifications offers
filters to be evaluated by Icinga DB. At the moment, these are SQL
queries being executed on the Icinga DB relational database. Either
consider both Icinga DB and Icinga Notifications to be part of the same
trust domain or consider the security implications.

Furthermore, this change requires a change on Icinga Notifications as
well. This will not work with the current version 0.1.1.
2025-11-17 09:20:07 +01:00
Alvar Penning
d6f67074e1
golangci-lint: Address forcetypeassert
There were multiple occurrences of forcetypeassert throughout the
codebase. In most cases, an if guard was added to return an error.
Sometimes, a panic was raised as refactoring would be too much. And
once, a nolint annotation was added as the actual check was out of
scope.
2025-10-14 15:12:57 +02:00
Alvar
5ee8f30644
Merge pull request #1008 from Icinga/schema-import-sql-conn-i1002
Dedicated SQL Connection For Schema Import
2025-08-27 12:03:56 +00:00
Yonas Habteab
ef1886365e Replace strings.Split() by strings.SplitSeq()
The changes are made by:
`go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -category=stringsseq -fix -test './...'`
2025-08-26 15:21:16 +02:00
Yonas Habteab
c9ebe8cd1d Replace 3-clause for loop with range based loop
The changes are made by:
`go run golang.org/x/tools/gopls/internal/analysis/modernize/cmd/modernize@latest -category=rangeint -fix -test './...'`
2025-08-26 15:17:48 +02:00