Commit graph

6874 commits

Author SHA1 Message Date
Yonas Habteab
1f92ec656b
Merge pull request #10523 from Icinga/dependency-eval-complexity
Prevent worst-case exponential complexity in dependency evaluation
2025-08-05 11:57:47 +02:00
Julian Brost
63e9ef58ba Prevent worst-case exponential complexity in dependency evaluation
So far, calling Checkable::IsReachable() traversed all possible paths to it's
parents. In case a parent is reachable via multiple paths, all it's parents
were evaluated multiple times, result in a worst-case exponential complexity.

With this commit, the implementation keeps track of which checkables were
already visited and uses the already-computed reachability instead of repeating
the computation, ensuring a worst-case linear runtime within the graph size.
2025-08-04 10:42:20 +02:00
Julian Brost
43f1e6f3a1 Move code involved in recursive dependency evaluation to helper class
Checkable::IsReachable() and DependencyGroup::GetState() call each other
recursively. Moving them to a common helper class allows adding caching to them
in a later commit without having to pass a cache between the functions (through
a public interface) or resorting to thread_local variables.
2025-08-04 10:42:20 +02:00
Julian Brost
51ec73cbd9 Send signals as Icinga user in safe-reload and logrotate
In contrast to the regular `kill` binary, `icinga2 internal signal` drops
permissions before sending the signal. This is important as the PID file can be
written by the Icinga user, dropping the permissions prevents that user from
using this to send signals to processes it is not supposed to signal.

SIGUSR1 wasn't among the list of signals supported by `icinga2 internal
signal`, so it is added there.
2025-08-01 11:27:48 +02:00
Julian Brost
a49ec1015d Allow intrusive_ptr<const T> for objects
This allows using ref-counted pointers to const objects. Adds a second typedef
so that T::ConstPtr can be used similar to how T::Ptr currently is.
2025-07-30 16:42:27 +02:00
Julian Brost
ebd4fd1933 Log: don't construct std::ostringstream for no-op messages
This commit removes the existing m_IsNoOp bool and instead wraps the m_Buffer
std::ostringstream into std::optional. Functionally, this is pretty much the
same, with the exception that std::ostringstream is no longer constructed for
messages that will be discarded later.
2025-07-29 10:27:38 +02:00
Julian Brost
6487497665 Log: use std::forward in operator<< and remove overload for const char*
There already is a template operator<< implemented, so far only for const
references though. Changing this to perfectly forward the argument to the
corresponding operator in the underlying std::ostringstring allows handling all
the cases there, removing the need for a separate overload for const char*.
2025-07-29 10:27:38 +02:00
Yonas Habteab
ce3275d27f Disallow stage deletions during reload
Once the new worker process has read the config, it also includes a
`include */include.conf` statement within the config packages root
directory, and from there on we must not allow to delete any stage
directory from the config package. Otherwise, when the worker actually
evaluates that include statement, it will fail to find the directory
where the include file is located, or the `active.conf` file, which is
included from each stage's `include.conf` file, thus causing the worker
fail.

Co-Authored-By: Johannes Schmidt <johannes.schmidt@icinga.com>
2025-07-24 16:02:30 +02:00
Yonas Habteab
1ac4d83963 Use AtomicFile where applicable in ConfigPackageUtility 2025-07-24 10:54:39 +02:00
Yonas Habteab
35f42fa5a3 Handle concurrent config package updates gracefully
Previously, we used a simple boolean to track the state of the package updates,
and didn't reset it back when the config validation was successful because it was
assumed that if we successfully validated the config beforehand, then the worker
would also successfully reload the config afterwards, and that the old worker would
be terminated. However, this assumption is not always true due to a number of reasons
that I can't even think of right now, but the most obvious one is that after we successfully
validated the config, the config  might have changed again before the worker was able
to reload it. If that happens, then the new worker might fail to successfully validate
the config due to the recent changes, in which case the old worker would remain active,
and this flag would still be set to true, causing any subsequent requests to fail with a
`423` until you manually restart the Icinga 2 service.

So, in order to prevent such a situation, we are additionally tracking the last time a reload
failed and allow to bypass the `m_RunningPackageUpdates` flag only if the last reload failed
time was changed since the previous request.
2025-07-24 10:54:39 +02:00
Julian Brost
827f85c327
Merge pull request #10387 from Icinga/cnt-msg
Introduce Endpoint#messages_received_per_type
2025-07-16 17:29:24 +02:00
Julian Brost
1f15f0ff07 JsonEncoder: wrap writer for flushing
This commit intruduces a small helper class that wraps any writer and
provides a flush operation that performs the corresponding action if the
writer is an AsyncJsonWriter and does nothing otherwise.
2025-07-11 16:10:22 +02:00
Yonas Habteab
82b80e24c1 fix comment 2025-07-11 14:05:54 +02:00
Yonas Habteab
cd1ab7548c Rename AsyncJsonWriter::Flush() -> MayFlush() to reflect its usage 2025-07-11 13:55:33 +02:00
Yonas Habteab
89418f38ee JsonEncoder: let the serializer replace invalid UTF-8 characters
Replacing invalid UTF-8 characters beforehand by our selves doesn't make
any sense, the serializer can literally perform the same replacement ops
with the exact same Unicode replacement character (U+FFFD) on its own.
So, why not just use it directly? Instead of wasting memory on a temporary
`String` object to always UTF-8 validate every and each value, we just
use the serializer to directly to dump the replaced char (if any) into
the output writer. No memory waste, no fuss!
2025-07-10 18:09:21 +02:00
Yonas Habteab
dad4c0889f JsonEncoder: lock olock conditionally & flush output regularly 2025-07-10 18:09:21 +02:00
Yonas Habteab
398b5e3193 Implement LockIfRequired() method for Namespace, Dictionary & Array 2025-07-10 18:09:21 +02:00
Yonas Habteab
57726fbb66 Do not require olock on frozen Namespace, Dictionary & Array 2025-07-10 18:09:21 +02:00
Yonas Habteab
2461e0415d Introduce JsonEncode helper function
It's just a wrapper around the `JsonEncoder` class to simplify its usage.
2025-07-10 18:09:21 +02:00
Yonas Habteab
9dd2e2a3ec Introduce JsonEncoder class 2025-07-10 18:09:21 +02:00
Yonas Habteab
1c61bced03 Introduce AsyncJsonWriter output adapter interface 2025-07-09 13:41:15 +02:00
Yonas Habteab
8ef921aa5e Implement bool operator for ObjectLock 2025-07-08 18:24:16 +02:00
Yonas Habteab
4c0628c24d Allow to defer lock on ObjectLock 2025-07-08 18:24:16 +02:00
Yonas Habteab
455d6fcde1 Introduce ValueGenerator class 2025-07-08 18:24:16 +02:00
Julian Brost
0ebcd2662d No longer allow overriding the frozen attribute of containers
The Array, Dictionary, and Namespace types provide a Freeze() method that makes
them read-only. So far, there was the possibility to call some methods with
`overrideFrozen=true` which would then bypass the corresponding check and allow
modification of the data structures nonetheless.

With 24b57f0d3a, this possibility was already
removed from the Namespace type. However, for interface compatibility, it kept
the parameter and just ignores it, throwing an exception on any modification on
a frozen instance.

The only place using `overrideFrozen` was processing of the `-D`/`--define`
command line flag that allows setting additional variables in the DSL. At the
time it is evaluated, there are no user-created data structures yet that could
be frozen, so the only frozen objects that could be encountered are Namespaces
(Icinga doesn't freeze other types by itself) and for these, `overrideFrozen`
already has no effect.

Hence, there is no harm in removing `overrideFrozen` altogether. This
simplifies the code and also means that frozen objects are now indeed read-only
without exceptions, allowing further optimizations regarding locking in the
future.
2025-07-08 14:16:20 +02:00
Chris Malton
ec48dae331 Correct a problem with expiry times not being passed through to AcknowledgeProblem 2025-06-23 11:57:29 +02:00
Julian Brost
1aa62d4bb9
Merge pull request #10420 from Icinga/bundled-perfdata-writers-fix
Serialize fields before queueing them to the workqueue
2025-06-17 10:17:27 +02:00
Johannes Schmidt
82bb636d2b Use WaitGroup to wait for or abort HTTP requests
The wait group gets passed to HttpServerConnection, then down to the
HttpHandlers. For those handlers that modify the program state, the
wait group is locked so ApiListener will wait on Stop() for the
request to complete. If the request iterates over config objects,
a further check on the state of the wait group is added to abort early
and not delay program shutdown. In that case, 503 responses will be
sent to the client.

Additionally, in HttpServerConnection, no further requests than the
one already started will be allowed once the wait group is joining.
2025-06-13 14:48:15 +02:00
Johannes Schmidt
33777f6f3f Disconnect JSON-RPC clients on ApiListner::Stop() 2025-06-13 14:48:15 +02:00
Johannes Schmidt
00802ed9fa Stop ApiListener::ListenerCoroutineProc() when Stop() is called 2025-06-13 14:48:11 +02:00
Johannes Schmidt
157e3750e3 Add IsLockable method to WaitGroup 2025-06-13 14:48:07 +02:00
Julian Brost
b27310fb6c
Merge pull request #10467 from Icinga/icingadb-calceventid-no-double-timestamptomilliseconds
IcingaDB::CalcEventID: No milliseconds as eventTime
2025-06-10 17:09:16 +02:00
Julian Brost
dcd9a2dd41
Merge pull request #10457 from Icinga/remove-superfluous-dsl-functions
Drop superfluous (broken) DSL functions
2025-06-10 16:33:37 +02:00
Julian Brost
c41fe682c5
Merge pull request #10452 from Icinga/streamline-redis-and-db-values
IcingaDB: Make Redis & DB values consistent
2025-06-10 11:30:07 +02:00
Julian Brost
a15b706ee3
Merge pull request #10372 from WuerthPhoenix/10364-race-condition-while-calculating-object-state
Keep object locked until events are dispatched.
2025-06-06 17:19:21 +02:00
Alvar Penning
9cdbbf4ede
IcingaDB::CalcEventID: No milliseconds as eventTime
CalcEventID's internal logic uses the TimestampToMilliseconds function
to convert the given eventTime to milliseconds. Within this function,
the timestamp is capped to prevent an overflow.

On three occasions, the input timestamp given to CalcEventID had already
been converted using TimestampToMilliseconds. The second
TimestampToMilliseconds function then checked the value and always
returned the capped maximum value. Consequently, CalcEventID returned
the same hash value for different timestamps.

This affected SendFlappingChange, SendAcknowledgementSet, and
SendAcknowledgementCleared. For example, when two acknowledgments were
created for the same service, the calculated event_id representing the
history table row would be identical.

Fixes #10465
2025-06-06 16:50:01 +02:00
Julian Brost
4d9e9e7ed6
Merge pull request #9924 from ymartin-ovh/pr-9916
Fix `/v1/actions/` deadlock & nullptr dereference
2025-06-06 15:15:51 +02:00
Julian Brost
4e08adc532
Merge pull request #10161 from Icinga/authBYhost-10157
ApiListener::UpdateObjectAuthority(): distribute auth. by object's host
2025-06-06 15:03:32 +02:00
Yonas Habteab
9e65a8b63b Fix compiler warnings of missing NotificationTypeAll case 2025-06-06 13:31:44 +02:00
Yonas Habteab
186571ec99 Refresh the states & types bitsets whenever states & types attrs change
Since the types and states attributes are user configurable and allowed to change at
runtime, we need to update the actual filter bitsets whenever these attributes change.
Otherwise, the filter bitsets would be stale and not reflect their current state.
2025-06-06 13:31:44 +02:00
Yonas Habteab
fd1927115a IcingaDB: Make is_acknowledged a bool & add is_sticky_acknowledgement field 2025-06-06 13:31:44 +02:00
Yonas Habteab
76d5915b3f IcingaDB: Set notification_histor#type to its string representation
So that Icinga DB (Go) daemon doesn't have to make the mappings again.
2025-06-06 13:31:44 +02:00
Yonas Habteab
7037b18b34 IcingaDB: Send the int representation of states & types filter 2025-06-06 13:31:44 +02:00
Yonas Habteab
ef1c0eb9b3 IcingaDB: Set state_type to hard or soft and not int 2025-06-06 13:31:44 +02:00
Yonas Habteab
953a2e2e96 Merge {host,service}::StateTypeToString() & drop unused StateTypeFromString() 2025-06-06 13:31:44 +02:00
Yonas Habteab
5d11df1abf IcingaDB: Send the string representation of comment#entry_type to Redis 2025-06-06 13:31:44 +02:00
Yonas Habteab
bc5db9834f Drop System#track_parents DSL function
No external user needs to manipulate the actual object dependency
graphs. This was maybe introduced for debugging purposes at that time
but if someone messes with this in prod - good luck with that. Oh, apart
from that it's broken :( and doesn't track parents as its implies but
children.
2025-06-03 17:09:57 +02:00
Yonas Habteab
9577af8e6d Drop Checkable#process_check_result() DSL function
Not sure why it's introduced in the first place, maybe for debugging
purposes at the early stage of Icinga 2 dev but I failed to see an
actual useful use case for it that's worth its maintenance burden. So,
this commit dropped it entirely from the DSL language.
2025-06-03 17:09:57 +02:00
Yannick Martin
e6fc1b91a7
AddComment: return Comment::Ptr instead of String containing the name
Mimic 88e5744d54 and address nullptr on concurrent
  add-comment / remove-comment actions.
2025-06-03 17:00:07 +02:00
Yonas Habteab
3d04eb456a
Merge pull request #10415 from Icinga/abort-no-endpoint-conns
Abort connections with no valid endpoint
2025-06-02 16:21:46 +02:00
William Calliari
9d40de78eb
Address comments from review 2025-05-29 09:16:46 +02:00
William Calliari
abf2fb392b
Keep object locked until events are dispatched. 2025-05-29 09:16:44 +02:00
Julian Brost
c253e7eb6e
Merge pull request #10397 from Icinga/activation-priority-10179
Checkable#ProcessCheckResult(): discard🗑️ CR or delay its producers shutdown
2025-05-28 12:30:40 +02:00
Yonas Habteab
7d2f1c2030 Drop Windows VISTA from the supported platform
Boost `1.88.0` introduced a feature [^1] that makes use of the Windows API, but it
uses API functions that are only available with `PSAPI_VERSION >= 2` and
Windows VISTA only supports `PSAPI_VERSION == 1`. Actually, that new feature
can also be disabled by setting the `BOOST_STACKTRACE_DISABLE_OFFSET_ADDR_BASE`
macro, but since it seems to be a useful feature and isn't even disabled by default,
we can just drop it that ancient Windows version instead of disabling it.

[^1]: https://github.com/boostorg/stacktrace/pull/200
2025-05-28 09:39:03 +02:00
Yonas Habteab
d265329a17
Merge commit from fork
Fix for master
2025-05-27 13:50:26 +02:00
Yannick Martin
ab58ff2928
fix api deadlock that can appears on two simultaneous actions
With this mutex, we can have deadlock in the following case:
    1/ Thread A processes a /v1/actions/acknowledge-problem request and locks the checkable
    2/ Thread B processes a /v1/actions/add-comment and enters first the ConfigItem::ActivateItems() method and locks the static mutex there and starts the just created comment object, which triggers the OnCommentAdded() event.
    3/ Thread A wants to activate the just created ack comment as well but since the mutex is already locked by TB, it blocks.
    4/ Thread B's OnCommentAdded() even dispatch causes the IcingaDB::CommentAddedHandler() to be called and implicitly triggers full state update for the checkable. Now, the state serialization of that checkable (remember that's the same checkable currently locked by TA) also includes computing its severity, thus it calls either service->GetSeverity() or host->GetSeverity(). However, since computing the checkable severity (as of now) requires acquiring the object lock, and boom - they deadlock each other.
2025-05-26 15:05:42 +02:00
Alexander Aleksandrovič Klimov
56d9f38b35
Merge pull request #10456 from Icinga/SharedObject-delete
SharedObject: delete unused methods
2025-05-26 10:00:52 +02:00
Alexander A. Klimov
4f351f625f SharedObject: delete unused methods
None of the derived classes use them, none shall have to explicitly delete them.
2025-05-23 15:47:02 +02:00
Alexander A. Klimov
36743f3100 Checkable#ProcessCheckResult(): discard CR or delay its producers shutdown 2025-05-23 14:53:58 +02:00
Alexander A. Klimov
f4691dd054 Require to pass WaitGroup::Ptr to several methods
Namely:

Checkable#ProcessCheckResult()
ClusterCheckTask::ScriptFunc()
ClusterZoneCheckTask::ScriptFunc()
DummyCheckTask::ScriptFunc()
ExceptionCheckTask::ScriptFunc()
IcingaCheckTask::ScriptFunc()
IfwApiCheckTask::ScriptFunc()
NullCheckTask::ScriptFunc()
PluginCheckTask::ScriptFunc()
RandomCheckTask::ScriptFunc()
SleepCheckTask::ScriptFunc()
IdoCheckTask::ScriptFunc()
IcingadbCheck::ScriptFunc()
CheckCommand#Execute()
Checkable#ExecuteCheck()
ClusterEvents::ExecuteCheckFromQueue()
ExternalCommandProcessor::Process*CheckResult()
ExternalCommandCallback
ExternalCommandProcessor::Execute()
ExternalCommandProcessor::ExecuteFromFile()
ExternalCommandProcessor::ProcessFile()
LivestatusQuery#ExecuteCommandHelper()
LivestatusQuery#Execute()
2025-05-23 14:53:58 +02:00
Alexander A. Klimov
c7cca7b460 Add a StoppableWaitGroup, and join it on #Stop(), to:
ApiListener
CheckerComponent
ExternalCommandListener
LivestatusListener
2025-05-23 14:53:58 +02:00
Alexander A. Klimov
18fb93fc11 Introduce WaitGroup and StoppableWaitGroup 2025-05-23 14:53:58 +02:00
Alexander A. Klimov
77b86bba52 Move l_MyCapabilities -> ApiCapabilities::MyCapabilities 2025-05-23 10:44:16 +02:00
Alexander A. Klimov
6cd83ba2b8 ApiListener::UpdateObjectAuthority(): distribute auth. by object's host
Pin child objects of hosts (HOST!...) to the same endpoint as the host.
This reduces cross-object action latency withing the same host.
If all endpoints know this algorithm, we can use it.
2025-05-23 10:44:16 +02:00
Yonas Habteab
2b0f73987e
Merge pull request #10443 from Icinga/use-correct-timeout-for-command-endpoints
Checkable: Use correct timeout for rescheduling remote checks
2025-05-23 10:14:49 +02:00
Alexander A. Klimov
18f810a1ea For the same zone, call ApiListener::UpdateObjectAuthority() in icinga::Hello
after setting remote capabilities. They'll become important for teamwork in a zone.
2025-05-22 15:01:06 +02:00
Alexander Aleksandrovič Klimov
ec2080dcc1
Merge pull request #9731 from Icinga/fix-compiler-warnings-by-copy-constructing-loop-variables-explicitly
Fix compiler warnings by (copy-)constructing loop variables explicitly or not at all
2025-05-21 14:26:47 +02:00
Alexander A. Klimov
22e75f08fa Fix compiler warnings by not unnecessarily (copy-)constructing loop variables 2025-05-21 11:36:32 +02:00
Julian Brost
4023128be4 VerifyCertificate: Work around issue in OpenSSL < 1.1.0 causing invalid certifcates being treated as valid
Old versions of OpenSSL stored a valid flag in the certificate (see inline code
comment for details) that if already set, causes parts of the verification to
be skipped and return that the certificate is valid, even if it's not actually
signed by the CA in the trust store.

This issue was assigned CVE-2025-48057.
2025-05-21 10:50:12 +02:00
Julian Brost
00864d1096 VerifyCertificate: fix use after free
`X509_STORE_CTX_get_error(csc)` was called after `X509_STORE_CTX_free(csc)`.
This is fixed by automatically freeing variables at the end of the function
using `std::unique_ptr`.
2025-05-21 10:46:25 +02:00
Alexander Aleksandrovič Klimov
5c20b1ae12
Merge pull request #10444 from Icinga/ProcessingResult-NoCheckResult
Remove unused ProcessingResult::NoCheckResult
2025-05-20 12:54:41 +02:00
Alexander A. Klimov
69d2a0442a Remove unused ProcessingResult::NoCheckResult
No one passes a NULL CR to Checkable#ProcessCheckResult() anymore.
2025-05-20 10:41:26 +02:00
Alexander A. Klimov
55fc0e51ff Checkable#process_check_result(): don't pass NULL CR to Checkable#ProcessCheckResult()
It's ignored anyway.
2025-05-20 10:33:20 +02:00
Yonas Habteab
44b3382f5f
Merge pull request #10442 from Icinga/jschmidt/fix-compiler-warnings
Fix two low-hanging fruit compiler warnings
2025-05-19 14:40:55 +02:00
Yonas Habteab
0c2215a9f8 Checkable: Use correct timeout for rescheduling remote checks
Previously, the `command#timeout` which by default is `1m`, was used to reschedule
the just sent remote check. However, this results into a bunch of extra checks being
sent to the remote host, even though the first one is still running. That's because
if one want to override the default timeout of the command for a specific host/service,
one has to set the `checkable#check_timeout` attribute to the desired value. So, this
commit makes sure that the `checkable#check_timeout` attribute (if set) is used to
reschedule the remote check.
2025-05-19 14:07:33 +02:00
Johannes Schmidt
f8d3bacc29 Fix warnings related to enum integer conversion 2025-05-19 12:31:22 +02:00
Yonas Habteab
8a1d9df767
Merge pull request #10070 from Icinga/time-period-schedule-next-check-on-next-transition-9984
If skipped due to time period, schedule next check on next transition
2025-05-19 12:29:09 +02:00
Johannes Schmidt
6a6c494279 Mark MakeName and ParseName virtual methods as override 2025-05-19 11:33:22 +02:00
Yonas Habteab
45c651499b
Merge pull request #10379 from Icinga/set-cancel-time-conditionally
IcingaDB: Sync downtime `cancel_time` conditionally
2025-05-16 12:18:20 +02:00
Yonas Habteab
83a0f9d217
Merge pull request #10361 from Icinga/reset-no-more-notifications-only-on-recovery
Notification: Reset internal states on (missed)recovery
2025-05-16 09:53:10 +02:00
Yonas Habteab
7acec6fc36 IcingaDB: Set downtime cancel_time conditionally
If the downtime ended automatically `cancel_time` should just be `NULL`
instead of a `0` timestamp.
2025-05-16 09:49:58 +02:00
Yonas Habteab
5ea666a7ad IcingaDB: Don't set cancel_time for downtime start event
It's a downtime start event there's now way the downtime could be
cancelled before it even started.
2025-05-16 09:49:16 +02:00
Julian Brost
1a386ad55d
Merge pull request #10265 from Icinga/RedisConnection-spinlock
RedisConnection#Connect(): get rid of spin lock
2025-05-14 15:06:58 +02:00
Yonas Habteab
cef6fb77e5 Serialize fields before queueing the event to the workqueue 2025-05-14 14:42:04 +02:00
Alexander A. Klimov
daeab09334 If skipped due to time period, schedule next check on next transition
and not after yet another check interval. Otherwise checks done every 24h may get suppressed due to being re-scheduled outside time period every 24h.
2025-05-14 12:47:34 +02:00
Alexander A. Klimov
2739f7f189 RedisConnection#Connect(): get rid of spin lock
Instead of IoEngine::YieldCurrentCoroutine(yc) until m_Queues.FutureResponseActions.empty(), async-wait a CV which is updated along with m_Queues.FutureResponseActions.
2025-05-14 12:28:11 +02:00
Alexander A. Klimov
060d8b185e Introduce AsioDualEvent 2025-05-14 12:24:28 +02:00
Yonas Habteab
a589b87d6c Remove unused parameters 2025-05-13 15:31:29 +02:00
Yonas Habteab
2e19fce31d Remove some superfluous if statements
They're just useless, since a `CheckResult` handler is never going to be
called without a check result and a checkable can't exist without a
checkcommand.
2025-05-13 15:31:29 +02:00
Yonas Habteab
d750bff193 Notification: Fix incorrectly dropped recovery & ACK notifications
Previously, recovery and ACK notifications were not delivered to users
who weren't notified about the problem state while having a configured
`Problem` type filter. However, since the type filter can also be
configured on the `Notification` object level, this resulted to an
incorrect behaviour. This PR changes the existing logic so that the
recovery and ACK notifications gets dropped only if the `Problem` filter
is configured on both the `User` and `Notification` object levels.
2025-05-13 09:46:35 +02:00
Alvar Penning
7e65a60a5d
Fix PerfdataValue Counter Parsing
Ensure that the counter unit of measurement, "c", is parsed correctly
for performance data values again.

A prior refactoring in 720a88c29a changed
the parsing logic, resulting in an incorrect behavior for counter units.
By passing the raw input into the l_CsUoMs map first, the "c" UoM is
removed. Moving the explicit counter check before passing the raw unit
into the map resolves this issue.

Fixes #9540.
2025-05-12 16:34:05 +02:00
Yonas Habteab
4596b44171 Reset no_more_notifications on filter mismatch correctly
Previously, if you enable flapping for a Checkable but the corresponding
`Notification` object does not have `FlappingStart` or `FlappingEnd`
types set, the `no_more_notifications` flag wasn't reset to false again.
This commit ensures that this flag is always reset on `Recovery` even
the type filter does not match including when we miss the `Recovery` due
to Flapping state.
2025-05-12 12:03:13 +02:00
Yonas Habteab
9166326876 Notification: Reset notified problem users on flapping end as well 2025-05-12 12:03:13 +02:00
Yonas Habteab
86365a4e2b Notification: Clear last notified state per user on flapping end as well 2025-05-12 12:03:13 +02:00
Yonas Habteab
89f12c2323 Notification: Reset no_more_notifications only on recovery 2025-05-12 12:03:13 +02:00
Julian Brost
b2b47981a5
Merge pull request #10422 from Icinga/mktime-dst-consistency
Ensure consistent mktime() DST behavior across different implementations
2025-04-30 16:51:05 +02:00
Julian Brost
cc48c924ae Load Notification objects after User and UserGroup
Notification objects can refer User and Group objects similar to how they can
refer Host and Service objects, so that dependency feels quite natural. Note
that for evaluating most configuration, this order doesn't really matter, the
configuration will successfully evaluate in either case, the difference can be
noticed mainly in more advanced configurations, for example when dynamically
assigning user based on their groups. When accessing user objects from the
Notification object definition (like in the following example), without this
change, only groups configured directly in groups attribute of User objects are
visible and those added via assign clauses in UserGroup objects are missing.
With this commit, these are also visible.

    apply Notification "n" to Host {
        for (var u in get_objects(User)) {
            log(u.name + " -> " + Json.encode(u.groups))
        }

        # [...]
    }
2025-04-29 12:14:46 +02:00
Alexander A. Klimov
98d097517b Introduce Endpoint#messages_received_per_type 2025-04-29 11:42:14 +02:00
Alexander A. Klimov
3dd7b15808 Count incoming messages per type and endpoint 2025-04-29 11:42:14 +02:00
Alexander A. Klimov
c566f6dc31 ApiFunction: store own name 2025-04-29 11:42:14 +02:00
Alexander A. Klimov
331ba1f661 Rename AsioConditionVariable to AsioEvent
The current implementation is rather similar to Python's threading.Event, than to a CV.
2025-04-29 11:39:42 +02:00
Julian Brost
5404143dee Ensure consistent mktime() DST behavior across different implementations
There are inputs to mktime() where the behavior is not specified and there's
also no single obviously correct behavior. In particular, this affects how
auto-detection of whether DST is in effect is done when tm_isdst = -1 is set
and the time specified does not exist at all or exists twice on that day.

If different implementations are used within an Icinga 2 cluster, that can lead
to inconsistent behavior because different nodes may interpret the same
TimePeriod differently.

This commit introduces a wrapper to mktime(), namely Utility::NormalizeTm()
that implements the behavior provided by glibc. The choice for glibc's behavior
is pretty arbitrary, it was simply picked because most systems that are
officially/fully supported use it (with the only exception being Windows), so
this should give the least possible amount of user-visible changes.

As part of this commit, the closely related helper function mktime_const() is
also moved to Utility::TmToTimestamp() and made a wrapper around the newly
introduced NormalizeTm().
2025-04-28 13:38:55 +02:00
Julian Brost
a65f2d6b41
Merge pull request #10417 from Icinga/inverted-hacluster-check
Fix inverted `IsHACluster` check
2025-04-25 11:01:52 +02:00
Johannes Schmidt
353386f404 Abort verified JSON-RPC connections with no valid endpoint 2025-04-23 16:55:16 +02:00
Yonas Habteab
1da497be89 Fix inverted IsHACluster check 2025-04-23 15:18:06 +02:00
Johannes Schmidt
43f78a4b86 Fix SIGABRT not causing a core dump
A second abort() is needed at the end of `SigAbrtHandler()` to trigger the SIG_DFL action (in this case the core dump).

Also since `AttachDebugger()` disables the ability to dump core, so
it gets reenabled after returning from it.
2025-04-23 09:13:04 +02:00
Alexander A. Klimov
c2ddd20ef3 Fix compiler warnings by (copy-)constructing loop variables explicitly
for (const T& needle : haystack) creates the illusion that haystack is a
container of T and we're just borrowing needle. In these cases that's not true.
2025-04-22 13:55:49 +02:00
Julian Brost
d3fae440d4
SpawnCoroutine: move callback into wrapper lambda
f isn't used otherwise in the function, so if possible, it can just be moved into the lambda, avoiding a copy.

Co-authored-by: Alexander Aleksandrovič Klimov <alexander.klimov@icinga.com>
2025-04-15 15:10:12 +02:00
Julian Brost
d1d399f8b3 Avoid multiple #if in a single function call expression
Simply giving two entire call expressions for either Boost version greatly
improves readability in my opinion.
2025-04-14 17:30:19 +02:00
Julian Brost
ccfc72267f Prefer icinga::String::GetData() over icinga::String::CStr()
Creating the string_view from the std::string (as returned by GetData()) uses
the stored length instead of having to detect it by finding '\0'.
2025-04-14 17:30:19 +02:00
Alexander A. Klimov
fb2b2e2d5b Don't use removed boost::asio::spawn() overload if Boost >= v1.87 2025-04-14 17:30:19 +02:00
Alexander A. Klimov
0662f2b719 In a coroutine, re-throw everything ex. std::exception (and inheritors)
not just boost::coroutines::detail::forced_unwind.

This is needed because as of Boost 1.87, boost::asio::spawn() uses Fiber, not Coroutine v1.
https://github.com/boostorg/asio/commit/df973a85ed69f021

This is safe because every actual exception shall inherit from std::exception. Except forced_unwind and its Fiber equivalent, so that `catch(const std::exception&)` doesn't catch them and only them.
2025-04-14 17:30:19 +02:00
Alexander Aleksandrovič Klimov
011c67964e Don't use boost::asio::io_context::strand method removed in Boost 1.87 2025-04-14 17:30:19 +02:00
Alexander Aleksandrovič Klimov
7bd35d8c6b Don't use boost::asio::ip::tcp::resolver::query
It was removed in Boost 1.87.
2025-04-14 17:30:19 +02:00
Yonas Habteab
9cc3971288
Merge pull request #10352 from Icinga/checkable-checkercomponent-fixed-timestamp-debug-logs
Fixed double output for timestamps in debug log
2025-04-14 12:16:51 +02:00
Alvar Penning
2ce34e8134
Fixed double output for timestamps in debug log
The timestamps used both in the CheckerComponent and Checkable debug
logs were printed in the scientific notation, making them effectively
useless.

> debug/CheckerComponent: Scheduling info for checkable 'host!service' (2025-02-26 14:53:16 +0100): Object 'host!service', Next Check: 2025-02-26 14:53:16 +0100(1.74058e+09).
> debug/Checkable: Update checkable 'host!service' with check interval '300' from last check time at 2025-02-26 14:48:47 +0100 (1.74058e+09) to next check time at 2025-02-26 14:58:12 +0100 (1.74058e+09).

Switching to std::fixed actually shows the complete Unix timestamp.

> debug/CheckerComponent: Scheduling info for checkable 'host!service' (2025-02-26 15:36:44 +0000): Object 'host!service', Next Check: 2025-02-26 15:36:44 +0000 (1740584204).
> debug/Checkable: Update checkable 'host!service' with check interval '60' from last check time at 2025-02-26 15:37:11 +0000 (1740584232) to next check time at 2025-02-26 15:38:09 +0000 (1740584290).
2025-04-14 10:09:42 +02:00
Julian Brost
8d607d2ef7
Merge pull request #10074 from open-i-gmbh/feature/tags-for-elasticsearchwriter-6837
Feature/tags for elasticsearchwriter
2025-04-10 10:11:15 +02:00
Julian Brost
5a6b2044b1
Merge pull request #10290 from Icinga/icingadb-dependencies-sync
Sync dependencies to Redis
2025-04-04 15:13:05 +02:00
Julian Brost
31a224c509 Checkable::GetSeverity(): always take reachability into account
So far, Service::GetSeverity() only considered the state of its own host, i.e.
the implicit service to its own host dependency, and treated it similar to
acknowledgements and downtimes. In contrast, Host::GetSeverity() considered
reachability and treated it like a state, i.e. for the severity calculation,
the host was either up, down, or unreachable.

This commit changes the following things:
1. Make the service severity also consider explicitly configured dependencies
   by using IsReachable().
2. Prefer acknowledgements and downtimes over unreachability in the severity
   calculation so that if an already acknowledged or in-downtime services (i.e.
   already handled service) becomes unreachable, it shouln't become more
   severe.
3. To unify host and service severities a bit, hosts now use the same logic
   that treats reachability more like acknowledgements/downtimes instead of
   like a state (changing the other way around would the state from the check
   plugin would not affect the severity for unrachable services anymore).
2025-03-31 15:23:51 +02:00
Julian Brost
1e05a166f1 Host::GetSeverity(): remove empty line at end of method 2025-03-31 15:23:51 +02:00
Julian Brost
d8271c6568 Host::GetSeverity(): remove explicit unlocking
No change in functionality. The ObjectLock destructor will implicitly release
the locks when returning from the function.
2025-03-31 15:23:51 +02:00
Julian Brost
2ebee010f0 Host::GetHost(): return early to remove a nesting level
No change in functionality. The first two branches actually set the final
return value for the method, so they can just return directly, removing the
need to have the rest of the function inside an else block.
2025-03-31 15:23:51 +02:00
Julian Brost
6443f8997f Host::GetSeverity(): add braces to if statements
No change in functionality, just makes the code a bit nicer.
2025-03-31 15:23:51 +02:00
Julian Brost
c899d52e2f Service::GetSeverity(): remove explicit unlocking
No change in functionality. The ObjectLock destructor will implicitly release
the locks when returning from the function.
2025-03-31 15:23:50 +02:00
Julian Brost
01acfb47a9 Service::GetHost(): return early to remove a nesting level
No change in functionality. The first two branches actually set the final
return value for the method, so they can just return directly, removing the
need to have the rest of the function inside an else block.
2025-03-31 15:23:50 +02:00
Julian Brost
5ca6047b35 Service::GetSeverity(): replace switch with if
No change in functionality, just making the code a bit more compact.
2025-03-31 15:23:50 +02:00
Julian Brost
a1865e1b43 Service::GetSeverity(): simplify nested if, add braces
No change in functionality, just making the code a bit nicer and more compact.
2025-03-31 15:23:50 +02:00
Yonas Habteab
bc2c750551 IcingaDB: Don't stream runtime state updates to Redis 2025-03-26 10:48:37 +01:00
Alexander A. Klimov
a943c4588b Zone#GetEndpoints(): return endpoints in the specified order, not randomly
ApiListener#RelayMessageOne() relays every given message to the first connected endpoint Zone#GetEndpoints() returns. Randomness in combination with bad luck can direct more traffic (from a particular network segment) to one master than the admin wants.

This change lets the Zone#endpoints order prefer one endpoint over the other.
2025-03-25 13:04:41 +01:00
Julian Brost
061338156c
Merge pull request #10345 from Icinga/remove-child-downtimes
ApiActions: Remove child downtimes recursively
2025-03-21 16:37:43 +01:00
Alexander Aleksandrovič Klimov
adde9cc53b
Merge pull request #10222 from Icinga/Registry-cleanup
Clean up Registry class
2025-03-21 11:00:49 +01:00
Julian Brost
065118bc22 Make DependencyGroup::State an enum
The previous struct used two bools to represent three useful states. Make this
more explicit by having these three states as an enum.
2025-03-19 16:28:00 +01:00
Yonas Habteab
864e2aaae0 Drop superfluous mutex lock & don't manually unpack std::tuple 2025-03-19 16:28:00 +01:00
Julian Brost
693d094ebc DependencyGroup: don't change the keys of m_Members after construction
This prevents the use of DependencyGroup for storing the dependencies during
the early registration (m_DependencyGroupsPushedToRegistry = false),
m_PendingDependencies is introduced as a replacement to store the dependencies
at that time.
2025-03-19 16:28:00 +01:00
Yonas Habteab
945a79e37f IcingaDB: Don't send useless dependencies state updates 2025-03-19 16:28:00 +01:00
Yonas Habteab
da637c3741 IcingaDB: Always send dependencies state HSET updates to Redis 2025-03-19 16:28:00 +01:00
Yonas Habteab
21cd5e00fa Dependency: Don't allow to update {period,states,ignore_soft_states} at runtime 2025-03-19 16:28:00 +01:00
Yonas Habteab
a9bb11b16d (Un)register dependencies from parent prior to child Checkable 2025-03-19 16:28:00 +01:00
Yonas Habteab
7fbb8f7452 Evaluate dependency group state only for a specific child
Previously the dependency state was evaluated by picking the first
dependency object from the batched members. However, since the
dependency `disable_{checks,notifications` attributes aren't taken into
account when batching the members, the evaluated state may yield a wrong
result for some Checkables due to some random dependency from other
Checkable of that group that has the `disable_{checks,notifications`
attrs set. This commit forces the callers to always provide the child
Checkable the state is evaluated for and picks only the dependency
objects of that child Checkable.
2025-03-19 16:28:00 +01:00
Julian Brost
ce1ed8556c Simplify DependencyGroup::GetState() implementation
The new implementation just counts reachable and available parents and
determines the overall result by comparing numbers, see inline comments for
more information.

This also fixes an issue in the previous implementation: if it didn't return
early from the loop, it would just return the state of the last parent
considered which may not actually represent the group state accurately.
2025-03-19 16:28:00 +01:00
Yonas Habteab
0ab50fd82d IcingaDB: Process dependencies runtime updates 2025-03-19 16:28:00 +01:00
Yonas Habteab
915ea6427e Use GetParents() in FireSppressedNotifications()
It's way efficient than accessing them through the dependency objects,
plus we won't have any duplicates.
2025-03-19 16:28:00 +01:00
Yonas Habteab
8640a3f84e Checkable: Extract parents directly from dependency groups 2025-03-19 16:28:00 +01:00
Yonas Habteab
806fff950c Checkable: Emit boost signals when changing dependency groups at runtime 2025-03-19 16:28:00 +01:00
Yonas Habteab
67a4889945 Checkable: Delay dependency group global registration on startup 2025-03-19 16:28:00 +01:00
Julian Brost
26f46fe021 Simplify dependency group registration
Co-Authored-By: Yonas Habteab <yonas.habteab@icinga.com>
2025-03-19 16:28:00 +01:00
Yonas Habteab
aed1bb6294 IcingaDB: Introduce ExecuteRedisTransaction() helper method 2025-03-19 16:28:00 +01:00
Yonas Habteab
db3f8dec27 IcingaDB: Sync dependencies initial states on config dump 2025-03-19 16:28:00 +01:00
Yonas Habteab
f502993eb4 IcingaDB: Sync dependencies states to Redis 2025-03-19 16:28:00 +01:00
Yonas Habteab
c6466ee0ea IcingaDB: Dump checkables dependencies config to redis correctly 2025-03-19 15:28:31 +01:00
Richard Mortimer
63926c6e0d
Process: Clean up process table entry even when kill(2) fails with ESRCH (#10375)
* Icinga daemon leaves zombie processes on very busy system

On a very heavily loaded system the process group kill can
be delayed until after the regular TERM signal has caused
the process to exit. In this situation the waitpid call
is valid and reaps the zombie process that would otherwise
be left behind.

* Update AUTHORS file
2025-03-18 11:29:00 +01:00
Alexander A. Klimov
a9e9e14fce Remove unused Registry#Clear() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
4d7361527c Remove unused Registry#RegisterIfNew() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
07b274ec45 Remove unused Registry#Unregister() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
402a6bbf40 Remove unused EventQueue::Unregister() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
d19c0637ee Remove unused EventQueue::UnregisterIfUnused() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
41f61ccba4 Remove unused ApiFunction::Unregister() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
cce03c5903 Remove unused ApiAction::Unregister() 2025-03-18 11:22:56 +01:00
Yonas Habteab
5e902fe4a7
Merge pull request #10380 from Icinga/sync-notified-problem-users-correctly
ClusterEvents: Sync & process notification `notified_problem_users`
2025-03-18 10:27:28 +01:00
Yonas Habteab
66cc6a4d8a ClusterEvents: Sync & process notification notified_problem_users 2025-03-14 14:13:55 +01:00
Yonas Habteab
3d761c0296 ApiActions: Remove child downtimes recursively
Services downtimes scheduled via the `all_services` flag get already
removed automatically when removing their parent downtimes (introduced
with #8913). Now, this commit makes it possible to perform the same actions
for all child downtimes, i.e. not only for those of service objects, but
for all child objects represented in the dependency tree.
2025-03-13 12:13:45 +01:00
Yonas Habteab
fa63fda75b ApiListener: Simplify deferred SSL shutdown in NewClientHandlerInternal() 2025-03-13 12:12:28 +01:00
Yonas Habteab
4bfaefadfa IcingaDB: Bump expected redis version to 6 2025-03-12 16:32:01 +01:00
Yonas Habteab
d094581b4b Checkable: Use redundancy groups state in IsReachable 2025-03-12 16:32:01 +01:00
Yonas Habteab
27f11a0955 Checkable: Introduce HasAnyDependencies() method 2025-03-12 16:32:01 +01:00
Yonas Habteab
ff0dabe287 Checkable: Store dependencies grouped by their redundancy group 2025-03-12 16:31:59 +01:00
Yonas Habteab
1820955993 Add DependencyGroup::GetState() helper method 2025-03-12 16:31:14 +01:00
Yonas Habteab
d7c9e6687e Introduce DependencyGroup helper class 2025-03-12 16:31:12 +01:00
Yonas Habteab
93d9fad565 Checkable: Drop unused failedDependency argument from IsReachable() 2025-03-12 16:19:22 +01:00
Julian Brost
67664ad7b7 Checkable::GetAllChildrenInternal: remove redundant emplace call
`checkable` is already added to the set by the insert call above, so calling
emplace for the same checkable doesn't do anything useful and can be removed.
2025-03-12 16:19:22 +01:00
Yonas Habteab
c465f45200 Rewrite Checkable::GetAllChildrenInternal() method
The previous wasn't per-se wrong, but it was way too inefficient. With
this commit each and every Checkable is going to be visited only once,
and we won't traverse the same Checkable's children multiple times
somewhere in the dependency chain.
2025-03-12 16:19:22 +01:00
Yonas Habteab
e0ce0ccff6 Activate Dependency objects before their parent objects 2025-03-12 16:19:22 +01:00
Yonas Habteab
c02b9d74a9 IcingaDB: Send reachablity state updates for all children 2025-03-12 16:19:22 +01:00
Yonas Habteab
772420a438 Checkable: Don't always trigger reachablity changed signal
But only when the current check result being processed affects the child
Checkables in any way.
2025-03-12 16:19:22 +01:00
Yonas Habteab
c64ae1af0f Dependency: Don't allow to change redundancy_group at runtime
Otherwise, it would require too much code changes to properly handle
redundancy group runtime modification in Icinga DB for no real benefit.
2025-03-12 16:19:22 +01:00
Yonas Habteab
6321606671 IcingaDB: Sync affects_children as part of runtime state updates 2025-03-12 16:19:22 +01:00
Yonas Habteab
297b62d841 IcingaDB: Add affected_children to Host/Service Redis updates 2025-03-12 16:19:22 +01:00
Yonas Habteab
d6b289e1cd Checkable: Introduce GetAllChildrenCount() method
The previous limit (32) doesn't seem to make sense, and appears to be some random number.
So, this limit is set to 256 to match the limit in IsReachable().
2025-03-12 16:19:22 +01:00
Alvar Penning
ef93f945a2 IcingaDB: Start keeping track of Host/Service to Dependency relationship
This does not work in this state!
Trying to refresh Dependency if a Host or Service being member
of this Dependency has a state change.
2025-03-12 16:19:22 +01:00
Julian Brost
e6ad2199fc
Merge pull request #10360 from Icinga/dependency-cycle-detection
Rework dependency cycle detection
2025-03-12 15:58:44 +01:00
Julian Brost
8e7e687b96 Unify depependency cycle check code.
This commit removes a distinction in how dependency objects are checked for
cycles in the resulting graph depending on whether they are part of the
initially loaded configuration during process startup or as part of a runtime
update.

The DependencyCycleChecker helper class is extended with a mechanism that
allows additional dependencies to be considered during the cycle search. This
allows using it to check for cycles before actually registering the
dependencies with the checkables.

The aforementioned case-distinction for initial/runtime-update config is
removed by making use of the newly added BeforeOnAllConfigLoaded signal to
perform the cycle check at once for each batch of dependencies inside
ConfigItem::CommitNewItems() for both cases now. During the initial config
loading, there can be multiple batches of dependencies as objects from apply
rules are created separately, so parts of the dependency graph might be visited
multiple times now, however that is limited to a minimum as only parts of the
graph that are reachable from the newly added dependencies are searched.
2025-03-12 11:53:30 +01:00
Julian Brost
c1b270f39f Rework dependency cycle check
This commit groups a bunch of structs and static functions inside
dependency.cpp into a new DependencyCycleChecker helper class. In the process,
the implementation was changed a bit, the behavior should be unchanged except
for a more user-friendly error message in the exception.
2025-03-12 11:53:30 +01:00
Julian Brost
500ad70b8c Implement std::hash<boost::intrusive_ptr<T>> for old Boost versions
Boost only implements it iself starting from version 1.74, but a specialization
of std::hash<> can be added trivially to allow the use of
std::unordered_set<boost::intrusive_ptr<T>> and
std::unordered_map<boost::intrusive_ptr<K>, V>.

Being unable to use such types already came up a few types in the past, often
resulting in the use of raw pointer instead which always involves an additional
"is this safe?"/"could the object go out of scope?" discussion. This commit
simply solves this for the future by simply allowing the use of intrusive_ptr
in unordered containers.
2025-03-12 11:53:30 +01:00
Julian Brost
4b18f62a11 Add ConfigType::BeforeOnAllConfigLoaded signal
Allows to hook into the config loading process just before OnAllConfigLoaded()
is called on a bunch of individual config objects. Allows doing some operations
more efficiently at once for all objects.

Intended use: when adding a number of dependencies, it has to be checked
whether this uses any cycles. This can be done more efficiently if all
dependencies are checked at once. So far, this is with a case-distinction for
initially loaded files in DaemonUtility::LoadConfigFiles() and for dependencies
created by runtime updates in Dependency::OnAllConfigLoaded(). The mechanism
added by this commit allows to unify the handling of both cases (done in a
following commit).
2025-03-12 11:53:30 +01:00
Yonas Habteab
206d7cda1b
Merge pull request #10359 from Icinga/do-not-publish-useless-stats
IcingaDB: Don't publish useless data to Redis
2025-03-07 12:51:10 +01:00
Yonas Habteab
3e9292a349 Value: Add a specialized rvalue reference of Get()
The move `String(Value&&)` constructor tries to partially move `String`
values from a `Value` type. However, since there was no an appropriate
`Value::Get<T>()` implementation that binds to the requested move
operation, the compiler will actually not move the value but copy it
instead as the only available implementation of `Value::Get<T>()`
returns a const reference `const T&`. This commit adds a new overload
that returns a non-const reference and allows to optionally move the string
value of a Value type.
2025-03-07 10:16:31 +01:00
Yonas Habteab
6a888e1494 String: Mark move constructor & assignment op as noexcept
The Icinga DB code performs intensive operations on certain STL containers,
primarily on `std::vector<String>`. Specifically, it inserts 2-3 new elements
at the beginning of a vector containing thousands of elements. Without this commit,
all the existing elements would be unnecessarily copied just to accommodate the new
elements at the front. By making this change, the compiler is able to optimize STL
operations like `push_back`, `emplace_back`, and `insert`, enabling it to prefer the
move constructor over copy operations, provided it is guaranteed that no exceptions
will be thrown.
2025-03-06 13:02:40 +01:00
Yonas Habteab
6ca0611f3d IcingaDB: Don't publish useless data to Redis
The Icinga DB daemon processes the data from the `IcingaApplication`
type only and Icinga DB Web also uses only those stats. However, before
this commit, Icinga DB published all kinds of useless stats to Redis
each second, like the number of (un)reachable hosts, services, and so
on, which is waste of CPU and some other resources. This commit reduces
the published data drastically to only those simple stats coming from
the `IcingaApplication` type.
2025-03-04 17:34:38 +01:00
Julian Brost
21c9ad5323
Merge pull request #10332 from Icinga/do-not-close-connection-in-request-cert-handler
Don't abruptly close anonymous connections
2025-02-04 10:58:17 +01:00
Alexander Aleksandrovič Klimov
065dfe4c40
Merge pull request #9928 from Icinga/no-data-received-on-new-api-connection
API: also log error behind "No data received on new API connection"
2025-02-03 15:39:26 +01:00
Yonas Habteab
25bbac1677 Don't abruptly close anonymous connections
This was mistakenly introduced with PR #7686 due to too many open
connections (#7680). This was wrong in the sense that closing the
connection is simply out of place here and should have been handled
differently. After we revised the RPC connection disconnect procedure
with `v2.14.4`, it becomes clear why it is wrong, because the connection
is closed abruptly before the corresponding response (`result`) has
even been written. Now if you remove the disconnect here, shouldn't the
issue #7680 occur again, you ask? The answer is no, because we now also
have a maximum timeout of `10s` for anonymous connections, after which
they are automatically closed. Thanks to the introduction of this
timeout by @julianbrost in #8479, this `Disconnect()` call has become
superfluous.
2025-01-30 17:45:27 +01:00
Julian Brost
51c6a58657
Merge pull request #9943 from Icinga/renegotiation-openbsd
Disable TLS renegotiation and fix compile error on OpenBSD
2025-01-30 15:50:07 +01:00
Alexander A. Klimov
e1a4390b9c Fix compile error on OpenBSD which has no SSL_OP_NO_RENEGOTIATION 2025-01-29 17:42:10 +01:00
Alexander A. Klimov
411c57aac5 API: also log error behind "No data received on new API connection" 2025-01-24 11:28:16 +01:00
Julian Brost
78883669d3
Merge pull request #8169 from Icinga/bugfix/object-query-all-attrs-8167
GET /v1/objects/*: handle "attrs":[] as expected
2025-01-24 09:14:17 +01:00
Sebastian Grund
7d12c1a524 Add tags functionality to ElasticSearchWriter 2025-01-24 08:51:53 +01:00
Alexander A. Klimov
e18c923abb GET /v1/objects/*: handle "attrs":[] as expected
... i.e. yield no attrs and not all.

refs #8167
2025-01-21 11:36:55 +01:00
Alexander Aleksandrovič Klimov
866db3ba3c
Merge pull request #10137 from Icinga/win-progfiles-icinga2-var
On Windows, don't create C:\Program Files\Icinga2\var during MSI build
2025-01-16 12:02:33 +01:00
Julian Brost
4ffe88e263
Merge pull request #9732 from Icinga/silence-compiler-warnings-in-code-we-don-t-maintain
Silence compiler warnings in code we don't maintain
2025-01-15 16:33:24 +01:00
Alexander A. Klimov
6195a457a7 Silence compiler warnings in code we don't maintain 2025-01-14 11:48:33 +01:00
Julian Brost
1f047ebbf5
Merge pull request #10058 from Icinga/error-timestamp-out-of-range-53323
Ido*sqlConnection#FieldToEscapedString(): don't write out of range time
2025-01-14 09:43:37 +01:00
Julian Brost
55829c4f55
Merge pull request #10077 from RincewindsHat/reject_invalid_perfdata
Reject infinite performance data values
2025-01-13 12:00:12 +01:00
Julian Brost
fb50e4b1f1
Merge pull request #10188 from Icinga/icingadb-heartbeat-both-responsible
IcingaDB Check: Multiple Responsible Instances
2025-01-13 11:56:19 +01:00
Lorenz Kästle
e7381193c8
Reject infinite performance data values
Some fault monitoring plugins may return "inf" or "-inf" as
values due to a failure to initialize or other errors.

This patch introduces a check on whether the parse value is infinite
(or negative infinite) and rejects the data point if that is the case.

The reasoning here is: There is no possible way a value of "inf" is ever
a true measuring or even useful. Furthermore, when passed to the
performance data writers, it may be rejected by the backend and lead
to further complications.
2025-01-09 11:46:34 +01:00
Yonas Habteab
1425641931 Don't endlessly wait on writer coroutine on disconnect 2025-01-08 16:30:36 +01:00
Yonas Habteab
41373ad0e5 Log before & after an RPC client is disconnected 2025-01-08 16:30:36 +01:00
Yonas Habteab
3af7cfe2ec JsonRpcConnection: Don't drop client from cache prematurely
PR #7445 incorrectly assumed that a peer that had already disconnected
and never reconnected was due to the endpoint client being dropped after
a successful socket shutdown. However, the issue at that time was that
there was not a single timeout guards that could cancel the `async_shutdown`
call, petentially blocking indefinetely. Although removing the client from
cache early might have allowed the endpoint to reconnect, it did not
resolve the underlying problem. Now that we have a proper cancellation
timeout, we can wait until the currently used socket is fully closed
before dropping the client from our cache. When our socket termination
works reliably, the `ApiListener` reconnect timer should attempt to
reconnect this endpoint after the next tick. Additionally, we now have
logs both for before and after socket termination, which may help
identify if it is hanging somewhere in between.
2025-01-08 16:30:36 +01:00
Alexander A. Klimov
8f72891228 Document Timeout 2025-01-07 18:20:54 +01:00
Alexander A. Klimov
3ca7ff7bf4 Timeout: explicitly delete #Timeout(const Timeout&), #Timeout(Timeout&&), #operator=(const Timeout&), #operator=(Timeout&&) 2025-01-07 18:20:52 +01:00
Alexander A. Klimov
27e0e236cb Move Timeout instances from heap to stack 2025-01-07 18:20:50 +01:00
Alexander A. Klimov
d77d7506f1 Don't call Timeout#Cancel() where Timeout#~Timeout() is called 2025-01-07 18:20:14 +01:00
Alexander A. Klimov
959b162913 Timeout#~Timeout(), #Cancel(): support boost::asio::io_context running on multiple threads 2025-01-07 18:19:42 +01:00
Alexander A. Klimov
cb51649363 Timeout#Timeout(): drop unnecessary template parameters 2025-01-07 18:19:39 +01:00
Alexander A. Klimov
d2285bcf0e While using Timeout, don't unnecessarily keep the strand alive via smart pointer 2025-01-07 18:18:46 +01:00
Alexander A. Klimov
faaeb4eb2e Timeout: use a plain callback, not an unnecessary coroutine 2025-01-07 18:18:24 +01:00
Alexander A. Klimov
92ab913226 Timeout#Timeout(): don't pass yield_context to callback
It's not used. Also, the callback shall run completely at once. This ensures that it won't (continue to) run once another coroutine on the strand calls Timeout#Cancel().
2025-01-07 18:18:18 +01:00
Julian Brost
880632b93a
Merge pull request #9861 from ymartin-ovh/issue-9752
icinga2: address comment loading where host reference is not found
2025-01-07 14:12:03 +01:00
Julian Brost
cf125dd8d5 Simplify DependencyGraph:RemoveDependency() method 2025-01-07 11:07:46 +01:00
Yonas Habteab
ff0e12e6ac ApiListener: Sync runtime configs in order 2025-01-07 11:07:46 +01:00
Yonas Habteab
015374e69d DependencyGraph: Allow lookups by parent & child dependencies 2025-01-07 11:07:46 +01:00
Alexander Aleksandrovič Klimov
383773eb2b
Merge pull request #10264 from Icinga/DependencyGraph-ConfigObject
DependencyGraph: use ConfigObject*, not Object*
2024-12-18 13:36:56 +01:00
Alexander A. Klimov
3a09cf72d6 DependencyGraph: use ConfigObject*, not Object*
This saves dynamic_cast<ConfigObject*> + if() on every item of GetChildren().
2024-12-17 18:33:05 +01:00
Julian Brost
452386cdb6
Merge pull request #10005 from Icinga/graceful-tls-disconnect
Add a dedicated method for disconnecting TLS connections
2024-12-12 16:20:14 +01:00
Julian Brost
3642ca3369
Merge pull request #10263 from Icinga/DependencyGraph-parent-child
DependencyGraph: switch "parent" and "child" terminology
2024-12-12 15:13:08 +01:00
Julian Brost
a506d562ae Add comment for remaining uses of async_shutdown() why it's safe
The reason for introducing AsioTlsStream::GracefulDisconnect() was to handle
the TLS shutdown properly with a timeout since it involves a timeout. However,
the implementation of this timeout involves spwaning coroutines which are
redundant in some cases. This commit adds comments to the remaining calls of
async_shutdown() stating why calling it is safe in these places.
2024-12-12 12:10:59 +01:00
Julian Brost
e6d103d0dd HttpServerConnection: use AsioTlsStream::GracefulDisconnect()
This new helper function has proper timeout handling which was missing here.
2024-12-12 12:10:59 +01:00
Julian Brost
007e3fbe7e JsonRpcConnection: use AsioTlsStream::GracefulDisconnect()
This new helper functions allows deduplicating the timeout handling for
`async_shutdown()`.
2024-12-12 12:10:59 +01:00
Julian Brost
56d5811283 AsioTlsStream: add GracefulDisconnect() and ForceDisconnect()
Calling `AsioTlsStream::async_shutdown()` performs a TLS shutdown which
exchanges messages (that's why it takes a `yield_context`) and thus has the
potential to block the coroutine. Therefore, it should be protected with a
timeout. As `async_shutdown()` doesn't simply take a timeout, this has to be
implemented using a timer. So far, these timers are scattered throughout the
codebase with some places missing them entirely. This commit adds helper
functions to properly shutdown a TLS connection with a single function call.
2024-12-12 12:10:59 +01:00
Alexander A. Klimov
188ba53b74 DependencyGraph: switch "parent" and "child" terminology
The .ti files call `DependencyGraph::AddDependency(this, service.get())`. Obviously, `service.get()` is the parent and `this` (Downtime, Notification, ...) is the child. The DependencyGraph terminology should reflect this not to confuse its future users.
2024-12-04 10:57:30 +01:00
Alexander Aleksandrovič Klimov
8f51f54f19
Merge pull request #10221 from Icinga/Al2Klimov-patch-7
JsonRpcConnection: don't write new messages on shutdown
2024-11-29 09:24:10 +01:00
Yonas Habteab
4564c068fe JsonRpcConnection: Log message processing time stats
Co-Authored-By: Julian Brost <julian.brost@icinga.com>
2024-11-27 09:57:38 +01:00
Yonas Habteab
e0b053cbe1 HttpServerConnection: Log noticable CPU semaphore wait time 2024-11-27 09:57:38 +01:00
Yonas Habteab
3218908595
Merge pull request #10214 from Icinga/useless-http-coroutines
HttpServerConnection: Don't spawn useless coroutines
2024-11-19 15:53:54 +01:00
Yonas Habteab
2931aea9bb
Merge pull request #7818 from Icinga/bugfix/no_more_notifications-7758
Don't set Notification#no_more_notifications on custom notifications
2024-11-15 14:43:12 +01:00
Alexander A. Klimov
35a705752f Don't set Notification#no_more_notifications on custom notifications 2024-11-15 13:03:22 +01:00
Alvar Penning
0bbe7a9b2f
IcingaDB Check: Multiple Responsible Instances
By design, only one Icinga 2 instance should be responsible in the HA
context. If this promise is broken, the Icinga 2 IcingaDB check should
report it.

The code did not check for invalid data in icingadb:telemetry:heartbeat.
With this change, it will go CRITICAL with a descriptive message and
report the actual number of icingadb_responsible_instances in the
performance data.
2024-11-15 12:56:45 +01:00
Yonas Habteab
5c0f9bfdaa HttpServerConnection: Don't spawn useless coroutines
Currently, for each `Disconnect()` call, we spawn a coroutine, but every
one of them is just usesless, except the first one. However, since all
`Disconnect()` usages share the same asio strand and cannot interfere
with each other, spawning another coroutine within `Disconnect()` isn't
even necessary. When a coroutine calls `Disconnect()` now, it will
immediately initiate an async shutdown of the socket, potentially causing
the coroutine to yield and allowing the others to resume. Therefore, the
`m_ShuttingDown` flag is still required by the coroutines to be checked
regularly.
2024-11-14 16:47:01 +01:00
Yonas Habteab
d68ee3fcf8
Merge pull request #10224 from Icinga/Empty-constant
Make icinga::Empty constant to prevent accidental changes
2024-11-14 10:35:36 +01:00
Julian Brost
67175c43c0
Merge pull request #10102 from Icinga/icingadb-redis-username
Icinga DB: Config no_user_modify and Support Redis username authentication
2024-11-12 17:04:20 +01:00
Julian Brost
5817e7666b
Merge commit from fork
Security: fix TLS certificate validation bypass
2024-11-12 15:01:57 +01:00
Alexander A. Klimov
09160ea9eb Make icinga::Empty constant to prevent accidental changes 2024-11-11 16:31:04 +01:00
Alexander Aleksandrovič Klimov
aa7f159a0f
JsonRpcConnection: don't write new messages on shutdown
In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.
2024-11-07 17:32:12 +01:00
Alexander Aleksandrovič Klimov
9a8620d923
Merge pull request #10213 from Icinga/do-not-read-data-on-disconnect
JsonRpcConnection: Don't read any data on shutdown
2024-11-07 12:32:02 +01:00
Alexander Aleksandrovič Klimov
fb64c4f057
Atomic#Atomic(): remove superfluous atomic write 2024-11-06 11:37:02 +01:00
Alexander Aleksandrovič Klimov
a77259adc1
Atomic<T>#Atomic(T): fix C++ compliance
by not calling `std::atomic<T>::atomic(void)`.

After the latter the instance "does not contain a T object, and its only valid uses are destruction and initialization by std::atomic_init" which we don't call. So the only safe option is `std::atomic<T>::atomic(T)`.

https://en.cppreference.com/w/cpp/atomic/atomic/atomic
2024-11-05 13:15:22 +01:00
Yonas Habteab
1c34610a78 JsonRpcConnection: Don't read any data on shutdown
When the `Desconnect()` method is called, clients are not disconnected
immediately. Instead, a new coroutine is spawned using the same strand
as the other coroutines. This coroutine calls `async_shutdown` on the
TCP socket, which might be blocking. However, in order not to block
indefintely, the `Timeout` class cancels all operations on the socket
after `10` seconds. Though, the timeout does not trigger the handler
immediately; it creates spawns another coroutine using the same strand
as in the `JsonRpcConnection` class. This can cause unexpected delays if
e.g. `HandleIncomingMessages` gets resumed before the coroutine from the
timeout class. Apart from that, the coroutine for writing messages uses
the same condition, making the two symmetrical.
2024-10-31 17:09:13 +01:00
Yonas Habteab
d894792c36
Merge pull request #10209 from Icinga/log-error-context-only-once
ApiListener: Log error context only once
2024-10-31 13:14:42 +01:00
Alexander Aleksandrovič Klimov
5f487aff1b
Merge pull request #10201 from Icinga/Validation-failed
Remove redundant "Validation failed" prefix from ValidationError exceptions
2024-10-31 12:30:39 +01:00
Yonas Habteab
8574357443 ApiListener: Log error context only once
When logging at the warning level, the logger will automatically look up
for registered context and append them to the log entry accordingly.
2024-10-30 16:55:13 +01:00
Yonas Habteab
e8b7baa298 JsonRpcConnection: Drop unused m_NextHeartbeat variable 2024-10-30 14:31:48 +01:00