icingadb

mirror of https://github.com/Icinga/icingadb.git synced 2026-05-28 04:35:54 -04:00

Author	SHA1	Message	Date
Eric Lippmann	7c068d4adf	Use `icinga-go-library`	2024-05-24 09:56:28 +02:00
Eric Lippmann	c070615e64	Move Redis related code to `redis`	2024-05-22 11:51:22 +02:00
Eric Lippmann	77ccdfc303	Move type related utility functions from `internal` to `types`	2024-05-22 11:51:21 +02:00
Eric Lippmann	5029e328c8	Unify notation of `n * time.Duration`	2024-04-11 13:01:31 +02:00
Alvar Penning	779afd1da3	Enhance HA "Taking over", "Handing over" logging The reason for a switch in the HA roles was not always directly clear. This change now introduces additional debug logging, indicating the reasoning for either taking over or handing over the HA responsibility. First, some logic was moved from the SQL query selecting active Icinga DB instances to Go code. This allowed distinguishing between no available responsible instances and responsible instances with an expired heartbeat. As the HA's peer timeout is logically bound to the Redis timeout, it will now reference this timeout with an additional grace timeout. Doing so eliminates a race between a handing over and a "forceful" take over. As the old code indicated a takeover on the fact that no other instance is active, it will now additionally check if it is already being the active/responsible node. In this case, the takeover logic - which will be interrupted at a later point as the node is already responsible - can be skipped. Next to the additional logging messages, both the takeover and handover channel are now transporting a string to communicate the reason instead of an empty struct{}. By doing so, both the "Taking over" and "Handing over" log messages are enriched with reason. This also required a change in the suppressed logging handling of the HA.realize method, which got its logging enabled through the shouldLog parameter. Now, there are both recurring events, which might be suppressed, as well as state changing events, which should be logged. Therefore, and because the logTicker's functionality was not clear to me on first glance, I renamed it to routineLogTicker. While dealing with the code, some function signature documentation were added, to ease both mine as well as the understanding of future readers. Additionally, the error handling of the SQL query selecting active Icinga DB instances was changed slightly to also handle wrapped sql.ErrNoRows errors. Closes #688.	2024-04-02 13:23:11 +02:00
Eric Lippmann	e31b101f4f	Upgrade `go-redis` to `v9` Co-Authored-By: Alvar Penning <alvar.penning@icinga.com>	2024-03-22 15:32:15 +01:00
Alexander A. Klimov	5a79a72ff5	Heartbeat#sendEvent(m): nil-check m before dereferencing it as it can be nil.	2023-01-19 16:55:11 +01:00
Eric Lippmann	cd96f0de6f	Block XREADs for a maxium of one second I just had the observation that blocking XREADs without timeouts (BLOCK 0) on multiple consecutive Redis restarts and I/O timeouts exceeds Redis internal retries and eventually leads to fatal errors. @julianbrost looked at this for clarification, here is his finding: go-redis only considers a command successful when it returned something, so a successfully started blocking XREAD consumes a retry attempt each time the underlying Redis connection is terminated. If this happens often before any element appears in the stream, this error is propagated. (This also means that even with this PR, when restarting Redis often enough so that a query never reaches the BLOCK 1sec, this would still happen.) https://github.com/Icinga/icingadb/pull/504#issuecomment-1164589244	2022-06-28 16:09:29 +02:00
Alexander A. Klimov	e1ff704aff	Write own heartbeat into icingadb:telemetry:heartbeat including version, current DB error and HA status quo.	2022-06-23 18:31:45 +02:00
Eric Lippmann	ccda48234e	Use custom logger for accessing the interval for periodic logging	2021-11-05 17:57:22 +01:00
Eric Lippmann	8ce917d45a	Remove waiting for heartbeat message If a heartbeat is pending, we log it every 60 seconds anyway.	2021-11-05 17:52:11 +01:00
Eric Lippmann	8a03745273	Speak of Icinga heartbeat not Icinga 2 heartbeat	2021-11-05 17:18:03 +01:00
Julian Brost	9b02b18f46	Use new environment ID https://github.com/Icinga/icinga2/pull/9036 introduced a new environment ID for Icinga DB that's written to the icinga:stats stream as field "icingadb_environment". This commit updates the code to make use of this ID instead of the one derived from the Icinga 2 Environment constant.	2021-11-03 15:47:38 +01:00
Julian Brost	217ab03e59	heartbeat: wrap messages with a timestamp Track when a heartbeat was received to allow other components to check when it will expire.	2021-10-04 16:58:35 +02:00
Julian Brost	8b2cb3acb8	heartbeat: use a single channel for all beat/loss events Using Cond does not allow to reliably catch all events as one will only receive events that occour after starting to listen. For heartbeat loss events it's import to reliably catch them to not remain in an HA active state incorrectly. fixes #360	2021-10-04 16:36:09 +02:00
Julian Brost	17321cdfc3	Fix use of wrong log function on heartbeat loss Has to use the Warnw function as it passes additional zap attributes.	2021-09-23 09:27:26 +02:00
Eric Lippmann	0b1610c69b	Use cancelCtx() instead of just cancel()	2021-08-09 10:29:47 +02:00
Eric Lippmann	725e70f0b9	Pointer receivers, Cond usage, pass ctx and Godoc for Heartbeat Heartbeat now uses pointer receivers for its methods because some methods actually change the heartbeat values. The context is no longer stored in the structure, but passed to the controller loop. The beat and the lost channels are replaced by Cond and the last heartbeat is stored independently to not be affected by a slow HA receiver. If the database connections are occupied by the config, HA cannot update the instance and does not read from the beat channel in time. In addition, heartbeat errors are no longer swallowed, but handled in HA.	2021-07-20 10:17:05 +02:00
Eric Lippmann	e12425d8dc	Wrap errors	2021-06-21 12:13:24 +02:00
Alexander A. Klimov	35349262ce	Use time.NewTicker(), not time.Tick()	2021-05-28 14:24:36 +02:00
Eric Lippmann	372f5cae7c	Also log environment info	2021-05-25 16:25:04 +02:00
Noah Hilverling	44c734f72d	Improve database and HA logging	2021-05-25 09:49:48 +02:00
Alexander A. Klimov	1026d4cabf	Wrap Redis errors	2021-05-19 11:57:58 +02:00
Alexander A. Klimov	4dffbad76e	Make channels more specific	2021-03-15 16:34:58 +01:00
Eric Lippmann	77267fa60c	Introducte type icingaredis.Heartbeat	2021-03-04 00:49:23 +01:00

25 commits