Commit graph

602 commits

Author SHA1 Message Date
Alexander A. Klimov
b84fc62112 Simplify BulkChunkSplitPolicy
refs #136
2022-03-10 17:04:22 +01:00
Alexander A. Klimov
8da164c50f Throw BulkChunkSplitPolicy#Reset() away
refs #136
2022-03-10 17:04:22 +01:00
Alexander A. Klimov
10a70e8b71 Pass BulkChunkSplitPolicyFactory-s, not BulkChunkSplitPolicy-s
refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
3eb14274dd DB#BuildDeleteStmt(): don't pre-rebind ?s
refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
7f4b895ea9 NamedBulkExec(): allow custom BulkChunkSplitPolicy
By the way avoid duplicate rows in the same upsert chunk to avoid Postgres
error 21000 (ON CONFLICT DO UPDATE command cannot affect row a second time).

refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
db2c3af769 BulkEntities(): allow custom BulkChunkSplitPolicy
refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
d4f2c13d9c NewEntityBulker(): allow custom BulkChunkSplitPolicy
refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
3854424a91 Introduce SplitOnDupId 2022-03-10 17:04:21 +01:00
Alexander A. Klimov
6b59f2e47c Introduce NeverSplit 2022-03-10 17:04:21 +01:00
Alexander A. Klimov
dccf02e11d Introduce BulkChunkSplitPolicy 2022-03-10 17:04:21 +01:00
Alexander A. Klimov
5b87fd94ee Don't re-invent sqlx.DB#Rebind()
refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
908bb42004 Postgres: upsert: only handle primary key conflicts
refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
eca23a95ed DB#Build*Stmt(): quote table names
Rationale: see https://dba.stackexchange.com/q/73136

refs #136
2022-03-10 17:04:21 +01:00
Alexander A. Klimov
1c3bfcf99d Support Postgres
refs #136
2022-03-09 18:49:45 +01:00
Alexander A. Klimov
ad895b560d Handle Postgres-specific errors
refs #136
2022-03-09 18:49:45 +01:00
Alexander A. Klimov
aed0e9124f Avoid SQL syntax Postgres doesn't understand
refs #136
2022-03-09 18:49:45 +01:00
Alexander A. Klimov
879b683041 Add previous_soft_state to host_state and service_state
refs #437
2022-02-17 13:48:17 +01:00
Eric Lippmann
b65e427232
Merge pull request #432 from Icinga/bugfix/crash-after-interuption-of-mysql-connection-427
HA#realize(): re-try transaction on temporary error
2022-02-03 16:58:29 +01:00
Eric Lippmann
bf92290507 Don't embed TLS config
When embedding TLS, the field name is implicitly specified as TLS,
which conflicts with the YAML key tls,
which currently does not allow TLS to be enabled.
I'm not entirely sure if this is a bug in goccy/go-yaml,
but if a struct field name other than TLS is specified,
the configuration works again.
2022-02-03 13:10:16 +01:00
Eric Lippmann
59cf6ccc41 Rename TLS.Tls to TLS.Enable
Enable is more self-documenting.
2022-02-03 10:03:46 +01:00
Alexander A. Klimov
943133a4ac HA#realize(): re-try transaction on temporary error
and don't just crash.

refs #427
2022-02-02 13:23:37 +01:00
Noah Hilverling
12b99eda89 Make sure Redis connection pool is large enough
The default Redis PoolSize is 10 * CPU, which isn't enough to run Icinga DB on machines with only 1 CPU.
Most of our open connections come from blocking XREADs. (e.g. Heartbeat, Runtime, Overdue, History)
2021-12-09 15:55:24 +01:00
Eric Lippmann
68943a64c4
Merge pull request #410 from Icinga/bugfix/improve-notification-recipient
Make NotificationRecipient struct readable
2021-11-12 18:07:06 +01:00
Noah Hilverling
a938cfa175 Make NotificationRecipient struct readable 2021-11-12 16:33:33 +01:00
Eric Lippmann
f21f50e958 Reduce max_hmget_connections to 8 2021-11-12 16:29:59 +01:00
Eric Lippmann
69a0897847 Change default configuration file path 2021-11-12 16:29:59 +01:00
Julian Brost
f710c48394
Merge pull request #408 from Icinga/retry-broken
Fix broken retry mechanics
2021-11-12 14:32:31 +01:00
Eric Lippmann
a811c43727
Merge pull request #394 from Icinga/bugfix/unbulk-rtu
RuntimeUpdates#Sync(): force FIFO for config updates
2021-11-12 12:56:34 +01:00
Eric Lippmann
6a8163cdbc Runtime Updates: Use proper buffer channel sizes 2021-11-12 12:39:24 +01:00
Alexander A. Klimov
7d6474f6b5 Add special handling for entity bulks of size 1
Co-authored-by: Eric Lippmann <eric.lippmann@icinga.com>
2021-11-12 12:39:22 +01:00
Alexander A. Klimov
d356909edc Add special handling for bulks of size 1
Co-authored-by: Eric Lippmann <eric.lippmann@icinga.com>
2021-11-12 12:38:33 +01:00
Alexander A. Klimov
fecf332b8e RuntimeUpdates#Sync(): force FIFO for config updates
so they don't interfer.

Co-authored-by: Eric Lippmann <eric.lippmann@icinga.com>
2021-11-12 12:35:12 +01:00
Yonas Habteab
54c563d1c7 Add service_state.host_id column 2021-11-12 11:28:02 +01:00
Yonas Habteab
6faa84d94a Downtime: Add duration & scheduled_duration columns 2021-11-12 11:28:02 +01:00
Eric Lippmann
7f3621e980 Fix broken retry mechanics
dc7511c introduced a parent context check to suppress logging if
it was canceled.
If it was not canceled, the check overwrites the variable err,
which resets every previous real error to nil,
which then leads to fatals because the OnError callback or
the IsRetryable function expect a non-nil error.
Since I cannot reproduce which logs should have been suppressed by
the changes, I removed them completely.
2021-11-11 21:26:11 +01:00
Noah Hilverling
c19cfdf406 Schema: Prefix command_id with command type (check, event, notification)
Signed-off-by: Eric Lippmann <eric.lippmann@icinga.com>
2021-11-09 15:11:09 +01:00
Eric Lippmann
94abcef7bf
Merge pull request #388 from Icinga/bugfix/gpl2
Replace Apache 2.0 licensed gopkg.in/yaml.v3 with MIT licensed github.com/goccy/go-yaml
2021-11-09 11:13:29 +01:00
Eric Lippmann
ea74dc172a Rename periodic.Stoper to periodic.Stopper 2021-11-05 17:57:27 +01:00
Eric Lippmann
886c60c95a Use custom logger with 1 second interval for delta tests 2021-11-05 17:57:27 +01:00
Eric Lippmann
ccda48234e Use custom logger for accessing the interval for periodic logging 2021-11-05 17:57:22 +01:00
Eric Lippmann
8ec157e39b Add periodic logging for runtime updates 2021-11-05 17:52:48 +01:00
Eric Lippmann
6232773943 Use debug instead of info for some log messages
These log messages are not relevant for the info level.
2021-11-05 17:52:48 +01:00
Eric Lippmann
2d4b5419af Log which history sync started 2021-11-05 17:52:48 +01:00
Eric Lippmann
5fd4d35907 Remove syncing $subject log message
This info message just pollutes the logs and
for debugging we log the execution anyway
2021-11-05 17:52:46 +01:00
Eric Lippmann
43bcd2bbee Remove syncing $redisKey log message
This info message just pollutes the logs and
for debugging we log the execution anyway.
2021-11-05 17:52:11 +01:00
Eric Lippmann
8ce917d45a Remove waiting for heartbeat message
If a heartbeat is pending,
we log it every 60 seconds anyway.
2021-11-05 17:52:11 +01:00
Eric Lippmann
bd23f17eda Use pkg periodic for database logs 2021-11-05 17:52:10 +01:00
Eric Lippmann
8e7564b2aa Log which delta finished 2021-11-05 17:18:05 +01:00
Eric Lippmann
d6a28d7672 Use pkg periodic for history sync logs 2021-11-05 17:18:05 +01:00
Eric Lippmann
4b239d69bb Use debug instead of info for overdue refresh logs 2021-11-05 17:18:05 +01:00
Eric Lippmann
12525c7872 Use pkg periodic for overdue sync logs 2021-11-05 17:18:05 +01:00
Eric Lippmann
b10d038ba8 Use internal.LoggingInterval() 2021-11-05 17:18:05 +01:00
Eric Lippmann
dbb64a0de3 Log how many items to insert 2021-11-05 17:18:05 +01:00
Eric Lippmann
addfabbde1 Speak of items instead of rows 2021-11-05 17:18:05 +01:00
Eric Lippmann
5f1639aca2 Use pkg periodic for Redis logs 2021-11-05 17:18:05 +01:00
Eric Lippmann
a6e02e7f3c Introduce Counter.Total() 2021-11-05 17:18:05 +01:00
Eric Lippmann
c335a3c99c Introduce package periodic 2021-11-05 17:18:05 +01:00
Eric Lippmann
986e685ee0 Allow to configure interval for periodic logging 2021-11-05 17:18:05 +01:00
Eric Lippmann
b067ed2147 Introduce Counter.Reset() 2021-11-05 17:18:05 +01:00
Eric Lippmann
8a03745273 Speak of Icinga heartbeat not Icinga 2 heartbeat 2021-11-05 17:18:03 +01:00
Eric Lippmann
dc7511cd25 Don't log if context is canceled 2021-11-05 17:16:57 +01:00
Julian Brost
54dbe0cfbe
Merge pull request #391 from Icinga/bugfix/multi-environment
Better handling of multiple environments
2021-11-05 16:55:21 +01:00
Julian Brost
82cf600c55
Merge pull request #401 from Icinga/flawed-config-keys-and-validation
Fix flawed config keys and validation
2021-11-04 15:01:51 +01:00
Julian Brost
3342191b5e Exit on environment ID changes
There's a small risk that when the environment ID changes, Icinga DB could
update write into the wrong environment in the database. Therefore,
https://github.com/Icinga/icinga2/pull/9036 introduced a new default
environment ID based on the CA public key so that there should be no cases
where it's required to change the actual environment ID. So if this happens
nonetheless, just bail out.
2021-11-03 15:47:38 +01:00
Julian Brost
9b02b18f46 Use new environment ID
https://github.com/Icinga/icinga2/pull/9036 introduced a new environment ID for
Icinga DB that's written to the icinga:stats stream as field
"icingadb_environment". This commit updates the code to make use of this ID
instead of the one derived from the Icinga 2 Environment constant.
2021-11-03 15:47:38 +01:00
Eric Lippmann
a081927672 Only sync entities that belong to the current environment
Previously, we selected each entity from the database.
Now we only select entities that belong to the current environment.
2021-11-03 15:47:38 +01:00
Eric Lippmann
6b5aac9154 Config: Validate max_rows_per_transaction 2021-11-03 15:26:20 +01:00
Eric Lippmann
dac4a7246a Config: Validate max_placeholders_per_statement 2021-11-03 15:26:01 +01:00
Eric Lippmann
15eb9b471f Don't use CamelCase for config keys 2021-11-03 15:25:16 +01:00
Eric Lippmann
563aafaf90 Config: Validate xread_count 2021-11-03 15:23:40 +01:00
Eric Lippmann
084e0409bc Restart HA after environment change
If the environment changes during runtime, we have to restart HA
in order to stop a possibly running config sync and start a new
one.
2021-11-03 14:51:19 +01:00
Eric Lippmann
26a184d953 Make v1.Environment#Name types.String
The default environment of Icinga is the empty string.
In Golang, the zero value of string is also the empty string.
But it makes sense to distinguish whether the name is not set
or set to the empty string. That is possible with this change.
2021-11-03 14:49:56 +01:00
Eric Lippmann
54f9ef5a12 Insert environment
With this change Icinga DB will insert the environment after each
heartbeat takeover if it does not already exist in the database as
the environment may have changed, although this is likely to happen
very rarely,

Instead of checking whether the environment already exists,
uses an INSERT statement that does nothing if it does.
2021-11-03 14:49:56 +01:00
Eric Lippmann
4a659fd5c4 Add db.BuildIgnoreStmt() 2021-11-03 14:49:56 +01:00
Julian Brost
f290a0c9b7
Merge pull request #392 from Icinga/feature/history-deterministic-ids
Make {NotificationHistory,StateHistory,History*}#Id UUID -> SHA1
2021-11-03 13:18:11 +01:00
Alexander A. Klimov
b39eac660f Replace Apache 2.0 licensed gopkg.in/yaml.v3 with MIT licensed github.com/goccy/go-yaml
... not to have GPLv2<->Apache 2.0 (app<->deps) license conflicts.
2021-11-03 12:22:21 +01:00
Alexander A. Klimov
d903d05c82 Make History*#Id UUID -> SHA1 2021-11-03 12:15:25 +01:00
Alexander A. Klimov
52ae34a6f8 Make {NotificationHistory,StateHistory}#Id UUID -> SHA1 2021-11-03 12:15:25 +01:00
Eric Lippmann
e433aa7ec3 Move custom var sync to a new method 2021-10-26 09:31:41 +02:00
Eric Lippmann
65440ee8fe Expect no custom var clear events during runtime 2021-10-26 09:31:41 +02:00
Eric Lippmann
b48792cf36 Handle flat custom vars explicitly in runtime updates
This also requires explicit handling of custom variables as we need
to multiplex the original values to handle flat custom variables.
2021-10-26 09:31:41 +02:00
Eric Lippmann
4d65c62f77 Handle contracts.Initer in common.NewSyncSubject()
contracts.EntitiyFactoryFunc.WithInit() checked for
contracts.Initer every time.
Now it is only done once in common.NewSyncSubject().
2021-10-26 09:31:36 +02:00
Eric Lippmann
c78326ad1b Use SyncSubject in RuntimeUpdates.Sync() 2021-10-26 09:27:00 +02:00
Eric Lippmann
fe6915447e Add method SyncSubject.Name() 2021-10-26 09:27:00 +02:00
Eric Lippmann
a73a882c6f Introduce function ExpandCustomvars() 2021-10-26 09:27:00 +02:00
Eric Lippmann
d017a05d05 Export DB.getSemaphoreForTable() 2021-10-26 09:27:00 +02:00
Eric Lippmann
16dd4663ad Move method DB.getSemaphoreForTable() 2021-10-26 09:27:00 +02:00
Eric Lippmann
44b45fc429 Use context from errgroup 2021-10-26 09:27:00 +02:00
Alexander A. Klimov
b5e024e68d Sync state runtime updates ASAP
I.e. don't wait for the complete initial sync first.
2021-10-14 10:15:28 +02:00
Eric Lippmann
537a4cf37f Allow to configure the logging output 2021-10-13 09:46:12 +02:00
Eric Lippmann
a9afcea25c Allow systemd-journald (and console) as log outputs 2021-10-13 09:46:12 +02:00
Eric Lippmann
b582995e37 Introduce zapcore.Core that sends logs to systemd-journald 2021-10-13 09:46:12 +02:00
Eric Lippmann
9e49b62c4d Use the app name as the default logger name 2021-10-13 09:46:12 +02:00
Eric Lippmann
3061e3d0c5 Introduce function utils.AppName() 2021-10-13 09:46:12 +02:00
Eric Lippmann
66d34b4a9f Add ConvertCamelCase utility function 2021-10-13 09:46:12 +02:00
Eric Lippmann
bdeed69337 Move logging from internal to pkg 2021-10-13 09:20:55 +02:00
Eric Lippmann
d8ba0c374a
Merge pull request #364 from Icinga/feature/history-sync-foreign-keys
Add foreign key constraints to history tables
2021-10-07 18:38:33 +02:00
Julian Brost
7c782e3eb8 History sync: use information from notification stream for user_notification_history table 2021-10-05 18:35:02 +02:00
Julian Brost
8b4e4d68a6 History sync: use indefinitely blocking XREAD
Just like we do it throughout the rest of the code.
2021-10-05 18:35:02 +02:00
Julian Brost
bfcc324535 History sync: rewrite to use a sequential pipeline
This is in preparation for adding foreign key constraints to the history
tables. For this, is is required to insert the rows into the different history
tables in a defined order.
2021-10-05 18:35:02 +02:00
Julian Brost
82530c771d Redis/DB: export options member
This change allows the history sync to use values configured in these options.
2021-10-05 18:34:55 +02:00
Julian Brost
c5af0cd287 HA: only set realize timeout when active
When inactive, this is the only query running so it has to retry for longer to
eventually trigger a fatal error if the database is gone for too long (5
minutes at the moment).
2021-10-04 16:58:35 +02:00
Julian Brost
239d2ea410 HA: after heartbeat expiry, stop writing to database and hand over
If it's not possible for Icinga DB to write through the heartbeat within its
validity period it cannot signal to other instances that it still is alive and
has the hand over. There's also no point in retrying for this individual
heartbeat any longer.
2021-10-04 16:58:35 +02:00
Julian Brost
a34aef4fc5 retry: if stopped due to outer context, return that error
If there is an outer context that is canceled or exceeds its deadline before
the internal timeout is reached, its error should be passed on as the failure
didn't happen due to retry giving up.
2021-10-04 16:58:35 +02:00
Julian Brost
217ab03e59 heartbeat: wrap messages with a timestamp
Track when a heartbeat was received to allow other components to check when it
will expire.
2021-10-04 16:58:35 +02:00
Julian Brost
8b2cb3acb8 heartbeat: use a single channel for all beat/loss events
Using Cond does not allow to reliably catch all events as one will only receive
events that occour after starting to listen. For heartbeat loss events it's
import to reliably catch them to not remain in an HA active state incorrectly.

fixes #360
2021-10-04 16:36:09 +02:00
Julian Brost
a1b78e0f23 Add XMessageBulker
Generics would be nice but we don't have them yet unfortunatly, so for now, yet
another copy of Bulker (as it already exists for EntityBulker).
2021-10-04 14:44:50 +02:00
Alexander Aleksandrovič Klimov
d99e0586a5
Merge pull request #236 from Icinga/feature/tls
Support TLS
2021-10-04 12:27:21 +02:00
Alexander A. Klimov
82c26b187e Support TLS 2021-09-30 12:25:23 +02:00
Ravi Kumar Kempapura Srinivasa
414057830e Allow to configure logging in the YAML configuration 2021-09-28 17:30:11 +02:00
Ravi Kumar Kempapura Srinivasa
acde6ade69 Introduce Logging config struct 2021-09-28 17:30:11 +02:00
Julian Brost
65074f5755
Merge pull request #370 from Icinga/feature/icingadb-scheduling_source-160
Include CheckResult#scheduling_source in state and history
2021-09-27 17:32:42 +02:00
Julian Brost
6e3df7d63b
Merge pull request #373 from Icinga/feature/single-threaded-delta
Rewrite delta to use only a single goroutine
2021-09-24 16:53:23 +02:00
Julian Brost
0c9fb2f22f
Merge pull request #369 from Icinga/feature/log-reconnects-351
Log all different failed and recovered reconnects to backends
2021-09-24 11:58:34 +02:00
Julian Brost
66d9b0e6e6 Rewrite delta to use only a single goroutine 2021-09-24 11:52:15 +02:00
Julian Brost
1e9a88bee6 Add tests and benchmarks for delta computation 2021-09-24 11:52:13 +02:00
Alexander A. Klimov
b4bfee92d9 Log all recovered reconnects to backends
... to give the admin the all-clear.

refs #351
2021-09-23 16:07:41 +02:00
Julian Brost
e0c903bfdc Redis HYield: remove duplicates returned by HSCAN
fixes #349
2021-09-23 14:36:51 +02:00
Julian Brost
4457f9f440
Merge pull request #365 from Icinga/data-races
Fix data races
2021-09-23 12:32:19 +02:00
Eric Lippmann
454381c820 Use uint64 instead of Counter
Use uint64 as there is no longer any concurrent access.
2021-09-23 12:18:08 +02:00
Eric Lippmann
98202e1257 Use buffered channel
Use a buffered channel so that the next HSCAN call does not have
to wait until the previous result has been processed.
2021-09-23 09:37:31 +02:00
Eric Lippmann
c1e722f5fa Do not close channel too early
This fixes a data race where the pairs channel was closed too early
when the context is canceled and therefore the outer errgroup
returns from Redis operations before Wait() is called on the inner
errgroup. Unfinished Go methods in the inner errgroup would then
try to work on a closed channel.
2021-09-23 09:37:31 +02:00
Eric Lippmann
7351559793 Use pointer receiver for Counter.Val()
This fixes a data race as Val() was previously operating on a copy
of the counter while Inc() and Add() may haved changed the original
value.
2021-09-23 09:37:31 +02:00
Eric Lippmann
9ce2cff5c0 Introduce WaiterFunc type
The WaiterFunc type is an adapter to allow the use of ordinary
functions as Waiter.
2021-09-23 09:37:31 +02:00
Julian Brost
17321cdfc3 Fix use of wrong log function on heartbeat loss
Has to use the Warnw function as it passes additional zap attributes.
2021-09-23 09:27:26 +02:00
Alexander A. Klimov
82d8f830af Include CheckResult#scheduling_source in state and history
refs #160
2021-09-22 17:30:13 +02:00
Alexander Aleksandrovič Klimov
585d1e6bb5
Merge pull request #368 from Icinga/bugfix/icinga-db-does-not-exit-when-reconnecting-to-the-database-350
On shutdown: give up HA handover after 3s, not 5m
2021-09-22 16:22:51 +02:00
Alexander A. Klimov
f554fa9dfe Log all different failed reconnects to backends
E.g. the first "connection refused" and the first "hostname mismatch".

refs #351
2021-09-22 16:15:37 +02:00
Alexander A. Klimov
321db0eecf Introduce Settings#OnSuccess
refs #351
2021-09-22 15:35:08 +02:00
Alexander A. Klimov
653df82a1e Introduce Settings#OnError
refs #351
2021-09-22 15:34:39 +02:00
Alexander A. Klimov
ea7668d99a HA#Close(): allow custom context
refs #350
2021-09-22 14:12:27 +02:00
Alexander A. Klimov
5a146645f2 HA#removeInstance(): allow custom context
refs #350
2021-09-22 14:12:27 +02:00
Alexander A. Klimov
8d57ec107a WithBackoff(): aggregate optional settings in one struct
refs #351
2021-09-22 13:37:12 +02:00
Alexander Aleksandrovič Klimov
4371d04d5e
Merge pull request #356 from Icinga/bugfix/mustpackany
Clarify what MustPackAny() does
2021-09-21 16:16:56 +02:00
Alexander Aleksandrovič Klimov
956590cb99
Merge pull request #301 from Icinga/feature/scheduled_by
Introduce downtime#scheduled_by
2021-09-02 10:31:59 +02:00
Julian Brost
be9054628a Ensure extra config options are properly initialized
YAML is decoded by the structure of the YAML source document, not the Go
destination data structure. Therefore, the old code did not always call
UnmarshalYAML() on all sub-structs. Therefore, defaults were not always set but
zero values were used, resulting in all kind of strange behavior.

This commit changes the code so that it no longer relies on individual
UnmarshalYAML() functions to set the defaults for each sub-struct but instead
just sets all of them when creating the surrounding Config instance. It also
moves the config validation to separate Validate() functions.
2021-09-01 18:49:38 +02:00
Alexander A. Klimov
3f8703310c Clarify what MustPackAny() does
It always packs a slice, even if only one item given.
2021-08-20 11:54:00 +02:00
Eric Lippmann
0bf4f4df0a
Merge pull request #343 from Icinga/config-options
Separate required and optional configuration for database and Redis
2021-08-10 12:33:22 +02:00
Alexander Aleksandrovič Klimov
5a91d8acf6
Merge pull request #338 from Icinga/feature/support-icingadb-version-again-335
Support --version
2021-08-10 11:17:51 +02:00
Eric Lippmann
fbbb9bfacd Don't allow 0 for timeout redis option
0 stands for deactivate, which makes no sense here.
2021-08-10 09:29:27 +02:00
Eric Lippmann
8232000524 Don't allow 0 for max_connections database option
0 stands for deactivate, which makes no sense here.
2021-08-10 08:55:24 +02:00
Eric Lippmann
50473bca70 Remove UnmarshalYAML
Config options are no longer inlined.
2021-08-09 22:06:55 +02:00
Eric Lippmann
1c386c9c2f Don't inline Database options
There is now the options key to separate required and optional
configuration. Before, both were mixed.
2021-08-09 22:06:50 +02:00
Eric Lippmann
8927e942f1 Remove UnmarshalYAML
Config options are no longer inlined.
2021-08-09 22:06:50 +02:00
Eric Lippmann
559b27cd8b Don't inline Redis options
There is now the options key to separate required and optional
configuration. Before, both were mixed.
2021-08-09 21:48:27 +02:00
Eric Lippmann
f9e12d9df7 Move method 2021-08-09 21:45:08 +02:00
Alexander A. Klimov
825dcbc817 Introduce downtime#scheduled_by 2021-08-09 20:12:10 +02:00
Eric Lippmann
783b3a6bfe
Merge pull request #323 from Icinga/feature/downtime-parent-downtime-id
Downtime: Add parent_id
2021-08-09 13:43:16 +02:00
Eric Lippmann
0a521a6e4a Add missing doc in meta 2021-08-09 10:30:53 +02:00
Eric Lippmann
1d5ae198aa Add missing doc in meta 2021-08-09 10:30:53 +02:00
Eric Lippmann
a4b77c6a45 Add missing doc in sync 2021-08-09 10:30:53 +02:00
Eric Lippmann
77ab2753f9 Add missing doc in ha 2021-08-09 10:30:53 +02:00
Eric Lippmann
ac0f26e59b Add missing doc in entitiesbyid 2021-08-09 10:30:53 +02:00
Eric Lippmann
1d361594ee Add missing doc in dump_signals 2021-08-09 10:30:53 +02:00
Eric Lippmann
fc5e2882ff Add missing doc in delta 2021-08-09 10:30:53 +02:00
Eric Lippmann
1fda4ea6ee Rename start() to run() 2021-08-09 10:30:53 +02:00
Eric Lippmann
a13788073d Add missing doc entitiy_bulker 2021-08-09 10:30:53 +02:00
Eric Lippmann
2c3a58e365 Add missing doc in bulker 2021-08-09 10:30:53 +02:00
Eric Lippmann
bf415f2e1c Add missing doc in stats_message 2021-08-09 10:30:53 +02:00
Eric Lippmann
ff88cb73f7 Add missing doc in icinga_status 2021-08-09 10:30:53 +02:00
Eric Lippmann
92bc1b26c7 Add missing doc in redis utils 2021-08-09 10:30:53 +02:00
Eric Lippmann
fee30380d5 Add missing doc in client 2021-08-09 10:30:53 +02:00
Eric Lippmann
7d59a98f90 Add missing doc in utils 2021-08-09 10:30:53 +02:00
Eric Lippmann
d1c20b6946 Add missing doc in db 2021-08-09 10:30:53 +02:00
Eric Lippmann
270f1930aa Remove useless comments 2021-08-09 10:30:53 +02:00
Eric Lippmann
7bda89e79d Return error instead of panicking 2021-08-09 10:29:47 +02:00
Eric Lippmann
ee36691f3f Remove --datadir config flag
It's currently not used anywhere.
2021-08-09 10:29:47 +02:00
Eric Lippmann
858dbe7481 Remove config.ValidateFile()
YAML already complains that the file is a directory:
"can't parse YAML file pkg: yaml: input error: read pkg: is a directory"
2021-08-09 10:29:47 +02:00
Eric Lippmann
0b1610c69b Use cancelCtx() instead of just cancel() 2021-08-09 10:29:47 +02:00
Eric Lippmann
f3f07a29cc Always use data as paramter name in UnmarshalJSON() 2021-08-09 10:29:47 +02:00
Eric Lippmann
63b8d98237 Always use text as paramter name in UnmarshalText() 2021-08-09 10:29:47 +02:00
Eric Lippmann
42935ae962 Fix comments 2021-08-09 10:29:47 +02:00
Eric Lippmann
83866f3a70 Remove unused variables Yes and No 2021-08-09 10:29:47 +02:00
Eric Lippmann
ec70babc91 Fix typo 2021-08-09 10:29:47 +02:00
Eric Lippmann
d40768ee64 Fix different receiver names 2021-08-09 10:29:47 +02:00
Eric Lippmann
e1d27bd93f Remove unused function PipeError 2021-08-09 10:29:47 +02:00
Eric Lippmann
f7be60623c Use QueryxContext() instead of Query() 2021-08-09 10:29:47 +02:00
Alexander A. Klimov
41e473b244 Remove unused functions 2021-08-09 10:29:47 +02:00
Eric Lippmann
868d46219a
Merge pull request #340 from Icinga/retry-badconn
Also retry driver.ErrBadConn
2021-08-09 10:27:03 +02:00
Alexander A. Klimov
46da390499 Validate specified config file only once
... after processed --version to allow --version w/o config file.

refs #335
2021-08-06 12:40:15 +02:00
Alexander A. Klimov
31fd1963f2 Support --version
refs #335
2021-08-06 12:40:15 +02:00
Eric Lippmann
ebd3f10d69 Log MySQL driver errors at the debug level instead of discarding them
The MySQL driver logs something like unexpected EOF, busy buffer,
broken pipe, bad connection, invalid connection etc. in the event
of an error. Most of these are retryable and are handled either by
the MySQL driver or our retry logic. Previously these messages
were discarded, but since they can be useful when debugging
connectivity issues, we'll now log them with the debug severity as
otherwise they would add too much noise to the logs where you might
see error messages with no actual effect.
2021-08-06 10:44:50 +02:00
Eric Lippmann
9b5e016e57 Also retry driver.ErrBadConn
ErrBadConn is returned by a driver to signal that a driver.Conn is
in a bad state. We have to retry with these errors as well.
2021-08-06 10:02:14 +02:00
Eric Lippmann
b5b169aea8
Merge pull request #312 from Icinga/bulk-statement-size
Bulk statement size
2021-08-05 00:40:25 +02:00
Alexander A. Klimov
ec37415261 AcknowledgementHistory#Author: make missing value NULL, not ""
refs #305
2021-08-04 12:02:08 +02:00
Eric Lippmann
bbba443529 Reduce the size of bulk statements and make the size configurable
Reduce the size of the bulk create, update, and delete statements to
also reduce query execution time so that the database server
can better execute statements in parallel.
2021-08-04 00:14:31 +02:00
Alexander Aleksandrovič Klimov
261c5aab8d
Merge pull request #322 from Icinga/feature/icingadb-last_comment_id
Icinga DB: introduce Checkable#last_comment_id
2021-08-03 16:29:40 +02:00
Eric Lippmann
e35a1609fc Stream state updates from icinga:runtime:state
Icinga now sends runtime updates in two separate channels,
icinga:runtime for config updates and icinga:runtime:state for
state updates. With this change, Icinga DB reads from these two
streams. This is a preparation so that state updates can be
streamed directly after a (re)start of Icinga or Icinga DB without
waiting for the config sync, as it is currently done.
2021-08-03 14:06:55 +02:00
Alexander A. Klimov
7346edb836 Icinga DB: introduce Checkable#last_comment_id 2021-07-26 18:10:28 +02:00
Noah Hilverling
9f6c73ca56 Downtime: Add parent_id 2021-07-23 13:54:39 +02:00
Eric Lippmann
ab4caa32a2
Merge pull request #319 from Icinga/heartbeat
Pointer receivers, Cond usage, pass ctx and Godoc for Heartbeat
2021-07-21 19:06:13 +02:00
Eric Lippmann
725e70f0b9 Pointer receivers, Cond usage, pass ctx and Godoc for Heartbeat
Heartbeat now uses pointer receivers for its methods because
some methods actually change the heartbeat values.
The context is no longer stored in the structure,
but passed to the controller loop.
The beat and the lost channels are replaced by Cond and
the last heartbeat is stored independently to not be affected by
a slow HA receiver. If the database connections are occupied by
the config, HA cannot update the instance and does not read from
the beat channel in time.
In addition, heartbeat errors are no longer swallowed,
but handled in HA.
2021-07-20 10:17:05 +02:00
Eric Lippmann
b7ecfb9df2 Do not store ctx inside Cond and add Godoc 2021-07-15 14:06:21 +02:00
Eric Lippmann
320fcd84bb
Merge pull request #308 from Icinga/feature/normalized_performance_data
Introduce *_state#normalized_performance_data
2021-07-13 15:16:38 +02:00
Eric Lippmann
2eb914a48c
Merge pull request #303 from Icinga/bugfix/cfg-segv
Fix SEGV due to empty config
2021-07-08 10:50:58 +02:00
Alexander A. Klimov
9bf1512e06 Introduce *_state#normalized_performance_data 2021-07-07 19:02:55 +02:00
Alexander A. Klimov
f47b6e7657 Fix SEGV due to empty config 2021-07-06 18:42:40 +02:00
Eric Lippmann
43624ecebe Use bulk upsert for bulk updates in sync 2021-07-05 08:51:32 +02:00
Eric Lippmann
f5777f1055 Use bulk upsert in runtime updates 2021-07-01 11:20:43 +02:00
Eric Lippmann
73865f819a Activate bulk upsert
sqlx >= v1.3.4 supports `VALUES(col_name)` to refer to column values.
2021-07-01 11:15:49 +02:00
Eric Lippmann
5cba0f9e22 Return number of placeholders 2021-07-01 11:13:17 +02:00
Eric Lippmann
7eb55ae81b Support DriverContext
Our driver now supports DriverContext and uses contexts.
This basically supports that the connect function returns
immediately when a context is cancelled.
2021-06-22 14:54:33 +02:00
Eric Lippmann
40bbe3117f
Merge pull request #289 from Icinga/error-handling
Error handling
2021-06-21 17:58:45 +02:00
Eric Lippmann
40095ad0db Call errors.Wrap*() unconditionally where appropriate 2021-06-21 13:20:44 +02:00
Eric Lippmann
dac942a9a8 Use errors.As() and Is() in favor of Unwrap() 2021-06-21 13:20:44 +02:00
Eric Lippmann
e12425d8dc Wrap errors 2021-06-21 12:13:24 +02:00
Eric Lippmann
6b2ecec65b Replace custom err w/ func returing a wrapped err
This ensures that there's also a stack trace associated with the error.
2021-06-21 12:13:24 +02:00
Eric Lippmann
8e216d2f83 Fix imports 2021-06-21 12:13:24 +02:00
Alexander A. Klimov
8cbf24932e Add stack of current goroutine to errors 2021-05-31 16:53:57 +02:00
Alexander A. Klimov
c25c47ad72 Compare errors smartly, i.e. unwrap them first 2021-05-31 16:52:47 +02:00
Eric Lippmann
a693b52118 HA: Respect context when operating channels
The transmission processes of the handover and takeover channel can
block because in the event of a context cancellation, nobody reads
it anymore.
2021-05-31 15:31:55 +02:00
Julian Brost
31aed435cc HA: context used to start transaction must not be canceled before the transaction is committed
This was introduced by 621c1b9537 and leads to
tx.Commit() always returning an "context canceled" error.
2021-05-31 14:48:15 +02:00
Alexander A. Klimov
b0354e3503 Don't misuse loop as if 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
afe0a90487 s/CondClosed/ErrCondClosed/ 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
1329c68cf3 Use !bytes.Equal(x,y), not bytes.Compare(x,y)!=0 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
a4abdce1f0 Use time.Since(x), not time.Now().Sub(x) 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
35349262ce Use time.NewTicker(), not time.Tick() 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
5a084ba7a9 Simplify code 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
c636de294a Un-capitalize error messages 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
dac83f2773 Drop unused stuff 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
c3ea4d9490 Avoid unreachable code 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
621c1b9537 Ensure context cancellation 2021-05-28 14:24:36 +02:00
Alexander A. Klimov
cabcd458ff Don't "misuse" unsafe.Pointer 2021-05-28 14:24:36 +02:00
Eric Lippmann
9ba2c01ce2 Reduce "Can't update or insert instance. Retrying" log noise
Only log after the third retry with the info level. Also, before
executing the transaction, sleep dependent on the retry count.
2021-05-25 16:35:41 +02:00
Eric Lippmann
372f5cae7c Also log environment info 2021-05-25 16:25:04 +02:00
Eric Lippmann
a1ffc53998 Log first Redis connection error while retrying 2021-05-25 14:00:06 +02:00
Eric Lippmann
be3180a54c Log first database connection error while retrying 2021-05-25 14:00:05 +02:00
Noah Hilverling
44c734f72d Improve database and HA logging 2021-05-25 09:49:48 +02:00
Julian Brost
37538b891c db.BulkExec(): Use ExecContext() instead of Query()
Query() returns a cursor that has to be closed to not block the underlying
database connection. Use ExecContext() instead to both avoid the cursor leak
and also properly pass the context.

This bug lead to the DB connection pool be blocked completely after a certain
number of runtime delete queries.
2021-05-21 11:24:53 +02:00
Eric Lippmann
c5e875cfce Merge pull request #64 from lippserd/feature/mysql-retry-5m
Re-use Redis dialer logic in retry.WithBackoff() and the SQL driver
2021-05-21 10:39:29 +02:00
Alexander A. Klimov
867d5b67dd SQL driver: de-duplicate retry.WithBackoff() logic 2021-05-20 13:13:55 +02:00
Alexander A. Klimov
e4e138aaa4 Redis dialer: de-duplicate retry.WithBackoff() logic 2021-05-20 12:10:20 +02:00
Noah Hilverling
7bbc4e931e Merge pull request #61 from lippserd/bugfix/change-id-fields-to-match-sql-schema
Change ID fields to match SQL schema
2021-05-20 10:46:06 +02:00
Eric Lippmann
91052b1f69 Merge pull request #62 from lippserd/feature/wrap-redis-err
Wrap Redis errors
2021-05-20 09:05:18 +02:00
Alexander A. Klimov
5bcd5339b4 retry.WithBackoff(): return the most descriptive error 2021-05-19 19:28:22 +02:00
Alexander A. Klimov
f77d394041 retry.WithBackoff(): add optional timeout 2021-05-19 19:12:18 +02:00
Alexander A. Klimov
00fe3fe6f7 retry.WithBackoff(): pass a context to the function to be tried 2021-05-19 19:00:55 +02:00
Julian Brost
7056cf2b92 Don't execute runtime update upset queries concurrently
These updates must be executed in order, therefore prevent concurrency by using
a separate semaphore.
2021-05-19 18:32:47 +02:00
Eric Lippmann
ff5bded004 Merge pull request #26 from lippserd/feature/redis-retry
Survive a Redis restart
2021-05-19 16:57:24 +02:00
Eric Lippmann
417ba462ed Merge pull request #31 from lippserd/feature/overdue
Sync overdue indicators
2021-05-19 16:57:03 +02:00
Alexander A. Klimov
1026d4cabf Wrap Redis errors 2021-05-19 11:57:58 +02:00
Alexander A. Klimov
d08f32397a Introduce icingaredis.WrapCmdErr() 2021-05-19 11:57:58 +02:00
Alexander A. Klimov
05c1361f1a Introduce utils.Ellipsize() 2021-05-19 11:55:17 +02:00
Eric Lippmann
20465d078a Merge pull request #57 from lippserd/bugfix/include-instance-id-in-ha-query
HA: Add own instance ID to responsibility query
2021-05-19 09:48:48 +02:00
Noah Hilverling
26086db211 HA: Add own instance ID to responsibility query
Without this PR Icinga DB thinks someone else is responsible, when it is responsible. This leads to Icinga DB only updating the heartbeat after the timeout. This could also cause random responsibility switching.
2021-05-19 09:32:12 +02:00
Alexander A. Klimov
3e45567368 config.Redis#NewClient(): work-around go-redis/redis#1737
... by re-trying once more often than there are connections in the pool.
2021-05-18 18:47:41 +02:00
Alexander A. Klimov
45b626c914 Redis: retry dial on syscall.ECONNREFUSED 2021-05-18 18:47:41 +02:00
Noah Hilverling
1282df748f Change ID fields to match SQL schema 2021-05-18 09:42:46 +02:00
Noah Hilverling
586e99a548 types.AcknowledgementState: Fix state map 2021-05-17 09:39:29 +02:00
Alexander A. Klimov
b31e012cf0 Sync overdue indicators 2021-05-12 18:51:13 +02:00