From c821e733644f9521a201157cb94aebc09ef59a8c Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Tue, 16 Apr 2019 16:37:38 +0200 Subject: [PATCH 1/5] Cache the API package stage name with a active-stage fallback This prevents reading the file everytime the stageName is required for when creating a runtime object via REST API. --- doc/15-troubleshooting.md | 89 ++++++++++++++++++++++++++++- doc/16-upgrading-icinga-2.md | 22 +++++++ lib/remote/apilistener.cpp | 55 ++++++++++++++++++ lib/remote/apilistener.hpp | 11 ++++ lib/remote/configpackageutility.cpp | 65 +++++++++++++++++++-- lib/remote/configpackageutility.hpp | 2 + 6 files changed, 237 insertions(+), 7 deletions(-) diff --git a/doc/15-troubleshooting.md b/doc/15-troubleshooting.md index a4a7b8910..f7d46f7b8 100644 --- a/doc/15-troubleshooting.md +++ b/doc/15-troubleshooting.md @@ -747,7 +747,7 @@ $ curl -k -s -u root:icinga -H 'Accept: application/json' -X DELETE 'https://loc } ``` -## REST API Troubleshooting: No Objects Found +### REST API Troubleshooting: No Objects Found Please note that the `404` status with no objects being found can also originate from missing or too strict object permissions for the authenticated user. @@ -761,6 +761,93 @@ In order to analyse and fix the problem, please check the following: - use an administrative account with full permissions to check whether the objects are actually there. - verify the permissions on the affected ApiUser object and fix them. +### Missing Runtime Objects (Hosts, Downtimes, etc.) + +Runtime objects consume the internal config packages shared with +the REST API config packages. Each host, downtime, comment, service, etc. created +via the REST API is stored in the `_api` package. + +This includes downtimes and comments, which where sometimes stored in the wrong +directory path, because the active-stage file was empty/truncated/unreadable at +this point. + +Wrong: + +``` +/var/lib/icinga2/api/packages/_api//conf.d/downtimes/1234-5678-9012-3456.conf +``` + +Correct: + +``` +/var/lib/icinga2/api/packages/_api/abcd-ef12-3456-7890/conf.d/downtimes/1234-5678-9012-3456.conf +``` + +At creation time, the object lives in memory but its storage is broken. Upon restart, +it is missing and e.g. a missing downtime will re-enable unwanted notifications. + +`abcd-ef12-3456-7890` is the active stage name which wasn't correctly +read by the Icinga daemon. This information is stored in `/var/lib/icinga2/api/packages/_api/active-stage`. + +2.11 now limits the direct active-stage file access (this is hidden from the user), +and caches active stages for packages in-memory. + +Bonus on startup/config validation: Icinga now logs a critical message when a deployed +config package is broken. + +``` +icinga2 daemon -C + +[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation. +``` + +In order to fix the broken config package, and mark a deployed stage as active +again, carefully do the following steps with creating a backup before: + +Navigate into the API package prefix. + +``` +cd /var/lib/icinga2/api/packages +``` + +Change into the broken package directory and list all directories and files +ordered by latest changes. + +``` +cd _api +ls -lahtr + +drwx------ 4 michi wheel 128B Mar 27 14:39 .. +-rw-r--r-- 1 michi wheel 25B Mar 27 14:39 include.conf +-rw-r--r-- 1 michi wheel 405B Mar 27 14:39 active.conf +drwx------ 7 michi wheel 224B Mar 27 15:01 abcd-ef12-3456-7890 +drwx------ 5 michi wheel 160B Apr 26 12:47 . +``` + +As you can see, the `active-stage` file is missing. When it is there, verify that its content +is set to the stage directory as follows. + +If you have more than one stage directory here, pick the latest modified +directory. Copy the directory name `abcd-ef12-3456-7890` and +add it into a new file `active-stage`. This can be done like this: + +``` +echo "abcd-ef12-3456-7890" > active-stage +``` + +Re-run config validation. + +``` +icinga2 daemon -C +``` + +The validation should not show an error. + +> **Note** +> +> The internal `_api` config package structure may change in the future. Do not modify +> things in there manually or with scripts unless guided here or asked by a developer. + ## Certificate Troubleshooting diff --git a/doc/16-upgrading-icinga-2.md b/doc/16-upgrading-icinga-2.md index 665f41b23..377806817 100644 --- a/doc/16-upgrading-icinga-2.md +++ b/doc/16-upgrading-icinga-2.md @@ -102,6 +102,28 @@ The deprecated `concurrent_checks` attribute in the [checker feature](09-object- has no effect anymore if set. Please use the [MaxConcurrentChecks](17-language-reference.md#icinga-constants-global-config) constant in [constants.conf](04-configuring-icinga-2.md#constants-conf) instead. +### REST API + +#### Config Packages + +Deployed configuration packages require an active stage, with many previous +allowed. This mechanism is used by the Icinga Director as external consumer, +and Icinga itself for storing runtime created objects inside the `_api` +package. + +This includes downtimes and comments, which where sometimes stored in the wrong +directory path, because the active-stage file was empty/truncated/unreadable at +this point. + +2.11 makes this mechanism more stable and detects broken config packages. + +``` +[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation. +``` + +In order to fix this, please follow [this troubleshooting entry](15-troubleshooting.md#troubleshooting-api-missing-runtime-objects). + + ## Upgrading to v2.10 ### Path Constant Changes diff --git a/lib/remote/apilistener.cpp b/lib/remote/apilistener.cpp index 39d82ef2f..713163d7d 100644 --- a/lib/remote/apilistener.cpp +++ b/lib/remote/apilistener.cpp @@ -6,6 +6,7 @@ #include "remote/endpoint.hpp" #include "remote/jsonrpc.hpp" #include "remote/apifunction.hpp" +#include "remote/configpackageutility.hpp" #include "base/convert.hpp" #include "base/defer.hpp" #include "base/io-engine.hpp" @@ -134,6 +135,9 @@ void ApiListener::OnConfigLoaded() Log(LogWarning, "ApiListener", "Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/"); } + /* Cache API packages and their active stage name. */ + UpdateActivePackageStagesCache(); + /* set up SSL context */ std::shared_ptr cert; try { @@ -1537,6 +1541,57 @@ Endpoint::Ptr ApiListener::GetLocalEndpoint() const return m_LocalEndpoint; } +void ApiListener::UpdateActivePackageStagesCache() +{ + boost::mutex::scoped_lock lock(m_ActivePackageStagesLock); + + for (auto package : ConfigPackageUtility::GetPackages()) { + String activeStage; + + try { + activeStage = ConfigPackageUtility::GetActiveStageFromFile(package); + } catch (const std::exception& ex) { + Log(LogCritical, "ApiListener") + << ex.what(); + continue; + } + + Log(LogNotice, "ApiListener") + << "Updating cache: Config package '" << package << "' has active stage '" << activeStage << "'."; + + m_ActivePackageStages[package] = activeStage; + } +} + +void ApiListener::SetActivePackageStage(const String& package, const String& stage) +{ + boost::mutex::scoped_lock lock(m_ActivePackageStagesLock); + m_ActivePackageStages[package] = stage; +} + +String ApiListener::GetActivePackageStage(const String& package) +{ + boost::mutex::scoped_lock lock(m_ActivePackageStagesLock); + + if (m_ActivePackageStages.find(package) == m_ActivePackageStages.end()) + BOOST_THROW_EXCEPTION(ScriptError("Package " + package + " has no active stage.")); + + return m_ActivePackageStages[package]; +} + +void ApiListener::RemoveActivePackageStage(const String& package) +{ + /* This is the rare occassion when a package has been deleted. */ + boost::mutex::scoped_lock lock(m_ActivePackageStagesLock); + + auto it = m_ActivePackageStages.find(package); + + if (it == m_ActivePackageStages.end()) + return; + + m_ActivePackageStages.erase(it); +} + void ApiListener::ValidateTlsProtocolmin(const Lazy& lvalue, const ValidationUtils& utils) { ObjectImpl::ValidateTlsProtocolmin(lvalue, utils); diff --git a/lib/remote/apilistener.hpp b/lib/remote/apilistener.hpp index e6390d944..dcd9e0424 100644 --- a/lib/remote/apilistener.hpp +++ b/lib/remote/apilistener.hpp @@ -84,6 +84,11 @@ public: static Value ConfigUpdateObjectAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params); static Value ConfigDeleteObjectAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params); + /* API config packages */ + void SetActivePackageStage(const String& package, const String& stage); + String GetActivePackageStage(const String& package); + void RemoveActivePackageStage(const String& package); + static Value HelloAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params); static void UpdateObjectAuthority(); @@ -175,6 +180,12 @@ private: void SendRuntimeConfigObjects(const JsonRpcConnection::Ptr& aclient); void SyncClient(const JsonRpcConnection::Ptr& aclient, const Endpoint::Ptr& endpoint, bool needSync); + + /* API Config Packages */ + mutable boost::mutex m_ActivePackageStagesLock; + std::map m_ActivePackageStages; + + void UpdateActivePackageStagesCache(); }; } diff --git a/lib/remote/configpackageutility.cpp b/lib/remote/configpackageutility.cpp index fad266b53..c385a5d93 100644 --- a/lib/remote/configpackageutility.cpp +++ b/lib/remote/configpackageutility.cpp @@ -1,6 +1,7 @@ /* Icinga 2 | (c) 2012 Icinga GmbH | GPLv2+ */ #include "remote/configpackageutility.hpp" +#include "remote/apilistener.hpp" #include "base/application.hpp" #include "base/exception.hpp" #include "base/utility.hpp" @@ -34,6 +35,14 @@ void ConfigPackageUtility::DeletePackage(const String& name) if (!Utility::PathExists(path)) BOOST_THROW_EXCEPTION(std::invalid_argument("Package does not exist.")); + ApiListener::Ptr listener = ApiListener::GetInstance(); + + /* config packages without API make no sense. */ + if (!listener) + BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured.")); + + listener->RemoveActivePackageStage(name); + Utility::RemoveDirRecursive(path); Application::RequestRestart(); } @@ -157,10 +166,7 @@ void ConfigPackageUtility::WriteStageConfig(const String& packageName, const Str void ConfigPackageUtility::ActivateStage(const String& packageName, const String& stageName) { - String activeStagePath = GetPackageDir() + "/" + packageName + "/active-stage"; - std::ofstream fpActiveStage(activeStagePath.CStr(), std::ofstream::out | std::ostream::binary | std::ostream::trunc); - fpActiveStage << stageName; - fpActiveStage.close(); + SetActiveStage(packageName, stageName); WritePackageConfig(packageName); } @@ -242,8 +248,11 @@ std::vector ConfigPackageUtility::GetStages(const String& packageName) return stages; } -String ConfigPackageUtility::GetActiveStage(const String& packageName) +String ConfigPackageUtility::GetActiveStageFromFile(const String& packageName) { + /* Lock the transaction, reading this only happens on startup or when something really is broken. */ + boost::mutex::scoped_lock lock(GetStaticMutex()); + String path = GetPackageDir() + "/" + packageName + "/active-stage"; std::ifstream fp; @@ -255,11 +264,55 @@ String ConfigPackageUtility::GetActiveStage(const String& packageName) fp.close(); if (fp.fail()) - return ""; + BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot detect active stage for package '" + packageName + "'. Broken config package, check the troubleshooting documentation.")); return stage.Trim(); } +String ConfigPackageUtility::GetActiveStage(const String& packageName) +{ + ApiListener::Ptr listener = ApiListener::GetInstance(); + + /* config packages without API make no sense. */ + if (!listener) + BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured.")); + + String activeStage; + + /* First use runtime state. */ + try { + activeStage = listener->GetActivePackageStage(packageName); + } catch (const std::exception& ex) { + /* Fallback to reading the file, happens on restarts. */ + activeStage = GetActiveStageFromFile(packageName); + + /* When we've read something, correct memory. */ + if (!activeStage.IsEmpty()) + listener->SetActivePackageStage(packageName, activeStage); + else + BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot detect active stage for package '" + packageName + "'. Broken config package, check the troubleshooting documentation.")); + } + + return activeStage; +} + +void ConfigPackageUtility::SetActiveStage(const String& packageName, const String& stageName) +{ + ApiListener::Ptr listener = ApiListener::GetInstance(); + + /* config packages without API make no sense. */ + if (!listener) + BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured.")); + + listener->SetActivePackageStage(packageName, stageName); + + /* Also update the marker on disk for restarts. */ + String activeStagePath = GetPackageDir() + "/" + packageName + "/active-stage"; + + std::ofstream fpActiveStage(activeStagePath.CStr(), std::ofstream::out | std::ostream::binary | std::ostream::trunc); //TODO: fstream exceptions + fpActiveStage << stageName; + fpActiveStage.close(); +} std::vector > ConfigPackageUtility::GetFiles(const String& packageName, const String& stageName) { diff --git a/lib/remote/configpackageutility.hpp b/lib/remote/configpackageutility.hpp index db475c8f2..ff1c2ad61 100644 --- a/lib/remote/configpackageutility.hpp +++ b/lib/remote/configpackageutility.hpp @@ -32,7 +32,9 @@ public: static String CreateStage(const String& packageName, const Dictionary::Ptr& files = nullptr); static void DeleteStage(const String& packageName, const String& stageName); static std::vector GetStages(const String& packageName); + static String GetActiveStageFromFile(const String& packageName); static String GetActiveStage(const String& packageName); + static void SetActiveStage(const String& packageName, const String& stageName); static void ActivateStage(const String& packageName, const String& stageName); static void AsyncTryActivateStage(const String& packageName, const String& stageName, bool reload); From 0d6d48fd59902379678ab79a095f122f349684dd Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Fri, 26 Apr 2019 14:43:10 +0200 Subject: [PATCH 2/5] Daemon: Deal with exceptions from broken _api package --- lib/cli/daemoncommand.cpp | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/lib/cli/daemoncommand.cpp b/lib/cli/daemoncommand.cpp index c5b0376a0..adbb1afa5 100644 --- a/lib/cli/daemoncommand.cpp +++ b/lib/cli/daemoncommand.cpp @@ -289,7 +289,13 @@ int DaemonCommand::Run(const po::variables_map& vm, const std::vector Date: Fri, 26 Apr 2019 14:43:38 +0200 Subject: [PATCH 3/5] Cluster: Don't try to sync objects from broken _api package --- lib/remote/apilistener-configsync.cpp | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/lib/remote/apilistener-configsync.cpp b/lib/remote/apilistener-configsync.cpp index e06d1d887..6b4bec138 100644 --- a/lib/remote/apilistener-configsync.cpp +++ b/lib/remote/apilistener-configsync.cpp @@ -282,7 +282,15 @@ void ApiListener::UpdateConfigObject(const ConfigObject::Ptr& object, const Mess params->Set("version", object->GetVersion()); if (object->GetPackage() == "_api") { - String file = ConfigObjectUtility::GetObjectConfigPath(object->GetReflectionType(), object->GetName()); + String file; + + try { + file = ConfigObjectUtility::GetObjectConfigPath(object->GetReflectionType(), object->GetName()); + } catch (const std::exception& ex) { + Log(LogNotice, "ApiListener") + << "Cannot sync object '" << object->GetName() << "': " << ex.what(); + return; + } std::ifstream fp(file.CStr(), std::ifstream::binary); if (!fp) From 2bca7a5bb5cb52149da0bfd93e6d43d3842ab753 Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Fri, 26 Apr 2019 14:51:28 +0200 Subject: [PATCH 4/5] Repair broken API config packages at runtime This means a new timer which checks every 5m whether the active-stage can be read, and if not, it overwrites the file on disk with the details from memory. --- lib/remote/apilistener.cpp | 26 ++++++++++++++++++++++++++ lib/remote/apilistener.hpp | 3 +++ lib/remote/configpackageutility.cpp | 17 ++++++++++++----- lib/remote/configpackageutility.hpp | 1 + 4 files changed, 42 insertions(+), 5 deletions(-) diff --git a/lib/remote/apilistener.cpp b/lib/remote/apilistener.cpp index 713163d7d..04eb4c804 100644 --- a/lib/remote/apilistener.cpp +++ b/lib/remote/apilistener.cpp @@ -271,6 +271,11 @@ void ApiListener::Start(bool runtimeCreated) m_CleanupCertificateRequestsTimer->Start(); m_CleanupCertificateRequestsTimer->Reschedule(0); + m_ApiPackageIntegrityTimer = new Timer(); + m_ApiPackageIntegrityTimer->OnTimerExpired.connect(std::bind(&ApiListener::CheckApiPackageIntegrity, this)); + m_ApiPackageIntegrityTimer->SetInterval(300); + m_ApiPackageIntegrityTimer->Start(); + OnMasterChanged(true); } @@ -1563,6 +1568,27 @@ void ApiListener::UpdateActivePackageStagesCache() } } +void ApiListener::CheckApiPackageIntegrity() +{ + boost::mutex::scoped_lock lock(m_ActivePackageStagesLock); + + for (auto package : ConfigPackageUtility::GetPackages()) { + String activeStage; + try { + activeStage = ConfigPackageUtility::GetActiveStageFromFile(package); + } catch (const std::exception& ex) { + /* An error means that the stage is broken, try to repair it. */ + String activeStageCached = m_ActivePackageStages[package]; + + Log(LogInformation, "ApiListener") + << "Repairing broken API config package '" << package + << "', setting active stage '" << activeStageCached << "'."; + + ConfigPackageUtility::SetActiveStageToFile(package, activeStageCached); + } + } +} + void ApiListener::SetActivePackageStage(const String& package, const String& stage) { boost::mutex::scoped_lock lock(m_ActivePackageStagesLock); diff --git a/lib/remote/apilistener.hpp b/lib/remote/apilistener.hpp index dcd9e0424..5818ec994 100644 --- a/lib/remote/apilistener.hpp +++ b/lib/remote/apilistener.hpp @@ -124,6 +124,8 @@ private: Timer::Ptr m_ReconnectTimer; Timer::Ptr m_AuthorityTimer; Timer::Ptr m_CleanupCertificateRequestsTimer; + Timer::Ptr m_ApiPackageIntegrityTimer; + Endpoint::Ptr m_LocalEndpoint; static ApiListener::Ptr m_Instance; @@ -131,6 +133,7 @@ private: void ApiTimerHandler(); void ApiReconnectTimerHandler(); void CleanupCertificateRequestsTimerHandler(); + void CheckApiPackageIntegrity(); bool AddListener(const String& node, const String& service); void AddConnection(const Endpoint::Ptr& endpoint); diff --git a/lib/remote/configpackageutility.cpp b/lib/remote/configpackageutility.cpp index c385a5d93..b56bbb9fc 100644 --- a/lib/remote/configpackageutility.cpp +++ b/lib/remote/configpackageutility.cpp @@ -269,6 +269,17 @@ String ConfigPackageUtility::GetActiveStageFromFile(const String& packageName) return stage.Trim(); } +void ConfigPackageUtility::SetActiveStageToFile(const String& packageName, const String& stageName) +{ + boost::mutex::scoped_lock lock(GetStaticMutex()); + + String activeStagePath = GetPackageDir() + "/" + packageName + "/active-stage"; + + std::ofstream fpActiveStage(activeStagePath.CStr(), std::ofstream::out | std::ostream::binary | std::ostream::trunc); //TODO: fstream exceptions + fpActiveStage << stageName; + fpActiveStage.close(); +} + String ConfigPackageUtility::GetActiveStage(const String& packageName) { ApiListener::Ptr listener = ApiListener::GetInstance(); @@ -307,11 +318,7 @@ void ConfigPackageUtility::SetActiveStage(const String& packageName, const Strin listener->SetActivePackageStage(packageName, stageName); /* Also update the marker on disk for restarts. */ - String activeStagePath = GetPackageDir() + "/" + packageName + "/active-stage"; - - std::ofstream fpActiveStage(activeStagePath.CStr(), std::ofstream::out | std::ostream::binary | std::ostream::trunc); //TODO: fstream exceptions - fpActiveStage << stageName; - fpActiveStage.close(); + SetActiveStageToFile(packageName, stageName); } std::vector > ConfigPackageUtility::GetFiles(const String& packageName, const String& stageName) diff --git a/lib/remote/configpackageutility.hpp b/lib/remote/configpackageutility.hpp index ff1c2ad61..11b2b9977 100644 --- a/lib/remote/configpackageutility.hpp +++ b/lib/remote/configpackageutility.hpp @@ -35,6 +35,7 @@ public: static String GetActiveStageFromFile(const String& packageName); static String GetActiveStage(const String& packageName); static void SetActiveStage(const String& packageName, const String& stageName); + static void SetActiveStageToFile(const String& packageName, const String& stageName); static void ActivateStage(const String& packageName, const String& stageName); static void AsyncTryActivateStage(const String& packageName, const String& stageName, bool reload); From 502c43fb12678d6eae01583db32e3291279e6d35 Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Tue, 30 Apr 2019 12:19:35 +0200 Subject: [PATCH 5/5] Active packages: Don't try to fix broken config packages which are not cached yet --- lib/remote/apilistener.cpp | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/lib/remote/apilistener.cpp b/lib/remote/apilistener.cpp index 04eb4c804..f4903e6a6 100644 --- a/lib/remote/apilistener.cpp +++ b/lib/remote/apilistener.cpp @@ -1578,7 +1578,12 @@ void ApiListener::CheckApiPackageIntegrity() activeStage = ConfigPackageUtility::GetActiveStageFromFile(package); } catch (const std::exception& ex) { /* An error means that the stage is broken, try to repair it. */ - String activeStageCached = m_ActivePackageStages[package]; + auto it = m_ActivePackageStages.find(package); + + if (it == m_ActivePackageStages.end()) + continue; + + String activeStageCached = it->second; Log(LogInformation, "ApiListener") << "Repairing broken API config package '" << package