* Reload all scrape pools concurrently
At the moment all scrape pools that need to be reloaded are reloaded one by one. While reloads are ongoing mtxScrape is locked.
For each pool that's being reloaded we need to wait until all targets are updated.
This whole process can take a while and the more scrape pools to reload the longer.
At the same time all pools are independent and there's no real reason to do them one-by-one.
Reload each pool in a seperate goroutine so we finish config reload as ASAP as possible and unlock the mtxScrape.
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
* Address PR review feedback
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
---------
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
Addresses https://github.com/prometheus/prometheus/issues/16371
This will help with migrating to native histograms with `convert_classic_histograms_to_nhcb` since users may still need to keep the classic histograms during a migration
Signed-off-by: chardch <otwordsne@gmail.com>
The new metric_name_escaping_scheme config option works in parallel with metric_name_validation_scheme and controls which escaping scheme is requested when scraping. When not specified, the scheme will request underscores if the validation scheme is set to legacy, and will request allow-utf-8 when the validation scheme is set to utf8. This setting allows users to allow utf8 names if they like, but explicitly request an escaping scheme rather than UTF-8.
Fixes https://github.com/prometheus/prometheus/issues/16034
Built on https://github.com/prometheus/prometheus/pull/16080
Signed-off-by: Owen Williams <owen.williams@grafana.com>
No need to validate, track, add sample, add exemplar if series isn't
going to be ingested. Basically this is the same as if this series was
dropped by relabelling.
Fixes#16217
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Global and Data Source configurations can specify legacy mode, but Prometheus now requires that the overall validation mode be set to UTF-8
Signed-off-by: Owen Williams <owen.williams@grafana.com>
We use the cache iteration number to detect when the same metric has
occurred twice in a scrape. We need to bump this number at the end of
every scrape, not just successful scrapes.
The `iter` number is also used:
* After a successful scrape, to delete older metrics and metadata.
* To detect when metadata changed in this scrape.
None of those additional cases is broken by incrementing the number
on error.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
..instead of *int64. This is as an optimization and ease of use. We already
accepted in many places (proto histograms, PRW) that CT (or any timestamp really) 0
means not set.
Signed-off-by: bwplotka <bwplotka@gmail.com>
The `Accept` header should not include `escape=allow-utf-8` unless
explicitly requested.
Conveniently, there was already a test covering this header's value, it
just required updating so it also asserts that this value in the header
is not set in the cases we don't expect it to be set. I also converted
those tests into table tests to help make failures clearer.
Issue: https://github.com/prometheus/prometheus/issues/15857
Signed-off-by: Matt Hughes <mhughes@uw.co.uk>
Change case order for switch scrapeLoop
This commit changes the ordering in error identification switch cases for better production performance and adds reasonings behind error switch case order as comments.
---------
Signed-off-by: Laimis Juzeliūnas <asnelaimis@gmail.com>
* model/textparse: Change parser interface Metric(...) string to Labels(...)
Simplified the interface given no one is using the return argument.
Renamed for clarity too.
Found and discussed https://github.com/prometheus/prometheus/pull/15731#discussion_r1950916842
Signed-off-by: bwplotka <bwplotka@gmail.com>
* Fixed comments; optimized not needed copy for om and text.
Signed-off-by: bwplotka <bwplotka@gmail.com>
---------
Signed-off-by: bwplotka <bwplotka@gmail.com>
Also:
* split benchmark functions to make sure no one compares across parsers.
* testdata file have meaningful names reflecting the type representation
* promtestdata.txt now has all types, taken directly from long running Prometheus (https://demo.do.prometheus.io/)
Needed for https://github.com/prometheus/prometheus/pull/15731
Signed-off-by: bwplotka <bwplotka@gmail.com>
The was a bug (due to confusion?) on the local metadata cache that is cached
by metric family not the series metric name. The fix is to NOT use that local cache
at all (it's still needed for current metadata API implementation, added TODO
on how we can get rid of it).
I went ahead and also rename Metric field in metadata structs to MetricFamily to make
clear it's not always __name__.
Signed-off-by: bwplotka <bwplotka@gmail.com>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Returning 0 from Append means 'unknown', so the series is never cached.
Return arbitrary numbers instead.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of storing discovered labels on every target, recompute them if
required. The `Target` struct now needs to hold some more data required
to recompute them, such as ScrapeConfig.
This moves the load from every Prometheus all of the time, to just when
someone views Service Discovery in the UI.
The way `PopulateLabels` is used changes; you are no longer expected to
call it with a part-populated `labels.Builder`.
The signature of `Target.Labels` changes to take a `labels.Builder`
instead of a `ScratchBuilder`, for consistency with `DiscoveredLabels`.
This will save a lot of work when many targets are filtered out in
relabeling. Combine with `keep_dropped_targets` to avoid ever computing
most labels for dropped targets.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were
races but likelyt only to replace 0 with 0, so not too severe.
I removed validation since I don't see anyone using our config.Config without Load.
I had to refactor one test that was doing it, all others use yaml config.
Fixes#15538
Previous attempt: https://github.com/prometheus/prometheus/pull/15634
Signed-off-by: bwplotka <bwplotka@gmail.com>
Don't forget to set `metrics` field as otherwise scraping will lead to a
nil panic in case the body size limit is reached.
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
Improves upon #15434, better resolves#15433.
This commit introduces a few changes to ensure safer handling of the
JSONFileLogger:
- the JSONFileLogger struct now implements the slog.Handler interface,
so it can directly be used to create slog Loggers. This pattern more
closely aligns with upstream slog usage (which is generally based around
handlers), as well as making it clear that devs are creating a whole new
logger when interacting with it (vs silently modifying internal configs
like it did previously).
- updates the `promql.QueryLogger` interface to be a union of the
methods of both the `io.Closer` interface and the `slog.Handler`
interface. This allows for plugging in/using slog-compatible loggers
other than the JSONFileLogger, if desired (ie, for downstream projects).
- introduces new `scrape.FailureLogger` interface; just like
`promql.QueryLogger`, it is a union of `io.Closer` and `slog.Handler`
interfaces. Similar logic applies to reasoning.
- updates tests where needed; have the `FakeQueryLogger` from promql's
engine_test implement the `slog.Handler`, improve JSONFileLogger test
suite, etc.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Resolves: #15433
When I converted prometheus to use slog in #14906, I update both the
`QueryLogger` interface, as well as how the log calls to the
`QueryLogger` were built up in `promql.Engine.exec()`. The backing
logger for the `QueryLogger` in the engine is a
`util/logging.JSONFileLogger`, and it's implementation of the `With()`
method updates the logger the logger in place with the new keyvals added
onto the underlying slog.Logger, which means they get inherited onto
everything after. All subsequent calls to `With()`, even in later
queries, would continue to then append on more and more keyvals for the
various params and fields built up in the logger. In turn, this causes
unbounded growth of the logger, leading to increased memory usage, and
in at least one report was the likely cause of an OOM kill. More
information can be found in the issue and the linked slack thread.
This commit does a few things:
- It was referenced in feedback in #14906 that it would've been better
to not change the `QueryLogger` interface if possible, this PR
proposes changes that bring it closer to alignment with the pre-3.0
`QueryLogger` interface contract
- reverts `promql.Engine.exec()`'s usage of the query logger to the
pattern of building up an array of args to pass at once to the end log
call. Avoiding the repetitious calls to `.With()` are what resolve the
issue with the logger growth/memory usage.
- updates the scrape failure logger to use the update `QueryLogger`
methods in the contract.
- updates tests accordingly
- cleans up unused methods
Builds and passes tests successfully. Tested locally and confirmed I
could no longer reproduce the issue/it resolved the issue.
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Need to extend `newTestScrapeMetrics`` to get at the Registry.
`gatherLabels` function could go upstream to `client_golang`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Before this change, logs would show like:
```
{...,"target":"http://localhost:8080/metrics","!BADKEY":"Get ..."}
```
After this change
```
{...,"msg":"Get ...","job_name":...,"target":...}
```
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
Fix some edge cases when OOO is enabled
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
* NHCB: scrape use state field and not booleans
From comment https://github.com/prometheus/prometheus/pull/14978#discussion_r1800898724
Also make compareLabels read only and move storeLabels to the first
processed classic histogram series.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Speed up TestConvertClassicHistogramsToNHCB 3x
Reduce the startup time and timeouts
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* lint fix
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
scrape: Remove implicit fallback to the Prometheus text format
Remove implicit fallback to the Prometheus text format in case of invalid/missing Content-Type and fail the scrape instead. Add ability to specify a `fallback_scrape_protocol` in the scrape config.
---------
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: Alex Greenbank <alex.greenbank@grafana.com>
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
* scrape/scrape_test.go: reduce the time it takes to reload the manager
TestNativeHistogramMaxSchemaSet took over 3x5s to complete because
there's a minimum reload interval.
I've made the testcases run in parallel and reduced the reload interval
to 10ms. Now the test runs in around 0.1-0.2 seconds.
Ran test 10000 times to check if it's flaky.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>