Commit graph

675 commits

Author SHA1 Message Date
Dimitar Dimitrov
8b45384a44
Make templateData a struct
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2024-12-24 11:10:13 +02:00
Julien Duchesne
615195372d
Ruler: Move inner eval function (#15688)
Refactoring in prevision of https://github.com/prometheus/prometheus/pull/15681

By moving the `eval` function outside of the for loop, we can modify the rule execution order more easily (without huge git changes)

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2024-12-17 21:11:55 +00:00
Julien Duchesne
e2f037e554
rules: Add new RuleEvaluationTimeSum field to groups (#15672)
* feat(ruler): Add new `RuleEvaluationTimeSum` field to groups
Coupled with a metric: `rule_group_last_rule_duration_sum_seconds`

This will give us more observability into how fast a group runs with or without concurrency

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Update rules/group.go

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Apply suggestions from code review

Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

* Remove `in seconds`. A duration is a duration

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>

---------

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
2024-12-13 21:42:46 +00:00
Julien Duchesne
7802ca263d
RuleDependencyController: Fix for indeterminate conditions (#15560)
The dependency map being empty meant that all the rules were being set as having no dependencies or dependents. Which is the opposite of what we want
Added two new tests that verify the behavior, they failed before the fix, running all the rules concurrently

Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
2024-12-13 16:48:29 +00:00
Anand Rajagopal
ceb2f653ba Add an option to restore new rule groups added to existing rule manager
Signed-off-by: Anand Rajagopal <anrajag@amazon.com>
2024-12-13 03:15:53 +00:00
Arve Knudsen
c2e28f21ba
rules.NewGroup: Fix when no logger is passed (#15356)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-11-21 16:53:06 +01:00
Matthieu MOREL
af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Vanshika
cccbe72514
TSDB: Fix some edge cases when OOO is enabled (#14710)
Fix some edge cases when OOO is enabled

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-10-23 17:34:28 +02:00
Bryan Boreham
70e2d23027
Merge pull request #11474 from clwluvw/group-label
[FEATURE] rules: add labels at group level
2024-10-21 14:47:12 +01:00
Dimitar Dimitrov
5b2cfc5e45
Merge branch 'main' into dimitar/ruler/unsupported-logger
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2024-10-09 16:53:21 +02:00
TJ Hoplock
6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Dimitar Dimitrov
b544127d9a
ruler: correct logging of alert name
The error messages we see in the embedded ruler in Mimir right now are contain `alert="unsupported value type"` and `data="unsupported value type"`

```
ts=2024-10-04T09:59:56.900015691Z caller=alerting.go:384 level=warn component=ruler insight=true user=XXX alert="unsupported value type" msg="Expanding alert template failed" err="error executing template __alert_service_mesh_down_alert: template: __alert_service_mesh_down_alert:1:139: executing \"__alert_service_mesh_down_alert\" at <.registry_name>: can't evaluate field registry_name in type struct { Labels map[string]string; ExternalLabels map[string]string; ExternalURL string; Value interface {} }" data="unsupported value type"
```

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2024-10-04 12:19:12 +02:00
Nathan Baulch
50cd453c8f
chore: Fix typos (#14868)
* Fix typos

---------

Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-10 22:32:03 +02:00
Arve Knudsen
99204f23ee Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-29 09:52:54 +02:00
riskrole
406bf775aa chore: fix some comments
Signed-off-by: riskrole <yuhang@before.tech>
2024-08-28 11:26:57 +08:00
Arve Knudsen
c9a460d570 Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-26 12:17:10 +02:00
Max Amin
84b819a69f
feat: add Google cloud roundtripper for remote write (#14346)
* feat: Google Auth for remote write

Signed-off-by: Max Amin <maxamin@google.com>

---------

Signed-off-by: Max Amin <maxamin@google.com>
2024-07-30 16:25:19 +01:00
Seena Fallah
f253d36361 rule: allow merging labels from group level
Support merging labels from groups to rule labels

Signed-off-by: Seena Fallah <seenafallah@gmail.com>
2024-07-26 20:18:05 +02:00
Arve Knudsen
7c873004c7 Merge remote-tracking branch 'prometheus/main' into arve/close-engine 2024-07-26 11:48:33 +02:00
gotjosh
465891cc56
Rules: Refactor concurrency controller interface (#14491)
* Rules: Refactor concurrency controller interface

Even though the main purpose of this refactor is to modify the interface of the concurrency controller to accept a Context. I did two drive-by modifications that I think are sensible:

1. I have moved the check for dependencies on rules to the controller itself - this aligns with how the controller should behave as it is a deciding factor on wether we should run concurrently or not.
2. I cleaned up some unused methods from the days of the old interface before #13527 changed it.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-07-22 14:11:18 +01:00
Arve Knudsen
fbc9eddfaf Refactor engine creation in tests
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-07-14 13:58:51 +02:00
Arve Knudsen
fec6adadcd Merge remote-tracking branch 'prometheus/main' into arve/close-engine 2024-07-14 13:19:11 +02:00
Saswata Mukherjee
398f42de5f
Add label-matcher support to Rules API (#10194)
* Add label-matcher support to Rules API

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Implement suggestions

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Match any matcherSet instead of all

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Don't treat labels.Labels as slice

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Remove non-templated check and fix tests

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Update docs

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comments

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* fix comment

Signed-off-by: Yijie Qin <qinyijie@amazon.com>

* Add comment for matching logic, fix tests after rebase

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>

---------

Signed-off-by: Saswata Mukherjee <saswataminsta@yahoo.com>
Signed-off-by: Yijie Qin <qinyijie@amazon.com>
Co-authored-by: Yijie Qin <qinyijie@amazon.com>
2024-07-10 13:18:29 +01:00
Arve Knudsen
e8ae8cf012 Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-07-01 10:47:21 +02:00
Raphael Silva
e0c9b2ee19 Fix linting errors
Signed-off-by: Raphael Silva <rapphil@gmail.com>
2024-06-28 23:44:08 +00:00
Raphael Silva
cd5a7b5020 Make rules Manager Update method no-op after Close
This has to be done because Close and Update methods are accessed concurrently.

Signed-off-by: Raphael Silva <rapphil@gmail.com>
2024-06-28 23:39:46 +00:00
Jeanette Tan
dda5f48c9e Merge branch 'main' into nhcb-review-2 2024-06-20 22:50:00 +08:00
Oleg Zaytsev
4c1e71fa0b
Reduce the flakiness of TestAsyncRuleEvaluation (#14300)
* Reduce the flakiness of TestAsyncRuleEvaluation

This tests sleeps for 15 millisecond per rule group, and then comprares
the entire execution time to be smaller than a multiple of that delay.

The ruleCount is 6, so it assumes that the test will come to the
assertions in less than 90ms.

Meanwhile, the Github's Windows runner:
- ...Huh, oh? What? How much time? milliwhat? Sorry I don't speak that.

TL;DR, this increases the delay to 250 millisecond. This won't prevent
the test from being flaky, but will reduce the flakiness by several
orders of magnitude and hopefully won't be an issue anymore.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

* Make tests parallel

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-06-14 15:02:46 +02:00
Arve Knudsen
b7320ef636 Merge remote-tracking branch 'prometheus/main' into arve/close-engine 2024-06-14 10:51:35 +02:00
Jeanette Tan
14f8dded39 Merge branch 'main' into nhcb
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-06-07 19:17:14 +08:00
Jeanette Tan
9adc1699c3 fix according to code review
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-06-07 18:50:59 +08:00
Marco Pracucci
edd558884b
Fix Group.Equals() to take in account the new queryOffset too (#14273)
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-06-06 18:47:36 +01:00
Arve Knudsen
e57aac8084 Merge remote-tracking branch 'prometheus/main' into arve/close-engine
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-06-05 11:37:44 +02:00
gotjosh
37b408c6cd
Feature: Allow configuration of a rule evaluation delay (#14061)
* [PATCH] Allow having evaluation delay for rule groups

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Fix lint

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Move the option to ManagerOptions

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* [PATCH] Include evaluation_delay in the group config

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>

* Fix comments

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add a server configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Appease the linter #1

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add the new server flag documentation

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve documentation of the new flag and configuration

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Use named parameters for clarity on the `Rule` interface

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `initial` to the flag help

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Change the CHANGELOG area from `ruler` to `rules`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename evaluation_delay to `rule_query_offset`/`query_offset` and make it a global configuration option.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

E Your branch is up to date with 'origin/gotjosh/evaluation-delay'.

* more docs

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve wording on CHANGELOG

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Add `RuleQueryOffset` to the default config in tests in case it changes

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Update docs/configuration/recording_rules.md

Co-authored-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Rename `RuleQueryOffset` to `QueryOffset` when in the group context.

Signed-off-by: gotjosh <josue.abreu@gmail.com>

* Improve docstring and documentation on the `rule_query_offset`

Signed-off-by: gotjosh <josue.abreu@gmail.com>

---------

Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
Signed-off-by: gotjosh <josue.abreu@gmail.com>
Co-authored-by: Ganesh Vernekar <ganeshvern@gmail.com>
Co-authored-by: Julius Volz <julius.volz@gmail.com>
2024-05-30 11:49:50 +01:00
Arve Knudsen
0cc99e677a promql.Engine: Add Close method
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-05-28 12:01:47 +02:00
Julien
d1eff95faf
Merge pull request #14100 from bboreham/windows-flake
[TEST] Rules: Sleep 15ms to fit Windows behaviour better
2024-05-16 12:04:42 +02:00
Oleksandr Redko
f10c3454e9 Enable perfsprint linter and fix up code
Signed-off-by: Oleksandr Redko <oleksandr.red+github@gmail.com>
2024-05-15 17:51:05 +03:00
Bryan Boreham
10eb23bd6b [TEST] Rules: Sleep 15ms to fit Windows behaviour better
On Windows, Go will sleep 15ms if you ask for less.  TestAsyncRuleEvaluation
compares actual delay to the nominal time, so using 15ms should work
better on Windows, and be hardly noticeable elsewhere.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-14 17:45:42 +01:00
Jeanette Tan
f028496133 Merge branch 'main' into nhcb
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-05-14 16:20:15 +08:00
Bryan Boreham
3fd24d1cd7
Merge pull request #13999 from bboreham/extract-promqltest
[Test] Extract most PromQL test code into separate packages
2024-05-09 13:23:11 +01:00
Bryan Boreham
8fd96241ab test: add promqltest package references
To packages outside of promql.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-05-08 16:08:04 +01:00
Jeanette Tan
796b1bbfde Merge branch 'main' into nhcb
Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-05-08 19:11:39 +08:00
gotjosh
c10186eeea
BUGFIX: Mark the rule's restoration process as completed always (#14048)
* BUGFIX: Mark the rule's restoration process as completed always

In https://github.com/prometheus/prometheus/pull/13980 I introduced a change to reduce the number of queries executed when we restore alert statuses.

With this, the querying semantics changed as we now need to go through all series before we enter the alert restoration loop and I missed the fact that exiting early when there are no rules to restore would lead to an incomplete restoration.

An alert being restored is used as a proxy for "we're now ready to write `ALERTS/ALERTS_FOR_SERIES` metrics" so as a result we weren't writing the series if we didn't restore anything the first time around.
---------

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-05-03 14:23:46 +01:00
gotjosh
1dd0bff4f1
Merge pull request #13980 from prometheus/gotjosh/restore-only-with-rule-query
Rule Manager: Only query once per alert rule when restoring alert state
2024-04-30 15:29:21 +01:00
gotjosh
379dec9d36
querier.Select cannot return a nil series set.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 13:09:30 +01:00
gotjosh
05ca082b07
Rename alerts to expectedAlerts in the test case input
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:43:09 +01:00
gotjosh
f63dbc3db2
Remove duplicated sorted and assignment of expected alerts.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:39:07 +01:00
gotjosh
63b09944b8
Use labels.Len() instead of manually counting the labels
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:25:48 +01:00
gotjosh
ccfafae36d
Rename QueryforStateSeries to QueryForStateSeries
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:19:18 +01:00
gotjosh
151f6e0ed6
Add an assertion on the count of alerts before adding an active alert
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-30 12:17:56 +01:00
George Robinson
dde2e5eb73
Improve comments around resending resolved alerts (#13990)
Signed-off-by: George Robinson <george.robinson@grafana.com>
2024-04-25 14:18:50 +02:00
gotjosh
cc2207148e
fix typo
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 19:20:57 +01:00
gotjosh
2de2fee035
Allow the result map for the series set before hand with a hint.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 19:10:34 +01:00
gotjosh
6cfc584308
- Add a changelog entry
- Improve variable name of the map produced by the series set

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 19:02:47 +01:00
gotjosh
fa75985c1c
Use the string representation of the labels instead of the hash
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh
276201598c
Fix tests and a bug with the series lookup logic.
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh
e6dcbd2e26
bug: nil check against the series set not errors
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh
4daaa59c08
Rule Manager: Only query once per alert rule when restoring alert state
Prometheus restores alert state between restarts and updates. For each rule, it looks at the alerts that are meant to be active and then queries the `ALERTS_FOR_STATE` series for _each_ alert within the rules.

If the alert rule has 120 instances (or series) it'll execute the same query with slightly different labels.

This PR changes the approach so that we only query once per alert rule and then match the corresponding alert that we're about to restore against the series-set. While the approach might use a bit more memory at start-up (if even?) the restore proccess is only ran once per restart so I'd consider this a big win.

This builds on top of #13974

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 18:46:05 +01:00
gotjosh
5beb2fe005
Improve the metric description
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 15:24:35 +01:00
gotjosh
381a77ac1e
Change variable name to restoreStartTime from now and introduce a log line to record total time
Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-24 14:21:11 +01:00
György Krajcsovits
bcafa5f1f9 Merge remote-tracking branch 'upstream/main' into update-nhcb 2024-04-24 11:06:59 +02:00
gotjosh
e7219e3d36
Rule Manager: Add rule_group_last_restore_duration_seconds to measure restore time per rule group
When a rule group changes or prometheus is restarted we need to ensure we restore the active alerts that were firing for a corresponding rule, for that Prometheus uses the `ALERTS_FOR_STATE` series to query the previous state and restore it. If a given rule has high cardinality (think 100s of 1000s for series) this proccess can take a bit of time - this is the first of a series of PRs to improve this problem and I'd like to start with exposing the time it takes to restore a rule group as a gauge.

Signed-off-by: gotjosh <josue.abreu@gmail.com>
2024-04-23 09:57:08 +01:00
Björn Rabenstein
4ec5c25393
Merge pull request #13731 from suntala/suntala/native-histogram-template
histograms: support expansion of native histogram values in templating
2024-04-11 13:24:26 +02:00
Matthieu MOREL
6f595c6762
golangci-lint: enable whitespace linter (#13905)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-04-11 09:27:54 +01:00
suntala
44f385fd51 Support expansion of native histogram values in alert templates
Co-authored-by: Aleks Fazlieva <britishrum@users.noreply.github.com>
Signed-off-by: suntala <arati.rana@grafana.com>
2024-03-26 22:30:01 +01:00
György Krajcsovits
a3d1a46eda Merge branch 'main' into nhcb 2024-03-22 14:51:48 +01:00
Łukasz Mierzwa
3bb27c33e9 Use consistent keys for logs
Rule warnings are logged with numDropped=N while every other component uses num_dropped=N:

```
notifier/notifier.go:		level.Warn(n.logger).Log("msg", "Alert batch larger than queue capacity, dropping alerts", "num_dropped", d)
notifier/notifier.go:		level.Warn(n.logger).Log("msg", "Alert notification queue full, dropping alerts", "num_dropped", d)
storage/remote/write_handler.go:		_ = level.Warn(h.logger).Log("msg", "Error on ingesting out-of-order exemplars", "num_dropped", outOfOrderExemplarErrs)
rules/group.go:				level.Warn(logger).Log("msg", "Error on ingesting out-of-order result from rule evaluation", "num_dropped", numOutOfOrder)
rules/group.go:				level.Warn(logger).Log("msg", "Error on ingesting too old result from rule evaluation", "num_dropped", numTooOld)
rules/group.go:				level.Warn(logger).Log("msg", "Error on ingesting results from rule evaluation with different value but same timestamp", "num_dropped", numDuplicates)
scrape/scrape.go:		level.Warn(sl.l).Log("msg", "Error on ingesting out-of-order samples", "num_dropped", appErrs.numOutOfOrder)
scrape/scrape.go:		level.Warn(sl.l).Log("msg", "Error on ingesting samples with different value but same timestamp", "num_dropped", appErrs.numDuplicates)
scrape/scrape.go:		level.Warn(sl.l).Log("msg", "Error on ingesting samples that are too old or are too far into the future", "num_dropped", appErrs.numOutOfBounds)
scrape/scrape.go:		level.Warn(sl.l).Log("msg", "Error on ingesting out-of-order exemplars", "num_dropped", appErrs.numExemplarOutOfOrder)
```

Rename numDropped to num_dropped for consistency.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-03-21 15:59:20 +00:00
Charles Korn
4e77e8e5ef
Allow using alternative PromQL engines for rule evaluation
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-03-06 14:54:33 +11:00
machine424
f477e0539a
Move from golang.org/x/exp/slices into slices now that we only support Go >= 1.21
Prevent adding back golang.org/x/exp/slices.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-02-28 14:54:53 +01:00
György Krajcsovits
5d0a0a7542 Add custom buckets to native histogram model (#13592)
* add custom buckets to native histogram model
* simple copy for custom bounds
* return errors for unsupported add/sub operations
* add test cases for string and update appendhistogram in scrape to account for new schema
* check fields which are supposed to be unused but may affect results in equals
* allow appending custom buckets histograms regardless of max schema

Signed-off-by: Jeanette Tan <jeanette.tan@grafana.com>
2024-02-28 14:06:43 +01:00
Bryan Boreham
3716326f3f rules: call NewScratchBuilder
Need to initialize ScratchBuilder with a SymbolTable.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-26 11:45:25 +00:00
Bryan Boreham
c0e36e6bb3 Standardise exemplar label as "trace_id"
This is consistent with the OpenTelemetry standard, and an example in OpenMetrics.

https://github.com/open-telemetry/opentelemetry-specification/blob/89aa01348139/specification/metrics/data-model.md#exemplars
https://github.com/OpenObservability/OpenMetrics/blob/138654493130/specification/OpenMetrics.md#exemplars-1

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-15 14:20:08 +00:00
Bryan Boreham
17f48f2b3b Tests: use replacement DeepEquals in more places
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:32:33 +00:00
Bryan Boreham
39af788dbd Tests: use replacement DeepEquals using go-cmp
Use DeepEqual replacement using go-cmp, which is more flexible.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-02-08 19:30:20 +00:00
Marco Pracucci
5ee3fbe825
Decouple ruler dependency controller from concurrency controller
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-02-02 10:06:37 +01:00
Marco Pracucci
cbbbd6e70a
Remove superfluous nil check in Group.metrics
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:21:57 +01:00
Marco Pracucci
046cd7599f
Introduced sequentialRuleEvalController
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:19:18 +01:00
Marco Pracucci
23f89c18b2
Improved RuleConcurrencyController interface doc
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:18:29 +01:00
Marco Pracucci
2764c46531
Added more test cases to TestDependenciesEdgeCases
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:18:03 +01:00
Marco Pracucci
52bc568d04
Add more test cases to TestDependenciesEdgeCases
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:17:13 +01:00
Marco Pracucci
21a03dc018
Simplify the design to update concurrency controller once the rule evaluation has done
Signed-off-by: Marco Pracucci <marco@pracucci.com>
2024-01-29 10:16:31 +01:00
Danny Kopping
7aa3b10c3f
Block until all rules, both sync & async, have completed evaluating
Updated & added tests
Review feedback nits
Return empty map if not indeterminate
Use highWatermark to track inflight requests counter
Appease the linter
Clarify feature flag

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping
f922534c4d
Refactoring for performance, and to allow controller to be overridden
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping
94cdfa30cd
Refactoring
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping
0dc7036db3
Optimising dependencies/dependents funcs to not produce new slices each request
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:41 +01:00
Danny Kopping
e7758d187e
Refactor concurrency control
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:08:39 +01:00
Danny Kopping
940f83a540
Implementation
NOTE:
Rebased from main after refactor in #13014

Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2024-01-29 10:07:15 +01:00
Filip Petkovski
583f3e587c
Optimize histogram iterators (#13340)
Optimize histogram iterators

Histogram iterators allocate new objects in the AtHistogram and
AtFloatHistogram methods, which makes calculating rates over long
ranges expensive.

In #13215 we allowed an existing object to be reused
when converting an integer histogram to a float histogram. This commit follows
the same idea and allows injecting an existing object in the AtHistogram and
AtFloatHistogram methods. When the injected value is nil, iterators allocate
new histograms, otherwise they populate and return the injected object.

The commit also adds a CopyTo method to Histogram and FloatHistogram which
is used in the BufferedIterator to overwrite items in the ring instead of making
new copies.

Note that a specialized HPoint pool is needed for all of this to work 
(`matrixSelectorHPool`).

---------

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2024-01-23 17:02:14 +01:00
Filip Petkovski
10a82f87fd
Enable reusing memory when converting between histogram types
The 'ToFloat' method on integer histograms currently allocates new memory
each time it is called.

This commit adds an optional *FloatHistogram parameter that can be used
to reuse span and bucket slices. It is up to the caller to make sure the
input float histogram is not used anymore after the call.

Signed-off-by: Filip Petkovski <filip.petkovsky@gmail.com>
2023-12-08 10:22:59 +01:00
Matthieu MOREL
9c4782f1cc
golangci-lint: enable testifylint linter (#13254)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2023-12-07 11:35:01 +00:00
Björn Rabenstein
a43669e611
Merge pull request #12928 from alexandear/ci-enable-godot
ci(lint): enable godot; append dot at the end of comments
2023-11-01 17:15:41 +01:00
Oleksandr Redko
fa90ca46e5 ci(lint): enable godot; append dot at the end of comments
Signed-off-by: Oleksandr Redko <Oleksandr_Redko@epam.com>
2023-10-31 19:53:38 +02:00
Charles Korn
9a8dbf06bc
Address PR feedback
Co-authored-by: Julien Pivotto <roidelapluie@o11y.eu>
Signed-off-by: Charles Korn <charleskorn@users.noreply.github.com>
2023-10-31 09:56:05 +11:00
Charles Korn
667a1efb04
Add trace ID to log lines emitted during rule evaluation
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-10-26 16:14:54 +11:00
Charles Korn
fc132a4557
Use common logger instance to reduce duplication in Group.Eval()
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2023-10-26 16:14:12 +11:00
Danny Kopping
498b836654
Refactoring manager.go into separate concerns
Signed-off-by: Danny Kopping <danny.kopping@grafana.com>
2023-10-21 11:11:11 +02:00
Goutham Veeramachaneni
86729d4d7b
Update exp package (#12650) 2023-09-21 22:53:51 +02:00
Arve Knudsen
6daee89e5f
Add context argument to Querier.Select (#12660)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2023-09-12 12:37:38 +02:00
Michael Hoffmann
4d8e380269
promql: allow tests to be imported (#12050)
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
2023-08-18 20:48:59 +02:00
Julien Pivotto
782e6f64fb
Merge pull request #11295 from dimitarvdimitrov/dimitar/simplify-evalTimestamp
Simplify rule group's EvalTimestamp formula
2023-07-18 13:21:20 +02:00