We don't hold db.mtx lock when trying to read db.blocks here so we need a read lock around this loop.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race.
The problem is that CleanTombstones() is a public method while reloadBlocks() is internal.
CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it.
So having a race between these two is not unexpected and we shouldn't really be testing this.
db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen.
If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
This partially reverts ae3d392aa9.
ae3d392aa9 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(),
previous db.mtx would be locked only during critical part of db.reloadBlocks().
The motivation was to protect against races:
9e0351e161 (r555699794)
The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods.
TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones().
To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks().
The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow.
While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call.
One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens.
When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block.
After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked.
Turns out that there is another lock that is more fine grained and aimed at this specific use case:
// cmtx ensures that compactions and deletions don't run simultaneously.
cmtx sync.Mutex
All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes.
We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx.
Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
This also improves the documentation in the following ways:
- Clarifies that `eval` requires no annotations.
- Clarifies that `eval_ordered` ignores annotations.
- Clarifies that `eval_ordered` does not work with matrix returns
(which could very well be created by instant queries).
- Clarifies that there are more `eval` commands than just `eval`.
- Improves wording for `eval_ordered`.
- Replaces `...` by the typographical correct `…`.
- Fixes a numerical error in an example.
Signed-off-by: beorn7 <beorn@grafana.com>
Besides making eval_ordered ignore annotations, this does the following:
- Adds a test to verify that eval_ordered indeed ignores an info
annotations now, while eval complains about it, eval_info recognizes
it and, eval_warn flags the missing of the warn annotation.
- Refactors the annotation check into its own method.
- Moves closing of the query to the appropriate place where it wasn't
so far.
Signed-off-by: beorn7 <beorn@grafana.com>
This commit introduced two field in `/status` endpoint:
- The node currently serving the request.
- The current server time for debugging time drift issues.
fixes#15394.
Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
* Rules: Store dependencies instead of boolean
To improve https://github.com/prometheus/prometheus/pull/15681 further, we'll need to store the dependencies and dependents of each
Right now, if a rule has both (at least 1) dependents and dependencies, it is not possible to determine the order to run the rules and they must all run sequentially
This PR only changes the dependents and dependencies attributes of rules, it does not implement a new topological sort algorithm
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
* Store a slice of Rule instead
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
* Add `BenchmarkRuleDependencyController_AnalyseRules` for future reference
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
---------
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
* `RuleConcurrencyController`: Add `SplitGroupIntoBatches` method
The concurrency implementation can now return a slice of concurrent rule batches
This allows for additional concurrency as opposed to the current interface which is limited by the order in which the rules have been loaded
Also, I removed the `concurrencyController` attribute from the group. That information is duplicated in the opts.RuleConcurrencyController` attribute, leading to some confusing behavior, especially in tests.
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
* Address PR comments
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
* Apply suggestions from code review
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
---------
Signed-off-by: Julien Duchesne <julien.duchesne@grafana.com>
Signed-off-by: Julien Duchesne <julienduchesne@live.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
BuildCompliantName was renamed to BuildCompliantMetricName, and it no longer takes UTF8 support into consideration. It focuses on building a metric name that follows Prometheus conventions.
A new function, BuildMetricName, was added to optionally add unit and type suffixes to OTLP metric names without translating any characters to underscores(_).
When creating dummy data for benchmarks, call `Commit()` periodically to
avoid growing the appender to enormous size.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
It was bumped during 3.0 with the adoption of log/slog and other dep
updates.
```
~/go/src/github.com/prometheus/prometheus (main [ ]) -> grep '^go' go.mod
go 1.22.0
```
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Exported the CheckpointPrefix constant to be used in other packages.
Updated references to the constant in db.go and checkpoint.go files.
This change improves code readability and maintainability.
Signed-off-by: johncming <johncming@yahoo.com>
Co-authored-by: johncming <conjohn668@gmail.com>
Returning 0 from Append means 'unknown', so the series is never cached.
Return arbitrary numbers instead.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>