Commit graph

304 commits

Author SHA1 Message Date
György Krajcsovits
1f0cc810fd
fix(tsdb): wal/wbl do pass ST when requested
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:52:46 +01:00
György Krajcsovits
8e2169fc8d
fix(test): removed wrong function.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:51:38 +01:00
György Krajcsovits
cf929d6460
fix(chunkenc): get rid of Iterator.Encoding()
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:51:38 +01:00
György Krajcsovits
7991bcbff9
feat(tsdb): adopt head append changes from 18026
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:51:33 +01:00
György Krajcsovits
a5394a8434
fix(tsdb): extra chunk creation by using wrong default encoding
Also add tests by Claude Opus 4.5.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:50:01 +01:00
György Krajcsovits
e40f988f5c
feat(tsdb): allow appending to ST capable XOR chunk optionally
See PR description for uptodate info on details.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:50:00 +01:00
George Krajcsovits
dc8613df54
fix(tsdb): missing passing head option to wal/wbl write (#18113)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-02-18 18:49:25 +01:00
bwplotka
5ac1080a60 refactor: sed enableStStorage/enableSTStorage
Signed-off-by: bwplotka <bwplotka@gmail.com>
2026-02-17 11:11:46 +00:00
Owen Williams
b57f5b59b3
tsdb: ST-in-WAL: Counter implementation and benchmarks (#17671)
Initial implementation of https://github.com/prometheus/prometheus/issues/17790.
Only implements ST-per-sample for Counters. Tests and benchmarks updated.

Note: This increases the size of the RefSample object for all users, whether st-per-sample is turned on or not.

Signed-off-by: Owen Williams <owen.williams@grafana.com>
2026-02-12 13:17:50 -05:00
Bartlomiej Plotka
eefa6178fb
fix: fix rare race on empty head.initialized() vs head.initTime() (#17963)
* fix: fix rare race on empty head.initized() vs head.initTime()

Relates to https://github.com/prometheus/prometheus/issues/17941

Signed-off-by: bwplotka <bwplotka@gmail.com>

* Apply suggestions from code review

Co-authored-by: Owen Williams <owen.williams@grafana.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

* addressed comments

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Owen Williams <owen.williams@grafana.com>
2026-02-02 09:13:02 +00:00
Arve Knudsen
d9db76631d
tsdb: fix flaky TestWaitForPendingReadersInTimeRange tests (#17985)
Some checks are pending
buf.build / lint and publish (push) Waiting to run
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (push) Waiting to run
CI / Build Prometheus for all architectures (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
The tests were flaky because they used hard-coded time.After(550ms)
waits, which had only 50ms margin over WaitForPendingReadersInTimeRange's
500ms poll interval. On slow CI runners, this margin wasn't reliable.

Use synctest for deterministic time control:
- Wrap test logic in synctest.Test() to use fake time
- Use synctest.Wait() to let goroutines reach dormant state
- Use time.Sleep() to advance fake time past the poll interval
- No more timing-dependent assertions

This makes the tests both reliable and ~60x faster (0.05s vs 3s).

Fixes both TestWaitForPendingReadersInTimeRange and
TestWaitForPendingReadersInTimeRange_AppenderV2.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2026-02-01 15:52:26 +00:00
Ganesh Vernekar
9b444b57af tsdb: Add StaleHead and GC for stale series in the Head block
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2026-01-23 17:59:41 -08:00
Arve Knudsen
572f247b4d
tsdb: add auto-cleanup to newTestHead and remove redundant cleanup calls (#17890)
Add automatic cleanup to newTestHeadWithOptions so that heads created
with newTestHead are automatically closed when the test ends. This
simplifies test code by removing the need for manual cleanup in most
cases.

Changes:
- Add t.Cleanup in newTestHeadWithOptions immediately after creating
  the head, using _ = h.Close() to handle double-close gracefully
- Remove redundant t.Cleanup, defer, and explicit Close calls from
  tests that use newTestHead
- Add cleanup for heads created with NewHead directly in restart
  patterns (e.g., restartHeadAndVerifySeriesCounts, startHead)

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2026-01-19 12:57:05 +01:00
György Krajcsovits
adf734db7a
update remaining tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-01-14 13:15:16 +01:00
György Krajcsovits
28dca34f4f
auto update head sample use in tests
find . -name "*.go" -type f -exec sed -E -i \
's/([^[:alpha:]]sample\{)([^,{:]+,[^,]+,[^,]+,[^,]+\})/\10, \2/g' {} +

I've omitted tsdb/ooo_head.go from the commit because I'm also adding todo
there.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2026-01-14 13:15:13 +01:00
Ben Kochie
e14795bbf4
Remove copyright date from headers (#17785)
Remove copyright dates from various files as part of [PROM-50].

[PROM-50]: https://github.com/prometheus/proposals/blob/main/proposals/0050-remove-copyright-dates.md

Signed-off-by: SuperQ <superq@gmail.com>
2026-01-05 13:46:21 +01:00
Bartlomiej Plotka
be419d80dc
Merge pull request #17629 from prometheus/bwplotka/a2-tsdb
Some checks are pending
buf.build / lint and publish (push) Waiting to run
CI / Go tests (push) Waiting to run
CI / More Go tests (push) Waiting to run
CI / Go tests with previous Go version (push) Waiting to run
CI / UI tests (push) Waiting to run
CI / Go tests on Windows (push) Waiting to run
CI / Mixins tests (push) Waiting to run
CI / Build Prometheus for common architectures (push) Waiting to run
CI / Build Prometheus for all architectures (push) Waiting to run
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions
CI / Check generated parser (push) Waiting to run
CI / golangci-lint (push) Waiting to run
CI / fuzzing (push) Waiting to run
CI / codeql (push) Waiting to run
CI / Publish main branch artifacts (push) Blocked by required conditions
CI / Publish release artefacts (push) Blocked by required conditions
CI / Publish UI on npm Registry (push) Blocked by required conditions
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run
refactor(appenderV2)[PART1]: add AppenderV2 interface; add TSDB AppenderV2 implementation
2025-12-09 11:41:00 +00:00
bwplotka
0b70a07572 refactor(appenderV2): add TSDB AppenderV2 implementation
Signed-off-by: bwplotka <bwplotka@gmail.com>

tmp

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-12-09 10:39:43 +00:00
dongjiang
3239723098
Update golangci-lint and add modernize check (#17640)
Some checks failed
buf.build / lint and publish (push) Has been cancelled
CI / Go tests (push) Has been cancelled
CI / More Go tests (push) Has been cancelled
CI / Go tests with previous Go version (push) Has been cancelled
CI / UI tests (push) Has been cancelled
CI / Go tests on Windows (push) Has been cancelled
CI / Mixins tests (push) Has been cancelled
CI / Build Prometheus for common architectures (push) Has been cancelled
CI / Build Prometheus for all architectures (push) Has been cancelled
CI / Check generated parser (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
CI / fuzzing (push) Has been cancelled
CI / codeql (push) Has been cancelled
Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled
CI / Report status of build Prometheus for all architectures (push) Has been cancelled
CI / Publish main branch artifacts (push) Has been cancelled
CI / Publish release artefacts (push) Has been cancelled
CI / Publish UI on npm Registry (push) Has been cancelled
* add modernize check

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

* fix golangci lint

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>

---------

Signed-off-by: dongjiang1989 <dongjiang1989@126.com>
2025-12-05 09:29:10 +01:00
Bartlomiej Plotka
f50ff0a40a
feat: rename CreatedTimestamp to StartTimestamp (#17523)
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.

```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st

```

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-11-13 14:17:51 +00:00
Jan Fajerski
49254f45e9
Merge pull request #17351 from bboreham/simplify-precreate
Some checks failed
buf.build / lint and publish (push) Has been cancelled
CI / Go tests (push) Has been cancelled
CI / More Go tests (push) Has been cancelled
CI / Go tests with previous Go version (push) Has been cancelled
CI / UI tests (push) Has been cancelled
CI / Go tests on Windows (push) Has been cancelled
CI / Mixins tests (push) Has been cancelled
CI / Build Prometheus for common architectures (push) Has been cancelled
CI / Build Prometheus for all architectures (push) Has been cancelled
CI / Check generated parser (push) Has been cancelled
CI / golangci-lint (push) Has been cancelled
CI / fuzzing (push) Has been cancelled
CI / codeql (push) Has been cancelled
Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled
CI / Report status of build Prometheus for all architectures (push) Has been cancelled
CI / Publish main branch artifacts (push) Has been cancelled
CI / Publish release artefacts (push) Has been cancelled
CI / Publish UI on npm Registry (push) Has been cancelled
TSDB: Allocate series ID after seriesLifecycleCallback; simplify code.
2025-11-07 14:39:51 +01:00
Ben Kochie
48956f60d7
Update modernize (#17471)
Apply additional Go modernize tool improvements.

Signed-off-by: SuperQ <superq@gmail.com>
2025-11-04 05:13:49 +00:00
Arve Knudsen
df8a9076b9
tsdb: Reduce TestHeadSeriesChunkRace number of iterations to 100 (#17410)
Reduce tsdb.TestHeadSeriesChunkRace number of iterations from 1000 to
100, to stop this test from timing out under CI.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-10-28 13:57:20 +01:00
Bryan Boreham
42b52ecc4b TSDB: Allocate series ID after seriesLifecycleCallback
This callback is not used by Prometheus, but in downstream projects it
is wasteful to allocate an ID only to abandon it.

Remove lengthy commment which I feel is distracting from the flow.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-10-17 11:06:22 +01:00
Bryan Boreham
2852c9c431 [REFACTOR] TSDB: Simplify series creation
Refactor the code so that everything proceeds linearly.

Also renamed `getOrSet` to `setUnlessAlreadySet` to emphasise that the
caller is expecting it not to be set.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-10-17 10:46:22 +01:00
beorn7
ad7d1aed99 Phase out native histogram feature flag
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .

This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.

To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.

Further implications:

- The default scrape protocols now depend on the
  `scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".

Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-15 14:50:52 +02:00
Patryk Prus
dc3e6af91a
tsdb: Fix appended sample count metrics when converting float staleness markers to histograms (#17241)
tsdb: Fix appended sample count metrics when converting histogram staleness markers

Signed-off-by: Patryk Prus <p@trykpr.us>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2025-09-30 16:49:54 +00:00
György Krajcsovits
30f941c57c
fix(wal): ignore invalid native histogram schemas on load
Reduce the resolution of histograms as needed and ignore invalid
schemas while emitting a warning log.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-24 11:41:25 +02:00
beorn7
7e82bdb75b tsdb: Fix commit order for mixed-typed series
Fixes https://github.com/prometheus/prometheus/issues/15177

The basic idea here is to divide the samples to be commited into (sub)
batches whenever we detect that the same series receives a sample of a
type different from the previous one. We then commit those batches one
after another, and we log them to the WAL one after another, so that
we hit both birds with the same stone. The cost of the stone is that
we have to track the sample type of each series in a map. Given the
amount of things we already track in the appender, I hope that it
won't make a dent. Note that this even addresses the NHCB special case
in the WAL.

This does a few other things that I could not resist to pick up on the
go:

- It adds more zeropool.Pools and uses the existing ones more
  consistently. My understanding is that this was merely an oversight.
  Maybe the additional pool usage will compensate for the increased
  memory demand of the map.

- Create the synthetic zero sample for histograms a bit more
  carefully. So far, we created a sample that always went into its own
  chunk. Now we create a sample that is compatible enough with the
  following sample to go into the same chunk. This changed the test
  results quite a bit. But IMHO it makes much more sense now.

- Continuing past efforts, I changed more namings of `Samples` into
  `Floats` to keep things consistent and less confusing. (Histogram
  samples are also samples.) I still avoided changing names in other
  packages.

- I added a few shortcuts `h := a.head`, saving many characters.

TODOs:

- Address @krajorama's TODOs about commit order and staleness handling.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7
747c5ee2b1 Apply analyzer "modernize" to the whole codebase
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.

This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.

Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 14:48:41 +02:00
Bryan Boreham
498f63e60b
Merge pull request #17029 from pr00se/wal-checkpoint-dropped-samples
TSDB: use timestamps rather than WAL segment numbers to track how long deleted series should be retained in checkpoints
2025-08-20 11:15:10 +01:00
Ganesh Vernekar
a86d9a3858
Merge pull request #16925 from prometheus/codesome/stale-series-tracking
tsdb: Track stale series in the Head block based on stale sample
2025-08-19 15:35:19 -07:00
Ganesh Vernekar
3904b3cd5f Restore stale series count from chunk snapshots
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar
b29ce3e489 Restore stale series count on WAL replay
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar
0c3d3d7466 Test the stale series tracking in Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
pipiland2612
82a4b12507 Add t.parallel() for ./tsdb
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-12 14:12:42 +02:00
Patryk Prus
676f7665fa
Use testutil.RequireEqual to handle dedupelabels in test
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:52:03 -04:00
Patryk Prus
ead6dc32b9
Fix test
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:34:56 -04:00
Patryk Prus
5cb0192626
Address linter errors
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:25:14 -04:00
Patryk Prus
0fea41ed53
Refactor keep function to work for both agent and non-agent implementations
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:12:47 -04:00
Patryk Prus
6875022873
Update head.walExpiries with record timestamps during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:12:47 -04:00
Patryk Prus
218558f543
Store mint rather than the last WAL segment in head.walExpiries during head GC
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:12:41 -04:00
Matthieu MOREL
cef219c31c chore: enable unused-receiver rule from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-08-04 09:43:33 +00:00
George Krajcsovits
1d79f0f47e
chore(tsdb): add a few more testcases for unlock of unlocked mtx 16332 (#16848)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-07-09 16:24:46 +02:00
Banana Duck
89f011ba13
fix: unlock of unlocked mutex (#16332)
* fix: unlock on unlocked mutex

Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru>

* test coverage

Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru>

---------

Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru>
Co-authored-by: alhanaqtah.usama <alhanaqtah.usama@DEV-254.local>
2025-07-09 15:37:55 +02:00
Andre Branchizio
b07b552139
[PERF] TSDB: Pass down label value limit into implementation (#16158)
* allow limiting label values calls

Signed-off-by: Andre Branchizio <andrejbranch@gmail.com>
2025-05-06 18:54:48 +01:00
Arve Knudsen
e7e3ab2824
Fix linting issues found by golangci-lint v2.0.2 (#16368)
* Fix linting issues found by golangci-lint v2.0.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-03 19:05:13 +02:00
Bryan Boreham
ca416c580c
Merge branch 'main' into slicelabels
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-05-02 10:31:57 +01:00
Bryan Boreham
8487ed8145
Merge pull request #16440 from bboreham/faster-benchmark-loadwls
[TESTS] TSDB: Faster WAL benchmarks
2025-04-22 15:59:03 +01:00
Bryan Boreham
1d4b1d76a5 [TESTS] More efficient label creation in BenchmarkLoadWLs
Use the Builder abstraction instead of going via a map.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-04-16 18:02:47 +01:00