prometheus

mirror of https://github.com/prometheus/prometheus.git synced 2026-06-09 08:32:26 -04:00

Author	SHA1	Message	Date
George Krajcsovits	2aefd00c24	tsdb/chunkenc: take a generic Appender as the prev parameter (#18857 ) The chunkenc.Appender interface's AppendHistogram and AppendFloatHistogram methods used to require a typed previous appender (HistogramAppender or FloatHistogramAppender). Callers were forced to type-assert s.app to that concrete type before each call, discarding it (and the cross-chunk counter-reset signal it carries) whenever the actual concrete type didn't match -- for example when a chunk used a different histogram encoding. Change the signature to accept the generic chunkenc.Appender interface and move the concrete-type check inside each implementation, onto the code path that actually needs the previous appender's state (the new-chunk branch where setCounterResetHeader runs). The check goes through small private interfaces -- histogramAppendable and floatHistogramAppendable -- so any appender type that exposes the appropriate appendable() method can serve as prev, and types that don't (xor, xor2, or a histogram appender on the opposite-kind code path) are silently ignored. This prepares the ground for #18609, which introduces HistogramSTAppender and FloatHistogramSTAppender. Both embed their non-ST counterparts and will satisfy the new interfaces automatically, so they can be passed as prev without a special case in the caller. Callers in tsdb/head_append.go and tsdb/ooo_head.go are simplified accordingly. The Appender consumers in storage/series.go and tsdb/querier.go were already passing nil and continue to do so unchanged. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2026-06-09 10:41:43 +02:00
Yuri Nikolic	ed12b940fd	tsdb: capture chunk-boundary samples in CompactStaleHead Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>	2026-06-03 15:58:05 +02:00
Đurica Yuri Nikolić	5ddb7e49e3	tsdb: store maxt timestamp in walExpiries on stale series eviction (#18847 ) [BUGFIX] tsdb: store a millisecond timestamp (not a WAL segment number) in `walExpiries` when a series is evicted via `CompactStaleHead`/`CompactSelectedSeries`, so the series' label record is correctly retained in the next WAL checkpoint and replays cleanly. Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>	2026-06-03 14:56:17 +01:00
RoyS	f3d653fb5c	chore: fix typos in comments (#18834 ) * chore: fix typos in comments Fix three minor typos in source comments: - scrape: mimicks -> mimics - tsdb: descibes -> describes - ui/codemirror-promql: theses -> these Signed-off-by: RoySerbi <roy676564@gmail.com> * ci: retrigger CI to clear known 32-bit flake Empty commit to retrigger CI. The previous run failed only on 'Go tests for 32-bit x86' due to the known intermittent flake in TestRemoteWrite_PerQueueMetricsAfterRelabeling (see #17356), which is unrelated to this comment-only PR. Signed-off-by: RoySerbi <roy676564@gmail.com> --------- Signed-off-by: RoySerbi <roy676564@gmail.com>	2026-06-02 15:34:02 +02:00
Bryan Boreham	87866e0c3f	Merge pull request #18838 from prometheus/tsdb/fix-histogram-pending-commit-condition tsdb: fix pendingCommit condition for classic histogram append	2026-06-02 10:32:06 +01:00
György Krajcsovits	c2b77f753b	tsdb: add regression tests for histogram pendingCommit on append error Add TestAppendHistogramErrorDoesNotSetPendingCommit (V1) and TestHeadAppenderV2_HistogramErrorDoesNotSetPendingCommit (V2), each covering the integer and float histogram branches. The integer V1 branch previously set s.pendingCommit on the error path, which left the flag stuck on existing series whenever an append was rejected (e.g. ErrOutOfOrderSample). Because the failed sample is never added to the appender's batch, Commit/Rollback never clears pendingCommit for that series, and head GC at tsdb/head.go treats it as still in use. The V1 integer subtest fails on main without the prior commit; both subtests pass with it. The V2 paths already use err == nil and the V2 test is a lock-in; inverting the V2 condition locally confirms the test would catch a similar regression there. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2026-06-02 09:33:40 +02:00
Weixie Cui	56918ba034	tsdb: fix pendingCommit condition for classic histogram append AppendHistogram used err != nil when deciding to set pendingCommit for integer histograms, while the float histogram branch uses err == nil. Align the classic histogram path so pendingCommit is set only after a successful appendableHistogram check, matching appendableFloatHistogram. Signed-off-by: Weixie Cui <cuiweixie@gmail.com>	2026-06-02 09:15:05 +02:00
Iheanacho Amarachi Sharon	f9ba49a9b6	tsdb: complete TestHistogramCounterResetHeader for integer histograms (#18289 ) Signed-off-by: Amarachi Iheanacho <amarachi.iheanacho@siderolabs.com>	2026-06-02 09:04:42 +02:00
Bartlomiej Plotka	178c53d83c	Merge pull request #18813 from miguelbernadi/improve-hostogram-allocations tsdb/record: eliminate prev pointer escapes in V2 histogram WAL decoder	2026-06-01 10:54:02 +01:00
Bryan Boreham	489d90e717	Merge pull request #18735 from colega/implement-head-stale-index-reader-without-sacrificing-performance tsdb: Implement `headStaleIndexReader` methods properly	2026-06-01 10:48:08 +01:00
Miguel Bernabeu Diaz	423c7878de	Update tsdb/record/record.go Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com> Signed-off-by: Miguel Bernabeu Diaz <miguelbernadi@gmail.com>	2026-06-01 11:13:00 +02:00
Oleg Zaytsev	84b870106e	Remove unnecessary method and rename Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2026-06-01 10:19:06 +02:00
Miguel Bernabeu Diaz	b4db611d52	tsdb/record: eliminate prev pointer escapes in V2 histogram decoder histogramSamplesV2 and floatHistogramSamplesV2 tracked the previous sample's Ref and ST via a *RefHistogramSample pointer (prev). Taking the address of a loop-local variable (prev = &rh) forced the compiler to heap-allocate rh on every iteration; the first iteration also allocated a separate sentinel struct. The pointed-to fields were only ever read as two int64 scalars, so the pointer added zero semantic value. Replace prev with two scalar variables (prevRef, prevST) and a boolean sentinel. rh no longer has its address taken and stays on the stack. This affects every caller of dec.HistogramSamples that produces V2 records (EnableSTStorage=true): WAL replay, the WAL watcher (remote write tail), and checkpoint creation. Benchmarks (go test -count=6 -benchmem, benchstat): BenchmarkDecodeHistogramSamples (tsdb/record) │ before │ after │ │ allocs/op │ allocs/op vs base │ buckets=0/v2 │ 2.001k ± 0%│ 1.000k ± 0% -50.02% (p=0.002)│ buckets=4/v2 │ 4.001k ± 0%│ 3.000k ± 0% -25.02% (p=0.002)│ buckets=16/v2 │ 4.001k ± 0%│ 3.000k ± 0% -25.02% (p=0.002)│ │ before │ after │ │ B/op │ B/op vs base │ buckets=0/v2 │ 187.5Ki ± 0%│ 156.2Ki ± 0% -16.68% (p=0.002)│ buckets=4/v2 │ 250.0Ki ± 0%│ 218.8Ki ± 0% -12.51% (p=0.002)│ buckets=16/v2 │ 437.5Ki ± 0%│ 406.2Ki ± 0% -7.15% (p=0.002)│ BenchmarkLoadWLs end-to-end WAL replay (tsdb), stStorage=true only │ before │ after │ │ allocs/op │ allocs/op vs base │ histogramSeriesPct=1.000 │ 19.70M ± 0% │ 14.90M ± 0% -24.39% (p=0.002)│ histogramSeriesPct=0.500 │ 10.47M ± 0% │ 8.06M ± 0% -23.00% (p=0.002)│ │ before │ after │ │ B/op │ B/op vs base │ histogramSeriesPct=1.000 │ 1.539Gi ± 0%│ 1.394Gi ± 0% -9.42% (p=0.002)│ histogramSeriesPct=0.500 │ 1051.3Mi ± 0%│ 975.1Mi ± 0% -7.25% (p=0.002)│ │ before │ after │ │ sec/op │ sec/op vs base │ histogramSeriesPct=1.000 │ 824.9m ± 0% │ 762.6m ± 1% -7.55% (p=0.002)│ histogramSeriesPct=0.500 │ 488.6m ± 1% │ 451.4m ± 1% -7.61% (p=0.002)│ V1 paths and float-only shapes are unchanged (p >> 0.05 throughout). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Miguel Bernabeu Diaz <miguel.bernabeu@coralogix.com>	2026-05-28 21:40:52 +02:00
Miguel Bernabeu Diaz	badf9da96a	tsdb/record,tsdb: add native histogram WAL decode benchmarks Add two benchmark components to measure the native histogram decode hot path, which is shared by WAL replay, WAL watcher (remote write), and checkpoint creation. tsdb/record: BenchmarkDecodeHistogramSamples isolates the V1 and V2 histogram decoder paths across bucket counts (0, 4, 16), giving a precise per-sample allocation signal for decoder changes. tsdb: BenchmarkLoadWLs gains two new shapes: - all-histogram (histogramSeriesPct=1.0, bucketsPerHistogram=8): mirrors the existing "In between" float shape for direct comparison. - mixed (histogramSeriesPct=0.5, bucketsPerHistogram=8): models a deployment partway through migrating to native histograms. Both shapes are parameterised over stStorage (V1 vs V2 encoding) via the existing enableSTStorage loop, so benchstat can show the V1/V2 delta without additional test infrastructure. The subtest names include histogramSeriesPct and bucketsPerHistogram only when non-zero, leaving existing float-only subtest names unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Miguel Bernabeu Diaz <miguel.bernabeu@coralogix.com>	2026-05-28 21:40:12 +02:00
Owen Williams	134051d480	tsdb: Add TODOs for ST-in-WAL work (#18773 ) Some checks are pending buf.build / lint and publish (push) Waiting to run Details CI / Go tests (push) Waiting to run Details CI / More Go tests (push) Waiting to run Details CI / Go tests for 32-bit x86 (push) Waiting to run Details CI / Go tests for Prometheus upgrades and downgrades (push) Waiting to run Details CI / Go tests with previous Go version (push) Waiting to run Details CI / UI tests (push) Waiting to run Details CI / Go tests on Windows (push) Waiting to run Details CI / Mixins tests (push) Waiting to run Details CI / Compliance testing (push) Waiting to run Details CI / Build Prometheus for common architectures (push) Waiting to run Details CI / Build Prometheus for all architectures (push) Waiting to run Details CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details CI / Check generated parser (push) Waiting to run Details CI / golangci-lint (push) Waiting to run Details CI / fuzzing (push) Waiting to run Details CI / codeql (push) Waiting to run Details CI / Publish main branch artifacts (push) Blocked by required conditions Details CI / Publish release artefacts (push) Blocked by required conditions Details CI / Publish UI on npm Registry (push) Blocked by required conditions Details govulncheck / Run govulncheck (push) Waiting to run Details Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details Comment-only changes. This will make it easier for me to track my work. Signed-off-by: Owen Williams <owen.williams@grafana.com>	2026-05-22 13:37:35 -04:00
Julien Pivotto	fae25e1405	tsdb: replace default encoding cases with explicit cases in snapshot encode/decode Replace the catch-all default branch in encodeToSnapshotRecord and decodeSeriesFromChunkSnapshot with an explicit EncFloatHistogram case and a default that panics (encode) or returns an error (decode), making unknown encodings immediately visible rather than silently mishandling them. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-05-21 12:47:55 +02:00
Owen Williams	5fe52643a0	tsdb: Rewrite TestCancelCompactions to run faster (#18632 ) * Rewrite TestCancelCompactions to run faster --------- Signed-off-by: Owen Williams <owen.williams@grafana.com> Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>	2026-05-21 09:06:10 +02:00
Julien	0f5727f420	tsdb: count EncXOR2 chunks as float samples; fix snapshot encoding (#18739 ) Some checks are pending buf.build / lint and publish (push) Waiting to run Details CI / Go tests (push) Waiting to run Details CI / More Go tests (push) Waiting to run Details CI / Go tests for 32-bit x86 (push) Waiting to run Details CI / Go tests for Prometheus upgrades and downgrades (push) Waiting to run Details CI / Go tests with previous Go version (push) Waiting to run Details CI / UI tests (push) Waiting to run Details CI / Go tests on Windows (push) Waiting to run Details CI / Mixins tests (push) Waiting to run Details CI / Compliance testing (push) Waiting to run Details CI / Build Prometheus for common architectures (push) Waiting to run Details CI / Build Prometheus for all architectures (push) Waiting to run Details CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details CI / Check generated parser (push) Waiting to run Details CI / golangci-lint (push) Waiting to run Details CI / fuzzing (push) Waiting to run Details CI / codeql (push) Waiting to run Details CI / Publish main branch artifacts (push) Blocked by required conditions Details CI / Publish release artefacts (push) Blocked by required conditions Details CI / Publish UI on npm Registry (push) Blocked by required conditions Details govulncheck / Run govulncheck (push) Waiting to run Details Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details EncXOR2 is a float encoding and must be treated like EncXOR in all places that enumerate chunk types: - compact.go: NumFloatSamples was not incremented for EncXOR2 chunks during compaction, leading to under-reported block stats. - head_wal.go: encodeToSnapshotRecord fell through to the default (FloatHistogram) branch for EncXOR2 head chunks, which would corrupt chunk snapshots; the decode path already handled EncXOR2 correctly. - ooo_head.go: update stale comment to mention EncXOR2. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-05-20 15:20:37 +00:00
Oleg Zaytsev	959bc1c90e	Unexport sortedStaleSeriesRefsNoOOOData Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2026-05-20 10:07:38 +02:00
Oleg Zaytsev	58898e8031	Implement headStaleIndexReader methods properly Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>	2026-05-20 09:41:16 +02:00
Bartlomiej Plotka	ccf503efe6	Merge pull request #18629 from prometheus/owilliams/flakefix tsdb: fix init race that lets initialized() return true before maxTime is set	2026-05-18 14:21:59 +02:00
Arve Knudsen	43e5fc6a24	perf(tsdb): inline oversize-chunk check in populateChunksFromIterable (#18699 ) Some checks failed buf.build / lint and publish (push) Has been cancelled Details CI / Go tests (push) Has been cancelled Details CI / More Go tests (push) Has been cancelled Details CI / Go tests for Prometheus upgrades and downgrades (push) Has been cancelled Details CI / Go tests with previous Go version (push) Has been cancelled Details CI / UI tests (push) Has been cancelled Details CI / Go tests on Windows (push) Has been cancelled Details CI / Mixins tests (push) Has been cancelled Details CI / Compliance testing (push) Has been cancelled Details CI / Build Prometheus for common architectures (push) Has been cancelled Details CI / Build Prometheus for all architectures (push) Has been cancelled Details CI / Check generated parser (push) Has been cancelled Details CI / golangci-lint (push) Has been cancelled Details CI / fuzzing (push) Has been cancelled Details CI / codeql (push) Has been cancelled Details govulncheck / Run govulncheck (push) Has been cancelled Details Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled Details CI / Report status of build Prometheus for all architectures (push) Has been cancelled Details CI / Publish main branch artifacts (push) Has been cancelled Details CI / Publish release artefacts (push) Has been cancelled Details CI / Publish UI on npm Registry (push) Has been cancelled Details The oversize-chunk trigger introduced in #18692 was implemented as a closure defined inside the per-sample loop in populateChunksFromIterable and invoked once at the `if` condition. Replace it with a plain conditional and hoist `len(currentChunk.Bytes())` out of the switch so the two encoding cases don't repeat the same expression. The new shape preserves the original `\|\|` short-circuit: the size check is only evaluated when neither the encoding nor the start-timestamp capability forces a new chunk, which also keeps `currentChunk` non-nil at the point of read. `gcflags=-m=2` reports the closure body inlined and the symbol table shows no separate `func1` symbol, yet benchstat shows a measurable speedup. The most likely explanation: the closure body inlines, but the `funcval` struct (capturing `currentChunk` and `currentValueType`) is still stack-constructed each iteration — invisible to escape analysis, but a real per-iteration cost in a hot loop. Benchmark, `go test -count=6 -benchmem -bench=BenchmarkQuerierSelectWithOutOfOrder -benchtime=5s -run=^$ ./tsdb/`, Intel Xeon Platinum 8280 @ 2.70 GHz (linux/amd64), 1M-series head, query selectivity varies: │ main │ optimized │ │ sec/op │ sec/op vs base │ Head/1of1000000-16 301.5m ± 4% 257.0m ± 4% -14.74% (p=0.002 n=6) Head/10of1000000-16 305.6m ± 3% 260.4m ± 2% -14.80% (p=0.002 n=6) Head/100of1000000-16 303.9m ± 2% 259.7m ± 2% -14.54% (p=0.002 n=6) Head/1000of1000000-16 303.8m ± 2% 267.0m ± 2% -12.13% (p=0.002 n=6) Head/10000of1000000-16 318.1m ± 1% 278.9m ± 8% -12.33% (p=0.002 n=6) Head/100000of1000000-16 364.1m ± 7% 352.8m ± 4% ~ (p=0.065 n=6) Head/1000000of1000000-16 1.115 ± 2% 1.089 ± 26% ~ (p=0.394 n=6) geomean 377.8m 337.3m -10.71% allocs/op and B/op unchanged. The two largest-selectivity cases trend faster but are dominated by the per-sample append cost so the relative delta is smaller and lost in variance. `TestChunkQuerier_OverlappingInOrderAndOOOChunks` continues to exercise the overflow path. ```release-notes NONE ``` Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2026-05-15 11:40:31 +00:00
George Krajcsovits	8a9f4ff440	fix(tsdb): chunk overflow on ooo query (#18692 ) Some checks are pending buf.build / lint and publish (push) Waiting to run Details CI / Go tests (push) Waiting to run Details CI / More Go tests (push) Waiting to run Details CI / Go tests for Prometheus upgrades and downgrades (push) Waiting to run Details CI / Go tests with previous Go version (push) Waiting to run Details CI / UI tests (push) Waiting to run Details CI / Go tests on Windows (push) Waiting to run Details CI / Mixins tests (push) Waiting to run Details CI / Compliance testing (push) Waiting to run Details CI / Build Prometheus for common architectures (push) Waiting to run Details CI / Build Prometheus for all architectures (push) Waiting to run Details CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details CI / Check generated parser (push) Waiting to run Details CI / golangci-lint (push) Waiting to run Details CI / fuzzing (push) Waiting to run Details CI / codeql (push) Waiting to run Details CI / Publish main branch artifacts (push) Blocked by required conditions Details CI / Publish release artefacts (push) Blocked by required conditions Details CI / Publish UI on npm Registry (push) Blocked by required conditions Details govulncheck / Run govulncheck (push) Waiting to run Details Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details * fix(tsdb): chunk overflow on ooo query Protect against and fix overflow of chunks with more than 2^16-1 samples in case we're recoding chunks due to for example in-order and ooo samples overlap during compaction or query. Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>	2026-05-14 22:22:52 +02:00
Arve Knudsen	cf4505c6cd	tsdb: skip entire stripes in mmapHeadChunks via per-stripe ready count (#18541 ) * tsdb: extract stripeSeries.refStripe helper Extract the repeated ref-to-stripe-index calculation into a method on stripeSeries, replacing five inline copies that used two different casting styles (int and uint64). The helper computes with uint64 internally so it is correct on 32-bit architectures. * tsdb: skip entire stripes in mmapHeadChunks via per-stripe ready count Add a per-stripe mmapReady counter to stripeSeries that tracks how many series in each stripe have headChunkCount >= 2 (i.e., are ready for mmapping). mmapHeadChunks skips stripes where the counter is zero, avoiding the RLock and map iteration entirely. --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2026-05-14 18:24:56 +02:00
Owen Williams	9ec9b1e3c6	Comments to explain what's going on Signed-off-by: Owen Williams <owen.williams@grafana.com>	2026-05-12 11:00:12 -04:00
Owen Williams	9e16c38214	Merge remote-tracking branch 'origin/main' into owilliams/flakefix	2026-05-12 10:42:09 -04:00
Owen Williams	da1f89e736	tsdb(wal): st-per-sample for histograms initial code and benchmarks (#18221 ) Implements ST for Histograms and Float Histograms (and their custom bucket cousins) in WAL. New tests, new benchmarks. Part of https://github.com/prometheus/prometheus/issues/17790 ```release-notes [CHANGE] Adds Start Time value to all WAL Histogram samples in memory, and therefore may increase memory usage. ``` Signed-off-by: Owen Williams <owen.williams@grafana.com>	2026-05-06 14:33:03 -04:00
Owen Williams	1cdee43726	tsdb: fix init race that lets initialized() return true before maxTime is set initTime previously set minTime first and maxTime second. Because Head.initialized() keys only off minTime, a concurrent Head.Appender call could observe initialized() == true while maxTime was still math.MinInt64. h.appender() then computes appendableMinValidTime as MaxTime() - chunkRange/2, which underflows to a large positive number and rejects in-range samples with ErrOutOfBounds. Set maxTime first, then minTime. The CAS-loser wait now spins on minTime instead of maxTime, preserving the existing anti-deadlock timeout. AppenderV2 shares the same gate, so this single change covers both paths. The TestHead_InitAppenderRace_ErrOutOfBounds test added in #17963 is now stable across 1000 iterations (and 100 iterations under -race). Relates to #17941 Builds on #17963 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Signed-off-by: Owen Williams <owen.williams@grafana.com>	2026-05-06 11:16:47 -04:00
Owen Williams	a5876a0143	tsdb: Reduce test flakiness (#18577 ) I have seen some flakiness in these tests, including timeouts. LLM suggested these fixes to make them more deterministic. They look good to me. Signed-off-by: Owen Williams <owen.williams@grafana.com>	2026-04-27 10:19:15 +02:00
Denys Sedchenko	ca578101af	feat(tsdb/agent): Implement checkpoint based on series in memory (#17948 ) Some checks failed buf.build / lint and publish (push) Has been cancelled Details CI / Go tests (push) Has been cancelled Details CI / More Go tests (push) Has been cancelled Details CI / Go tests for Prometheus upgrades and downgrades (push) Has been cancelled Details CI / Go tests with previous Go version (push) Has been cancelled Details CI / UI tests (push) Has been cancelled Details CI / Go tests on Windows (push) Has been cancelled Details CI / Mixins tests (push) Has been cancelled Details CI / Compliance testing (push) Has been cancelled Details CI / Build Prometheus for common architectures (push) Has been cancelled Details CI / Build Prometheus for all architectures (push) Has been cancelled Details CI / Check generated parser (push) Has been cancelled Details CI / golangci-lint (push) Has been cancelled Details CI / fuzzing (push) Has been cancelled Details CI / codeql (push) Has been cancelled Details govulncheck / Run govulncheck (push) Has been cancelled Details Scorecards supply-chain security / Scorecards analysis (push) Has been cancelled Details CI / Report status of build Prometheus for all architectures (push) Has been cancelled Details CI / Publish main branch artifacts (push) Has been cancelled Details CI / Publish release artefacts (push) Has been cancelled Details CI / Publish UI on npm Registry (push) Has been cancelled Details Adds CheckpointFromInMemorySeries option for agent.Options to enable a faster checkpoint implementation that skips segment re-read and just uses in-memory data instead. * feat: impl agent-specific checkpoint dir * feat: impl ActiveSeries interface * feat: use new checkpoint impl * feat: hide new checkpoint impl behind a feature flag * feat: add benchmark * feat: add benchstat case * feat: use feature flag in bench * feat: use same labels for persisted state and append * feat: set WAL segment size * feat: add checkpoint size metric and bump series size * feat: wal replay test * feat: expose new checkpoint opts in cmd flags * feat: update cli doc * add ActiveSeries and DeletedSeries doc Signed-off-by: x1unix <9203548+x1unix@users.noreply.github.com> Signed-off-by: Denys Sedchenko <9203548+x1unix@users.noreply.github.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>	2026-04-24 19:42:26 +02:00
Julien Pivotto	f69db5bc54	storage: introduce search interface with scoring and filtering Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-04-23 15:05:48 +02:00
Julien	3b9caf6564	Merge pull request #18569 from roidelapluie/roidelapluie/labelnames-limit Some checks are pending buf.build / lint and publish (push) Waiting to run Details CI / Go tests (push) Waiting to run Details CI / More Go tests (push) Waiting to run Details CI / Go tests for Prometheus upgrades and downgrades (push) Waiting to run Details CI / Go tests with previous Go version (push) Waiting to run Details CI / UI tests (push) Waiting to run Details CI / Go tests on Windows (push) Waiting to run Details CI / Mixins tests (push) Waiting to run Details CI / Compliance testing (push) Waiting to run Details CI / Build Prometheus for common architectures (push) Waiting to run Details CI / Build Prometheus for all architectures (push) Waiting to run Details CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details CI / Check generated parser (push) Waiting to run Details CI / golangci-lint (push) Waiting to run Details CI / fuzzing (push) Waiting to run Details CI / codeql (push) Waiting to run Details CI / Publish main branch artifacts (push) Blocked by required conditions Details CI / Publish release artefacts (push) Blocked by required conditions Details CI / Publish UI on npm Registry (push) Blocked by required conditions Details govulncheck / Run govulncheck (push) Waiting to run Details Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details tsdb: apply LabelNames limit from LabelHints in blockBaseQuerier	2026-04-23 12:28:19 +02:00
Julien Pivotto	a5b5a3329c	tsdb: apply LabelNames limit from LabelHints in blockBaseQuerier Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-04-23 11:05:17 +02:00
George Krajcsovits	c84b0acdb4	test(tsdb): add OOO error coverage for ST zero sample appends (#18554 ) * test(tsdb): add OOO error coverage for ST zero sample appends Add unit tests exercising the out-of-order error paths in AppendSTZeroSample, AppendHistogramSTZeroSample (AppenderV1), and the best-effort ST injection in AppenderV2.Append. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * make format Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> * test(tsdb): add TestHeadAppenderV2_BestEffortSTZeroSample_OOO The three OOO cases added to TestHeadAppenderV2_Append_EnableSTAsZeroSample use a single appender so headChunks is nil at append time; the zero sample enters the batch and is rejected silently in commitFloats, never reaching the error-handling branch at line 374 of bestEffortAppendSTZeroSample. Add a dedicated test that commits the first sample before appending the second. This makes headChunks non-nil, so appendFloat/appendHistogram/ appendFloatHistogram returns ErrOutOfOrderSample at append time and the branch at line 374 is actually executed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> --------- Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>	2026-04-23 09:48:12 +02:00
Arve Knudsen	c7b2210ac3	tsdb: cache collected head chunks on ChunkReader for O(1) lookup (#18302 ) tsdb: cache collected head chunks on ChunkReader for O(1) lookup The query path calls s.chunk() once per chunk meta via ChunkOrIterableWithCopy. Each call walks the head chunks linked list from the head to the target position. For a series with N head chunks iterated oldest-first, total work is O(N²). Cache the collected []*memChunk slice on headChunkReader, keyed by series ref, head pointer, and mmapped chunks length. Collected once per series under lock; reused on subsequent chunk lookups for the same series. The backing array is reused across series (zero alloc after first use). Series with 0 or 1 head chunks skip the cache entirely to avoid per-series overhead that dominates for typical workloads where most series have a single head chunk. The cache is gated behind an enableCache flag, toggled via an optional chunkCacheToggler interface only when hints.Step > 0 (range queries). Instant queries only need one chunk per series, so the cache overhead is not recouped. Also replace O(N²) linked-list traversals in appendSeriesChunks with O(N) collectHeadChunks + slice iteration, and thread reusable headChunksBuf through the index reader paths to avoid per-series allocations. --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>	2026-04-17 18:34:41 +02:00
Arve Knudsen	98809e40c6	tsdb: Skip clean series during periodic head chunk mmap (#18272 ) tsdb: Skip clean series during periodic head chunk mmap The periodic mmapHeadChunks cycle previously acquired a per-series lock on every series, even though typically >99% have nothing to mmap. This was identified as a CPU bottleneck in Grafana Mimir. Add a headChunkCount field (sync/atomic.Uint32) to memSeries that tracks the number of head chunks. It is incremented in cutNewHeadChunk and the histogram new-chunk paths, and reset by mmapChunks and truncateChunksBefore. mmapHeadChunks uses a lock-free Load to skip series with fewer than 2 head chunks, avoiding the per-series lock for clean series. sync/atomic.Uint32 (4 bytes) is used instead of go.uber.org/atomic (8 bytes) to fit in existing struct padding without growing memSeries. Chunk counts are bounded by the 3-byte field in HeadChunkRef, so cannot overflow uint32. Also fix pre-existing comment inaccuracies in the touched code: headChunks.next -> headChunks.prev, mmapHeadChunks() -> mmapChunks() in the doc comment, and a grammar error. --------- Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>	2026-04-14 17:11:35 +02:00
Julien Pivotto	2828c543bc	tsdb: reduce chunk segment size in TestDiskFillingUpAfterDisablingOOO The test only writes ~80 samples, so the default 512MB chunk segment pre-allocation during compaction is unnecessary. Use 1MB instead to avoid large file allocations on constrained CI environments. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-04-02 12:23:55 +02:00
Ayoub Mrini	0dd834e924	Merge pull request #18406 from machine424/depll test: migrate TestDelayedCompaction to synctest to eliminate flakiness	2026-04-01 16:40:50 +02:00
Björn Rabenstein	4280662cdf	Merge pull request #18304 from crawfordxx/fix-typos-in-comments Fix typos in comments and metric help strings	2026-04-01 13:45:59 +02:00
Jorge Creixell	4b562bba6e	tsdb: fix prometheus_tsdb_head_chunks going negative after WAL replay (#18401 ) Some checks are pending buf.build / lint and publish (push) Waiting to run Details CI / Go tests (push) Waiting to run Details CI / More Go tests (push) Waiting to run Details CI / Go tests for Prometheus upgrades and downgrades (push) Waiting to run Details CI / Go tests with previous Go version (push) Waiting to run Details CI / UI tests (push) Waiting to run Details CI / Go tests on Windows (push) Waiting to run Details CI / Mixins tests (push) Waiting to run Details CI / Compliance testing (push) Waiting to run Details CI / Build Prometheus for common architectures (push) Waiting to run Details CI / Build Prometheus for all architectures (push) Waiting to run Details CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details CI / Check generated parser (push) Waiting to run Details CI / golangci-lint (push) Waiting to run Details CI / fuzzing (push) Waiting to run Details CI / codeql (push) Waiting to run Details CI / Publish main branch artifacts (push) Blocked by required conditions Details CI / Publish release artefacts (push) Blocked by required conditions Details CI / Publish UI on npm Registry (push) Blocked by required conditions Details Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details * tsdb: fix prometheus_tsdb_head_chunks going negative after WAL replay When truncateStaleSeries deletes a series (writing a full-range tombstone to the WAL) and the same label set is immediately re-created, WAL replay queues the following sequence on the same processor shard for the shared memSeries pointer: reset(mSeries, M mmappedChunks, walRef=old) deleteSeriesByID(old) reset(mSeries, N mmappedChunks, walRef=new) deleteSeriesByID correctly subtracts M from the gauge but does not clear series.mmappedChunks. The subsequent reset subtracts M again, driving prometheus_tsdb_head_chunks negative when M > N. Fix by setting series.mmappedChunks = nil in deleteSeriesByID after accounting for those chunks. Fixes #10884 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> Signed-off-by: Jorge Creixell <jcreixell@gmail.com> * Simplify test - Re-use appending helper - Cleanup comments Signed-off-by: Jorge Creixell <jcreixell@gmail.com> * Improve comments in test Signed-off-by: Jorge Creixell <jcreixell@gmail.com> * Fix formatting Signed-off-by: Jorge Creixell <jcreixell@gmail.com> * Improve comment Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com> Signed-off-by: Jorge Creixell <jcreixell@gmail.com> --------- Signed-off-by: Jorge Creixell <jcreixell@gmail.com> Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>	2026-04-01 11:30:33 +02:00
Rushabh Mehta	a2172f91c1	tsdb: Find the last series ID on startup from the last series id file and WAL scan (#18333 ) * Add logic to Head.Init(...) for fast startup Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Add unit tests Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Empty commit to retrigger CI Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Empty commit to retrigger CI Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Make readSeriesStateFile return a struct directly, fix small nits, remove test Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Fix test for readSeriesStateFile function Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Fix some more nits, add extra testcase Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> --------- Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com>	2026-03-31 21:45:53 -07:00
Bartlomiej Plotka	fb38463dfb	Merge pull request #18321 from atoulme/aix aix: support the aix/ppc64 compilation target	2026-03-31 16:42:20 +02:00
Julien	4b4d5157b8	chunkenc: add tests for XOR2 active ST delta and value branches (#18363 ) Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-03-31 15:15:48 +02:00
Kyle Eckhart	37d85980a3	tsdb/agent: fix getOrCreate race (#18292 ) * tsdb/agent: fix race in getOrCreate and consolidate series lookup * tsdb/agent: fix transition window race in SetUnlessAlreadySet * tsdb/agent: address review feedback and improve BenchmarkGetOrCreate Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com> --------- Signed-off-by: Kyle Eckhart <kgeckhart@users.noreply.github.com>	2026-03-31 15:08:58 +02:00
machine424	86215cf91f	test: migrate TestDelayedCompaction to synctest to eliminate flakiness The previous implementation relied on real wall-clock time and busy-loops (time.Sleep + polling loops) to detect when compaction had finished, making it both slow and flaky especially on busy CI envs and also on Windows due to timer imprecision). Now both the subtests run on windows. The delay value can be increased (1s → 5s) at zero cost to test runtime Also cleaned up shared logic into small helpers and split the no-delay and delay-enabled cases into separate subtests for clarity. Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2026-03-30 23:58:08 +02:00
machine424	dcfb8ce59c	chore: remove util/testutil/synctest now that we use Go>=1.25 Signed-off-by: machine424 <ayoubmrini424@gmail.com>	2026-03-30 19:48:39 +02:00
Julien Pivotto	3856195bb8	tsdb: use float64 for retention percentage The retention.percentage config field was typed as uint, which silently truncated fractional values. Setting percentage: 1.5 in prometheus.yml resulted in a retention of 1%, with no warning or error. Remove the redundant MaxPercentage > 100 clamp in main.go; the config UnmarshalYAML already returns an error for out-of-range values before this code is reached. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-03-26 12:39:22 +01:00
Julien Pivotto	7a1a5e285f	chunkenc: add extra tests Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-03-25 09:59:12 +01:00
Julien Pivotto	d8607cbd9b	tsdb/chunkenc: optimise XOR2 and varbit hot paths Use writeBitsFast instead of writeBits in putVarbitInt/putVarbitUint, combining prefix and value into a single call per bucket. Inline the common fast paths in XOR2 Append to avoid encodeJoint and putVarbitInt calls for the typical dod=0 and 13-bit dod cases. Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>	2026-03-25 09:09:46 +01:00
Rushabh Mehta	df61021436	tsdb: Add `series_state.json` file to `wal/` directory to track state (#18303 ) Some checks are pending buf.build / lint and publish (push) Waiting to run Details CI / Go tests (push) Waiting to run Details CI / More Go tests (push) Waiting to run Details CI / Go tests with previous Go version (push) Waiting to run Details CI / UI tests (push) Waiting to run Details CI / Go tests on Windows (push) Waiting to run Details CI / Mixins tests (push) Waiting to run Details CI / Compliance testing (push) Waiting to run Details CI / Build Prometheus for common architectures (push) Waiting to run Details CI / Build Prometheus for all architectures (push) Waiting to run Details CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details CI / Check generated parser (push) Waiting to run Details CI / golangci-lint (push) Waiting to run Details CI / fuzzing (push) Waiting to run Details CI / codeql (push) Waiting to run Details CI / Publish main branch artifacts (push) Blocked by required conditions Details CI / Publish release artefacts (push) Blocked by required conditions Details CI / Publish UI on npm Registry (push) Blocked by required conditions Details Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details * Add series_state.json file creation and updation logic. Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Make comments follow the guidelines. Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Fix linter complaints Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Put PR behind feature flag fast-startup Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Marshal updated information to file directly Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Fix linter failures Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Move series state code from head.go to head_wal.go Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Fix nits Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> * Add unit test Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com> --------- Signed-off-by: Rushabh Mehta <mehtarushabh2005@gmail.com>	2026-03-23 20:46:04 -07:00

1 2 3 4 5 ...

1656 commits