Commit graph

1522 commits

Author SHA1 Message Date
Sujal Shah
17c58f5fce wal: ignore os.ErrNotExist errors in DirSize during WAL size calculation
This change updates `DirSize` to ignore `os.ErrNotExist` errors,
since they are expected during normal WAL cleanup. All other errors
continue to propagate.

Fixes: #17005
Signed-off-by: Sujal Shah <sujalshah28092004@gmail.com>
2025-08-05 22:41:46 +05:30
Bryan Boreham
e068c7332d [REFACTOR] TSDB: Clarify intersectPostings
This is intended to make `intersectPostings` easier to follow.

Instead of cryptic `arr` and `cur`, name the members `postings` and
`current`.

Instead of updating `cur` to intermediate values encountered during
operations, introduce a local variable `target` meaning the ref we might
expect to find next, and only update `current` when an intersection is
found.

Name the function which implements seeking `Seek` instead of `doNext`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-05 13:09:29 +01:00
Alan Protasio
25aee26a57
Improving "Sparse postings" intersection (#13971)
Lets take the given example:

P1: [2, 5, 9, 18, 21]
P2: [3, 7, 14, 19, 21]
P3: [1, 21]

Currently, we would only advance through P1 and P2 until discovering
an intersection and then checking P3. In essence, the traversal order
was: 2, 3, 5, 7, 9, 14, 18, 19, 21 (intersection found).

With the proposed change, P3 is also examined even if P1 and P2
haven't found an intersection yet. This adjustment allows for the
possibility of skipping some iterations.

Post-change, the traversal order becomes: 2, 3, 21 (3 iterations instead of 9).

Signed-off-by: alanprot <alanprot@gmail.com>
2025-08-05 12:22:54 +01:00
Matthieu MOREL
cef219c31c chore: enable unused-receiver rule from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-08-04 09:43:33 +00:00
pipiland2612
8b24acb729 Remove label index and labe offset index
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-01 13:50:49 +03:00
Julius Volz
2e709c6567
Merge pull request #16695 from sujalshah-bit/block_endpoint
api: Create `/status/tsdb/blocks` endpoint.
2025-07-31 18:15:49 +02:00
George Krajcsovits
3f59fe1a80
fix(chunkenc): appending histograms with empty buckets (#16893)
* test(chunkenc): appending histograms with empty buckets and gaps

Append such native histograms that have empty buckets and gaps
between the indexes of those buckets.

There is a special case for appending counter native histograms to a chunk in TSDB: if we append a histogram that is missing some buckets that are already in chunk, then usually that's a counter reset. However if the missing bucket is empty, meaning its value is 0, then we don't consider it missing.

For this case to trigger , we need to write empty buckets into the chunk. Normally native histograms are compacted when we emit them , so this is very rare and compact make sure that there are no multiple continuous empty buckets with gaps between them.

The code that I've added in #14513 did not take into account that you can bypass compact and write histograms with many empty buckets, with gaps between them. These are still valid, so the code has to account for them.

Main fix in the expandIntSpansAndBuckets and expandFloatSpansAndBuckets function. I've also refactored them for clarity. Consequently needed to fix insert and adjustForInserts to also allow gaps between inserts.

I've added some new test cases (data driven test would be nice here, too many cases). And removed the deprecated old function that didn't deal with empty buckets at all.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
2025-07-24 18:01:02 +02:00
machine424
9a0bbb60bc test(tsdb): disable TestDelayedCompaction/delayed_compaction_enabled on windows
as flaky because of Time imprecision

fixes https://github.com/prometheus/prometheus/issues/16450

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-07-22 15:30:05 +01:00
Charles Korn
46acc974c0
fix(remote): Unregister metrics emitted by remote.WriteStorage when closed (#16868)
* Unregister metrics emitted by `remote.WriteStorage` when closed

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address PR feedback: add test

Signed-off-by: Charles Korn <charles.korn@grafana.com>

---------

Signed-off-by: Charles Korn <charles.korn@grafana.com>
2025-07-17 11:32:15 +02:00
socialsister
869c946370 chore: fix some minor issues in comments
Signed-off-by: socialsister <seekseat@qq.com>
2025-07-16 11:24:42 +01:00
Nicolás Pazos
b43a07248f tsdb tests: fix mockIndex implementation
Signed-off-by: Nicolás Pazos <npazosmendez@gmail.com>
2025-07-10 15:59:38 -03:00
machine424
846acc10bb chore(tsdb): remove NewLeveledCompactorWithChunkSize constructor as unused, library users ca can redefine it on their side
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-07-09 17:10:13 +01:00
George Krajcsovits
1d79f0f47e
chore(tsdb): add a few more testcases for unlock of unlocked mtx 16332 (#16848)
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-07-09 16:24:46 +02:00
Banana Duck
89f011ba13
fix: unlock of unlocked mutex (#16332)
* fix: unlock on unlocked mutex

Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru>

* test coverage

Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru>

---------

Signed-off-by: Usama Alhanaqtah <a.usama@yandex.ru>
Co-authored-by: alhanaqtah.usama <alhanaqtah.usama@DEV-254.local>
2025-07-09 15:37:55 +02:00
liangmulu
b1a7df2c0c chore: fix some minor issues in comments
Signed-off-by: liangmulu <liangmulu@outlook.com>
2025-07-09 18:05:41 +08:00
Ahmed Hassan
01be7bfb2e add NumFloatSamples to TSDB block stats
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
2025-07-07 13:48:18 -07:00
sujal shah
4408a6bcaf api: Create /status/tsdb/blocks endpoint.
this endpoint serves blocks data to the client.

Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
2025-07-04 03:13:54 +05:30
Ahmed Hassan
6d77b47d13 add numHistogramSamples to block stats
Signed-off-by: Ahmed Hassan <afayekhassan@gmail.com>
2025-07-02 19:52:04 -07:00
machine424
5ac1e6a656
fix(test): fall back to default direct I/O requirements in tests when statx isn't supported by using a higher lever util
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-05-29 09:35:33 +02:00
Charles Korn
ab1b1db128
tsdb: fix issue where a new segment file is created for every chunk if WithSegmentSize not called (#16635)
* tsdb: fix issue where a new segment file is created for every chunk

Signed-off-by: Charles Korn <charles.korn@grafana.com>

* Address PR feedback

Signed-off-by: Charles Korn <charles.korn@grafana.com>

---------

Signed-off-by: Charles Korn <charles.korn@grafana.com>
2025-05-28 18:21:59 +02:00
Ayoub Mrini
317acb3d68
refactor: use the built-in max/min to simplify the code (#16617)
Signed-off-by: carrychair <linghuchong404@gmail.com>
2025-05-27 14:42:50 +02:00
Ayoub Mrini
2edc3ed6c5
feat(tsdb): introduce --use-uncached-io feature flag and allow using it for chunks writing (#15365)
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Signed-off-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-05-21 14:42:30 +02:00
carrychair
e83dc66bdb refactor: use the built-in max/min to simplify the code
Signed-off-by: carrychair <linghuchong404@gmail.com>
2025-05-20 14:36:39 +08:00
György Krajcsovits
772d5ab433
Merge branch 'main' into krajo/intern-custom-values 2025-05-20 08:23:15 +02:00
hardlydearly
ba4b058b7a refactor: use slices.Contains to simplify code
Signed-off-by: hardlydearly <799511800@qq.com>
2025-05-09 08:27:10 +02:00
György Krajcsovits
6c646657d5
perf(chunkenc): intern the custom values for native histograms
The custom values are the "le" bucket boundaries of native histograms
with custom buckets. They are never modified. It is ok to not copy them
when iterating a chunk, just reference them.

If we will ever have a function that modifies the custom values, like
'trim' for example. That function will have to make a copy on write.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-05-07 14:40:45 +02:00
Andre Branchizio
b07b552139
[PERF] TSDB: Pass down label value limit into implementation (#16158)
* allow limiting label values calls

Signed-off-by: Andre Branchizio <andrejbranch@gmail.com>
2025-05-06 18:54:48 +01:00
Dimitar Dimitrov
7e49b91d9a
tsdb/errors.MultiError: support errors.As (#16544)
* tsdb/errors.MultiError: implement Unwrap

the multierror was hiding some errors in Mimir. I also added unit tests because I had them handy from a similar change I and yuri did in XXX and some time ago

---------

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-06 13:45:16 +00:00
Arve Knudsen
e7e3ab2824
Fix linting issues found by golangci-lint v2.0.2 (#16368)
* Fix linting issues found by golangci-lint v2.0.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-03 19:05:13 +02:00
Bryan Boreham
ca416c580c
Merge branch 'main' into slicelabels
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-05-02 10:31:57 +01:00
Yuchen Wang
5630a3906a
fix typo (#16480)
Signed-off-by: Yuchen Wang <yuchen.wang@databricks.com>
2025-04-25 09:27:58 +02:00
Bryan Boreham
8487ed8145
Merge pull request #16440 from bboreham/faster-benchmark-loadwls
[TESTS] TSDB: Faster WAL benchmarks
2025-04-22 15:59:03 +01:00
Bryan Boreham
a11772234d
Merge pull request #16333 from colega/fix-series-create-gc-race
fix: race condition between series creation and garbage collection
2025-04-17 12:15:11 +01:00
machine424
a825d448da feat(tsdb/(head|agent)): dereference the pools at the end of the WL replay to
not wait for an extra GC cycle until the built-in cleanup mechanism
kicks in

See https://github.com/prometheus/prometheus/pull/15778

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-04-17 13:06:08 +02:00
Bryan Boreham
1d4b1d76a5 [TESTS] More efficient label creation in BenchmarkLoadWLs
Use the Builder abstraction instead of going via a map.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-04-16 18:02:47 +01:00
Bryan Boreham
848df13d3a [TESTS] Faster WAL Benchmarks by reusing buffer
Less garbage collection.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-04-16 17:58:09 +01:00
Lukasz Mierzwa
bb76966992 Use stringlabels by default
This removes the stringlabels build tag, makes that implementation the default one, and moves the old labels implementation under the slicelabels build tag.
Fixes #16064.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-04-15 17:52:24 +01:00
Alex Le
bce72b93d9
tsdb: Introduced new constructor for LeveledCompactor to take in metrics (#16408)
* Introduced new constructor for LeveledCompactor to take in metrics

Signed-off-by: Alex Le <leqiyue@amazon.com>

* Added Metrics to LeveledCompactorOptions

Signed-off-by: Alex Le <leqiyue@amazon.com>

---------

Signed-off-by: Alex Le <leqiyue@amazon.com>
2025-04-11 09:17:45 +01:00
Alex Le
701d13abf9
Make sure LeveledCompactor respect context cancellation during the time opening blocks (#16407)
Signed-off-by: Alex Le <leqiyue@amazon.com>
2025-04-08 09:04:23 +01:00
Oleg Zaytsev
f5f91a9ca4
defer a.unmarkCreatedSeriesAsPendingCommit()
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2025-03-31 10:06:28 +02:00
Ben Kochie
a721daf981
Log WAL segment loading time (#16336)
Improve readability of "WAL segment loaded" by logging the duration
of each load. This helps make it easier to spot slow WAL file load
times.

Signed-off-by: SuperQ <superq@gmail.com>
2025-03-31 06:05:14 +02:00
Ryan Wu
7d73c1d3f8
refactor[discovery, tsdb]: simplify error handling and remove redundant checks (#16328)
* refactor: simplify error handling and remove redundant checks

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Add the comment for return of reloading blocks failure

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Add the comment for return of reloading blocks failure

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

---------

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-03-27 12:20:59 +01:00
Oleg Zaytsev
e4fe8d8684
Create memSeries with pendingCommit=true
This fixes TestHead_RaceBetweenSeriesCreationAndGC.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2025-03-27 11:11:57 +01:00
Oleg Zaytsev
df33f1aace Add TestHead_RaceBetweenSeriesCreationAndGC
This test consistently fails missing ~10 series.
If it doesn't fail on your machine, just increase totalSeries, that's
how race conditions work.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2025-03-27 10:56:24 +01:00
Matthieu MOREL
5fa1146e21
chore: enable gci linter (#16245)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-03-22 15:46:13 +00:00
Patryk Prus
2f43a5a3ab
TSDB: don't process exemplars older than minValidTime during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-19 16:24:08 -04:00
Ganesh Vernekar
bc595263c1
Merge pull request #16231 from pr00se/multiref-improvements
TSDB: Handle metadata/tombstones/exemplars for duplicate series during WAL replay
2025-03-19 16:15:50 -04:00
pudongair
308c8c48c1
chore: fix some comments (#16237)
Signed-off-by: pudongair <744355276@qq.com>
2025-03-19 16:28:34 +01:00
Ziqi Zhao
f6903bcc22
Let HistogramAppender.appendable return CounterResetHeader instead of… (#16195)
Let HistogramAppender.appendable return CounterResetHeader instead of boolean

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>

---------

Signed-off-by: Ziqi Zhao <zhaoziqi9146@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2025-03-18 17:40:27 +01:00
Patryk Prus
e4e1b515bc
TSDB: Handle metadata/tombstones/exemplars for duplicate series during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-18 12:22:33 -04:00
Fiona Liao
37c2ebb5fd
Make out-of-order native histograms flag a no-op and always enable (#16207)
* Remove experimental out-of-order native histogram flag

This feature has been available in Prometheus since September 2024,
and has no known issues. Therefore proposing to remove the flag
entirely and always have it on. Note that there are still two
settings that need to be configured (out-of-order time window > 0
and native histograms enabled) for this feature to work.

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Update CHANGELOG

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Keep feature flag with warning

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Update CHANGELOG

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

* Update tsdb/head_append.go

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Update CHANGELOG.md

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Update tsdb/head_append.go

Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>

* Additional cleanup of comments and test names

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>

---------

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Signed-off-by: Fiona Liao <fiona.y.liao@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-03-18 10:59:02 +00:00
Patryk Prus
86eeaf1886
Skip writing series records uniformly across the benchmark, so we skip some OOO series as well
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:53 -04:00
Patryk Prus
2147538d1e
Add missing series refs to benchmark
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:53 -04:00
Patryk Prus
401dbacf2e
Add counters for unknown series references during WAL/WBL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:53 -04:00
Patryk Prus
85fa39032e
TSDB: Track count of unknown series referenced during WAL replay
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-17 15:17:48 -04:00
Bryan Boreham
30d04792ca
[PERF] Remote-write: re-use memory to read WAL data (#16197)
The `:=` causes new variables to be created, which means the outer
slice stays at nil, and new memory is allocated every time round the
loop.

Extracted from https://github.com/prometheus/prometheus/pull/16182
Credit to @bwplotka.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-03-11 10:49:51 +00:00
Bartlomiej Plotka
7a7bc65237
Add util/compression package to consolidate snappy/zstd use in Prometheus. (#16156)
# Conflicts:
#	tsdb/db_test.go

Apply suggestions from code review




tmp



Addressed comments.



Update util/compression/buffers.go

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
2025-03-10 10:36:26 +00:00
Arve Knudsen
56929ffa42 Upgrade to Go v1.24 (#16180)
* Upgrade to Go v1.24
* Upgrade golangci-lint

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-03-07 11:28:26 +01:00
Patryk Prus
61aa82865d
TSDB: keep duplicate series records in checkpoints while their samples may still be present (#16060)
Renames the head's deleted map to walExpiries, and creates entries for any
duplicate series records encountered during WAL replay, with the expiry set
to the highest current WAL segment number. Any subsequent WAL
checkpoints will see the duplicate series entry in the walExpiries map, and
keep the series record until the last WAL segment that could contain its
samples is deleted.

Other considerations:

WBL: series records aren't written to the WBL, so there are no duplicates to deal with
agent mode: has its own WAL replay logic that handles duplicate series records differently, and is outside the scope of this PR
2025-03-05 13:45:08 -05:00
Arve Knudsen
7cbf749096
Upgrade to github.com/oklog/ulid/v2 (#16168)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-03-05 16:03:25 +01:00
Bryan Boreham
42d55505f9
Merge pull request #12659 from prymitive/memChunk
Short-cut common memChunk operations
2025-02-25 11:33:56 +00:00
Matthieu MOREL
c7d4b53ec1 chore: enable unused-parameter from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-19 19:50:28 +01:00
Ayoub Mrini
e04913aea2
Merge pull request #15778 from machine424/reuse-pools
feat(tsdb/(head|agent)): reuse pools across segments to reduce garbage during WL replay
2025-02-17 12:48:17 +01:00
Bartlomiej Plotka
de23a9667c
prw2: Split PRW2.0 from metadata-wal-records feature (#16030)
Rationales:

* metadata-wal-records might be deprecated and replaced going forward: https://github.com/prometheus/prometheus/issues/15911
* PRW 2.0 works without metadata just fine (although it sends untyped metrics as expected).

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-02-13 12:16:33 +00:00
machine424
d644324407
feat(tsdb/(head|agent)): reuse pools across segments to avoid generating garbage during WL replay
This is part of the "reduce WAL replay overhead/garbage" effort to help with https://github.com/prometheus/prometheus/issues/6934.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-02-10 22:40:24 +01:00
Matthieu MOREL
b472ce7010 chore: enable early-return from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-10 22:08:43 +01:00
Bryan Boreham
b74cebf6bf
Merge pull request #12920 from prymitive/compactLock
Fix locks in db.reloadBlocks()
2025-02-10 17:35:09 +00:00
Dimitar Dimitrov
686dcc7b0d
headIndexReader: reduce debug logging (#15993)
Around Mimir compactions we see logging in ShardedPostings do massive allocations and drive GC up to 50% of CPU.

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2025-02-07 15:46:55 +00:00
SuryaPrakash
cb3b17a14c
fix: os.MkdirTemp with t.TempDir (#15860)
Signed-off-by: Surya Prakash <surya0prakash@proton.me>
2025-01-31 14:32:20 +00:00
Alan Protasio
9d1abbb9ed
Call PostCreation callback only after the new series is added to the mempotings (#15579)
Signed-off-by: alanprot <alanprot@gmail.com>
2025-01-28 12:11:58 +01:00
Jan Fajerski
6823f58e59
Merge pull request #15732 from bboreham/benchmark-setup-append-periodically
TSDB benchmarks: Commit periodically to speed up init
2025-01-28 11:35:04 +01:00
Bryan Boreham
6ba25ba93f tsdb tests: avoid 'defer' till end of function
'defer' only runs at the end of the function, so explicitly close the
querier after we finish with it. Also check it didn't error.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-01-27 19:59:43 +00:00
Bryan Boreham
2f615a200d tsdb tests: restrict some 'defer' operations
'defer' only runs at the end of the function, so introduce some more
functions / move the start, so that 'defer' can run at the end of the
logical block.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-01-27 19:59:43 +00:00
Bryan Boreham
f4fbe47254 tsdb tests: avoid capture-by-reference in goroutines
Only one version of the variable is captured; this is a source of race conditions.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-01-27 19:59:43 +00:00
piguagua
a82f2b8168
chore: fix function name and struct name in comment (#15827)
Signed-off-by: piguagua <piguagua@aliyun.com>
2025-01-17 21:26:08 +01:00
Julius Volz
0d7db907a9
Merge pull request #15785 from crystalstall/main
refactor: using slices.Contains to simplify the code
2025-01-13 10:31:41 +01:00
crystalstall
616914abe2 Signed-off-by: crystalstall <crystalruby@qq.com>
refactor: using slices.Contains to simplify the code

Signed-off-by: crystalstall <crystalruby@qq.com>
2025-01-11 00:41:51 +08:00
Lukasz Mierzwa
e3728122b2 Update comments for methods that require a lock
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:20:10 +00:00
Lukasz Mierzwa
a1740cd2e7 Remove unnecessary locks
Compact() is an uppercase function that deals with locks on its own, so we shouldn't have a lock around it.

Signed-off-by: Lukasz Mierzwa <lukasz@cloudflare.com>
2025-01-09 17:06:05 +00:00
Łukasz Mierzwa
d106b3beb7 Wrap db.blocks read in a read lock
We don't hold db.mtx lock when trying to read db.blocks here so we need a read lock around this loop.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:06:05 +00:00
Łukasz Mierzwa
92788d313a Remove TestTombstoneCleanRetentionLimitsRace
This test ensures that running db.reloadBlocks() and db.CleanTombstones() at the same time doesn't race.
The problem is that CleanTombstones() is a public method while reloadBlocks() is internal.
CleanTombstones() sets db.cmtx lock while reloadBlocks() is not protected by any locks at all, it expects the public method through which it was called to do it.
So having a race between these two is not unexpected and we shouldn't really be testing this.
db.cmtx ensures that no other function can be modifying the list of open blocks and so the scenario tested here cannot happen.
If it would happen it would be only because some other method doesn't aquire db.ctmx lock, something this test cannot detect.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:06:03 +00:00
Łukasz Mierzwa
b880cea613 Fix locks in db.reloadBlocks()
This partially reverts ae3d392aa9.

ae3d392aa9 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(),
previous db.mtx would be locked only during critical part of db.reloadBlocks().
The motivation was to protect against races:
9e0351e161 (r555699794)
The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods.
TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones().
To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks().

The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow.
While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call.
One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens.
When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block.
After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked.

Turns out that there is another lock that is more fine grained and aimed at this specific use case:

// cmtx ensures that compactions and deletions don't run simultaneously.
cmtx sync.Mutex

All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes.
We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx.

Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2025-01-09 17:05:39 +00:00
Arve Knudsen
f030894c2c
Fix issues raised by staticcheck (#15722)
Fix issues raised by staticcheck

We are not enabling staticcheck explicitly, though, because it has too many false positives.

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-01-09 17:51:26 +01:00
Ben Ye
919a5b657e
Expose ListPostings Length via Len() method (#15678)
tsdb: expose remaining ListPostings Length

Signed-off-by: Ben Ye <benye@amazon.com>

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2025-01-07 17:58:26 +01:00
György Krajcsovits
1e420ef373 Merge branch 'main' into cedwards/nhcb-wal-wbl
# Conflicts:
#	tsdb/tsdbutil/histogram.go
2025-01-02 12:50:19 +01:00
György Krajcsovits
a7ccc8e091 record_test.go: avoid captures, simply return test refs
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-01-02 12:45:20 +01:00
Bryan Boreham
096e2aa7bd
Merge pull request #14518 from bboreham/faster-listpostings-merge
TSDB: Optimization: Merge postings using concrete type
2025-01-02 10:43:45 +00:00
Bryan Boreham
b2fa1c9524 TSDB benchmarks: Commit periodically to speed up init
When creating dummy data for benchmarks, call `Commit()` periodically to
avoid growing the appender to enormous size.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-30 17:42:56 +00:00
johncming
061400e31b
tsdb: export CheckpointPrefix constant (#15636)
Exported the CheckpointPrefix constant to be used in other packages.
Updated references to the constant in db.go and checkpoint.go files.
This change improves code readability and maintainability.

Signed-off-by: johncming <johncming@yahoo.com>
Co-authored-by: johncming <conjohn668@gmail.com>
2024-12-29 17:54:45 +01:00
Carrie Edwards
1508149184 Update benchmark test and comment 2024-12-27 09:09:13 -08:00
Bryan Boreham
cfa32f3d28 TSDB: Move merge of head postings into index
This enables it to take advantage of a more compact data structure
since all postings are known to be `*ListPostings`.

Remove the `Get` member which was not used for anything else, and fix up
tests.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-20 19:22:30 +00:00
Bryan Boreham
0a8779f46d TSDB: Make mergedPostings generic
Now we can call it with more specific types which is more efficient than
making everything go through the `Postings` interface.

Benchmark the concrete type.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-20 17:09:21 +00:00
Bryan Boreham
1b22242024 TSDB BenchmarkMerge: run fewer sizes
As long as we run small and big sizes, we don't need all the sizes inbetween.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-20 17:09:21 +00:00
Bryan Boreham
e630ffdbed TSDB: extend BenchmarkMemPostings_PostingsForLabelMatching to check merge speed
We need to create more postings entries so the merger has some work to do.
Not material for the regexp ones as they match so few series.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-20 17:09:21 +00:00
Björn Rabenstein
318d6bc4bf
Merge pull request #15548 from TinfoilSubmarine/fix/386-test-failures
test: fixes for 32-bit archs
2024-12-18 15:49:30 +01:00
Björn Rabenstein
ff398062cb
Merge pull request #15679 from colega/update-comment-on-mempostings-lvs
Update comment on MemPostings.lvs
2024-12-17 19:41:56 +01:00
Oleg Zaytsev
c8359fcd6b
Fix bug in lbl!~".+" shortcut (#15684)
We were appending to the wrong slice, so instead of removing values, we
were adding them.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-12-17 17:34:24 +01:00
Oleg Zaytsev
17d5bc4e54
Update comment on MemPostings.lvs
There was a missing verb there.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-12-16 17:20:51 +01:00
Joel Beckmeyer
39f5a07236 fix TestOOOHeadChunkReader_Chunk on 32-bit
Signed-off-by: Joel Beckmeyer <joel@beckmeyer.us>
2024-12-16 10:45:07 -05:00
Bryan Boreham
ac4f8a5e23
[ENHANCEMENT] TSDB: Improve calculation of space used by labels (#13880)
* [ENHANCEMENT] TSDB: Improve calculation of space used by labels

The labels for each series in the Head take up some some space in the
Postings index, but far more space in the `memSeries` structure.

Instead of having the Postings index calculate this overhead, which is
a layering violation, have the caller pass in a function to do it.

Provide three implementations of this function for the three Labels
versions.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-12-16 09:42:52 +00:00
David Ashpole
953a873342
update links to openmetrics to reference the v1.0.0 release
Signed-off-by: David Ashpole <dashpole@google.com>
2024-12-13 21:32:27 +00:00
György Krajcsovits
df88de5800 Fix lint for real
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-12 12:52:01 +01:00
György Krajcsovits
cf36792e14 Fix unused import
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-12 12:49:28 +01:00
György Krajcsovits
fdb1516af1 Fix lint errors
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-12 12:47:43 +01:00
György Krajcsovits
d64d1c4c0a Benchmark encoding classic and nhcb
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-12 10:59:06 +01:00
György Krajcsovits
a325ff142c fix(test): do not run automatic WAL truncate during test
Remove the 2 minute timeout as the default is 2 hours and wouldn't
interfere. With the test. Otherwise the extra samples combined with
race detection can push the test over 2 minutes and make it fail.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-10 17:30:46 +01:00
György Krajcsovits
07276aeece fix(test): if we are dereferencing a slice we should check its len
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-10 16:25:50 +01:00
György Krajcsovits
8f572fe905 fix(lint): linter errors
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-10 16:25:20 +01:00
György Krajcsovits
b94c87bea6 fix(test): TestCheckpoint segment size too low
The segment size was too low for the additional NHCB data, thus it created
more segments then expected. This meant that less were in the lower
numbered segments, which meant more was kept.

FAIL: TestCheckpoint (4.05s)
  FAIL: TestCheckpoint/compress=none (0.22s)
        checkpoint_test.go:361:
            	Error Trace:	/home/krajo/go/github.com/prometheus/prometheus/tsdb/wlog/checkpoint_test.go:361
            	Error:      	"0.8586956521739131" is not less than "0.8"
            	Test:       	TestCheckpoint/compress=none

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-12-10 16:16:46 +01:00
György Krajcsovits
efdd0880c1 Merge branch 'main' into cedwards/nhcb-wal-wbl
# Conflicts:
#	tsdb/docs/format/wal.md
2024-12-10 14:33:35 +01:00
bwplotka
eeef17ea0a docs: Added native histogram WAL record documentation.
Signed-off-by: bwplotka <bwplotka@gmail.com>
2024-12-09 11:47:28 +00:00
Carrie Edwards
1933ccc9be Fix test 2024-12-06 14:55:19 -08:00
Carrie Edwards
a046417bc0 Use new record type only for NHCB 2024-12-06 13:46:20 -08:00
Carrie Edwards
45944c1847 Extend tsdb agent tests with custom bucket histograms 2024-12-05 09:21:47 -08:00
Carrie Edwards
6b44c1437f Fix comment and histogram record string 2024-12-05 09:21:47 -08:00
Carrie Edwards
f8a39767a4 Update WAL doc to include native histogram encodings 2024-12-05 09:21:47 -08:00
Carrie Edwards
6684344026 Rename old histogram record type, use old names for new records 2024-12-05 09:21:47 -08:00
Carrie Edwards
454f6d39ca Add separate handling for histograms and custom bucket histograms 2024-12-05 09:21:47 -08:00
Carrie Edwards
37df50adb9 Attempt for record type 2024-12-05 09:21:47 -08:00
Carrie Edwards
cfcd51538d Remove references to custom values record 2024-12-05 09:21:47 -08:00
Carrie Edwards
6d413fad36 Use histogram records for custom value handling 2024-12-05 09:21:47 -08:00
Carrie Edwards
aa144b7263 Handle custom buckets in WAL and WBL 2024-12-05 09:21:47 -08:00
Antoine Pultier
f1340bac64
documentation: put back trailing punctuation.
markdownlint wasn't happy about the trailing punctuation in the headings.

Signed-off-by: Antoine Pultier <antoine.pultier@sintef.no>
2024-12-03 14:36:56 +01:00
Antoine Pultier
5c2fd7988b
Merge remote-tracking branch 'upstream/main' into patch-2
Signed-off-by: Antoine Pultier <antoine.pultier@sintef.no>
2024-12-03 14:32:28 +01:00
Antoine Pultier
6046769941
tsdb documenation: Improve Chunk documentation
Signed-off-by: Antoine Pultier <45740+fungiboletus@users.noreply.github.com>

Signed-off-by: Antoine Pultier <45740+fungiboletus@users.noreply.github.com>
2024-12-03 14:24:50 +01:00
Oleg Zaytsev
cd1f8ac129
MemPostings: keep a map of label values slices (#15426)
While investigating lock contention on `MemPostings`, we saw that lots
of locking is happening in `LabelValues` and
`PostingsForLabelsMatching`, both copying the label values slices while
holding the mutex.

This adds an extra map that holds an append-only label values slice for
each one of the label names. Since the slice is append-only, it can be
copied without holding the mutex.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-11-29 12:52:56 +01:00
Charles Korn
96adc410ba
tsdb/chunkenc: don't reuse custom value slices between histograms
Signed-off-by: Charles Korn <charles.korn@grafana.com>
2024-11-29 16:28:09 +11:00
Oleg Zaytsev
9ad93ba8df
Optimize l=~".+" matcher (#15474)
Since dot is matching newline now, `l=~".+"` is "any non empty label
value", and #14144 added a specific method in the index for that so we
don't need to run the matcher on each one of the label values.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-11-27 12:33:20 +01:00
Bryan Boreham
ca3119bd24 TSDB: eliminate one yolostring
When the only use of a []byte->string conversion is as a map key, Go
doesn't allocate.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-26 17:21:55 +00:00
Bryan Boreham
e98c19c1ce [PERF] TSDB: Cache all symbols for compaction
Trade a bit more memory for a lot less CPU spent looking up symbols.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-26 17:21:55 +00:00
Oleg Zaytsev
9aa6e041d3
MemPostings: allocate ListPostings once in PFALV (#15465)
Same as #15427 but for the new method added in #14144

Instead of allocating each ListPostings one by one, allocate them all in
one go.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-11-26 16:03:45 +01:00
DC
d535d501d1
[DOCS] Improve description of WAL record format (#14936)
Signed-off-by: DC <413331538@qq.com>
2024-11-26 11:48:17 +00:00
Bryan Boreham
dd0252a774
Merge pull request #15380 from bboreham/improve-loadwbl
[BUGFIX] TSDB: Apply fixes from loadWAL to loadWBL
2024-11-25 17:31:49 +00:00
Bryan Boreham
7996a13fdd
Merge pull request #15403 from bboreham/fix-rw-benchmark-startup
[TESTS] Remote-Write: Fix BenchmarkStartup
2024-11-25 17:31:24 +00:00
Oleg Zaytsev
cc390aab64
MemPostings: allocate ListPostings once in PFLM (#15427)
Instead of allocating ListPostings pointers one by one, allocate a slice
and take pointers from that. It's faster, and also generates less
garbage (NewListPostings is one of the top offenders in number of
allocations).

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-11-20 17:52:20 +01:00
Arve Knudsen
89bbb885e5
Upgrade to golangci-lint v1.62.0 (#15424)
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-11-20 17:22:20 +01:00
Björn Rabenstein
384c5951ef
Merge pull request #14489 from harry671003/implement_metadata_limit
storage: Implement limit in mergeGenericQuerier
2024-11-19 17:32:16 +01:00
Arve Knudsen
06d54fcc6c
[PERF] TSDB: Optimize inverse matching (#14144)
Simple follow-up to #13620. Modify `tsdb.PostingsForMatchers` to use the optimized tsdb.IndexReader.PostingsForLabelMatching method also for inverse matching.

Introduce method `PostingsForAllLabelValues`, to avoid changing the existing method.

The performance is much improved for a subset of the cases; there are up to
~60% CPU gains and ~12.5% reduction in memory usage. 

Remove `TestReader_InversePostingsForMatcherHonorsContextCancel` since
`inversePostingsForMatcher` only passes `ctx` to `IndexReader` implementations now.

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-11-19 15:49:01 +00:00
Bryan Boreham
0ef0b75a4f [TESTS] Remote-Write: Fix BenchmarkStartup
It was crashing due to uninitialized metrics, and not terminating due to
incorrectly reading segment names.

We need to export `SetMetrics` to avoid the first problem.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-15 11:22:07 +00:00
Fiona Liao
c599d37668
Always return unknown hint for first sample in non-gauge histogram chunk (#15343)
Always return unknown hint for first sample in non-gauge histogram chunk

---------

Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-11-12 15:14:06 +01:00
Bryan Boreham
5450e6d368 [BUGFIX] TSDB: Apply fixes from loadWAL to loadWBL
Move a couple of variables inside the scope of a goroutine, to avoid
data races.

Use `zeropool` to reduce garbage and avoid some lint warnings.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-11 18:41:33 +00:00
Ben Ye
140f4aa9ae
feat: Allow customizing TSDB postings decoder (#13567)
* allow customizing TSDB postings decoder

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-11-11 07:59:24 +01:00
Ben Ye
f9057544cb
Fix AllPostings added twice (#13893)
* handle all postings added twice

---------

Signed-off-by: Ben Ye <benye@amazon.com>
2024-11-10 18:17:21 +01:00
🌲 Harry 🌊 John 🏔
f9bc50b247 storage: Implement limit in mergeGenericQuerier
Signed-off-by: 🌲 Harry 🌊 John 🏔 <johrry@amazon.com>
2024-11-07 09:08:23 -08:00
Bryan Boreham
f42b37ff2f
[BUGFIX] TSDB: Fix race on stale values in headAppender (#15322)
* [BUGFIX] TSDB: Fix race on stale values in headAppender

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

* Simplify

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>

---------

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-06 16:51:39 +01:00
Matthieu MOREL
af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Bryan Boreham
02aa6d1de6
Merge pull request #15338 from bboreham/cosmetic-tsdb
[COMMENT] Remove duplicate line
2024-11-05 12:03:04 +00:00
Oleg Zaytsev
b1e4052682
MemPostings.Delete(): make pauses to unlock and let the readers read (#15242)
This introduces back some unlocking that was removed in #13286 but in a
more balanced way, as suggested by @pracucci.

For TSDBs with a lot of churn, Delete() can take a couple of seconds,
and while it's holding the mutex, reads and writes are blocked waiting
for that mutex, increasing the number of connections handled and memory
usage.

This implementation pauses every 4K labels processed (note that also
compared to #13286 we're not processing all the label-values anymore,
but only the affected ones, because of #14307), makes sure that it's
possible to get the read lock, and waits for a few milliseconds more.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
2024-11-05 12:59:57 +01:00
Bryan Boreham
541c7fd9fe [COMMENT] Remove duplicate line
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-05 11:03:40 +00:00
Alban Hurtaud
4b56af7eb8
Add hidden flag for the delayed compaction random time window (#14919)
* Add hidden flag for the delayed compaction random time window

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>

* Update cmd/prometheus/main.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>

* Update cmd/prometheus/main.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>

* Update tsdb/db.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>

* Fix flag name according to review - add test for delay

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>

* Fix afer main rebase

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>

* Implement review comments

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>

* Update generatedelaytest to try with limit values

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>

---------

Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2024-11-04 08:26:26 +01:00
Bryan Boreham
2fbbfc3da8 Revert "Fix MemPostings.Add and MemPostings.Get data race (#15141)"
This reverts commit 50ef0dc954.

Memory allocation goes so high in Prombench that the system is unusable.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-11-03 12:30:34 +00:00
Bryan Boreham
e2e01c1cff
Merge pull request #15216 from yeya24/log-last-series-labels
log last series labelset when hitting OOO series labels
2024-11-01 14:15:39 +00:00
Oleg Zaytsev
ba11a55df4
Revert "Process MemPostings.Delete() with GOMAXPROCS workers"
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-10-29 17:13:40 +01:00
Nicolas Takashi
b6c538972c
[REFACTORY] simplify appender commit (#15112)
* [REFACTOR] simplify appender commit

Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-10-29 12:34:02 +00:00
Arve Knudsen
706dcfeecf
tsdb.CircularExemplarStorage: Avoid racing (#15231)
* tsdb.CircularExemplarStorage: Avoid racing

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-10-29 10:40:46 +01:00
Pedro Tanaka
bab587b9dc
Agent: allow for ingestion of CT samples (#15124)
* Remove unused option from HeadOptions

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Improve docs for appendable() method in head appender

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Ingest CT (float) samples in Agent DB

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* allow for ingestion of CT native histogram

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding some verification for ct ts

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Validating CT histogram before append and add newly created series to pending series

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* checking the wal for written samples

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Checking for samples in test

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding case for validations

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* fixing comparison when dedupelabels is enabled

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* unite tests, use table testing

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Implement CT related methods in timestampTracker for write storage

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* adding error case to test

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* removing unused fields

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* Updating lastTs for series when adding CT to invalidate duplicates

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

* making sure that updating the lastTS wont cause OOO later on in Commit();

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>

---------

Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
2024-10-27 01:06:34 +01:00
Ayoub Mrini
93db81dd3d
Merge pull request #14983 from machine424/dopp
fix(storage/mergeQuerier): fix a data race
2024-10-25 18:34:51 +02:00
Łukasz Mierzwa
b6e22cd346 Short-cut common memChunk operations
memChunk is a linked list, speed up some common operations when there's no need to iterate all elements on the list.

Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
2024-10-25 12:19:20 +01:00
Ben Ye
99882eec3b log last series labelset when hitting OOO series labels during compaction
Signed-off-by: Ben Ye <benye@amazon.com>
2024-10-24 09:27:15 -07:00
Vanshika
cccbe72514
TSDB: Fix some edge cases when OOO is enabled (#14710)
Fix some edge cases when OOO is enabled

Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
2024-10-23 17:34:28 +02:00
machine424
cebcdce78a
fix(storage/mergeQuerier): copy the matcjers slice before passing it to queriers as
some of them may alter it.

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-10-22 14:08:47 +02:00
machine424
eb523a6b29
fix(storage/mergeQuerier): add a reproducer for data race that occurs when one of the queriers alters the passed matchers and propose a fix
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-10-22 14:08:46 +02:00
György Krajcsovits
a4083f14e8 Fix populateWithDelChunkSeriesIterator corrupting chunk meta
When handling recoded histogram chunks the min time of the chunk is
updated by mistake. It should only update when the chunk is completely new.
Otherwise the ongoing chunk's meta will be later than the previously
written samples in it.

Same bug as https://github.com/prometheus/prometheus/pull/14629

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-18 10:34:22 +02:00
György Krajcsovits
e6a682f046 Reproduce populateWithDelChunkSeriesIterator corrupting chunk meta
When handling recoded histogram chunks the min time of the chunk is
updated by mistake. It should only update when the chunk is completely
new.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-18 10:34:22 +02:00
machine424
ab2475c426
test(tsdb): add a reproducer for https://github.com/prometheus/prometheus/issues/14422
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-10-15 20:39:25 +02:00
Bryan Boreham
1e1f6ab9df
Merge pull request #15120 from bboreham/floor-ino-mint
[BUGFIX] TSDB: Don't read in-order chunks from before head MinTime
2024-10-15 10:27:38 +01:00
George Krajcsovits
b8867f8ead
Merge pull request #15142 from krajorama/fix-appendhistogram-race
bugfix: data race in head.Appender.AppendHistogram and Commit
2024-10-14 08:13:39 +02:00
Oleg Zaytsev
50ef0dc954
Fix MemPostings.Add and MemPostings.Get data race (#15141)
* Tests for Mempostings.{Add,Get} data race
* Fix MemPostings.{Add,Get} data race

We can't modify the postings list that are held in MemPostings as they
might already be in use by some readers.

* Modify BenchmarkHeadStripeSeriesCreate to have common labels

If there are no common labels on the series, we don't excercise the
ordering part of MemSeries, as we're just creating slices of one element
for each label value.

---------

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-10-11 15:21:15 +02:00
György Krajcsovits
bb70370d72 TSDB head: fix race between AppendHistogram and Commit
Move writing memSeries lastHistogramValue and lastFloatHistogramValue
after series creation under lock.

The resulting code isn't totally correct in the sense that we're setting
these values before Commit() , so they might be overwritten/rolled back
later.

Also Append of stale sample checks the values without lock, so there's
still a potential race.

The correct solution would be to set these only in Commit() which we
actually do, but then Commit() would also need to process samples in
order and not floats first, then histograms, then float histograms - which
leads to not knowing what stale marker to write for histograms.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-10 16:59:15 +02:00
György Krajcsovits
631fadc4ca Unit test for data race in head.Appender.AppendHistogram
Two Appenders race when creating a series with a native histogram
as the memSeries will be common and the lastHistogram field is written
without lock.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-10 14:10:07 +02:00
beorn7
12c39d5421 docs: Some nitpicking in chunks.md
- `float histogram` → `floathistogram`, as it is used in the code.
- Actual link encodings to the code (to find the actual numerical values).
- `<bytes>` → `<data>` for consistency.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-10-09 14:32:12 +02:00
beorn7
a4cb52ff15 docs: Update chunk layot for NHCB
Signed-off-by: beorn7 <beorn@grafana.com>
2024-10-09 14:19:20 +02:00
Björn Rabenstein
02d0de9987
Merge pull request #14997 from fionaliao/fl/update-format-docs
Update chunk format docs with native histograms and OOO
2024-10-09 13:29:01 +02:00
TJ Hoplock
6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Bryan Boreham
91de19fbef [BUGFIX] TSDB: Don't read in-order chunks from before head MinTime
Because we are reimplementing the `IndexReader` to fetch in-order and
out-of-order chunks together, we must reproduce the behaviour of
`Head.indexRange()`, which floors the minimum time queried at `head.MinTime()`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-10-07 13:50:03 +01:00
Matthieu MOREL
ab64966e9d
fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()" (#15094)
* fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()"

---------

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-10-06 16:35:29 +00:00
György Krajcsovits
44ebbb8458 Fix missing histogram copy in sampleRing
The specialized version of sample add to the ring:
func addH(s hSample, buf []hSample, r *sampleRing) []hSample
func addFH(s fhSample, buf []fhSample, r *sampleRing) []fhSample
already correctly copy histogram samples from the reused hReader, fhReader
buffers, but the generic version does not. This means that the
data is overwritten on the next read if the sample ring has seen histogram
and float samples at the same time and switched to generic mode.

The `genericAdd` function (which was commented anyway) is by now quite
different from the specialized functions so that this commit deletes
it.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2024-10-02 13:57:28 +02:00
Bryan Boreham
54de4fb780
Merge pull request #14975 from colega/process-mempostings-delete-with-gomaxprocs-workers
Process `MemPostings.Delete()` with `GOMAXPROCS` workers
2024-09-29 07:58:42 +01:00
Fiona Liao
fd62dbc291 Update chunk format docs with native histograms and OOO
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
2024-09-27 18:57:58 +01:00
Ayoub Mrini
105ab2e95a
fix(test): adjust defer invocations (#14996)
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-09-27 17:13:51 +01:00
Oleg Zaytsev
ada8a6ef10
Add some more tests for MemPostings_Delete
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-09-27 10:14:39 +02:00
Arthur Silva Sens
d5f65cfce0
Merge pull request #14694 from prometheus/ct-histogram
Histogram CT Zero ingestion
2024-09-26 12:48:46 -03:00
Arthur Silva Sens
95a53ef982
Join tests for appending float and histogram CTs
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-09-26 11:29:31 -03:00
Arthur Silva Sens
6bd9b1a7cc
Histogram CT Zero ingestion
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
2024-09-26 11:29:22 -03:00
Oleg Zaytsev
4fd2556baa
Extract processWithBoundedParallelismAndConsistentWorkers
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-09-26 15:43:19 +02:00
Oleg Zaytsev
ccd0308abc
Don't do anything if MemPostings are empty
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-09-25 15:00:10 +02:00
Oleg Zaytsev
9c417aa710
Fix deadlock with empty MemPostings
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-09-25 14:08:50 +02:00
Bryan Boreham
5d8f0ef0c2
Merge pull request #14721 from bboreham/exp-grow-postings
[PERF] TSDB: Grow postings by doubling
2024-09-25 10:47:55 +01:00
Oleg Zaytsev
e196b977af
Process MemPostings.Delete() with GOMAXPROCS workers
We are still seeing lock contention on MemPostings.mtx, and MemPostings.Delete() is by far the most expensive operation on that mutex.

This adds parallelism to that method, trying to reduce the amount of time we spend with the mutex held.

Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
2024-09-25 10:38:47 +02:00
Bryan Boreham
ca673eb749 Merge remote-tracking branch 'origin/release-2.55' into merge-2.55-into-main
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-22 17:49:34 +01:00
Bryan Boreham
31c5760551
Neater string vs byte-slice conversions (#14425)
unsafe.Slice and unsafe.StringData were added in Go 1.20

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-21 12:19:21 +02:00
Bryan Boreham
d42232e178
Merge pull request #14932 from bboreham/chunk-xor-combine-writebits
[PERF] TSDB: Chunk encoding: shorten some write sequences
2024-09-20 17:53:54 +01:00
Bryan Boreham
6f0d6038b7 [BUGFIX] TSDB: Only query chunks up to truncation time (#14948)
If the query overlaps the range currently undergoing compaction, we
should only fetch chunks up to that time. Need to store that min time
in `HeadAndOOOIndexReader`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-20 17:44:04 +01:00
Bryan Boreham
9215252221
[BUGFIX] TSDB: Only query chunks up to truncation time (#14948)
If the query overlaps the range currently undergoing compaction, we
should only fetch chunks up to that time. Need to store that min time
in `HeadAndOOOIndexReader`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-20 18:40:17 +02:00
Ganesh Vernekar
5ccb069414 Backward compatibility with upcoming index v3
Signed-off-by: Ganesh Vernekar <ganeshvern@gmail.com>
2024-09-19 10:27:52 +01:00
George Krajcsovits
0d22a91267 Merge pull request #14874 from krajorama/fix-panic-in-ooo-query2
BUGFIX: TSDB: panic in chunk querier
2024-09-19 10:03:53 +01:00
Bryan Boreham
e8c2d916ec lint
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-18 15:23:46 +01:00
Bryan Boreham
648a668835 [PERF] Chunk encoding: combine timestamp writes
Instead of a 2-bit write followed by a 14-bit write, do two 8-bit
writes, which goes much faster since it avoids looping.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-18 13:19:21 +01:00
Bryan Boreham
b9a9689aae [PERF] Chunk encoding: simplify writeByte
Rather than append a zero then set the value at that position, append the value.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-18 13:19:04 +01:00
Bryan Boreham
b65f1b6560 TSDB: Improve xor-chunk benchmarks
Benchmarks must do the same work N times.
Run 3 cases, where the values are constant, vary a bit, and vary a lot.

Also aim for 120 samples same as TSDB default.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-09-18 13:14:49 +01:00