No implementation yet. Just to test the shape of the interface.
AtST is implemented for trivial cases, anything else is hard coded
to return 0.
Ref: https://github.com/prometheus/prometheus/issues/17791
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
NHCB is native histograms with custom buckets.
prompb is used for both remote write 1.0 and remote read. We do not
support NHCB over remote write 1.0 , however we should absolutely
support it for remote read.
Prometheus remote write 1.0 client already refuses to send NHCB.
Prometheus remote write 1.0 server accepts NHCB, but doesn't store
custom values, corrupting the result. I'm now handling NHCB correctly,
instead of refusing or corrupting.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The original implementation in #9705 for native histograms included a
technical dept #15177 where samples were committed ordered by type
not by their append order. This was fixed in #17071, but this docstring
was not updated.
I've also took the liberty to mention that we do not order by timestamp
either, thus it is possible to append out of order samples.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
ReduceResolution is currently called before validation during
ingestion. This will cause a panic if there are not enough buckets in
the histogram. If there are too many buckets, the spurious buckets are
ignored, and therefore the error in the input histogram is masked.
Furthermore, invalid negative offsets might cause problems, too.
Therefore, we need to do some minimal validation in reduceResolution.
Fortunately, it is easy and shouldn't slow things down. Sadly, it
requires to return errors, which triggers a bunch of code changes.
Even here is a bright side, we can get rud of a few panics. (Remember:
Don't panic!)
In different news, we haven't done a full validation of histograms
read via remote-read. This is not so much a security concern (as you
can throw off Prometheus easily by feeding it bogus data via
remote-read) but more that remote-read sources might be makeshift and
could accidentally create invalid histograms. We really don't want to
panic in that case. So this commit does not only add a check of the
spans and buckets as needed for resolution reduction but also a full
validation during remote-read.
Signed-off-by: beorn7 <beorn@grafana.com>
* drop extra label from receiver
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* used constant
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
Partially fixes https://github.com/prometheus/prometheus/issues/17416 by
renaming all CT* names to ST* in the whole codebase except RW2 (this is
done in separate
[PR](https://github.com/prometheus/prometheus/pull/17411)) and
PrometheusProto exposition proto.
```
CreatedTimestamp -> StartTimestamp
CreatedTimeStamp -> StartTimestamp
created_timestamp -> start_timestamp
CT -> ST
ct -> st
```
Signed-off-by: bwplotka <bwplotka@gmail.com>
OTLP Receiver: Only update metadata to WAL when metadata-wal-records feature is enabled.
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
`histogram.Error` becomes the generic wrapper type for all histogram errors.
This makes it easier and less error prone when adding new errors to check if
an error is an histogram error as well as making it less error prone to convert
the errors.
This change the type of those specific sentinel errors from error to
`histogram.Error`, but it should almost never matter.
e.g., `errors.Is(err, ErrHistogram...)` would still work out of the box.
Signed-off-by: Laurent Dufresne <laurent.dufresne@grafana.com>
Add logic to the target_info metric generation in the OTLP endpoint, so that any samples with the same timestamp for the same (target_info) series are de-duplicated. It comes out of a user's bug report about duplicated target_info samples in Grafana Mimir (which uses the Prometheus target_info generation logic).
If I'm not mistaken, duplicate target_info samples should stem from multiple resources in the same OTLP request being translated to the same target_info label set. It shouldn't be caused by a Prometheus bug.
* Add WriteProto method and tests for promtool metrics
This commit adds:
1. WriteProto method to storage/remote/client.go that handles
marshaling and compression of protobuf messages
2. Updated parseAndPushMetrics in cmd/promtool/metrics.go to use
the new WriteProto method
3. Comprehensive tests for PushMetrics functionality
The WriteProto method provides a cleaner API for sending protobuf
messages without manually handling marshaling and compression.
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* use Write method from exp/api/remote
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix lint
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* nit fixed
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
* fix lint
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
---------
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
As a follow-up to #17344, add two configuration parameters for controlling label
name translation, both defaulting to on for backwards compatibility (currently
these behaviours are hardcoded as enabled):
* otlp.label_name_underscore_sanitization => Prefix label names starting with a
single underscore with key_ when translating OTel attribute names
* otlp.label_name_preserve_multiple_underscores => Keep multiple consecutive
underscores in label names when translating OTel attribute names
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
The upgrade to prometheus/otlptranslator@7f02967de0 fixes two label
name translation bugs, when in legacy name translation mode:
* 'key' is no longer prefixed when label names start with an underscore
* Multiple consecutive underscores are combined into one
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
The detailed plan for this is laid out in
https://github.com/prometheus/prometheus/issues/16572 .
This commit adds a global and local scrape config option
`scrape_native_histograms`, which has to be set to true to ingest
native histograms.
To ease the transition, the feature flag is changed to simply set the
default of `scrape_native_histograms` to true.
Further implications:
- The default scrape protocols now depend on the
`scrape_native_histograms` setting.
- Everywhere else, histograms are now "on by default".
Documentation beyond the one for the feature flag and the scrape
config are deliberately left out. See
https://github.com/prometheus/prometheus/pull/17232 for that.
Signed-off-by: beorn7 <beorn@grafana.com>
We have always validated that none of the bucket is negative. We
should do the same for the count of observations and the zero bucket.
Note that this was always implied in the protobuf exposition format
because a count or a zero bucket population is ignored if it is not
positive.
Signed-off-by: beorn7 <beorn@grafana.com>
If a sample read through remote read has too high resolution,
reduce it to the maximum allowed.
This is a slow path, but we only expect it to happen if the server
side is newer version that allows higher resolution.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
When remote read returns chunks, the validation is in tsdb/chunkenc.
However when it returns samples, we need to modify the iterator to
validate.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
control over time
The test becomes flaky after it was asked to run on parallel
and "fight" for resources
let's hide all of that
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
It's not possible to store created timestamp at the same timestamp as
the current sample, so do not even try.
In OpenTelemetry spec, if the start time is unknown, it will be set to
the same timestamp as the first sample.
https://opentelemetry.io/docs/specs/otel/metrics/data-model/#cumulative-streams-handling-unknown-start-time
This means that we will get a lot of duplicate sample for timestamp
errors and we should not log those.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Histogram.Validate and FloatHistogram.Validate now return error on
unsupported schemas.
Scrape and remote-write handler reduces the schema to the maximum allowed
if it is above the maximum, but below theoretical maximum of 52.
For scrape the maximum is a configuration option, for remote-write it is 8.
Note: OTLP endpont already does the reduction, without checking that it is
below 52 as the spec does not specify a maximum.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* OTLP writer writes directly to appender
Do not convert to Remote-Write 1.0 protocol. Convert to TSDB Appender interface instead.
For downstream projects that still convert OTLP to something else (e.g. Mimir using
its own RW 1.0+2.0 compatible protocol), introduce a compatibility layer between
OTLP decoding and TSDB Appender. This is the CombinedAppender that hides the
implementation. Name is subject to change.
---------
Signed-off-by: David Ashpole <dashpole@google.com>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: David Ashpole <dashpole@google.com>
Co-authored-by: Jesus Vazquez <jesusvazquez@users.noreply.github.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
Remote Write one currently attempts to send native histograms with
custom buckets, but these are not actually supported in RW1 protocol.
Drop, measure and log instead.
Fixes: #17140
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Because of relabelling, an endpoint can only select a subset of series
that go through WriteStorage
Having a highestTimestamp at WriteStorage level yields wrong values
if the corresponding sample won't even make it to a remote queue.
Currently PrometheusRemoteWriteBehind is based on that, and would fire
if an endpoint is only interested in a subset of series that take time
to appear.
A "prometheus_remote_storage_queue_highest_timestamp_seconds" that only
takes into account samples in the queue is introduced, and used in
PrometheusRemoteWriteBehind and dashboards in documentation/prometheus-mixin
Same applies to samplesIn/dataIn, QueueManager should know more about
when to update those; when data is enqueued.
That makes dataDropped unnecessary, thus help simplify the logic
in QueueManager.calculateDesiredShards()
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
- The tool left an empty line behind that we don't need anymore, see
https://github.com/prometheus/prometheus/pull/17092. (Arguably not a
bug in the tool but just our stricter style about empty lines.)
- In tsdb/index/postings_test.go , our (admittedly somewhat
convoluted) code structure tricked the tool so it spit out something
that wouldn't even compile.
- storage/remote/queue_manager_test.go is just a minor formatting
nit.
Signed-off-by: beorn7 <beorn@grafana.com>
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.
This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.
Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.
Signed-off-by: beorn7 <beorn@grafana.com>
add metric to track unexpected metadata seen in populateV2TimeSeries, which would indicate metadata incorrectly routed in queue_manager code paths
---------
Signed-off-by: leegin <leegin.t@gmail.com>
Signed-off-by: Darkknight <leegin.t@gmail.com>
A race condition in TestSendSamplesWithBackoffWithSampleAgeLimit was
observed in CI where the sample age limit was too close to the backoff
time, causing samples to be dropped intermittently. Increasing the
SampleAgeLimit resolves the problem.
Signed-off-by: Adam Bernot <bernot@google.com>
* remote read: simplify ReadMultiple to return single SeriesSet
Changed ReadMultiple to return a single SeriesSet with interleaved
series from all queries instead of a slice of SeriesSets. This
simplifies the interface and removes the complex multiplexing
infrastructure while maintaining the ability to send multiple
queries in a single HTTP request.
Changes:
- Updated ReadClient interface: ReadMultiple now returns storage.SeriesSet
- Removed multiplexing infrastructure (MessageQueue, QueueConsumer, etc.)
- Simplified response handling to interleave series from all queries
- Updated tests to match new interface
- All existing tests pass
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Fix sorting behavior in ReadMultiple for samples responses
When sortSeries=false, the previous implementation incorrectly used
storage.NewMergeSeriesSet which requires sorted inputs, violating the
function's contract and potentially producing incorrect results.
Changes:
- When sortSeries=true: Use NewMergeSeriesSet for efficient merging and
deduplication of sorted series
- When sortSeries=false: Use simple concatenation to avoid the sorted
input requirement, preserving duplicates from overlapping queries
- Add comprehensive tests to verify both sorting behaviors
- Update existing test expectations to match correct sorted order
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Refactor to reduce code duplication in ReadMultiple implementation
Extract common query result combination logic into a shared
combineQueryResults function that handles both sorted and unsorted
cases. This eliminates duplication between the real client
implementation and the mock client used in tests.
Changes:
- Add combineQueryResults helper function in client.go
- Refactor handleSamplesResponseImpl to use the helper
- Simplify mockedRemoteClient.ReadMultiple to use the same helper
- Reduce code duplication by ~30 lines while maintaining same functionality
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
As mentioned in #16182, the BenchmarkStartup test for the queue manager
covers an old API and uses settings that will not occur in production
Signed-off-by: Adam Bernot <bernot@google.com>
convert.Timeseries() and converter.Metadata() is never nil, because
they are always initialized. It's better to assert on whether they are
empty or not.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Modify storage/remote.writeHandler.ServeHTTP to return after responding
with an error due to receiving an unrecognized protobuf message type.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
* Remove unused feature from prw translator
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>