Add support for promoting all OTel resource attributes via `promote_all_resource_attributes`,
except for those ignored using 'ignore_resource_attributes'.
---------
Signed-off-by: Antonio Jimenez <antonjim@thousandEyes.com>
Signed-off-by: Antonio Jimenez <123171955+antonjim-te@users.noreply.github.com>
The new docs site will have syntax highlighting, so this adds language tags
to code boxes that are currently missing them. I didn't add `promql` as a
language yet since the highlighter doesn't support it yet, plus a lot of
the PromQL codeboxes in our docs aren't strictly valid PromQL, they are
more like multiple expressions listed in the same code box on multiple
lines. So I'm leaving that for sometime later.
In the HTTP API page, I moved the curl examples from the JSON codeboxes to
their own ones above the JSON output. I considered putting an "Output:"
text between the curl + JSON output, but I think the way it currently looks
without it is probably fine.
I also fixed a number of headings which were at the wrong level relative to
their nesting in the document.
I also removed `go` as a language from the Go template language examples,
because the Go template language isn't Go at all.
I also adjusted the indent on one codebox to be more reasonable (2 spaces
instead of 8).
And then finally, my editor made a bunch of whitespace changes
automatically, like removing trailing spaces.
Signed-off-by: Julius Volz <julius.volz@gmail.com>
Signed-off-by: Julius Volz <julius.volz@gmail.com>
This is a fix for documentation update done in https://github.com/prometheus/prometheus/pull/16066, setting correct name for a configuration value.
Signed-off-by: Martin Danko <46035688+zepellin@users.noreply.github.com>
Addresses https://github.com/prometheus/prometheus/issues/16371
This will help with migrating to native histograms with `convert_classic_histograms_to_nhcb` since users may still need to keep the classic histograms during a migration
Signed-off-by: chardch <otwordsne@gmail.com>
State explicitely what kind of timestamps are expected for the
--min-time and --max-time options of promtool tsdb commands.
This is especially important for the dump-openmetrics command as users
could otherwise mistakenly think it would be in seconds, like the
OpenMetrics timestamps themselves.
Signed-off-by: Nicolas Peugnet <nicolas.peugnet@lip6.fr>
Allows to filter the servers when sending the listing request to the API. This feature is only available when using the `role=hcloud`.
See https://docs.hetzner.cloud/#label-selector for details on how to use the label selector.
Signed-off-by: Jonas Lammler <jonas.lammler@hetzner-cloud.de>
* Add documentation for 'NoTranslation' mode
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Update docs/configuration/configuration.md
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
---------
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
* promql: histogram_fraction for bucket histograms
This PR extends the histogram_fraction function to also work with classic bucket histograms. This is beneficial because it allows expressions like sum(increase(my_bucket{le="0.5"}[10m]))/sum(increase(my_total[10m])) to be written without knowing the actual values of the "le" label, easing the transition to native histograms later on.
It also feels natural since histogram_quantile also can deal with classic histograms.
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
* promql: histogram_fraction for bucket histograms
* Add documentation and reduce code duplication
* Fix a bug in linear interpolation between bucket boundaries
* Add more PromQL tests
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
* Update docs/querying/functions.md
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
---------
Signed-off-by: Michael Hoffmann <mhoffmann@cloudflare.com>
Signed-off-by: Michael Hoffmann <mhoffm@posteo.de>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
These are supported in the main prometheus binary but the feature flags
weren't supported in promtool.
Fixes#16412.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
We implemented proper type handling (gauge vs. counter histogram) a
while ago. The note about it is obsolete.
Signed-off-by: beorn7 <beorn@grafana.com>
Updated the parser to allow calculations in PromQL durations.
This enables durations in the form of:
rate(http_requests_total[10m+2s])
The computation of the calculations is done directly at the parse level and does not hit the PromQL Engine.
The lexer has also been updated and improved, in particular for subqueries.
Buxfix: rate(http_requests_total[0]) is no longer allowed.
Signed-off-by: Julien Pivotto <291750+roidelapluie@users.noreply.github.com>
The new metric_name_escaping_scheme config option works in parallel with metric_name_validation_scheme and controls which escaping scheme is requested when scraping. When not specified, the scheme will request underscores if the validation scheme is set to legacy, and will request allow-utf-8 when the validation scheme is set to utf8. This setting allows users to allow utf8 names if they like, but explicitly request an escaping scheme rather than UTF-8.
Fixes https://github.com/prometheus/prometheus/issues/16034
Built on https://github.com/prometheus/prometheus/pull/16080
Signed-off-by: Owen Williams <owen.williams@grafana.com>
* ruler notifier: make batch size configurable
In Mimir we experimented with setting a higher value for the batch size.
A 4x increase in batch size decreased the time to process a single notification by about 2x.
This reduces the processing time of the notifications queue and increases the throughput of the queue.
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Update cmd/prometheus/main.go
Co-authored-by: gotjosh <josue.abreu@gmail.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Update docs
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Use a string constant
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
* Add godoc comment on exported constant
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
---------
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Co-authored-by: gotjosh <josue.abreu@gmail.com>
doc: fix broken kuma.io link
I'm not actually familiar with kuma, but I noticed this link was broken, and I believe the one I've found here is equivalent.
Signed-off-by: Ian Kerins <git@isk.haus>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Rationales:
* metadata-wal-records might be deprecated and replaced going forward: https://github.com/prometheus/prometheus/issues/15911
* PRW 2.0 works without metadata just fine (although it sends untyped metrics as expected).
Signed-off-by: bwplotka <bwplotka@gmail.com>
This is also meant to document the actual implementation, but
see #13934 for the current state.
This also improves and streamlines some parts of the documentation
that are not strictly native histogram related, but are colocated with
them. In particular, the section about aggregation operators got
restructured quite a bit, including the removal of a quite verbose
example for `limit_ratio` (which was just too long an this location
and also a bit questionabl in its usefulness).
Signed-off-by: beorn7 <beorn@grafana.com>
This fixes a formatting problem (`__name__`) was rendered in boldface
without the underscores in the headline).
Furthermore, it explains the possible issues with the feature flag
(change of behavior of certain "weird" queries, problems when
disecting a query manually or in PromLens).
Signed-off-by: beorn7 <beorn@grafana.com>
Add --ignore-unknown-fields that ignores unknown fields in rule group
files. There are lots of tools in the ecosystem that "like" to extend
the rule group file structure but they are currently unreadable by
promtool if there's anything extra. The purpose of this flag is so that
we could use the "vanilla" promtool instead of rolling our own.
Some examples of tools/code:
https://github.com/grafana/mimir/blob/main/pkg/mimirtool/rules/rwrulefmt/rulefmt.go8898eb3cc5/pkg/rules/rules.go (L18-L25)
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
What
Adds support for OTLP delta temporality to the OTLP endpoint.
This is done by calling the deltatocumulative processor from the OpenTelemetry collector during OTLP conversion.
Why
Delta conversion is a naturally stateful process, which requires careful request routing when operated inside a collector.
Prometheus is already stateful and doing the conversion in-server reduces the operational burden on the ingest architecture by only having one stateful component.
How
deltatocumulative is a OTel collector component that works as follows:
* pmetric.Metrics come from a receiver or in this case from the HTTP client
* It operates as an in-place update loop:
* for each sample, if not delta, leave unmodified
* if delta, do:
* state += sample, where state is the in-memory sum of all previous samples
* sample = state, sample value is now cumulative
* this is supported for sums (counters), gauges, histograms (old histograms) and exponential histograms (native histograms)
If a series receives no new samples for 5m, its state is removed from memory
Performance
Delta performance is a stateful operation and the OTel code is not highly optimized yet, e.g. it locks the entire processor for each request. Nonetheless, care has been taken to mitigate those effects:
delta conversion is behind a feature flag. If disabled, no conversion code is ever invoked
if enabled, conversion is not invoked if request not actually contains delta samples. This leads to no measureable performance difference between default-cumulative to convert-cumulative (only cumulative, feature on/off)
Signed-off-by: sh0rez <me@shorez.de>
This commit introduced two field in `/status` endpoint:
- The node currently serving the request.
- The current server time for debugging time drift issues.
fixes#15394.
Signed-off-by: sujal shah <sujalshah28092004@gmail.com>
When a remote-write is executed towards a host name that is resolved to multiple IP addresses, this PR introduces a possibility to force creation of new connections used for the remote-write request to a randomly chosen IP address from the ones corresponding to the host name. The default behavior remains unchanged, i.s., the IP address used for the connection creation remains the one chosen by Go.
This is an experimental feature, it is disabled by default.
Signed-off-by: Yuri Nikolic <durica.nikolic@grafana.com>
Enable the `auto-gomaxprocs` feature flag by default.
* Add command line flag `--no-auto-gomaxprocs` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Enable the `auto-gomemlimit` feature flag by default.
* Add command line flag `--no-auto-gomemlimit` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Enable the `auto-gomaxprocs` feature flag by default.
* Add command line flag `--no-auto-gomaxprocs` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
* Enable auto-gomemlimit by default
Enable the `auto-gomemlimit` feature flag by default.
* Add command line flag `--no-auto-gomemlimit` to disable.
---------
Signed-off-by: SuperQ <superq@gmail.com>
Documented that WAL can still be written after memory-snapshot-on-shutdown - #10824
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Signed-off-by: gopi <gopi.singaravelan.k@gmail.com>
---------
Signed-off-by: Gopi-eng2202 <gopi.singaravelan.k@gmail.com>
Signed-off-by: gopi <gopi.singaravelan.k@gmail.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
* docs: 2 to 3 migration guide
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
* docs/stability: add 3.0 section
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
* docs/migration: details on enabling legacy name validation
Signed-off-by: Owen Williams <owen.williams@grafana.com>\
* migration: add log format and `le` normalization
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
* migration: add new enable_http2 default for remote write
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
---------
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Co-authored-by: Owen Williams <owen.williams@grafana.com>
Remote-write creates several shards to parallelise sending, each with
its own http connection. We do not want them all combined onto one
socket by http2.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* promtool: Add debug flag for rule tests
This makes it print out the tsdb state (both input_series and rules that
are run) at the end of a test, making reasoning about tests much easier.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
* Reuse generated test name from junit testing
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: David Leadbeater <dgl@dgl.cx>
scrape: Remove implicit fallback to the Prometheus text format
Remove implicit fallback to the Prometheus text format in case of invalid/missing Content-Type and fail the scrape instead. Add ability to specify a `fallback_scrape_protocol` in the scrape config.
---------
Signed-off-by: alexgreenbank <alex.greenbank@grafana.com>
Signed-off-by: Alex Greenbank <alex.greenbank@grafana.com>
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
The `info` function is an experiment to improve UX
around including labels from info metrics.
`info` has to be enabled via the feature flag `--enable-feature=promql-experimental-functions`.
This MVP of info simplifies the implementation by assuming:
* Only support for the target_info metric
* That target_info's identifying labels are job and instance
Also:
* Encode info samples' original timestamp as sample value
* Deduce info series select hints from top-most VectorSelector
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Ying WANG <ying.wang@grafana.com>
Co-authored-by: Augustin Husson <augustin.husson@amadeus.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Bryan Boreham <bjboreham@gmail.com>
Put a scalar to query, it can be graphed.
So the doc says "an expression that returns an instant vector is the only type which can be graphed." is not correct?
And also, a query_range, which used for graph, always return a range vector <https://promlabs.com/blog/2020/06/18/the-anatomy-of-a-promql-query/#range-queries> , so it's confusing to read the above statement.
Signed-off-by: Viet Hung Nguyen <hvn@familug.org>
Extracted HTTP client options (e.g., authentication, proxy settings,
TLS configuration, and custom headers) into a dedicated section for
improved clarity and organization. This will centralize all HTTP-related
options from prometheus/common in one place within the documentation.
The remaining HTTP-related settings in sections (e.g. Service Discovery)
will be moved in a follow-up PR to further unify the documentation
structure.
Signed-off-by: Julien <roidelapluie@o11y.eu>
This unifies the documentation of float literals and time durations
and updates all references to the old definitions.
Signed-off-by: beorn7 <beorn@grafana.com>
The OTLP receiver can now considered stable. We've had it for longer
than a year in main and has received constant improvements.
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
The instant vector documentation does not explain which metric samples are selected - in particular, it makes no reference to staleness.
It's confusing when reading the docs to understand how exactly Prometheus selects the metrics to report: the most recent sample older than the search timestamp specified in the API request, so long as that metric is not "stale".
Signed-off-by: Craig Ringer <craig.ringer@enterprisedb.com>
In detail:
- Clarify that label name and value length limits are in byte,
not in UTF-8 data points.
- More consistent formatting to keep 80 characters line limet.
- Clarify various misleading specifications around "per sample",
"per scrape", "per scrape config", "per job"...
- Fix grammar.
Signed-off-by: beorn7 <beorn@grafana.com>
The linear interpolation (assuming that observations are uniformly
distributed within a bucket) is a solid and simple assumption in lack
of any other information. However, the exponential bucketing used by
standard schemas of native histograms has been chosen to cover the
whole range of observations in a way that bucket populations are
spread out over buckets in a reasonably way for typical distributions
encountered in real-world scenarios.
This is the origin of the idea implemented here: If we divide a given
bucket into two (or more) smaller exponential buckets, we "most
naturally" expect that the samples in the original buckets will split
among those smaller buckets in a more or less uniform fashion. With
this assumption, we end up with an "exponential interpolation", which
therefore appears to be a better match for histograms with exponential
bucketing.
This commit leaves the linear interpolation in place for NHCB, but
changes the interpolation for exponential native histograms to
exponential. This affects `histogram_quantile` and
`histogram_fraction` (because the latter is more or less the inverse
of the former).
The zero bucket has to be treated specially because the assumption
above would lead to an "interpolation to zero" (the bucket density
approaches infinity around zero, and with the postulated uniform usage
of buckets, we would end up with an estimate of zero for all quantiles
ending up in the zero bucket). We simply fall back to linear
interpolation within the zero bucket.
At the same time, this commit makes the call to stick with the
assumption that the zero bucket only contains positive observations
for native histograms without negative buckets (and vice versa). (This
is an assumption relevant for interpolation. It is a mostly academic
point, as the zero bucket is supposed to be very small anyway.
However, in cases where it _is_ relevantly broad, the assumption helps
a lot in practice.)
This commit also updates and completes the documentation to match both
details about interpolation.
As a more high level note: The approach here attempts to strike a
balance between a more simplistic approach without any assumption, and
a more involved approach with more sophisticated assumptions. I will
shortly describe both for reference:
The "zero assumption" approach would be to not interpolate at all, but
_always_ return the harmonic mean of the bucket boundaries of the
bucket the quantile ends up in. This has the advantage of minimizing
the maximum possible relative error of the quantile estimation.
(Depending on the exact definition of the relative error of an
estimation, there is also an argument to return the arithmetic mean of
the bucket boundaries.) While limiting the maximum possible relative
error is a good property, this approach would throw away the
information if a quantile is closer to the upper or lower end of the
population within a bucket. This can be valuable trending information
in a dashboard. With any kind of interpolation, the maximum possible
error of a quantile estimation increases to the full width of a bucket
(i.e. it more than doubles for the harmonic mean approach, and
precisely doubles for the arithmetic mean approach). However, in
return the _expectation value_ of the error decreases. The increase of
the theoretical maximum only has practical relevance for pathologic
distributions. For example, if there are thousand observations within
a bucket, they could _all_ be at the upper bound of the bucket. If the
quantile calculation picks the 1st observation in the bucket as the
relevant one, an interpolation will yield a value close to the lower
bucket boundary, while the true quantile value is close to the upper
boundary.
The "fancy interpolation" approach would be one that analyses the
_actual_ distribution of samples in the histogram. A lot of statistics
could be applied based on the information we have available in the
histogram. This would include the population of neighboring (or even
all) buckets in the histogram. In general, the resolution of a native
histogram should be quite high, and therefore, those "fancy"
approaches would increase the computational cost quite a bit with very
little practical benefits (i.e. just tiny corrections of the estimated
quantile value). The results are also much harder to reason with.
Signed-off-by: beorn7 <beorn@grafana.com>
Conflicts:
cmd/prometheus/main.go
docs/command-line/prometheus.md
docs/feature_flags.md
web/ui/build_ui.sh
web/web.go
Resolved by dropping the UTF-8 feature flag and adding the
`auto-reload-config` feature flag.
For the new web ui pick all changes from `main`.
This change causes Prometheus to allow all UTF-8 characters in metric and label names.
This means that names that were previously invalid and would have been previously rejected will be allowed through.
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Fix call to newTestEngine(t) in promql/engine_test.go:3214.
`agent` feature-flag it's own cmdline flag now.
Remove `scrape.name-escaping-scheme` argument.
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation
- This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
- The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
- The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
- Fixes https://github.com/prometheus/prometheus/issues/11397
- See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
- See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
- Note: a feature flag `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)
Example (this always fails, as `__name__` is being dropped by `count_over_time`):
```
count_over_time({__name__!=""}[1m])
=> Error executing query: vector cannot contain metrics with the same labelset
```
Before:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=> Error executing query: vector cannot contain metrics with the same labelset
```
After:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
---------
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>