Commit graph

437 commits

Author SHA1 Message Date
Callum Styan
8fd73b1d28
Add Exemplar Remote Write support (#8296)
* Write exemplars to the WAL and send them over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Update example for exemplars, print data in a more obvious format.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add metrics for remote write of exemplars.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix incorrect slices passed to send in remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* We need to unregister the new metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Order of exemplar append vs write exemplar to WAL needs to change.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Several fixes to prevent sending uninitialized or incorrect samples with an exemplar. Fix dropping exemplar for missing series. Add tests for queue_manager sending exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Store both samples and exemplars in the same timeseries buffer to remove the alloc when building final request, keep sub-slices in separate buffers for re-use

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Condense sample/exemplar delivery tests to parameterized sub-tests

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename test methods for clarity now that they also handle exemplars

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Rename counter variable. Fix instances where metrics were not updated correctly

Signed-off-by: Martin Disibio <mdisibio@gmail.com>

* Add exemplars to LoadWAL benchmark

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* last exemplars timestamp metric needs to convert value to seconds with
ms precision

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Process exemplar records in a separate go routine when loading the WAL.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address review comments related to clarifying comments and variable
names. Also refactor sample/exemplar to enqueue prompb types.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Regenerate types proto with comments, update protoc version again.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Put remote write of exemplars behind a feature flag.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some of Ganesh's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Move exemplar remote write feature flag to a config file field.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address Bartek's review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allocate exemplar buffers in queue_manager if we're not going to
send exemplars over remote write.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add ValidateExemplar function, validate exemplars when appending to head
and log them all to WAL before adding them to exemplar storage.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address more reivew comments from Ganesh.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Add exemplar total label length check.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address a few last review comments

Signed-off-by: Callum Styan <callumstyan@gmail.com>

Co-authored-by: Martin Disibio <mdisibio@gmail.com>
2021-05-06 13:53:52 -07:00
Damien Grisonnet
b50f9c1c84
Add label scrape limits (#8777)
* scrape: add label limits per scrape

Add three new limits to the scrape configuration to provide some
mechanism to defend against unbound number of labels and excessive
label lengths. If any of these limits are broken by a sample from a
scrape, the whole scrape will fail. For all of these configuration
options, a zero value means no limit.

The `label_limit` configuration will provide a mechanism to bound the
number of labels per-scrape of a certain sample to a user defined limit.
This limit will be tested against the sample labels plus the discovery
labels, but it will exclude the __name__ from the count since it is a
mandatory Prometheus label to which applying constraints isn't
meaningful.

The `label_name_length_limit` and `label_value_length_limit` will
prevent having labels of excessive lengths. These limits also skip the
__name__ label for the same reasons as the `label_limit` option and will
also make the scrape fail if any sample has a label name/value length
that exceed the predefined limits.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: add metrics and alert to label limits

Add three gauge, one for each label limit to easily access the
limit set by a certain scrape target.
Also add a counter to count the number of targets that exceeded the
label limits and thus were dropped. This is useful for the
`PrometheusLabelLimitHit` alert that will notify the users that scraping
some targets failed because they had samples exceeding the label limits
defined in the scrape configuration.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: apply label limits to __name__ label

Apply limits to the __name__ label that was previously skipped and
truncate the label names and values in the error messages as they can be
very very long.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>

* scrape: remove label limits gauges and refactor

Remove `prometheus_target_scrape_pool_label_limit`,
`prometheus_target_scrape_pool_label_name_length_limit`, and
`prometheus_target_scrape_pool_label_value_length_limit` as they are not
really useful since we don't have the information on the labels in it.

Signed-off-by: Damien Grisonnet <dgrisonn@redhat.com>
2021-05-06 09:56:21 +01:00
Julien Pivotto
f3b2d2a998
Fix config tests in main branch (#8767)
The merge of 8761 did not catch that the secrets were off by one
because it was not rebased on top of 8693.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-29 00:00:30 +02:00
Levi Harrison
fa184a5fc3
Add OAuth 2.0 Config (#8761)
* Introduced oauth2 config into the codebase

Signed-off-by: Levi Harrison <git@leviharrison.dev>
2021-04-28 14:47:52 +02:00
n888
7c028d59c2
Add lightsail service discovery (#8693)
Signed-off-by: N888 <drifto@gmail.com>
2021-04-28 11:29:12 +02:00
Julien Pivotto
5bce801a09
Rename discovery/dockerswarm to discovery/moby (#8691)
This makes it clear that the dockerswarm package does more than docker
swarm, but does also docker.

I have picked moby as it is the upstream name: https://mobyproject.org/

There is no user-facing change, except in the case of a bad
configuration. Previously, a user who would have a bad docker sd config
would see an error like:

> field xx not found in type dockerswarm.plain

Now that error would be turned into:

> field xx not found in type moby.plain

While not perfect, it should at not be confusing between docker and
dockerswarm.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-04-13 09:33:54 +02:00
Julien Pivotto
e635ca834b Add environment variable expansion in external label values
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-30 01:36:28 +02:00
Robert Jacob
b253056163
Implement Docker discovery (#8629)
* Implement Docker discovery

Signed-off-by: Robert Jacob <xperimental@solidproject.de>
2021-03-29 22:30:23 +02:00
Julien Pivotto
5a6d244b00 Scaleway SD: Add the ability to read token from file
Prometheus adds the ability to read secrets from files. This add
this feature for the scaleway service discovery.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-25 00:52:33 +01:00
Julien Pivotto
49016994ac Switch to alertmanager api v2
According to the 2.25 release notes, 2.26 should switch to alertmanager
api v2 by default.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-03-20 01:01:10 +01:00
Rémy Léone
f690b811c5
add support for scaleway service discovery (#8555)
Co-authored-by: Patrik <patrik@ptrk.io>
Co-authored-by: Julien Pivotto <roidelapluie@inuits.eu>

Signed-off-by: Rémy Léone <rleone@scaleway.com>
2021-03-10 15:10:17 +01:00
Robert Fratto
5b78aa0649
Contribute grafana/agent sigv4 code (#8509)
* Contribute grafana/agent sigv4 code
* address review feedback
  - move validation logic for RemoteWrite into unmarshal
  - copy configuration fields from ec2 SD config
  - remove enabled field, use pointer for enabling sigv4
* Update config/config.go
* Don't provide credentials if secret key / access key left blank
* Add SigV4 headers to the list of unchangeable headers.
* sigv4: don't include all headers in signature
* only test for equality in the authorization header, not the signed date
* address review feedback
  1. s/httpClientConfigEnabled/httpClientConfigAuthEnabled
  2. bearer_token tuples to "authorization"
  3. Un-export NewSigV4RoundTripper
* add x-amz-content-sha256 to list of unchangeable headers
* Document sigv4 configuration
* add suggestion for using default AWS SDK credentials

Signed-off-by: Robert Fratto <robertfratto@gmail.com>
Co-authored-by: Julien Pivotto <roidelapluie@gmail.com>
2021-03-08 12:20:09 -07:00
Julien Pivotto
93c6139bc1 Support follow_redirect
This PR introduces support for follow_redirect, to enable users to
disable following HTTP redirects.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-26 22:50:56 +01:00
Harkishen-Singh
79ba53a6c4 Custom headers on remote-read and refactor implementation to roundtripper.
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2021-02-26 17:20:29 +05:30
Julien Pivotto
8787f0aed7 Update common to support credentials type
Most of the backwards compat tests is done in common.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2021-02-18 23:28:22 +01:00
Harkishen-Singh
77c20fd2f8 Adds support to configure retry on Rate-Limiting from remote-write config.
Signed-off-by: Harkishen-Singh <harkishensingh@hotmail.com>
2021-02-16 14:52:49 +05:30
Nándor István Krácser
509000269a
remote_write: allow passing along custom HTTP headers (#8416)
* remote_write: allow passing along custom HTTP headers

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* add warning

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* remote_write: add header valadtion

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* extend tests for bad remote write headers

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>

* remote_write: add note about the authorization header

Signed-off-by: Nandor Kracser <bonifaido@gmail.com>
2021-02-04 14:18:13 -07:00
Alexey Shumkin
73ddf603af
discovery/kubernetes: Fix valid label selector causing config error
Label selector can be
"set-based"(https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/#set-based-requirement)
but such a selector causes Prometheus start failure with the "unexpected
error: parsing YAML file ...: invalid selector: 'foo in (bar,baz)';
can't understand 'baz)'"-like error.

This is caused by the `fields.ParseSelector(string)` function that
simply splits an expression as a CSV-list, so a comma confuses such a
parsing method and lead to the error.

Use `labels.Parse(string)` to use a valid lexer to parse a selector
expression.

Closes #8284.

Signed-off-by: Alexey Shumkin <Alex.Crezoff@gmail.com>
2020-12-16 10:56:01 +03:00
gotjosh
4eca4dffb8
Allow metric metadata to be propagated via Remote Write. (#6815)
* Introduce a metadata watcher

Similarly to the WAL watcher, its purpose is to observe the scrape manager and pull metadata. Then, send it to a remote storage.

Signed-off-by: gotjosh <josue@grafana.com>

* Additional fixes after rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Rework samples/metadata metrics.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Use more descriptive variable names in MetadataWatcher collect.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix issues caused during rebasing.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix missing metric add and unneeded config code.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Address some review comments.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Fix metrics and docs

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Replace assert with require

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Bring back max_samples_per_send metric

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

* Fix tests

Signed-off-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>

Co-authored-by: Callum Styan <callumstyan@gmail.com>
Co-authored-by: Ganesh Vernekar <cs15btech11018@iith.ac.in>
2020-11-19 20:53:03 +05:30
Julien Pivotto
3509647462
Docker swarm: add filtering of services (#8074)
* Docker swarm: add filtering of services

Add filters on all docker swarm roles (nodes, tasks and services).

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-11-09 12:41:02 +01:00
Julien Pivotto
6c56a1faaa
Testify: move to require (#8122)
* Testify: move to require

Moving testify to require to fail tests early in case of errors.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>

* More moves

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-29 09:43:23 +00:00
Julien Pivotto
1282d1b39c
Refactor test assertions (#8110)
* Refactor test assertions

This pull request gets rid of assert.True where possible to use
fine-grained assertions.

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-27 11:06:53 +01:00
Julien Pivotto
4e5b1722b3
Move away from testutil, refactor imports (#8087)
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-10-22 11:00:08 +02:00
Bryan Boreham
90fc6be70f
Default to bigger remote_write sends (#5267)
* Default to bigger remote_write sends

Raise the default MaxSamplesPerSend to amortise the cost of remote
calls across more samples. Lower MaxShards to keep the expected max
memory usage within reason.

Signed-off-by: Bryan Boreham <bryan@weave.works>

* Change default Capacity to 2500

To maintain ratio with MaxSamplesPerSend

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2020-09-09 14:00:23 -06:00
kangwoo
7c0d5ae4e7
Add Eureka Service Discovery (#3369)
Signed-off-by: kangwoo <kangwoo@gmail.com>
2020-08-26 17:36:59 +02:00
Lukas Kämmerling
b6955bf1ca
Add hetzner service discovery (#7822)
Signed-off-by: Lukas Kämmerling <lukas.kaemmerling@hetzner-cloud.de>
2020-08-21 15:49:19 +02:00
Andy Bursavich
4e6a94a27d
Invert service discovery dependencies (#7701)
This also fixes a bug in query_log_file, which now is relative to the config file like all other paths.

Signed-off-by: Andy Bursavich <abursavich@gmail.com>
2020-08-20 13:48:26 +01:00
Julien Pivotto
f482c7bdd7
Add per scrape-config targets limit (#7554)
* Add per scrape-config targets limit

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-30 14:20:24 +02:00
Julien Pivotto
610b622520
Merge pull request #7644 from prometheus/release-2.20
Merge release 2.20 into master
2020-07-23 09:47:47 +02:00
Julien Pivotto
f8ec72d730 Add digitalocean test
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:04:36 +02:00
Julien Pivotto
a197508d09 Add docker swarm test
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:04:36 +02:00
Julien Pivotto
0cca23d3ed DigitalOcean, Docker Swarm: properly load files
Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-07-22 00:01:19 +02:00
Steffen Neubauer
9c9b872087
OpenStack SD: Add availability config option, to choose endpoint type (#7494)
* OpenStack SD: Add availability config option, to choose endpoint type

In some environments Prometheus must query OpenStack via an alternative
endpoint type (gophercloud calls this `availability`.

This commit implements this option.

Co-Authored-By: Dennis Kuhn <d.kuhn@syseleven.de>
Signed-off-by: Steffen Neubauer <s.neubauer@syseleven.de>
2020-07-02 15:17:56 +01:00
Jop Zinkweg
1f69c38ba4
Add discovery support for triton compute nodes (#7250)
Added optional configuration item role, defaults to 'container' (backwards-compatible).
Setting role to 'cn' will discover compute nodes instead.

Human-friendly compute node hostname discovery depends on cmon 1.7.0:
c1a2aeca36

Adjust testcases to use discovery config per case as two different types are now supported.

Updated documentation:
* new role setting
* clarify what the name 'container' covers as triton uses different names in different locations

Signed-off-by: jzinkweg <jzinkweg@gmail.com>
2020-05-22 16:19:21 +01:00
Aleksandra Gacek
8e53c19f9c discovery/kubernetes: expose label_selector and field_selector
Close #6807

Co-authored-by @shuttie
Signed-off-by: Aleksandra Gacek <algacek@google.com>
2020-02-15 14:57:56 +01:00
Grebennikov Roman
b4445ff03f discovery/kubernetes: expose label_selector and field_selector
Closes #6096

Signed-off-by: Grebennikov Roman <grv@dfdx.me>
2020-02-15 14:57:38 +01:00
Julien Pivotto
9d9bc524e5 Add query log (#6520)
* Add query log, make stats logged in JSON like in the API

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2020-01-08 13:28:43 +00:00
Callum Styan
67838643ee
Add config option for remote job name (#6043)
* Track remote write queues via a map so we don't care about index.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Support a job name for remote write/read so we can differentiate between
them using the name.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Remote write/read has Name to not confuse the meaning of the field with
scrape job names.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Split queue/client label into remote_name and url labels.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Don't allow for duplicate remote write/read configs.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Ensure we restart remote write queues if the hash of their config has
not changed, but the remote name has changed.

Signed-off-by: Callum Styan <callumstyan@gmail.com>

* Include name in remote read/write config hashes, simplify duplicates
check, update test accordingly.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-12-12 12:47:23 -08:00
Simon Pasquier
cccd542891
*: avoid missed Alertmanager targets (#6455)
This change makes sure that nearly-identical Alertmanager configurations
aren't merged together.

The config's identifier was the MD5 hash of the configuration serialized
to JSON but because `relabel.Regexp` has no public field and doesn't
implement the JSON.Marshaler interface, it was always serialized to
"{}".

In practice, the identifier can be based on the index of the
configuration in the list.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-12-12 17:00:19 +01:00
johncming
8d3083e256 config: add test case for scrape interval larger than timeout. (#6037)
Signed-off-by: johncming <johncming@yahoo.com>
2019-09-23 13:26:56 +02:00
Bartek Plotka
f0863a604e Removed extra tsdb/testutil after merge.
Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2019-08-14 10:12:32 +01:00
Chris Marchbanks
a6a55c433c Improve desired shards calculation (#5763)
The desired shards calculation now properly keeps track of the rate of
pending samples, and uses the previously unused integralAccumulator to
adjust for missing information in the desired shards calculation.

Also, configure more capacity for each shard.  The default 10 capacity
causes shards to block on each other while
sending remote requests. Default to a 500 sample capacity and explain in
the documentation that having more capacity will help throughput.

Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-08-13 10:10:21 +01:00
Chris Marchbanks
529ccff07b
Remove all usages of stretchr/testify
Signed-off-by: Chris Marchbanks <csmarchbanks@gmail.com>
2019-08-08 19:49:27 -06:00
Max Leonard Inden
41c22effbe
config&notifier: Add option to use Alertmanager API v2
With v0.16.0 Alertmanager introduced a new API (v2). This patch adds a
configuration option for Prometheus to send alerts to the v2 endpoint
instead of the defautl v1 endpoint.

Signed-off-by: Max Leonard Inden <IndenML@gmail.com>
2019-06-21 16:33:53 +02:00
Callum Styan
e9129abeff Remove max_retries from queue_config since it's not used in remote write
anymore.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-06-10 12:43:08 -07:00
Tariq Ibrahim
8fdfa8abea refine error handling in prometheus (#5388)
i) Uses the more idiomatic Wrap and Wrapf methods for creating nested errors.
ii) Fixes some incorrect usages of fmt.Errorf where the error messages don't have any formatting directives.
iii) Does away with the use of fmt package for errors in favour of pkg/errors

Signed-off-by: tariqibrahim <tariq181290@gmail.com>
2019-03-26 00:01:12 +01:00
Callum Styan
5603b857a9 Check if label value is valid when unmarhsaling external labels from
YAML, add a test to config_tests for valid/invalid external label
value.

Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-18 20:31:12 +00:00
Tom Wilkie
c7b3535997 Use pkg/relabelling in remote write.
- Unmarshall external_labels config as labels.Labels, add tests.
- Convert some more uses of model.LabelSet to labels.Labels.
- Remove old relabel pkg (fixes #3647).
- Validate external label names.

Signed-off-by: Tom Wilkie <tom.wilkie@gmail.com>
2019-03-18 20:31:12 +00:00
Julien Pivotto
4397916cb2 Add honor_timestamps (#5304)
Fixes #5302

Signed-off-by: Julien Pivotto <roidelapluie@inuits.eu>
2019-03-15 10:04:15 +00:00
Callum Styan
83c46fd549 update Consul vendor code so that catalog.ServiceMultipleTags can be (#5151)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-03-12 10:31:27 +00:00
Simon Pasquier
027d2ece14 config: resolve more file paths (#5284)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-03-12 10:24:15 +00:00
Simon Pasquier
e72c875e63
config: fix Kubernetes config with empty API server (#5256)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-22 15:51:47 +01:00
Simon Pasquier
c8a1a5a93c
discovery/kubernetes: fix support for password_file and bearer_token_file (#5211)
* discovery/kubernetes: fix support for password_file

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Create and pass custom RoundTripper to Kubernetes client

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use inline HTTPClientConfig

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-02-20 11:22:34 +01:00
Callum Styan
6f69e31398 Tail the TSDB WAL for remote_write
This change switches the remote_write API to use the TSDB WAL.  This should reduce memory usage and prevent sample loss when the remote end point is down.

We use the new LiveReader from TSDB to tail WAL segments.  Logic for finding the tracking segment is included in this PR.  The WAL is tailed once for each remote_write endpoint specified. Reading from the segment is based on a ticker rather than relying on fsnotify write events, which were found to be complicated and unreliable in early prototypes.

Enqueuing a sample for sending via remote_write can now block, to provide back pressure.  Queues are still required to acheive parallelism and batching.  We have updated the queue config based on new defaults for queue capacity and pending samples values - much smaller values are now possible.  The remote_write resharding code has been updated to prevent deadlocks, and extra tests have been added for these cases.

As part of this change, we attempt to guarantee that samples are not lost; however this initial version doesn't guarantee this across Prometheus restarts or non-retryable errors from the remote end (eg 400s).

This changes also includes the following optimisations:
- only marshal the proto request once, not once per retry
- maintain a single copy of the labels for given series to reduce GC pressure

Other minor tweaks:
- only reshard if we've also successfully sent recently
- add pending samples, latest sent timestamp, WAL events processed metrics

Co-authored-by: Chris Marchbanks <csmarchbanks.com> (initial prototype)
Co-authored-by: Tom Wilkie <tom.wilkie@gmail.com> (sharding changes)
Signed-off-by: Callum Styan <callumstyan@gmail.com>
2019-02-12 11:39:13 +00:00
Simon Pasquier
f678e27eb6
*: use latest release of staticcheck (#5057)
* *: use latest release of staticcheck

It also fixes a couple of things in the code flagged by the additional
checks.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>

* Use official release of staticcheck

Also run 'go list' before staticcheck to avoid failures when downloading packages.

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2019-01-04 14:47:38 +01:00
Marcel D. Juhnke
c7d83b2b6a discovery: add support for Managed Identity authentication in Azure SD (#4590)
Signed-off-by: Marcel Juhnke <marrat@marrat.de>
2018-12-19 10:03:33 +00:00
Bartek Płotka
62c8337e77 Moved configuration into relabel package. (#4955)
Adapted top dir relabel to use pkg relabel structs.

Removal of this in a separate tracked here: https://github.com/prometheus/prometheus/issues/3647

Signed-off-by: Bartek Plotka <bwplotka@gmail.com>
2018-12-18 11:26:36 +00:00
Ryota Arai
135d580ab2 Introduce min_shards for remote write to set minimum number of shards. (#4924)
Signed-off-by: Ryota Arai <ryota.arai@gmail.com>
2018-12-04 17:32:14 +00:00
Julius Volz
d28246e337
Fix config loading panics on nil pointer slice elements (#4942)
Fixes https://github.com/prometheus/prometheus/issues/4902
Fixes https://github.com/prometheus/prometheus/issues/4889

Signed-off-by: Julius Volz <julius.volz@gmail.com>
2018-12-03 18:09:02 +08:00
mengnan
a5d39361ab discovery/azure: Fail hard when Azure authentication parameters are missing (#4907)
* discovery/azure: fail hard when client_id/client_secret is empty

Signed-off-by: mengnan <supernan1994@gmail.com>

* discovery/azure: fail hard when authentication parameters are missing

Signed-off-by: mengnan <supernan1994@gmail.com>

* add unit test

Signed-off-by: mengnan <supernan1994@gmail.com>

* add unit test

Signed-off-by: mengnan <supernan1994@gmail.com>

* format code

Signed-off-by: mengnan <supernan1994@gmail.com>
2018-11-29 16:47:59 +01:00
Ben Kochie
c6399296dc
Fix spelling/typos (#4921)
* Fix spelling/typos

Fix spelling/typos reported by codespell/misspell.
* UK -> US spelling changes.

Signed-off-by: Ben Kochie <superq@gmail.com>
2018-11-27 17:44:29 +01:00
Simon Pasquier
ff08c40091 discovery/openstack: support tls_config
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-09-25 14:31:32 +02:00
Simon Pasquier
128ff546b8 config: add test for OpenStack SD (#4594)
Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2018-09-13 21:44:27 +05:30
Tariq Ibrahim
f708fd5c99 Adding support for multiple azure environments (#4569)
Signed-off-by: Tariq Ibrahim <tariq.ibrahim@microsoft.com>
2018-09-04 17:55:40 +02:00
Daisy T
7d01ead689 change time.duration to model.duration for standardization (#4479)
Signed-off-by: Daisy T <daisyts@gmx.com>
2018-08-24 16:55:21 +02:00
Goutham Veeramachaneni
c28cc5076c Saner defaults and metrics for remote-write (#4279)
* Rename queueCapacity to shardCapacity
* Saner defaults for remote write
* Reduce allocs on retries

Signed-off-by: Goutham Veeramachaneni <cs14btech11014@iith.ac.in>
2018-07-18 05:15:16 +01:00
Paul Gier
d24d2acd11 config: set target group source index during unmarshalling (#4245)
* config: set target group source index during unmarshalling

Fixes issue #4214 where the scrape pool is unnecessarily reloaded for a
config reload where the config hasn't changed.  Previously, the discovery
manager changed the static config after loading which caused the in-memory
config to differ from a freshly reloaded config.

Signed-off-by: Paul Gier <pgier@redhat.com>

* [issue #4214] Test that static targets are not modified by discovery manager

Signed-off-by: Paul Gier <pgier@redhat.com>
2018-06-13 16:34:59 +01:00
Philippe Laflamme
2aba238f31 Use common HTTPClientConfig for marathon_sd configuration (#4009)
This adds support for basic authentication which closes #3090

The support for specifying the client timeout was removed as discussed in https://github.com/prometheus/common/pull/123. Marathon was the only sd mechanism doing this and configuring the timeout is done through `Context`.

DC/OS uses a custom `Authorization` header for authenticating. This adds 2 new configuration properties to reflect this.

Existing configuration files that use the bearer token will no longer work. More work is required to make this backwards compatible.
2018-04-05 09:08:18 +01:00
Manos Fokas
25f929b772 Yaml UnmarshalStrict implementation. (#4033)
* Updated yaml vendor package.

* remove checkOverflow duplicate in rulefmt

* remove duplicated HTTPClientConfig.Validate()

* Added yaml static check.
2018-04-04 09:07:39 +01:00
Kristiyan Nikolov
be85ba3842 discovery/ec2: Support filtering instances in discovery (#4011) 2018-03-31 07:51:11 +01:00
Corentin Chary
60dafd425c consul: improve consul service discovery (#3814)
* consul: improve consul service discovery

Related to #3711

- Add the ability to filter by tag and node-meta in an efficient way (`/catalog/services`
  allow filtering by node-meta, and returns a `map[string]string` or `service`->`tags`).
  Tags and nore-meta are also used in `/catalog/service` requests.
- Do not require a call to the catalog if services are specified by name. This is important
  because on large cluster `/catalog/services` changes all the time.
- Add `allow_stale` configuration option to do stale reads. Non-stale
  reads can be costly, even more when you are doing them to a remote
  datacenter with 10k+ targets over WAN (which is common for federation).
- Add `refresh_interval` to minimize the strain on the catalog and on the
  service endpoint. This is needed because of that kind of behavior from
  consul: https://github.com/hashicorp/consul/issues/3712 and because a catalog
  on a large cluster would basically change *all* the time. No need to discover
  targets in 1sec if we scrape them every minute.
- Added plenty of unit tests.

Benchmarks
----------

```yaml
scrape_configs:

- job_name: prometheus
  scrape_interval: 60s
  static_configs:
    - targets: ["127.0.0.1:9090"]

- job_name: "observability-by-tag"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      tag: marathon-user-observability  # Used in After
      refresh_interval: 30s             # Used in After+delay
  relabel_configs:
    - source_labels: [__meta_consul_tags]
      regex: ^(.*,)?marathon-user-observability(,.*)?$
      action: keep

- job_name: "observability-by-name"
  scrape_interval: "60s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - observability-cerebro
        - observability-portal-web

- job_name: "fake-fake-fake"
  scrape_interval: "15s"
  metrics_path: "/metrics"
  consul_sd_configs:
    - server: consul.service.par.consul.prod.crto.in:8500
      services:
        - fake-fake-fake
```

Note: tested with ~1200 services, ~5000 nodes.

| Resource | Empty | Before | After | After + delay |
| -------- |:-----:|:------:|:-----:|:-------------:|
|/service-discovery size|5K|85MiB|27k|27k|27k|
|`go_memstats_heap_objects`|100k|1M|120k|110k|
|`go_memstats_heap_alloc_bytes`|24MB|150MB|28MB|27MB|
|`rate(go_memstats_alloc_bytes_total[5m])`|0.2MB/s|28MB/s|2MB/s|0.3MB/s|
|`rate(process_cpu_seconds_total[5m])`|0.1%|15%|2%|0.01%|
|`process_open_fds`|16|*1236*|22|22|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="services"}[5m])`|~0|1|1|*0.03*|
|`rate(prometheus_sd_consul_rpc_duration_seconds_count{call="service"}[5m])`|0.1|*80*|0.5|0.5|
|`prometheus_target_sync_length_seconds{quantile="0.9",scrape_job="observability-by-tag"}`|N/A|200ms|0.2ms|0.2ms|
|Network bandwidth|~10kbps|~2.8Mbps|~1.6Mbps|~10kbps|

Filtering by tag using relabel_configs uses **100kiB and 23kiB/s per service per job** and quite a lot of CPU. Also sends and additional *1Mbps* of traffic to consul.
Being a little bit smarter about this reduces the overhead quite a lot.
Limiting the number of `/catalog/services` queries per second almost removes the overhead of service discovery.

* consul: tweak `refresh_interval` behavior

`refresh_interval` now does what is advertised in the documentation,
there won't be more that one update per `refresh_interval`. It now
defaults to 30s (which was also the current waitTime in the consul query).

This also make sure we don't wait another 30s if we already waited 29s
in the blocking call by substracting the number of elapsed seconds.

Hopefully this will do what people expect it does and will be safer
for existing consul infrastructures.
2018-03-23 14:48:43 +00:00
pasquier-s
fc8cf08f42 Prevent invalid label names with labelmap (#3868)
This change ensures that the relabeling configurations using labelmap
can't generate invalid label names.
2018-02-21 10:02:22 +00:00
Shubheksha Jalan
0471e64ad1 Use shared types from the common repo (#3674)
* refactor: use shared types from common repo, remove util/config

* vendor: add common/config

* fix nit
2018-01-11 16:10:25 +01:00
Shubheksha Jalan
ec94df49d4 Refactor SD configuration to remove config dependency (#3629)
* refactor: move targetGroup struct and CheckOverflow() to their own package

* refactor: move auth and security related structs to a utility package, fix import error in utility package

* refactor: Azure SD, remove SD struct from config

* refactor: DNS SD, remove SD struct from config into dns package

* refactor: ec2 SD, move SD struct from config into the ec2 package

* refactor: file SD, move SD struct from config to file discovery package

* refactor: gce, move SD struct from config to gce discovery package

* refactor: move HTTPClientConfig and URL into util/config, fix import error in httputil

* refactor: consul, move SD struct from config into consul discovery package

* refactor: marathon, move SD struct from config into marathon discovery package

* refactor: triton, move SD struct from config to triton discovery package, fix test

* refactor: zookeeper, move SD structs from config to zookeeper discovery package

* refactor: openstack, remove SD struct from config, move into openstack discovery package

* refactor: kubernetes, move SD struct from config into kubernetes discovery package

* refactor: notifier, use targetgroup package instead of config

* refactor: tests for file, marathon, triton SD - use targetgroup package instead of config.TargetGroup

* refactor: retrieval, use targetgroup package instead of config.TargetGroup

* refactor: storage, use config util package

* refactor: discovery manager, use targetgroup package instead of config.TargetGroup

* refactor: use HTTPClient and TLS config from configUtil instead of config

* refactor: tests, use targetgroup package instead of config.TargetGroup

* refactor: fix tagetgroup.Group pointers that were removed by mistake

* refactor: openstack, kubernetes: drop prefixes

* refactor: remove import aliases forced due to vscode bug

* refactor: move main SD struct out of config into discovery/config

* refactor: rename configUtil to config_util

* refactor: rename yamlUtil to yaml_config

* refactor: kubernetes, remove prefixes

* refactor: move the TargetGroup package to discovery/

* refactor: fix order of imports
2017-12-29 21:01:34 +01:00
Brian Brazil
fba80da635
Fix default of read_recent to be false. (#3617)
This is what is documented in the migration guide, and the default settings
should make sense for a true long term storage.

Document the setting.
2017-12-23 17:21:38 +00:00
Krasi Georgiev
e405e2f1ea refactored discovery 2017-12-18 17:22:49 +00:00
Alberto Cortés
29da2fb9cd testutil: update to go1.9 testing.Helper 2017-12-08 19:06:53 +01:00
Alberto Cortés
8f6a9f7833 config: simplify tests by using testutil.NotOk (#3289)
Also include filename in all LoadFile errors

Also add mesage to testuitl.NotOk so we can identify failing tests when
using table driven tests.
2017-12-08 16:52:25 +00:00
Tobias Schmidt
7098c56474 Add remote read filter option
For special remote read endpoints which have only data for specific
queries, it is desired to limit the number of queries sent to the
configured remote read endpoint to reduce latency and performance
overhead.
2017-11-13 23:30:01 +01:00
Krasi Georgiev
e86d82ad2d Fix regression of alert rules state loss on config reload. (#3382)
* incorrect map name for the group prevented copying state from existing alert rules on config reload

* applyConfig test

* few nits

* nits 2
2017-11-01 12:58:00 +01:00
Thibault Chataigner
bf4a279a91 Remote storage reads based on oldest timestamp in primary storage (#3129)
Currently all read queries are simply pushed to remote read clients.
This is fine, except for remote storage for wich it unefficient and
make query slower even if remote read is unnecessary.
So we need instead to compare the oldest timestamp in primary/local
storage with the query range lower boundary. If the oldest timestamp
is older than the mint parameter, then there is no need for remote read.
This is an optionnal behavior per remote read client.

Signed-off-by: Thibault Chataigner <t.chataigner@criteo.com>
2017-10-18 12:08:14 +01:00
Alberto Cortés
6c67296423 config: fix error message for unexpected result of yaml marshal 2017-10-12 19:50:07 +02:00
Alberto Cortés
0f3d8ea075 config: use testutil package 2017-10-12 19:50:07 +02:00
Fabian Reinartz
2d0b8e8b94 Merge branch 'master' into dev-2.0 2017-10-05 13:09:18 +02:00
Alberto Cortés
bb3dad9cba config: simplify some returns 2017-09-26 16:57:56 +02:00
Bryan Boreham
9d6b945e41 Default HTTP keep-alive ON for remote read/write 2017-09-11 09:48:30 +00:00
Bryan Boreham
e0a4d18301 Allow http keep-alive setting to be overridden in config 2017-09-11 09:07:14 +00:00
Fabian Reinartz
e746282772 Merge branch 'master' into dev-2.0 2017-09-11 10:55:19 +02:00
Jamie Moore
7a135e0a1b Add the ability to assume a role for ec2 discovery 2017-09-10 00:36:43 +10:00
Fabian Reinartz
87918f3097 Merge branch 'master' into dev-2.0 2017-09-04 14:09:21 +02:00
Johannes 'fish' Ziemke
70f3d1e9f9 k8s: Support discovery of ingresses (#3111)
* k8s: Support discovery of ingresses

* Move additional labels below allocation

This makes it more obvious why the additional elements are allocated.
Also fix allocation for node where we only set a single label.

* k8s: Remove port from ingress discovery

* k8s: Add comment to ingress discovery example
2017-09-04 13:10:44 +02:00
CuiHaozhi
b1c18bf29b discovery openstack: support discovery hosts, add rule option.
Signed-off-by: CuiHaozhi <cuihz@wise2c.com>
2017-08-29 10:14:00 -04:00
Max Leonard Inden
1c96fbb992
Expose current Prometheus config via /status/config
This PR adds the `/status/config` endpoint which exposes the currently
loaded Prometheus config. This is the same config that is displayed on
`/config` in the UI in YAML format. The response payload looks like
such:
```
{
  "status": "success",
  "data": {
    "yaml": <CONFIG>
  }
}
```
2017-08-13 22:21:18 +02:00
Fabian Reinartz
25f3e1c424 Merge branch 'master' into mergemaster 2017-08-10 17:04:25 +02:00
Yuki Ito
1bf3b91ae0 Make sure that url for remote_read/write is not nil (#3024) 2017-08-07 08:49:45 +01:00
Tom Wilkie
5169f990f9 Review feedback: add yaml struct tags, don't embed queue config.
Also, rename QueueManageConfig to QueueConfig, for consistency with tags.
2017-08-01 14:43:56 +01:00
Tom Wilkie
454b661145 Make queue manager configurable. 2017-07-25 13:47:34 +01:00
Fabian Reinartz
dba7586671 Merge branch 'master' into dev-2.0 2017-07-11 17:22:14 +02:00
Fuente, Pablo Andres
9eb8c6e1d2 Renaming the config_notwin test to config_default 2017-07-10 11:08:16 -03:00
Fuente, Pablo Andres
fe73de9452 Renaming config test file to fix build tags
Renaming the name of a file of the config tests, in order to properly
use the Go build tags feature.
2017-07-10 00:02:08 -03:00
Fuente, Pablo Andres
193dc47230 Fixing code style to adhere gofmt 2017-07-09 02:43:33 -03:00
Fuente, Pablo Andres
902fafb8e7 Fixing tests for Windows
Fixing the config/config_test, the discovery/file/file_test and the
promql/promql_test tests for Windows. For most of the tests, the fix involved
correct handling of path separators. In the case of the promql tests, the
issue was related to the removal of the temporal directories used by the
storage. The issue is that the RemoveAll() call returns an error when it
tries to remove a directory which is not empty, which seems to be true due to
some kind of process that is still running after closing the storage. To fix
it I added some retries to the remove of the temporal directories.
Adding tags file from Universal Ctags to .gitignore
2017-07-09 01:59:30 -03:00
Goutham Veeramachaneni
98d20d5880 Make sure rendering config produces valid config
Fixes #2899

Signed-off-by: Goutham Veeramachaneni <goutham@boomerangcommerce.com>
2017-07-05 16:09:29 +02:00
Fabian Reinartz
65b087bcc1 config: resolve file SD paths relative to config 2017-07-04 11:40:26 +02:00
Fabian Reinartz
9ba61df45a Merge pull request #2789 from mtanda/aws_default_region
config: set default region for EC2 SD
2017-06-12 16:15:53 +02:00
Mitsuhiro Tanda
64cef5cd05 handle NewSession() error 2017-06-06 22:02:50 +09:00
Christian Groschupp
8f781e411c Openstack Service Discovery (#2701)
* Add openstack service discovery.

* Add gophercloud code for openstack service discovery.

* first changes for juliusv comments.

* add gophercloud code for floatingip.

* Add tests to openstack sd.

* Add testify suite vendor files.

* add copyright and make changes for code climate.

* Fixed typos in provider openstack.

* Renamed tenant to project in openstack sd.

* Change type of password to Secret in openstack sd.
2017-06-01 23:49:02 +02:00
Roman Vynar
dbe2eb2afc Hide consul token on UI. (#2797) 2017-06-01 22:14:23 +01:00
Mitsuhiro Tanda
a1ddab463e config: set default region for EC2 SD 2017-06-02 01:40:24 +09:00
Tobias Schmidt
287ec6e6cc Fix outdated target_group naming in error message
The target_groups config has been renamed to static_configs, the error
message for overflow attributes should reflect that.
2017-05-31 11:01:13 +02:00
Julius Volz
240bb671e2 config: Fix overflow checking in global config (#2783) 2017-05-30 20:58:06 +02:00
Conor Broderick
6766123f93 Replace regex with Secret type and remarshal config to hide secrets (#2775) 2017-05-29 12:46:23 +01:00
Brian Akins
27d66628a1 Allow limiting Kubernetes service discover to certain namespaces
Allow namespace discovery to be more easily extended in the future by using a struct rather than just a list.

Rename fields for kubernetes namespace discovery
2017-04-27 07:41:36 -04:00
Julius Volz
eb14678a25 Make remote read/write use config.HTTPClientConfig 2017-03-20 13:37:50 +01:00
Julius Volz
02395a224d [WIP] Remote Read 2017-03-20 13:13:44 +01:00
Julius Volz
525da88c35 Merge pull request #2479 from YKlausz/consul-tls
Adding consul capability to connect via tls
2017-03-20 11:40:18 +01:00
Goutham Veeramachaneni
5c89cec65c Stricter Relabel Config Checking for Labeldrop/keep (#2510)
* Minor code cleanup

* Labeldrop/Labelkeep Now *Only* Support Regex

Ref promtheus/prometheus#2368
2017-03-18 22:32:08 +01:00
yklausz
75880b594f Adding consul capability to connect via tls 2017-03-17 22:37:18 +01:00
Michael Kraus
04eadf6e20 Allow Marathon SD without bearer_token and bearer_token_file 2017-03-02 13:17:19 +01:00
Michael Kraus
47bdcf0f67 Allow the use of bearer_token or bearer_token_file for MarathonSD 2017-03-02 09:44:20 +01:00
Julius Volz
e9476b35d5 Re-add multiple remote writers
Each remote write endpoint gets its own set of relabeling rules.

This is based on the (yet-to-be-merged)
https://github.com/prometheus/prometheus/pull/2419, which removes legacy
remote write implementations.
2017-02-20 13:23:12 +01:00
Alex Somesan
b22eb65d0f Cleaner separation between ServiceAccount and custom authentication in K8S SD (#2348)
* Canonical usage of cluster service-account in K8S SD

* Early validation for opt-in custom auth in K8S SD

* Fix typo in condition
2017-01-19 10:52:52 +01:00
Fabian Reinartz
7eb849e6a8 Merge pull request #2307 from joyent/triton_discovery
Add Joyent Triton discovery
2017-01-18 05:08:11 +01:00
Richard Kiene
f3d9692d09 Add Joyent Triton discovery 2017-01-17 20:34:32 +00:00
Björn Rabenstein
ad40d0abbc Merge pull request #2288 from prometheus/limit-scrape
Add ability to limit scrape samples, and related metrics
2017-01-08 01:34:06 +01:00
Brian Brazil
30448286c7 Add sample_limit to scrape config.
This imposes a hard limit on the number of samples ingested from the
target. This is counted after metric relabelling, to allow dropping of
problemtic metrics.

This is intended as a very blunt tool to prevent overload due to
misbehaving targets that suddenly jump in sample count (e.g. adding
a label containing email addresses).

Add metric to track how often this happens.

Fixes #2137
2016-12-16 15:10:09 +00:00
Tristan Colgate-McFarlane
4d9134e6d8 Add labeldrop and labelkeep actions. (#2279)
Introduce two new relabel actions. labeldrop, and labelkeep.
These can be used to filter the set of labels by matching regex

- labeldrop: drops all labels that match the regex
- labelkeep: drops all labels that do not match the regex
2016-12-14 10:17:42 +00:00
Fabian Reinartz
cc35104504 config: fix naming and typo 2016-11-25 11:04:33 +01:00
Fabian Reinartz
3fb4d1191b config: rename AlertingConfig, resolve file paths 2016-11-24 15:19:37 +01:00
Fabian Reinartz
183c5749b9 config: add Alertmanager configuration 2016-11-23 18:23:37 +01:00
Fabian Reinartz
200bbe1bad config: extract SD and HTTPClient configurations 2016-11-23 18:23:37 +01:00
Fabian Reinartz
ec66082749 Merge branch 'ec2_sd_profile_support' of https://github.com/Ticketmaster/prometheus into Ticketmaster-ec2_sd_profile_support 2016-11-21 11:49:23 +01:00
Kraig Amador
bec6870ed4 ec2_sd_configs: Support profiles for configuring the ec2 service 2016-11-03 08:38:02 -07:00
beorn7
b2f28a9e82 Merge branch 'release-1.2' 2016-11-03 14:42:15 +01:00
Brian Brazil
d1ece12c70 Handle null Regex in the config as the empty regex. (#2150) 2016-11-03 13:34:15 +00:00
bekbulatov
2bc12fa2fb Set timeout for marathon_sd 2016-10-24 11:27:08 +01:00
bekbulatov
c689b35858 Merge branch 'master' into marathon_tls 2016-10-24 10:37:32 +01:00
Matti Savolainen
aabf4a419b use LabelNam.IsValid() instead of LabelNameRE and MatchString instead of Match 2016-10-19 16:30:52 +03:00
Matti Savolainen
ec6524ce74 test the labelTarget regex to make sure it properly validates pre-interpolated label names. 2016-10-19 13:32:42 +03:00
Matti Savolainen
f867c1fd58 formating and text fixes, adjust regexp 2016-10-19 13:31:55 +03:00
Matti Savolainen
56907ba6e3 Add interpolation to good test config. Fix regex 2016-10-19 01:19:19 +03:00
Matti Savolainen
7a36af1c85 add comment about interpolation 2016-10-19 00:42:49 +03:00
Matti Savolainen
3b8e7c1277 Merge branch 'master' of https://github.com/prometheus/prometheus into bug/target_label_unmarshal 2016-10-19 00:33:52 +03:00
Matti Savolainen
5a1e909b5d Make TargetLabel in RelabelConfig a string 2016-10-19 00:33:22 +03:00
Björn Rabenstein
d93f73874f Merge pull request #2093 from dominikschulz/spelling
Trivial spelling corrections
2016-10-18 22:46:03 +02:00
Dominik Schulz
182e17958a Trivial spelling corrections and a small comment. 2016-10-18 20:14:38 +02:00
bekbulatov
ac702f66eb Resolve merge conflicts 2016-10-18 14:14:24 +01:00
Fabian Reinartz
1b6dfa32a9 config: rename role 'endpoint' to 'endpoints' 2016-10-17 11:53:49 +02:00
Frederic Branczyk
2e18c81a00 config: adapt unit tests 2016-10-17 10:32:10 +02:00
Fabian Reinartz
b24602f713 kubernetes: merge back into single configuration 2016-10-17 10:32:10 +02:00
Fabian Reinartz
2331701b50 kubernetes: Add K8S v2 pod discovery
This adds plumbing for a parallel version of the new K8S SD
and adds pod discovery as the first role.
2016-10-17 10:32:10 +02:00
Dominik Schulz
72cbf8af6f Fix small copy and paste error 2016-10-08 08:49:00 +02:00
bekbulatov
01b53c1180 Add tls support 2016-10-07 13:40:22 +01:00
Brian Brazil
77605649a9 Add support for remote write relabelling.
Switch back to a single remote writer, as we were only ever meant to
have one and the relabel semantics are clearer that way.
2016-10-05 07:43:19 +01:00
Tom Wilkie
4520e12440 Add HTTP Basic Auth & TLS support to the generic write path. (#1957)
* Add config, HTTP Basic Auth and TLS support to the generic write path.

- Move generic write path configuration to the config file
- Factor out config.TLSConfig -> tlf.Config translation
- Support TLSConfig for generic remote storage
- Rename Run to Start, and make it non-blocking.
- Dedupe code in httputil for TLS config.
- Make remote queue metrics global.
2016-09-19 22:47:51 +02:00
Tobias Schmidt
874cb44bb6 Merge pull request #1996 from ton31337/Fix/allow_numbers_as_first_letter
Allow number to be the first letter as well for `job_name`
2016-09-16 11:08:52 -04:00
Donatas Abraitis
1aa8898b66 Allow number to be the first letter as well for job_name 2016-09-16 14:06:47 +03:00
Ingo Gottwald
3b546d061f Add support for GCE discovery 2016-09-16 08:55:33 +02:00
Alexey Miroshkin
e29d9394e5 Forbid invalid relabel configurations
This fix adds check if target_label value is set in case if action is replace or
hashmod
Issue [#1900]
2016-08-29 16:56:06 +02:00
Fabian Reinartz
be596f82b4 Merge pull request #1783 from knyar/json
Allow URLs in targets defined via a JSON file
2016-08-10 09:42:17 +02:00
Frederic Branczyk
b655aa002f introduce top level alerting config node 2016-08-09 14:19:25 +02:00
Frederic Branczyk
679d225c8d allow relabeling of alerts
in case of dropping don't even enqueue them
2016-08-09 14:18:31 +02:00
Fabian Reinartz
7a0b3af0b7 config: validate Kubernetes role correctly. 2016-07-18 22:24:41 +09:00
Fabian Reinartz
919558f601 config: remove deprecated target_groups configuration 2016-07-14 09:55:00 +09:00
Fabian Reinartz
7221228843 discovery/kubernetes: select between discovery role
This adds `role` field to the Kubernetes SD config, which indicates
which type of Kubernetes SD should be run.
This no longer allows discovering pods and nodes with the same SD
configuration for example.
2016-07-05 14:22:12 +02:00
Anton Tolchanov
772a3af38f Allow URLs in targets defined via a JSON file
This enables defining `blackbox_exporter` targets (which can be URLs,
because of relabeling) in a JSON file.

Not sure if this is the best approach, but current behaviour is
inconsistent (`UnmarshalYAML` does not have this check) and breaks
officially documented way to use `blackbox_exporter`.
2016-07-04 00:05:57 +03:00
Ali Reza
98c156c361 reorder config validation, move checkOverflow before any other validation 2016-06-13 10:02:20 +07:00
Fabian Reinartz
0f21bd31ca config: deprecate target_groups for static_configs
This change deprecates the `target_groups` option in favor
of `static_configs`. The old configuration is still accepted
but prints a warning.
Configuration loading errors if both options are set.
2016-06-08 15:55:25 +02:00
Jimmi Dyson
206bcfcdaa
Kubernetes SD: Remove kubeletPort config option 2016-06-07 12:34:55 +01:00
Ali Reza
c81b4e8a87 change config names to files for consistency 2016-05-30 07:47:58 +07:00
Gregory G. Tseng
7997c14b0d Add ServerName into TLS Config 2016-05-26 14:24:49 -07:00
Seth Miller
0988e3b937 Add support for Azure discovery
This change adds the ability to do target discovery with Microsoft's Azure platform.
2016-04-06 22:47:02 -05:00
Fabian Reinartz
37c709f917 Fix global config YAML issues 2016-02-15 14:08:25 +01:00
Fabian Reinartz
44a5e860ed Fix scrape timeout config checks 2016-02-15 12:07:46 +01:00
Julius Volz
829a029dda Update two more __meta_dns_srv_name references.
Although they are only in examples/tests and don't affect anything, they
could be confusing (the label has been renamed in the rest of the code a
while ago).
2016-02-14 22:20:39 +01:00
Fabian Reinartz
e26e4b6e89 Restrict scrape timeout to interval length 2016-02-12 12:52:22 +01:00
beorn7
a7408bfb47 Unify duration parsing
It's actually happening in several places (and for flags, we use the
standard Go time.Duration...). This at least reduces all our
home-grown parsing to one place (in model).
2016-01-29 15:41:50 +01:00
Julien Dehee
061fe2f364 Support AirBnB's Smartstack Nerve client for SD
nerve's registration format differs from serverset. With this commit
there is now a dedicated treecache file in util,
and two separate files for serverset and nerve.

Reference:
https://github.com/airbnb/nerve
2016-01-18 14:07:28 +01:00
Fabian Reinartz
4d1c9296d5 Add new defaults for relabel configurations 2015-11-16 13:16:13 +01:00
Brian Brazil
fd2bd81cd8 Allow all instance labels in target groups
With the blackbox exporter, the instance label will commonly
be used for things other than hostnames so remove this restriction.
https://example.com or https://example.com/probe/me are some examples.

To prevent user error, check that urls aren't provided as targets
when there's no relabelling that could potentically fix them.
2015-11-07 14:35:20 +00:00
Fabian Reinartz
f2a8261cdb Merge pull request #1177 from fabric8io/kubernetes-discovery
Kubernetes SD authentication options cleanup
2015-10-24 20:32:25 +02:00
Fabian Reinartz
180da1ba65 Add overflow check in TLS config 2015-10-24 17:12:34 +02:00
Jimmi Dyson
87940ec213 Kubernetes SD: Rename masters to api_servers in config 2015-10-24 14:41:14 +01:00
Jimmi Dyson
7ff5cc66ea Kubernetes SD authentication options cleanup 2015-10-23 16:47:52 +01:00
Brian Brazil
1ddf75240d config: Don't hide username, it's not secret.
Usernames are not generally considered to be secrets,
and treating them as secrets may lead to confusion
as to how secure they are. Obscuring them also makes
debugging harder.
2015-10-08 15:13:21 +01:00
Matt Jibson
dcb4856d72 Add SD for Amazon EC2 instances 2015-10-06 18:36:17 -04:00
Julius Volz
dac26cef71 Rename global "labels" config option to "external_labels". 2015-09-29 20:54:20 +02:00
Matt Jibson
0e99fa6c46 Allow labelmap action 2015-09-21 15:41:19 -04:00
Jimmi Dyson
ec04ba38a2 Kubernetes SD config check 2015-09-09 13:24:44 +01:00
Jimmi Dyson
a1574aa2b3 Move TLS options to scrape config
Fixes #1013, fixes #989
2015-09-09 09:52:21 +01:00
Fabian Reinartz
1ef5ed0cf2 Merge pull request #1048 from xperimental/fix/marathon-config
Fix parsing Marathon SD config
2015-09-06 20:09:46 +02:00
Robert Jacob
eb7416e71f Fix missing unmarshal for Marathon SD config. 2015-09-06 20:02:22 +02:00
Jimmi Dyson
d7a7fd4589 Kubernetes SD improvements
* Support multiple masters with retries against each master as required.
* Scrape masters' metrics.
* Add role meta label for node/service/master to make it easier for relabeling.
2015-09-04 11:31:20 +01:00
Julius Volz
f63a899744 Change config regexes to full-string matches.
This anchors all regular expressions entered via the config to match a
full string vs. a substring.

THIS IS A BREAKING CHANGE!

Fixes part of https://github.com/prometheus/prometheus/issues/996
2015-09-01 15:46:41 +02:00
Julius Volz
995d3b831d Fix most golint warnings.
This is with `golint -min_confidence=0.5`.

I left several lint warnings untouched because they were either
incorrect or I felt it was better not to change them at the moment.
2015-08-26 12:44:46 +02:00
Fabian Reinartz
438e232c9b Fix grouping of import blocks 2015-08-22 09:42:45 +02:00
Fabian Reinartz
306e8468a0 Switch from client_golang/model to common/model 2015-08-21 13:33:38 +02:00
Fabian Reinartz
ac0be60bb9 Add license headers 2015-08-20 13:03:56 +02:00
Fabian Reinartz
139f27bf8a Increase default retry interval for file SD
The automatic refresh is a safety mechanism in case
file watches fail. As they seem to be working well the
interval can be increased.
2015-08-16 15:06:26 +02:00
Fabian Reinartz
3c6dd161d7 Scrape all services on empty services list. 2015-08-14 17:39:41 +02:00