Commit graph

870 commits

Author SHA1 Message Date
Matthieu MOREL
cef219c31c chore: enable unused-receiver rule from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-08-04 09:43:33 +00:00
Joe Adams
cdfb67467f
Review feedback
Signed-off-by: Joe Adams <github@joeadams.io>
2025-07-31 21:22:43 -04:00
Joe Adams
56a3bbf5c5
Fix import formatting
Signed-off-by: Joe Adams <github@joeadams.io>
2025-07-29 23:17:23 -04:00
Joe Adams
eab9b696f2
Upgrade AWS SDK to v2
AWS SDK v1 is end of life soon, so migrate to the V2 SDK. The credential loading should work more consistently with other projects that use the SDK and load credentials from the appropriate locations including from environment variables. This affects the EC2 and Lightsail service discovery features.

Signed-off-by: Joe Adams <github@joeadams.io>
2025-07-29 23:06:05 -04:00
Ayoub Mrini
9dc274687b
Merge pull request #16831 from machine424/nsmeta
feat(discovery/kubernetes): allow attaching namespace metadata
2025-07-17 10:30:27 +01:00
machine424
a9f6fdd910
feat(discovery/kubernetes): allow attaching namespace metadata
to ingress and service roles.

with the help of claude-4-sonnet

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-07-17 09:53:16 +02:00
Yandi Lee
8eb445b8a4
Discovery.Manager: close sync ch after sender() is stopped (#14465)
* close sync ch after sender() is stopped
* break if chan is closed

Signed-off-by: liyandi <littlepangdi@163.com>
Co-authored-by: liyandi <liyandi@xiaomi.com>
2025-07-11 17:15:01 +01:00
machine424
020e803ee0 chore(discovery): remove unused StaticProvider struct, library users can easily define it on their side
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-07-09 17:10:13 +01:00
chenlujjj
a2735494e1
chore: complete error message in RegisterSDMetrics function (#14635)
Signed-off-by: chenlujjj <953546398@qq.com>
2025-07-08 12:05:24 +00:00
machine424
c2d6e528e4
feat(discovery/kubernetes): allow attaching namespace metadata
to endpointslice, endpoints and pod roles

after injecting the labels for endpointslice, claude-4-sonnet
helped transpose the code and tests to endpoints and pod roles

fixes https://github.com/prometheus/prometheus/issues/9510
supersedes https://github.com/prometheus/prometheus/pull/13798

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
Co-authored-by: Paul BARRIE <paul.barrie.calmels@gmail.com>
2025-07-03 19:41:08 +02:00
Lukasz Mierzwa
b49d143595 Fix a race in discovery manager ApplyConfig & shutdown
If we call ApplyConfig() at the same time the manager is being stopped we might end up hanging forever.
This is because ApplyConfig() will try to cancel obsolete providers and wait until they are cancelled.
It's done by setting a done() function that call Done() on a sync.WaitGroup:

```
if len(prov.newSubs) == 0 {
	wg.Add(1)
	prov.done = func() {
		wg.Done()
	}
}
```

then calling prov.cancel() and finally waiting until all providers run done() function
that by blocking it all on a wg.Wait() call.

For each provider there is a goroutine created by calling Manager.startProvider(*Provider):

```
func (m *Manager) startProvider(ctx context.Context, p *Provider) {
	m.logger.Debug("Starting provider", "provider", p.name, "subs", fmt.Sprintf("%v", p.subs))
	ctx, cancel := context.WithCancel(ctx)
	updates := make(chan []*targetgroup.Group)

	p.mu.Lock()
	p.cancel = cancel
	p.mu.Unlock()

	go p.d.Run(ctx, updates)
	go m.updater(ctx, p, updates)
}
```

It creates a context that can be cancelled and that cancel function becomes prov.cancel. This is what ApplyConfig will call.
If we look at the body of updater() method:

```
func (m *Manager) updater(ctx context.Context, p *Provider, updates chan []*targetgroup.Group) {
	// Ensure targets from this provider are cleaned up.
	defer m.cleaner(p)
	for {
		select {
		case <-ctx.Done():
			return
[...]
```

we can see that it will exit if that context is cancelled and that will trigger a call to Manager.cleaner().
That cleaner() is where done() is called.
So ApplyConfig() -> calls cancel() -> causes cleaner() to be executed -> calls done().

cancel() is also called from cancelDiscoverers() method that will be called by Manager.Run() when Manager is stopping:

```
func (m *Manager) Run() error {
	go m.sender()
	<-m.ctx.Done()
	m.cancelDiscoverers()
	return m.ctx.Err()
}
```

The problem is that if we call both ApplyConfig and stop the manager at the same time we might end up with:

- We call Manager.ApplyConfig()
- We stop the Manager
- Manager.cancelDiscoverers() is called
- Provider.cancel() is called for every Provider
- cancel() causes provider context to be cancelled which terminates updater() for given Provider
- cancelling context causes cleaner() method to be called for given Provider
- cleaner() calls done() and exits
- Provider is considered stopped at this point, there is no goroutine running that will call done() anymore
- ApplyConfig iterates providers and decides that one is obsolete is must be stopped
- It sets a custom done() function body with a WaitGroup.Done() call in it
- Then ApplyConfig waits until all Providers run done()
- But they are all stopped and no done() will be run
- We wait forever

This only happens if cancelDiscoverers() is run before ApplyConfig, if ApplyConfig runs first done() will be called,
if cancelDiscoverers() is called first it will stop updater() instances and so done() won't be called anymore.

Part of the problem is that there is no distinction between running and stopped providers. There is Provider.IsStarted() method
that returns a bool based on the value of cancel function but ApplyConfig doesn't check it.
Second problem is that although there is a mutex on a Provider it's used much in the code, so two goroutines can try to read and/or write
provider.cancel and/or provider.done at the same time, making it all more likely to race.

The easiest way to fix it is to check if the provider is started inside ApplyConfig so we don't try to stop a provider that's already stopped.
For that we need to mark it as stopped after cancel() is called, by setting cancel to nil.
This also needs better lock usage to avoid different parts of the code trying to set cancel and done at the same time.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-02 16:03:10 +01:00
Lukasz Mierzwa
357e652044 Add a test for a rare shutdown hang
When doing a config reload that need to stop some providers while also sending SIGTERM to Prometheus at the same time can sometimes hang

1: sync.WaitGroup.Wait [83 minutes] [Created by run.(*Group).Run in goroutine 1 @ group.go:37]
    sync         sema.go:110              runtime_SemacquireWaitGroup(*uint32(#166))
    sync         waitgroup.go:118         (*WaitGroup).Wait(*WaitGroup(#23))
    discovery    manager.go:276           (*Manager).ApplyConfig(#23, #167)
    main         main.go:964              main.func5(#120)
    main         main.go:1505             reloadConfig({#183, 0x1b}, 1, #40, #43, #50, {#31, 0xa, 0})
    main         main.go:1182             main.func22()
    run          group.go:38              (*Group).Run.func1(*Group(#26), #51)

Add a test for it.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-07-02 16:01:42 +01:00
Bryan Boreham
d6f9ba6310 [BUILD] Docker SD: Fix up deprecated types
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-06-23 16:15:58 +01:00
Jan-Otto Kröpke
ceaa3bd6f9
discovery: add STACKIT SD (#16401) 2025-06-17 15:41:14 +02:00
Ayoub Mrini
50ba25f273
chore(docs/kubernetes SD): add a note about Endpoints API being deprecated in kubernetes 1.33+ (#16684)
* chore(docs/kubernetes SD): add a note about Endpoints API being deprecated in kubernetes 1.33+

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

* chore(discovery/kubernetes): add Endpoints API deprecation comment

Signed-off-by: machine424 <ayoubmrini424@gmail.com>

---------

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-06-06 11:56:27 +02:00
Zhengke Zhou
45211dc72f
chore: Adjust test and add comment about DNS resolution issue for failing tests (#16200)
* chore: Add comment about DNS resolution issue for failing tests

Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>

* remove unexported-return

Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>

---------

Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>
2025-05-27 14:40:09 +02:00
Ryan Wu
091e662f4d
refactor(endpointslice): use service cache.Indexer to achieve better iteration performance (#16365)
* refactor(endpointslice): use cache.Indexer to index endpointslices by LabelServiceName so not have to iterate over all endpoint objects.

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* check the type and error early and add 'TestEndpointSliceDiscoveryWithUnrelatedServiceUpdate' unit test to give a regression test

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* make service indexer namespaced

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* remove unneeded test func

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Apply suggestions from code review

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

---------

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-05-20 20:33:25 +02:00
Ayoub Mrini
eb8d34c2ad
Merge pull request #16587 from prymitive/discoveryLocks
discovery: Try fixing potential deadlocks in discovery
2025-05-19 11:09:49 +02:00
Ben Kochie
1eaf12e99b
Add golangci-lint fmt (#16602)
With golangci-lint v2, it now has "formatters" that can be configured.
Add `golangci-lint fmt` to the `make format` in Makefile.common.
* Enable goimports formatter.

Signed-off-by: SuperQ <superq@gmail.com>
2025-05-16 11:05:35 +02:00
Lukasz Mierzwa
59761f631b Move m.targetsMtx.Lock down into the loop
Make sure the order of locks is always the same in all functions. In ApplyConfig() we have m.targetsMtx.Lock() after provider is locked, so replicate the same in allGroups().

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-05-15 12:30:48 +01:00
Lukasz Mierzwa
7d55ee8cc8 Try fixing potential deadlocks in discovery
Manager.ApplyConfig() uses multiple locks:
- Provider.mu
- Manager.targetsMtx

Manager.cleaner() uses the same locks but in the opposite order:
- First it locks Manager.targetsMtx
- The it locks Provider.mu

I've seen a few strange cases of Prometheus hanging up on shutdown and never compliting that shutdown.
From a few traces I was given it appears that while Prometheus is still running only discovery.Manager and notifier.Manager are running running.
From that trace it also seems like they are stuck on a lock from two functions:
- cleaner waits on a RLock()
- ApplyConfig waits on a Lock()

I cannot reproduce it but I suspect this is a race between locks. Imagine this scenario:
- Manager.ApplyConfig() is called
- Manager.ApplyConfig locks Provider.mu.Lock()
- at the same time cleaner() is called on the same Provider instance and it calls Manager.targetsMtx.Lock()
- Manager.ApplyConfig() now calls Manager.targetsMtx.Lock() but that lock is already held by cleaner() function so ApplyConfig() hangs there
- at the same time cleaner() now wants to lock Provider.mu.Rlock() but that lock is already held by Manager.ApplyConfig()
- we end up with both functions locking each other out without any way to break that lock

Re-order lock calls to try to avoid this scenario.
I tried writing a test case for it but couldn't hit this issue.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-05-12 09:13:46 +01:00
hardlydearly
ba4b058b7a refactor: use slices.Contains to simplify code
Signed-off-by: hardlydearly <799511800@qq.com>
2025-05-09 08:27:10 +02:00
Arve Knudsen
e7e3ab2824
Fix linting issues found by golangci-lint v2.0.2 (#16368)
* Fix linting issues found by golangci-lint v2.0.2

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-05-03 19:05:13 +02:00
Jonas Lammler
08982b177f
Add label_selector to hetzner service discovery
Allows to filter the servers when sending the listing request to the API. This feature is only available when using the `role=hcloud`.

See https://docs.hetzner.cloud/#label-selector for details on how to use the label selector.

Signed-off-by: Jonas Lammler <jonas.lammler@hetzner-cloud.de>
2025-04-30 09:24:14 +02:00
Ryan Wu
b4d3c06acb
discovery: make endpointSlice discovery more efficient (#16433)
* discovery: a change to a service with the same name but from another namespace won't enqueue the endpointSlice

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Update discovery/kubernetes/endpointslice.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Update endpointslice.go

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

---------

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-04-16 16:43:30 +02:00
Zhengke Zhou
c884dd16ac
discovery: Remove ingress & endpoint slice adaptors (#16413)
* Remove ingress & endpoint slice adaptors
* fix ci

Signed-off-by: zhengkezhou1 <madzhou1@gmail.com>
2025-04-09 10:25:53 +01:00
Ryan Wu
7d73c1d3f8
refactor[discovery, tsdb]: simplify error handling and remove redundant checks (#16328)
* refactor: simplify error handling and remove redundant checks

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Add the comment for return of reloading blocks failure

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

* Add the comment for return of reloading blocks failure

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>

---------

Signed-off-by: Ryan Wu <rongjun0821@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-03-27 12:20:59 +01:00
Matthieu MOREL
5fa1146e21
chore: enable gci linter (#16245)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-03-22 15:46:13 +00:00
Matthieu MOREL
6719867196
test(kubernetes): replace equality check with JSON equality assertion (#16246)
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-03-22 13:55:30 +01:00
Patryk Prus
452fd42aeb
Disable additional test as flaky on windows
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-03-18 14:06:33 -04:00
machine424
b0227d1f16 chore(discovery): disable some file update tests as flaky
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-03-14 18:33:13 +01:00
Paulo Dias
9630dc656c
discovery(openstack): remove duplicated error handling for floatingips.List (#16205)
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2025-03-12 15:25:50 +01:00
dependabot[bot]
6f9f29542e
chore(deps): bump github.com/docker/docker (#16118)
Bumps [github.com/docker/docker](https://github.com/docker/docker) from 27.5.1+incompatible to 28.0.1+incompatible.
- [Release notes](https://github.com/docker/docker/releases)
- [Commits](https://github.com/docker/docker/compare/v27.5.1...v28.0.1)

---
updated-dependencies:
- dependency-name: github.com/docker/docker
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
2025-03-07 15:40:29 +01:00
co63oc
0e4e5a71bd
Fix typos (#16076)
Signed-off-by: co63oc <co63oc@users.noreply.github.com>
2025-02-28 11:24:25 +11:00
Matthieu MOREL
c7d4b53ec1 chore: enable unused-parameter from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-19 19:50:28 +01:00
Björn Rabenstein
13c05a385c
Merge pull request #16007 from mmorel-35/revive/early-return
chore: enable early-return from revive
2025-02-11 20:31:34 +01:00
Ayoub Mrini
de6add2c7d
Merge pull request #14228 from Codelax/sd-scaleway-routed-ips
feat(scaleway-sd): add labels for multiple public IPs
2025-02-11 17:21:29 +01:00
Matthieu MOREL
b472ce7010 chore: enable early-return from revive
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-02-10 22:08:43 +01:00
Pierre Prinetti
bb30a871ac
deps: Use Gophercloud v2
Signed-off-by: Pierre Prinetti <pierreprinetti@redhat.com>
2025-01-28 15:08:34 +01:00
Jan Fajerski
ffea9f005b
Merge pull request #15539 from paulojmdias/openstack-loadbalancer-discovery
discovery(openstack): add load balancer discovery
2025-01-28 14:10:06 +01:00
Jan Fajerski
7f37a008c4
Merge pull request #15540 from mmorel-35/prometheus/common@v0.61.0
chore(deps): use `version.PrometheusUserAgent`
2025-01-28 13:10:48 +01:00
Matthieu MOREL
dd5ab743ea chore(deps): use version.PrometheusUserAgent
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2025-01-22 07:31:02 +01:00
Paulo Dias
803b1565a5
fix: fix network endpoint id
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2025-01-21 11:20:15 +00:00
Paulo Dias
1d49d11786
fix: fix testing
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2025-01-21 11:18:34 +00:00
Paulo Dias
cddf729ca3
Merge branch 'main' of github.com:prometheus/prometheus into openstack-loadbalancer-discovery
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2025-01-21 11:16:52 +00:00
Paulo Dias
e8fab32ca2
discovery: move openstack floating ips function from deprecated Compute API /os-floating-ips to Network API /floatingips (#14367) 2025-01-21 11:40:15 +01:00
crystalstall
616914abe2 Signed-off-by: crystalstall <crystalruby@qq.com>
refactor: using slices.Contains to simplify the code

Signed-off-by: crystalstall <crystalruby@qq.com>
2025-01-11 00:41:51 +08:00
Paulo Dias
36ccf62692
Merge branch 'prometheus:main' into openstack-loadbalancer-discovery 2025-01-02 14:44:19 +00:00
Paulo Dias
d40e99c2ec
Merge branch 'openstack-loadbalancer-discovery' of github.com:paulojmdias/prometheus into openstack-loadbalancer-discovery
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2025-01-02 14:43:46 +00:00
Paulo Dias
cb7254158b
feat: rename status to provisioning_status and add operating_status
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2025-01-02 14:43:31 +00:00
pinglanlu
6a61efcfc3
discovery: use a more direct and less error-prone return value (#15347)
Signed-off-by: pinglanlu <pinglanlu@outlook.com>
2024-12-29 18:03:06 +01:00
Paulo Dias
a5c20713dc
Merge branch 'prometheus:main' into openstack-loadbalancer-discovery 2024-12-08 22:54:18 +00:00
Paulo Dias
713903fe48
fix: fix configuration and remove uneeded libs
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2024-12-06 17:58:21 +00:00
Ayoub Mrini
af2a1cb10c
Merge pull request #15227 from aniketnk/i15185_1
Run discovery/kubernetes tests in parallel
2024-12-05 10:48:26 +01:00
Paulo Dias
d136e43109
fix: fix comment
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2024-12-04 23:48:31 +00:00
Paulo Dias
9e9929c421
fix: remove new line
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2024-12-04 23:46:11 +00:00
Paulo Dias
fc0141aec2
discovery: add openstack load balancer discovery
Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
2024-12-04 23:34:29 +00:00
machine424
c9f3d9b47f
doc(nomad): adjust sections about nomad_sd_config's server
test(nomad): extend TestConfiguredService with more valid/invalid servers configs

fixes https://github.com/prometheus/prometheus/issues/12306

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-12-03 19:41:45 +01:00
hongmengning
2a1b940ae4 discovery: fix some function names in comment
Signed-off-by: hongmengning <go@before.tech>
2024-11-25 17:33:04 +08:00
Aniket Kaulavkar
f7685caf0d Parallelize discovery/kubernetes tests using t.Parallel()
Signed-off-by: Aniket Kaulavkar <aniket.kaulavkar@gmail.com>
2024-11-14 10:44:03 +05:30
Matthieu MOREL
af1a19fc78 enable errorf rule from perfsprint linter
Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
2024-11-06 16:50:36 +01:00
Giedrius Statkevičius
58fedb6b61 discovery/kubernetes: optimize more gets
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-28 17:17:37 +02:00
Giedrius Statkevičius
716fd5b11f discovery/kubernetes: use namespacedName
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-28 16:19:56 +02:00
Giedrius Statkevičius
e452308e37 discovery/kubernetes: optimize resolvePodRef
resolvePodRef is in a hot path:

```
ROUTINE ======================== github.com/prometheus/prometheus/discovery/kubernetes.(*Endpoints).resolvePodRef in discovery/kubernetes/endpoints.go
    2.50TB     2.66TB (flat, cum) 22.28% of Total
         .          .    447:func (e *Endpoints) resolvePodRef(ref *apiv1.ObjectReference) *apiv1.Pod {
         .          .    448:   if ref == nil || ref.Kind != "Pod" {
         .          .    449:           return nil
         .          .    450:   }
    2.50TB     2.50TB    451:   p := &apiv1.Pod{}
         .          .    452:   p.Namespace = ref.Namespace
         .          .    453:   p.Name = ref.Name
         .          .    454:
         .   156.31GB    455:   obj, exists, err := e.podStore.Get(p)
         .          .    456:   if err != nil {
         .          .    457:           level.Error(e.logger).Log("msg", "resolving pod ref failed", "err", err)
         .          .    458:           return nil
         .          .    459:   }
         .          .    460:   if !exists {
```

This is some low hanging fruit that we can easily optimize. The key of
an object has format "namespace/name" so generate that inside of
Prometheus itself and use pooling.

```
goos: linux
goarch: amd64
pkg: github.com/prometheus/prometheus/discovery/kubernetes
cpu: Intel(R) Core(TM) i9-10885H CPU @ 2.40GHz
                 │   olddisc    │               newdisc               │
                 │    sec/op    │   sec/op     vs base                │
ResolvePodRef-16   516.3n ± 17%   289.5n ± 7%  -43.92% (p=0.000 n=10)

                 │   olddisc    │              newdisc               │
                 │     B/op     │    B/op     vs base                │
ResolvePodRef-16   1168.00 ± 0%   24.00 ± 0%  -97.95% (p=0.000 n=10)

                 │  olddisc   │            newdisc             │
                 │ allocs/op  │ allocs/op   vs base            │
ResolvePodRef-16   2.000 ± 0%   2.000 ± 0%  ~ (p=1.000 n=10) ¹
¹ all samples are equal
```

Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
2024-10-28 12:12:40 +02:00
3Juhwan
685d6d169f refactor: reorder fields in defaultSDConfig initialization for consistency
Signed-off-by: 3Juhwan <13selfesteem91@naver.com>
2024-10-28 10:40:49 +01:00
Ayoub Mrini
98dcd28b1a
Merge pull request #15170 from machine424/awldi
fix(discovery): Handle cache.DeletedFinalStateUnknown in node informers' DeleteFunc
2024-10-18 17:33:08 +02:00
akunszt
08a7162502
discovery: aws/ec2 unit tests (#14364)
* discovery: add aws/ec2 unit tests 

* discovery: initial skeleton for aws/ec2 unit tests

This is a - very likely - not too useful unit test for the AWS SD. It is
commited so other people can check the basic logic and the
implementation.

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: fix linter complains about ec2_test.go

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: add basic unit test for aws

This tests only the basic labelling, not including the VPC related
information.

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: fix linter complains about ec2_test.go

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: other linter fixes in aws/ec2_test.go

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: implement remaining tests for aws/ec2

The coverage is not 100% but I think it is a good starting point if
someone wants to improve that.

Currently it covers all the AWS API calls.

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: make linter happy in aws/ec2_test.go

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: make utility funtcions private

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discover: no global variable in the aws/ec2 test

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: common body for some tests in ec2

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: try to make golangci-lint happy

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: make every non-test function private

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: test for errors first in TestRefresh

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: move refresh tests into the function

This way people can find both the test cases and the execution of the
test at the same place.

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: fix copyright date

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: remove misleading comment

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: rename test for easier identification

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: use static values for the test cases

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discover: try to make the linter happy

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: drop redundant data from ec2 and use common ptr functions

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: use Error instead of Equal

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

* discovery: merge refreshAZIDs tests into one

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>

---------

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>
2024-10-16 14:36:37 +02:00
machine424
b1c356beea
fix(discovery): Handle cache.DeletedFinalStateUnknown in node informers' DeleteFunc
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-10-16 10:20:37 +02:00
M Viswanath Sai
16bba78f15
discovery: Improve Azure test coverage to 50% (#14586)
* azure sd: separate refresh and refreshAzure
* azure sd: create a client with mocked servers for tests
* add test for refresh function

---------

Signed-off-by: mviswanathsai <mviswanath.sai.met21@itbhu.ac.in>
2024-10-13 10:24:51 +02:00
Bryan Boreham
b87b88ddc2
Merge branch 'main' into consul-catalog-filter-support
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-10-08 12:20:31 +01:00
TJ Hoplock
6ebfbd2d54 chore!: adopt log/slog, remove go-kit/log
For: #14355

This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type

Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
2024-10-07 15:58:50 -04:00
Matthieu MOREL
ab64966e9d
fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()" (#15094)
* fix: use "ErrorContains" or "EqualError" instead of "Contains(t, err.Error()" and "Equal(t, err.Error()"

---------

Signed-off-by: Matthieu MOREL <matthieu.morel35@gmail.com>
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-10-06 16:35:29 +00:00
bas smit
73997289c3 tests: update discovery tests with new labael
Previous commit added the pod_container_init label to discovery, so all
the tests need to reflect that.

Signed-off-by: bas smit <bsmit@bol.com>
2024-10-01 10:26:58 +02:00
bas smit
a10dc9298e sd k8s: support sidecar containers in endpoint discovery
Sidecar containers are a newish feature in k8s. They're implemented
similar to init containers but actually stay running and allow you to
delay startup of your application pod until the sidecar started (like
init containers always do).

This adds the ports of the sidecar container to the list of discovered
endpoint(slice), allowing you to target those containers as well.
The implementation is a copy of that of Pod discovery

fixes: #14927

Signed-off-by: bas smit <bsmit@bol.com>
2024-10-01 10:26:58 +02:00
bas smit
7a90d73fa6 sd k8s: test for sidecar container support in endpoints
This test is expected to fail, the followup will add the feature

Signed-off-by: bas smit <bsmit@bol.com>
2024-10-01 10:26:58 +02:00
machine424
b5569c4070 fix(discovery): adjust how type is retrieved in Configs' MarshalYAML/UnmarshalYAML
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-09-30 12:33:07 +02:00
machine424
97f3219157 test(discovery): add a Configs test showing that the custom unmarshalling/marshalling is broken.
This went under the radar because the utils are never called directly.

We usually marshall/unmarshal Configs as embeded in a struct using UnmarshalYAMLWithInlineConfigs/MarshalYAMLWithInlineConfigs
which bypasses Configs' custom UnmarshalYAML/MarshalYAML

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-09-30 12:33:07 +02:00
Nathan Baulch
50cd453c8f
chore: Fix typos (#14868)
* Fix typos

---------

Signed-off-by: Nathan Baulch <nathan.baulch@gmail.com>
2024-09-10 22:32:03 +02:00
machine424
d18fa62ae9
chore(discovery): enable new-service-discovery-manager by default and drop legacymanager package
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-09-05 12:46:03 +02:00
Jan Fajerski
fe4289b502 Merge branch 'main' into HEAD 2024-09-04 18:50:00 +02:00
Jan Fajerski
00315ce15e Merge branch 'main' into 3.0-main-sync-24-08-30
using -Xours

Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
2024-09-02 11:27:18 +02:00
machine424
d23d196db5 fix(discovery): prevent the manager from storing stale targetGroups
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-08-30 14:39:31 +02:00
machine424
c586c15ae6 fix(discovery): make discovery manager notify consumers of dropped targets for still defined jobs
scrape/manager_test.go: add a test to check that the manager gets notified
for targets that got dropped by discovery to reproduce: https://github.com/prometheus/prometheus/issues/12858#issuecomment-1732318102

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2024-08-28 17:39:02 +02:00
Bryan Boreham
4202be5e79 Merge branch 'release-2.54' into merge-2.54.1-into-main 2024-08-27 12:04:48 +01:00
beorn7
0f760f63dd lint: Revamp our linting rules, mostly around doc comments
Several things done here:

- Set `max-issues-per-linter` to 0 so that we actually see all linter
  warnings and not just 50 per linter. (As we also set
  `max-same-issues` to 0, I assume this was the intention from the
  beginning.)

- Stop using the golangci-lint default excludes (by setting
  `exclude-use-default: false`. Those are too generous and don't match
  our style conventions. (I have re-added some of the excludes
  explicitly in this commit. See below.)

- Re-add the `errcheck` exclusion we have used so far via the
  defaults.

- Exclude the signature requirement `govet` has for `Seek` methods
  because we use non-standard `Seek` methods a lot. (But we keep other
  requirements, while the default excludes completely disabled the
  check for common method segnatures.)

- Exclude warnings about missing doc comments on exported symbols. (We
  used to be pretty adamant about doc comments, but stopped that at
  some point in the past. By now, we have about 500 missing doc
  comments. We may consider reintroducing this check, but that's
  outside of the scope of this commit. The default excludes of
  golangci-lint essentially ignore doc comments completely.)

- By stop using the default excludes, we now get warnings back on
  malformed doc comments. That's the most impactful change in this
  commit. It does not enforce doc comments (again), but _if_ there is
  a doc comment, it has to have the recommended form. (Most of the
  changes in this commit are fixing this form.)

- Improve wording/spelling of some comments in .golangci.yml, and
  remove an outdated comment.

- Leave `package-comments` inactive, but add a TODO asking if we
  should change that.

- Add a new sub-linter `comment-spacings` (and fix corresponding
  comments), which avoids missing spaces after the leading `//`.

Signed-off-by: beorn7 <beorn@grafana.com>
2024-08-22 17:36:11 +02:00
Jan Fajerski
5138922b0d Merge branch 'main' into 3.0-main-sync-24-08-21 2024-08-21 09:09:36 +02:00
ouyang1204@gmail.com
89dee48cc8 fix the issue of failing to match the first network when the container is reconnected to a new network
Signed-off-by: ouyang1204@gmail.com <ouyang1204@gmail.com>
2024-08-19 21:26:25 +08:00
Arve Knudsen
3a78e76282 Upgrade golangci-lint to v1.60.1
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-08-18 12:13:25 +02:00
Julien Pivotto
7711cd5ab5 Remove deprecated storage.tsdb.retention flag
Signed-off-by: Julien Pivotto <roidelapluie@o11y.eu>
Signed-off-by: Julien <roidelapluie@o11y.eu>
2024-08-16 13:53:09 +02:00
cuiweiyuan
1800af54f0 chore: fix some function names
Signed-off-by: cuiweiyuan <cuiweiyuan@aliyun.com>
2024-08-15 13:57:21 +08:00
Björn Rabenstein
3f16a2e7de
Merge pull request #14543 from jan--f/3.0-main-sync-24-08-01
3.0 main sync 24 08 01
2024-08-13 15:54:13 +02:00
Julien
3933cba052
Merge pull request #14365 from simonpasquier/fix-12884
discovery(k8s): remove support for API versions no longer served
2024-08-09 12:48:54 +02:00
Bryan Boreham
79a0ba9d64
Merge pull request #13503 from tylitianrui/chore/remove_redundance
remove redundant code
2024-07-30 12:44:03 +01:00
Bryan Boreham
ce3bd4abea Update for Docker deprecation
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2024-07-17 17:03:32 +01:00
Simon Pasquier
145988d48f
discovery(k8s): remove support for API versions no longer served
This commit removes support for the following API versions:
* `discovery.k8s.io/v1beta1` API version of EndpointSlice (no longer
  served as of v1.25).
* `networking.k8s.io/v1beta1` API version of Ingress (no longer served
  as of v1.22).

Closes #12884

Signed-off-by: Simon Pasquier <spasquie@redhat.com>
2024-07-04 14:54:27 +02:00
Paulo Dias
f4b1fcb73e
discovery: add support for gathering flavor name in Openstack discovery (#14312)
* feat: add support for gathering flavor name in Openstack discovery

Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>

* Update instance.go

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Paulo Dias <44772900+paulojmdias@users.noreply.github.com>

* Update configuration.md

Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Paulo Dias <44772900+paulojmdias@users.noreply.github.com>

* fix: fix linting

Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>

* fix: fix instance type

Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>

* Update docs/configuration/configuration.md

Co-authored-by: Simon Pasquier <spasquie@redhat.com>
Signed-off-by: Paulo Dias <44772900+paulojmdias@users.noreply.github.com>

---------

Signed-off-by: Paulo Dias <paulodias.gm@gmail.com>
Signed-off-by: Paulo Dias <44772900+paulojmdias@users.noreply.github.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Co-authored-by: Simon Pasquier <spasquie@redhat.com>
2024-06-30 19:18:18 +02:00
Bryan Boreham
c5040c5ea9
Merge pull request #10490 from DrAuYueng/fix-docker-sd-service-missing
[ENHANCEMENT] Docker SD: add MatchFirstNetwork for containers with multiple networks

Fixes docker sd service misssing in shared mode and deduplicate targets by network
2024-06-26 12:33:50 +01:00
Arve Knudsen
d902116b41 Fix various linting errors
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2024-06-24 16:11:53 -07:00
unknown
0d25931049 rebase main and adjust the configuration
Signed-off-by: ouyang1204@gmail.com <ouyang1204@gmail.com>
2024-06-21 19:10:18 +08:00
akunszt
2aaf99dd0a
discovery: aws: expose Primary IPv6 addresses as label, partially fixes #7406 (#14156)
* discovery: aws: expose Primary IPv6 addresses as label

Add __meta_ec2_primary_ipv6_addresses label. This label contains the
Primary IPv6 address for every ENI attached to the EC2 instance. It is
ordered by the DeviceIndex and the missing elements (interface without
Primary IPv6 address) are kept in the list.

---------

Signed-off-by: Arpad Kunszt <akunszt@hiya.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2024-06-20 14:36:20 +01:00