* claude first pass - use new auth in search client constructor
* use in-process token exchanger when token exchange url is empty
* use unsafe authenticator in dev mode
* delete claudes useless tests
* add unified feature flag for unsafe grpc auth. Clean up some comments
* extract to helper func and add tests
* update readme
* fix comment
* adds regression tests
* fixes test
* has to also be dev for unsage authenticator to work
* fixes dev check and adds test
* gate in process token exchanger behind dev mode
* Unified Storage: extract KV and resource DB construction into Wire providers
Add sql.ProvideKV(cfg, eDB) as the single instantiation point for the
unified-storage kv.KV (Badger for storage_type=file, SQL KV for
storage_type=unified when EnableSQLKVBackend=true, nil otherwise). Add
sql.ProvideResourceDB(cfg, grafanaDB) so the resource-DB wrapper is a
Wire output shared between ProvideKV and NewStorageBackend — keeping
one eDB instance per process.
NewStorageBackend and NewFileBackend now take eDB and kvStore as
parameters instead of constructing them internally. unified.Options
gains KV and EDB fields; populated by Wire. ModuleServer.NewModule
also takes both, stored on the struct and reused in the UnifiedBackend
module callback.
This enables future DI-time consumers (e.g. a KV-lease-based leader
elector) to depend on kv.KV directly without going through the
storage backend's deferred construction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* make update-workspace
* regen wire
* Remove KV provider from module server wire
* Unified Storage: skip resource DB init when storage_type doesn't use it
ProvideResourceDB now returns nil for storage_type=file, unified-grpc,
and unified-kv-grpc so Wire DI doesn't open an unused xorm engine /
connection pool at startup. Only storage_type=unified consumes eDB
(legacy SQL backend or SQLKV).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Unified Storage: tighten ProvideResourceDB doc comment
Drop the "single eDB instance per process" wording — the guarantee is
per Wire graph, not per process.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Unified Storage: provide resource DB for legacy and etcd storage types
ProvideResourceDB previously returned nil for every storage_type other
than `unified`, but `legacy`, `etcd`, and unspecified types fall through
to the SQL backend in newClient. That left NewBackend with a nil
DBProvider and broke startup. Skip the resource DB only for the storage
types that genuinely do not consume it (file, unified-grpc,
unified-kv-grpc).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Unified Storage: satisfy exhaustive linter in ProvideResourceDB
Move the dbimpl.ProvideResourceDB call into a default branch so the
switch covers every StorageType variant.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Dashboards: skip flaky tabs scroll Playwright test
Introduced in #123251 (merged the same day this branch was rebased onto
main). The "scroll tabs left" button intermittently stays visible after
the test scrolls all the way to the start, failing the toPass() block
that waits for it to hide.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Currently each of the operators listens for signals to shutdown, but this is
not necessary as they are run from dskit services which provice a Context which
will be cancelled on shutdown. Propagate this and use it instead.
* wire up vector search and authz - first pass
* fix comment
* memoize checker when subresource present
* search server dist routes to random search pod
* update local integration smoke tests
* make gofmt
* make gen-go
* add embedder to single binary for running locally easier
* make update-workspace
* fix failing extractor tests
* fix linter errors
* fix linter errors
* remove old embedder smoke test
* tenant watcher use client side filtering
* Revert "tenant watcher use client side filtering"
This reverts commit f71e7f3b26.
* make update-workspace
* make update-workspace
* adds backfiller. First pass review done.
* fix comments and dont use module constant for vector backfiller
* remove stupid comments. Use builder iface instead of extractor iface
* cleans up more comments and some naming
* adds func for marking backfill job with error only so we dont overwrite last seen key
* make update-workspace
* fix broken tests from merge and fix bug with backfill pagination
* removes slop comment
* dont run backfiller as own service, just spawn goroutine to run it
* fix lint errors
* only run promoter goroutine when interval > 0
* Skip jobs whose model doesnt match configured model. Skip jobs when we dont have a registered builder for them. Fixes some bad tests.
* sort builders in constructor
* wire up vector search and authz - first pass
* fix comment
* memoize checker when subresource present
* search server dist routes to random search pod
* update local integration smoke tests
* make gofmt
* make gen-go
* add embedder to single binary for running locally easier
* make update-workspace
* fix failing extractor tests
* fix linter errors
* fix linter errors
* remove old embedder smoke test
* tenant watcher use client side filtering
* Revert "tenant watcher use client side filtering"
This reverts commit f71e7f3b26.
* make update-workspace
* make update-workspace
* use max query length of 1000 for now
* use BatchCheck() for authz
* storage/unified: remove leftover empty retry test file
The merge of main into this branch hit a delete/modify conflict on
storage_backend_list_retry_test.go and resolved it by keeping an empty
file. Upstream had deleted it (along with the batchGetRetryPull /
BatchGetRetryPolicy / WithIndexBuildRetryBudget machinery it tested) in
4652d970b8. The 0-byte file broke the build with 'expected package,
found EOF'.
* adds test for query length
* make update-workspace
* feat(unified-storage): add vector-storage database config section
* feat(unified-storage): add VectorBackend interface and types
* feat(unified-storage): add pgvector schema DDL and init
* feat(unified-storage): add vector SQL templates and request structs
Add 5 SQL template files (upsert, delete, search, get_latest_rv,
create_partition) and queries.go with typed request/response structs
following the sqltemplate pattern used in unified storage.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(unified-storage): add vector SQL template snapshot tests
Add snapshot tests for all 5 SQL templates (upsert, delete, search,
get_latest_rv, create_partition) using PostgreSQL dialect only via the
mocks.CheckQuerySnapshots framework.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat(unified-storage): add pgvectorBackend implementation
Implements the VectorBackend interface using pgvector-go for half-vector
similarity search, the Grafana db.DB/db.Tx abstractions for dbutil
compatibility, and a sync.Map partition cache for idempotent DDL.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* test(unified-storage): add pgvectorBackend unit tests
Tests cover sanitizePartitionName, Upsert no-op with empty/nil slice,
Delete (all and stale), and GetLatestRV (value and no-rows cases)
using the existing sqlmock test infrastructure.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* refactor(unified-storage): make vector search metadata filters generic
Replace hardcoded datasource_uids/query_languages filters with a generic
MetadataFilterEntry that works for any JSONB key. The SQL template now
uses {{ range .MetadataFilters }} instead of dashboard-specific conditionals.
* refactor(unified-storage): use migrator pattern for vector schema
Replace raw schema.sql DDL with Grafana's migrator pattern (same as
MigrateResourceStore). Each DDL statement is a tracked migration against
the separate vector database, enabling incremental schema evolution.
* devenv: add pgvector docker block for vector search integration tests
PostgreSQL with pgvector extension on port 5433. Usage:
make devenv sources=pgvector
* feat(unified-storage): add vector database provider wiring
ProvideVectorBackend connects [database_vector] config to a working
pgvectorBackend: builds connection string, creates xorm engine, runs
migrations, wraps as db.DB. Returns (nil, nil) if not configured.
* test(unified-storage): add pgvector integration tests
Tests run against real PostgreSQL+pgvector via devenv block:
make devenv sources=pgvector
PGVECTOR_TEST_DB="..." go test -run TestIntegration ...
Covers: upsert, search with filters, GetLatestRV, delete stale,
upsert overwrite. Skips when PGVECTOR_TEST_DB is not set.
* chore: remove planning docs from repo
* chore: remove trivial vector config tests
* get tests passing
* partition on namespace + model so we dont mix embeddings
* wire up VectorBackend and feature flag
* fix wiring
* gofmt and provider func name change
* docs: spec for tenant watcher/deleter tracing
Performance-oriented tracing design: spans around labelling and
deletion to identify slow tenants/group-resources.
* chore: remove superpowers spec accidentally merged from local main
The docs/superpowers/specs/2026-04-20-tenant-tracing-design.md was
committed to local main and came along during a merge-back into this
branch. Per repo convention, superpowers planning docs stay outside
the repo.
* gofmt
* fix codeowners for pgvector docker block
* fix CI make gen-apps
* make update-workspace
* fix bad comment and modowners
* fix linter errors
* rename ff to vector_backend
* shard by table. Create tables on write. Remove rv dep for embeddings and track it globally in one table until we have a persistent queue
* use halfvec(1024) for embeddings to match what GA uses. Fix some comments.
* add some vector validation
* gofmt
* update partitioning strategy... again. One table per resource, tenants with many resources get their own dedicated partition created dynamically that has hnsw index, all other tenants share a single partition with no hnsw index
* add gin index to metadata when creating new partition
* move promoter into vector backend
* make gen-jsonnet
* use partitioning for resource tables instead. Makes things simpler and makes cross resource search easier
* remove top level default partition - not used
* add some comments
* promoter uses partial index instead of partitions
* shortens hnsw index name so theres more room for resource name since theres a 63 char max
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* IAM/Zanzana: inject MT reconciler GVR list via Wire
Promote the hardcoded `reconcileGVRs` package global to a `Config.GVRs`
field on the reconciler and wire it through `NewEmbeddedZanzanaServer` /
`newServer`. The OSS list (folders, resourcepermissions, teambindings,
users, serviceaccounts) is provided by `ProvideReconcileGVRs`; enterprise
can rebind this to include Role/RoleBinding (and any other
enterprise-only resources) without touching the reconciler package.
Split `authz.WireSet` into `authz.WireSetBase` (reusable) and
`authz.WireSet` (base + OSS `ProvideReconcileGVRs`) so enterprise's
wireset can swap in its own list by importing `WireSetBase` and binding
its own provider.
Why: Role and RoleBinding are noop-implemented in OSS
(pkg/registry/apis/iam/api_installer.go). Listing them in the reconciler
fails the whole namespace reconcile. Rather than teach the reconciler to
detect and skip noop APIs at runtime, pick the GVR list at wire time so
the reconciler only sees resources it can actually read.
This PR is pure DI plumbing — no behavior change for enterprise (which
will land the override on its matching branch) and no change to the set
of GVRs the OSS MT reconciler is currently expected to handle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* IAM/Zanzana: move ReconcileGVRs out of shared wireSet
The shared wireSet in wire.go previously bundled authz.WireSet, which
included authz.ProvideReconcileGVRs. Because wireSet is composed into
both OSS and enterprise graphs, enterprise would silently pull in the
OSS default provider and conflict with (or shadow) its own override.
- pkg/server/wire.go: use authz.WireSetBase instead of authz.WireSet in
the shared wireSet.
- pkg/server/wireexts_oss.go: add authz.ProvideReconcileGVRs to the OSS
wireExtsBasicSet so OSS still binds the default list.
Enterprise supplies extauthz.ProvideEnterpriseReconcileGVRs in its own
wireExtsBasicSet (lives in grafana-enterprise).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Rename from gvr to crd
* Fix lint
* Fix module server DI
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Zanzana: make datastore creation pluggable via StoreProvider interface
Introduce a StoreProvider interface that abstracts how OpenFGA datastores
are created for the Zanzana authorization server. This allows alternative
datastore backends to be injected via Wire dependency injection.
Changes:
- Add StoreProvider interface with NewEmbeddedStore and NewStandaloneStore
methods, plus a default SQL-based implementation
- Refactor NewEmbeddedZanzanaServer and NewZanzanaServer to accept a
pre-built storage.OpenFGADatastore instead of creating one internally
- Thread StoreProvider through ProvideEmbeddedZanzanaServer,
ProvideZanzanaService, and ModuleServer via Wire
- Add ZanzanaGRPCStoreSettings config and store_type setting to support
future alternative backends
- Regenerate wire_gen.go
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Zanzana: fix enterprise wireexts to use OSS default StoreProvider
The wireexts_enterprise.go in the OSS repo must use the default OSS
StoreProvider since the enterprise extensions package is not available
in OSS CI. The enterprise repo overrides this with the gRPC-capable
provider via the oss-to-enterprise sync.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Zanzana: regenerate enterprise_wire_gen.go with enterprise StoreProvider
The enterprise_wire_gen.go must use ProvideEnterpriseStoreProvider (from
the enterprise wire source), while wireexts_enterprise.go in the OSS repo
uses ProvideDefaultStoreProvider as a placeholder.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Zanzana: remove enterprise wire files from OSS repo
These files are managed by the enterprise repo and copied via
enterprise-to-oss.sh. They should not be committed in the OSS repo.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Zanzana: Fix list test to use NewEmbeddedStore for OpenFGADatastore
The test was passing *sqlstore.SQLStore directly to NewEmbeddedZanzanaServer,
which expects a storage.OpenFGADatastore. Wrap it with zStore.NewEmbeddedStore
to match the pattern used in server_test.go and bench tests.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* operators: add liveness and readiness check to instrumentation
* operators: move notifier to instrumentation service
* avoid updating zanzana reconciler
* operators: refactor health probes and connect provisioning controllers
* provisioning: discard error in body response in tests
* Squashed commit of the following:
commit c89d5ea3ac9622b507ed0fcc77e8dd2869ef2c48
Author: Georges Chaudy <chaudyg@gmail.com>
Date: Wed Feb 18 22:30:56 2026 +0100
Squashed commit of the following:
commit b86de0d0e262bfdd0df77353aa1bc8c2e8c407f2
Author: Georges Chaudy <chaudyg@gmail.com>
Date: Mon Feb 9 18:55:09 2026 +0100
fix tests
commit f9e1970c117e9f81bf4ff2b74233999231e277a7
Author: Georges Chaudy <chaudyg@gmail.com>
Date: Mon Feb 9 17:08:35 2026 +0100
Refactor: Remove fallback logic
* Add StorageServiceOptions for SQL service injection in tests. Update Run method to utilize StorageServiceOptions for gRPC service registrations.
* Simplify unified storage client backend setup
* Gate storage backend creation by storage type
* Allow unified grpc storage without backend
* refactor(unified): ProvideStorageBackend before unified service
* refactor(unified): implement ProvideStorageBackend and provide backend before service
* Fix issues after merge
* simplify changes
* fix missing reference
* fix tests
* fix lint and add comment to NewStorageBackend
* start service in test
* separate module for unified backend
* Stop unified backend after grpc
* Fix tests
* Shutdown backend last
* Do not reply on ishealthy after shutdown started
* Cleanup a bit the code
* Init backend at register time
* Do not change health checks for now
* re-add storage metrics
* check for nil on testinfra sql.NewStorageBackend
* add tracer for backend and set max_open_conn in test
* address claude review
* make distributor an idle server
* ensure server is created in test
* refactor(unified): add separate target for search
* refactor(unified): add initializeBlobStorage, remove some comments, use noopService if diagnostics not set
* refactor(unified): add integration test and SearchClient
* chore(search): remove indexMetrics when search is disabled, adjust function names and struct fields
* Add enterprise hooks
* wip...
* undo
* update wire gen
* remove old hook thing
* move build info into seperate func
* align fs context middleware with grafana, setting SignedInUser
* Call IndexDataHooks to get modified build info
* update tests
* go workspace
* idk, reset workspace files or whatever
* conditionally mount license
* support loading decoupled plugins from cdn
---------
Co-authored-by: Ashley Harrison <ashley.harrison@grafana.com>
* FrontendService: Add tracing and logging middleware
* tests!
* middleware tests
* context middleware test
* revert http_server back to previous version
* fix lint
* fix test
* use http.NotFound instead of custom http handler
* use existing tracer for package
* use otel/trace.Tracer in request_tracing middleware
* tidy up tracing in contextMiddleware
* fix 404 test
* remove spans from contextMiddleware
* comment
* setup distributor module
* move lifecycler into resource server provider
* remove ring/client pool setup from distributor module and use the same ring/client pool between storage server module and distributor module
* implement resourcestore server methods
* make healthcheck fail if ring is not running
* Updates the instrumentation_server service to use mux instead of the builtin router, and have it store the router in the module server: this is so we can register the /ring endpoint to check the status of the ring
* Create a new Ring service that depends on the instrumentation server and declares it as a dependency for the storage server
* Create standalone MemberlistKV service for Ring service to use
* Update the storage server Search and GetStats handler to distribute requests if applicable
* create the most basic frontend-server module
* expose prom metrics??
* add todo list
* move frontend-service to its own folder in services
* check error from writer.Write
* reword comment, add launch config
* refactor grafana_index_server_index_size to calculate in a goroutine instead of at scrape time and remove grafana_index_server_indexed_docs metric
* use wire to inject bleve index metrics
* remove sprinkles metrics from bleve index metrics
* log error when trying to calculate file index size and bump interval to 1m instead of 5s
* move prometheus.register for unified storage metrics into metrics.go and do most of the plumbing to get it to work
* convert StorageApiMetrics to pointer and check for nil before using it
* rename type and variables to something more sensible
---------
Co-authored-by: Jean-Philippe Quéméner <jeanphilippe.quemener@grafana.com>
* Wire up sprinkles to oss and enterprise. Fetching sprinkles not implemented yet.
* Adds wireset for initializing document builders. Had to init it when creating the service to avoid cyclical imports.
* updates to int64 for stats
* adds config for sprinklesApiServer and gets sprinkles from there when its present
* add comment for later
* adds feature toggle for sprinkles. returns empty results when flag not enabled.
* adds unified storage config setting for sprinkles apiserver page limit
* fixes bug where dashboard uid was not getting set
* when creating dashboard summary, use metadata.name as the dashboard uid
* cleans up wire. use existing oss and enterprise sets to generate doc builders
* remove old wireset
* fix linter - adds missing arg for doc builders
* update dashboard stats in tests
* updates test-data dashboards
* log a warning instead of returning an error if we can't get sprinkles for a namespace
* dont read uid from dashboard json
* make the resource store the default unified storage backend
* add integration tests
* fix test non passing
* Update pkg/storage/unified/sql/test/integration_test.go
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* lint
* fix tests
* fix no rows
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* Zanana: Initial work to run zanana as ebeddedn or standalone
* Add addr settings for when remote client is used.
* sync dependencies
* Lock mysql driver version
---------
Co-authored-by: Dan Cech <dcech@grafana.com>
* Storage server runs own instrumentation server if its the sole target. Starts adding some sample metrics for now.
* adds metric for failed optimistic locks
* refactors metrics registration to own method on service for testability. Adds tests.
* Register sql storage server metrics from within the service
* fixes test
* troubleshooting drone test failures. Maybe timing when starting instrumentation server?
* Waits until instrumentation server has started. Updates tests.
* defer wont get called unless theres an error. removing.
* wait for instrumentation server to be running
* linter - close res body
* use port 3000 for metrics and removes test metric inc() call
* fixes test - updates port
* refactors module server to provide an instrumentation server module when there is no ALL or CORE target provided and running as single target
* make instrumentation server a dependency of all modules that do not run their own http server
* adds module server test
* adds tests for instrumentation service and removes old tests that aren't needed
* ignore error in test
* uses helper to start and run service
* when running wait on ctx done or http server err
* wait for http server
* removes println
* updates module server test to be integration test
* require no error in goroutine
* skips integration test when GRAFANA_TEST_DB not defined
* move http server start into start, verify returned content
* make test error when run fails
* try waiting longer and see if drone tests pass
* update integration test mysql creds to match drone
* go back to only waiting half second
* debug log drone mysql connection string
* use same db connection config as drone
* try using same hostname as drone
* cant use localhost as mysql hostname in drone tests. Need to parse it from the cfg db connection string
---------
Co-authored-by: Dan Cech <dcech@grafana.com>
* first round of entityapi updates
- quote column names and clean up insert/update queries
- replace grn with guid
- streamline table structure
fixes
streamline entity history
move EntitySummary into proto
remove EntitySummary
add guid to json
fix tests
change DB_Uuid to DB_NVarchar
fix folder test
convert interface to any
more cleanup
start entity store under grafana-apiserver dskit target
CRUD working, kind of
rough cut of wiring entity api to kube-apiserver
fake grafana user in context
add key to entity
list working
revert unnecessary changes
move entity storage files to their own package, clean up
use accessor to read/write grafana annotations
implement separate Create and Update functions
* go mod tidy
* switch from Kind to resource
* basic grpc storage server
* basic support for grpc entity store
* don't connect to database unless it's needed, pass user identity over grpc
* support getting user from k8s context, fix some mysql issues
* assign owner to snowflake dependency
* switch from ulid to uuid for guids
* cleanup, rename Search to List
* remove entityListResult
* EntityAPI: remove extra user abstraction (#79033)
* remove extra user abstraction
* add test stub (but
* move grpc context setup into client wrapper, fix lint issue
* remove unused constants
* remove custom json stuff
* basic list filtering, add todo
* change target to storage-server, allow entityStore flag in prod mode
* fix issue with Update
* EntityAPI: make test work, need to resolve expected differences (#79123)
* make test work, need to resolve expected differences
* remove the fields not supported by legacy
* sanitize out the bits legacy does not support
* sanitize out the bits legacy does not support
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>
* update feature toggle generated files
* remove unused http headers
* update feature flag strategy
* devmode
* update readme
* spelling
* readme
---------
Co-authored-by: Ryan McKinley <ryantxu@gmail.com>