* Remove build-opensearch-image.yml -- the published image is unused
* MM-68248: Support OpenSearch v3
* MM-68248: Add CI job to test OpenSearch v2 backwards compatibility
* MM-68248: Handle missing indexes gracefully before reindex
OpenSearch v3 rejects _update_by_query and _delete_by_query with no index
argument (405), and returns index_not_found_exception (404) when querying
an exact index name that hasn't been created yet. Both arise before any
reindex has run, since indexes are created on first document write.
Return nil/empty instead of an error from all affected operations, and add
test coverage for each in the no-indexes state.
* MM-68248: Fix copy-paste operation names in DeleteFilesBatch
* MM-68248: Add i18n string for delete_files_batch error
* Revert "MM-68248: Add i18n string for delete_files_batch error"
This reverts commit e885678088.
* Revert "MM-68248: Fix copy-paste operation names in DeleteFilesBatch"
This reverts commit 4b7caacf59.
* Revert "MM-68248: Handle missing indexes gracefully before reindex"
This reverts commit 2d2d522f86.
* allow workflow_dispatch trigger for Server CI (for plugins CI)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* [MM-68402] MBE Phase 2: declare four generic plugin hooks (#36291)
* new hooks-only phase 2
* remove ChannelWillBeMoved
* remove RecapWillBeProcessed and MessageWillBeRewrittenByAI
Drop the AI/recap hooks from the new-hook surface; AI-LLM paths
remain uncovered in tech preview and are documented as residuals.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [MM-68403] MBE Phase 3: ChannelGuards primitive (storage + cache + plugin API) (#36365)
* phase 3
* phase 3: register ChannelGuard mock in test setup helper
NewChannels' startup-time call to reloadGuardCache invokes
s.ChannelGuard().GetAll(); without an expectation on the mock store,
every test that sets up the server with GetMockStoreForSetupFunctions
panics during init.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase 3: register ChannelGuard mock in retrylayer test
retrylayer.New walks every store getter to wrap it; without the mock
expectation on ChannelGuard, TestRetry panics during layer construction.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* use rctx properly in the store methods
* phase 3: match rctx arg in testlib ChannelGuard mock
GetAll now takes request.CTX, so the testify expectation must include
mock.Anything; otherwise the call panics under the mocked store.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* phase 3: set api.ctx in TestChannelGuardLowercaseNormalization
The test constructs PluginAPI directly without a ctx, which used to
work when App.RegisterChannelGuard built its own EmptyContext. Now
that the App methods take rctx from the caller, the nil ctx panics
inside RequestContextWithMaster.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [MM-68404] MBE Phase 4: App-layer plugin hook wiring (#36407)
* phase 4
* Fix nil rctx in TestChannelGuardLowercaseNormalization
The PluginAPI struct literal was missing ctx: rctx after a refactor
moved the rctx declaration below the struct construction, leaving
api.ctx as nil. This caused a nil pointer dereference in reloadGuardCache
when RegisterChannelGuard called store.RequestContextWithMaster(nil).
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* Remove ChannelWillBeMoved hook call from MoveChannel (phase 4)
The hook and its ID were removed from mbe-phase-2 but the call site in
MoveChannel and its i18n string were not cleaned up during the rebase.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* remove channel will be moved test
* Remove RecapWillBeProcessed and MessageWillBeRewrittenByAI hook calls (phase 4)
The hooks and their IDs were removed from mbe-phase-2 but the call sites
in ProcessRecapChannel and RewriteMessage, their i18n strings, and their
tests were not cleaned up during the rebase.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Revert channel_id plumbing on rewrite endpoint (phase 4)
The channel_id field on RewriteRequest was added in phase 4 to feed the
synthetic post passed to MessageWillBeRewrittenByAI. With that hook
removed from mbe-phase-2, channel_id has no consumer; revert the field,
the api4 validation, the app.RewriteMessage parameter, and the
corresponding webapp client + hook plumbing.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [MM-68555] MBE Phase 5: Channel-guard enforcement + two-phase dispatch (#36473)
* phase 5
* Bake plugin counter-file paths into source instead of env vars
t.Setenv panics when an ancestor test calls t.Parallel, so the two
channel-guard tests broke under ENABLE_FULLY_PARALLEL_TESTS in CI.
Build each plugin source per-subtest with its temp file path embedded
as a Go literal — same pattern as TestPluginUploadsAPI.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Remove guarded helpers and tests for dropped hooks (phase 5)
The runGuardedRecapWillBeProcessed and runGuardedMessageWillBeRewrittenByAI
helpers were never wired (their app-layer call sites were already removed
in the phase-4 cleanup), and the corresponding sub-tests across panic /
allow / reject / partial plugins reference hooks that no longer exist.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* [MM-68405] MBE Phase 6: fire MessagesWillBeConsumed on the edit path (#36475)
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* rebase onto master
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Create config-change-checker.yml
* Create check_config_changes_ci.py
* Update config-change-checker.yml
* Update check_config_changes_ci.py
* Update check_config_changes_ci.py
* Update check_config_changes_ci.py
* Update check_config_changes_ci.py
* Update config-change-checker.yml
* Update check_config_changes_ci.py
* Update config-change-checker.yml
* Update config.go
* Fix check_api to detect multi-line and multi-method endpoints
The previous implementation matched the .Handle(...).Methods(...) regex
line-by-line against diff lines. This silently missed two real and
common patterns in api4/:
1. Multi-line .Handle(...) declarations — e.g. group.go has 18 of
them, where the path lives on one line and the wrapper/handler on
the next. The regex never matched, so PRs adding such endpoints
produced empty release-note entries.
2. Multi-method declarations like
.Methods(http.MethodGet, http.MethodHead) (4 instances in file.go)
— the old regex required a closing paren immediately after the
first method.
The fix:
- Add a file_at(ref, path) helper that snapshots a file at a git ref
via 'git show', so checkers can compare full file states instead of
pattern-matching diff text.
- Add _scan_endpoints() that whitespace-collapses the file before
matching, letting the regex span what were originally multiple
lines.
- Loosen _HANDLE_RE to capture the methods list as a substring and
extract individual HTTP verbs with a known-method allowlist, so
multi-method declarations produce one entry per verb.
- Switch check_api to set-diff (after - before) / (before - after)
on the parsed endpoint sets. This also cleanly handles routes
that move within a file (no fragile add/remove dedup needed).
- Anchor the new/deleted file detection to '^new file mode \d+' to
avoid false positives from stray text in source files.
Made-with: Cursor
* Track enclosing struct in check_config to avoid dedup collisions
The previous check_config keyed its add/remove dedup on the bare field
name. The dedup intent was to ignore fields that were merely reordered
within config.go (which appear in the diff as both '-Foo' and '+Foo').
But because the key was just the field name, an unrelated rename in one
struct could silently cancel out a real new field with the same name in
a different struct. For example, in a single PR:
- EnableFoo *bool // removed from ServiceSettings
+ EnableFooV2 *bool
- EnableBar *bool // removed from EmailSettings
+ EnableFoo *bool // newly added — but wrongly cancelled below
The dedup would see 'EnableFoo' in both lists and drop both entries,
hiding the brand-new EmailSettings.EnableFoo from the release-note
output.
The fix tracks each field's enclosing struct using a brace-depth stack
that walks the file at BASE_SHA and HEAD_SHA. Fields are keyed as
(struct_name, field_name) tuples, so identically-named fields in
different structs are distinct, and the dedup only collapses true
reorderings. As a side benefit the rendered output is now
'StructName.FieldName' which is much more useful to reviewers.
Switching to file-at-revision scanning + set diff also removes the
custom dedup logic entirely — set arithmetic handles "moved within
file" naturally.
Made-with: Cursor
* Switch remaining checkers to file-at-revision style; drop lines_by_sign
check_audit_events and check_go_version still parsed +/- diff lines
directly, with the same brittle dedup-and-cancel logic that was used in
the previous check_config. After the previous two commits the rest of
the file uses the file_at(ref, path) helper to compare full file
states between BASE_SHA and HEAD_SHA, which:
- removes the entire moved-within-file dedup dance (set arithmetic
handles it for free),
- aligns all four checkers on a single, easy-to-reason-about pattern,
- is robust to whitespace-only or reordering edits in the watched
files.
For Dockerfile.buildenv the helper also avoids a subtle case where the
old code only inspected +/- lines: an edit to an unrelated RUN line
that didn't touch the FROM line could in theory leave both old_ver and
new_ver as None even though the version was effectively unchanged.
Reading the file at each revision compares the actual current and
previous FROM line directly.
The lines_by_sign helper now has no callers, so remove it.
Made-with: Cursor
* Update config.go
* Update config.go
* Update check_config_changes_ci.py
* Update check_config_changes_ci.py
* Update check_config_changes_ci.py
* Update check_config_changes_ci.py
* Tighten check_config_changes_ci.py: regex coverage + idempotency
- Restore tolerant `_HANDLE_RE` so 2-arg wrappers (e.g. `api.APISessionRequired(handler, handlerParamFileAPI)`)
are not silently dropped from the api4 endpoint scan; broaden the `.Methods(...)`
capture so string-literal variants (`Methods("GET")`) work too. Filtering moves
back to the `_HTTP_METHODS` allowlist in `_parse_methods` to keep stray
identifiers from being treated as HTTP verbs.
- Make `strip_old_note` also remove auto-generated lines that landed outside
the ```release-note fence (the inject_note fallback paths) so reruns no
longer accumulate duplicates when a PR has no fence.
- Skip the GitHub PATCH when the PR description is already up to date, so
every commit no longer triggers an unconditional write.
- Wire up `check_go_version`'s `additions` path in `_format_lines` and
`_AUTO_LINE_RE` so a freshly-added Dockerfile.buildenv emits a note.
- Remove the now-dead `CheckResult.to_markdown` method (replaced by
`_format_lines`).
Made-with: Cursor
* Restore ExperimentalSettings.EnableWatermark
The field was removed in f71527f0b1 but `server/config/client.go`,
`server/config/client_test.go`, and `server/public/model/config_test.go`
still reference it (added on master in #36025). Restoring the field
makes the branch compile again so CI can go green.
Made-with: Cursor
* Replace placeholder release-note content (NONE / N/A) on injection
The script previously appended its auto-detected lines INSIDE the
```release-note fence but never displaced template placeholders, so PRs
that only had `NONE` ended up with output like:
NONE
Added `Foo.Bar` configuration setting.
Go runtime updated from 1.25.8 to 1.25.9.
When the existing fence content is empty or consists only of placeholder
tokens (NONE, N/A, NA, dashes — case-insensitive), replace it entirely
with the auto-detected entries. User-written human content is still
preserved by appending instead.
Idempotent: stripping followed by re-injection keeps the placeholder
visible when there's nothing to inject, and replaces it again when there
is.
Made-with: Cursor
* Update config-change-checker.yml
* Update check_config_changes_ci.py
---------
Co-authored-by: Your Name <eva.sarafianou@gmail.com>
Co-authored-by: Mattermost Build <build@mattermost.com>
Workers no longer run `npm ci` — `node_modules` and framework binaries
are restored from actions/cache populated once by a new `prep-deps` job.
This closes the intermittent EEXIST/ENOENT failure inside npm's own
cacache writer that occasionally fails `npm ci` on a runner. Removing
`npm ci` from workers also cuts ~5 min of duplicated install work per worker.
dispatch-begin now runs as its own job after prep-deps so it fires once the
per-worker test-server setup is the only remaining work before dispatch-run.
* MM-68149: upgrade to Go 1.26.2
Update go directive in go.mod and .go-version.
* MM-68149: replace pointer helpers with Go 1.26 new()
Go 1.26 extends the built-in new() to accept an initial value expression,
making typed-pointer helpers like model.NewPointer(x), bToP(x), and boolPtr(x)
redundant. Replace every call site with new(x) and remove the now-unused
helper functions and their //go:fix inline directives.
* MM-68149: apply go fix for reflect API and format-string changes
- reflect.Ptr → reflect.Pointer (renamed in Go 1.18, deprecated alias removed in 1.26)
- reflect range-over-struct: for i := 0; i < t.NumField(); i++ → for field := range t.Fields()
and the equivalent for Methods() and interface types
- Fix format-string concatenation and variadic-arg mismatches flagged by go vet
* MM-68149: update JPEG fixtures and test infrastructure for Go 1.26 encoder
Go 1.26 ships a new image/jpeg encoder that produces slightly different output.
Regenerate all JPEG fixture files and switch the comparison helpers from
byte-equality to pixel-level comparison with a small per-channel tolerance,
so minor encoder drift across patch versions is handled automatically.
Add -update-fixtures flag to make it easy to regenerate fixtures after future
major Go upgrades. Document the update procedure in tests/README.md.
* MM-68149: CI check that go fix ./... produces no changes
* Fix real bugs flagged by CodeRabbit review
- group.go: set newGroup.MemberCount not group.MemberCount (member count
was populated on the wrong variable and lost before publish/return)
- file_test.go: guard compareImage(GetFilePreview) on the preview slice
length, not the thumbnail slice length (copy-paste error)
- config_test.go: remove duplicate MinimumLength assignment
* fixup! Fix real bugs flagged by CodeRabbit review
Only comment when action-junit-report reports no failing tests after retry
merging, so all-failure retries are not labeled flaky.
Remove skip/JIRA guidance from the comment body and use neutral wording
for the workflow run link.
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* Update E2E test workflows to use context names and server images and bump playwright workers to 10
* refactor: update branch naming conventions in E2E test workflows for better aggregation
* change retry to 1, fixed and disabled failed tests
* add v2 templates for Cypress and Playwright E2E tests with test system io integration
* add commenting to pr
* identify more playwrights to fix separately
* disable deletion-report.spec for separate fix
---------
Co-authored-by: Mattermost Build <build@mattermost.com>
Restore the `fullyparallel: false` override for the unsharded
`Postgres with binary parameters` and `Postgres FIPS` jobs in the
weekly workflow. The override was originally added to the binary
parameters job in #35995 to prevent resource exhaustion on a single
runner, but was dropped when both jobs moved into
server-ci-weekly.yml in #36036, leaving them on the template default
of `true`.
Without it, the hosted runner is overwhelmed (too many server
instances, WebSocket hubs, and DB connections) and the runner agent
itself loses communication with GitHub mid-run, surfacing as
"hosted runner lost communication with the server" at ~55-60 min
into the Run Tests step. Both runs on April 27 and May 4 failed
this way; the sharded FIPS variant retained for FIPS-touching PRs
in server-ci.yml is unaffected because each shard handles only a
fraction of the packages.
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Coverage job ran the same tests as Postgres with only ENABLE_COVERAGE=true
and a Codecov upload as differences. Enable coverage directly on the Postgres
job under the same release-branch skip condition, eliminating 4x 8-core runner
hours per PR.
* ci: fix startup_failure in nightly race and weekly workflows
Add id-token: write permission to both server-ci-nightly-race.yml and
server-ci-weekly.yml. The reusable server-test-template.yml declares
id-token: write in its permissions block (needed for FIPS Docker Hub
login via OIDC). GitHub requires that caller workflows grant at least
the permissions declared by any reusable workflow they invoke —
regardless of whether the steps using those permissions are skipped at
runtime. Both new workflows only declared contents: read, causing
immediate startup_failure with zero jobs created.
Release Note
```release-note
NONE
```
Co-authored-by: Claude <claude@anthropic.com>
* ci: remove unused id-token: write from server-test-template
The template declared id-token: write but nothing in the workflow uses
OIDC token exchange — the FIPS Docker Hub login uses plain secrets
(DOCKERHUB_USERNAME/DOCKERHUB_TOKEN), not an OIDC identity token.
Removing it from the template means caller workflows (nightly race,
weekly, and the main server-ci) no longer need to grant id-token: write
either, following the principle of least privilege.
This is the actual root cause fix: the previous commit added id-token
to the callers as a workaround, but the real issue was the template
requesting a permission it never uses.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
* ci: add yamllint workflow to detect duplicate YAML keys
Add a yamllint check for workflow files to catch duplicate keys that
YAML parsers silently accept but GitHub Actions rejects on the default
branch.
Release Note
NONE
Co-authored-by: Claude <claude@anthropic.com>
* ci: address review feedback on yamllint workflow
- Add permissions: contents: read (least privilege)
- Bump checkout to v6.0.2 with persist-credentials: false
- Remove pip install step (yamllint is pre-installed on ubuntu-22.04)
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
* ci: move FIPS and binary params tests to weekly schedule
Move low-regression-risk test suites out of the per-push/PR Server CI
workflow into a new weekly scheduled workflow (Monday 1am EST / 5am UTC):
- Postgres with binary parameters (1x 8-core runner)
- Postgres FIPS sharded tests (4x 8-core runners + merge job)
- mmctl FIPS tests (1x 8-core runner)
This reduces the per-push 8-core runner demand from 14 concurrent jobs
to 5 (4 Postgres shards + 1 ES), which should significantly reduce
queue times that currently reach 90+ minutes during peak hours.
The weekly workflow also supports workflow_dispatch for manual triggering
when urgent FIPS or binary parameter verification is needed.
#### Release Note
```release-note
NONE
```
Co-authored-by: Claude <claude@anthropic.com>
* ci: move coverage shards to 2-core runners
Add a 'runner' input to server-test-template.yml (defaults to
ubuntu-latest-8-cores for backward compatibility) and set coverage
shards to ubuntu-22.04 (2-core).
Coverage is non-blocking (allow-failure: true) so longer runtime
doesn't impact PR feedback. Estimated ~20-30 min per shard on 2-core
vs ~7-9 min on 8-core, but frees 4 more 8-core slots per push.
Combined with the FIPS/binary-params weekly move, per-push 8-core
demand drops from 14 → 4 (just the Postgres test shards + ES v8).
Co-authored-by: Claude <claude@anthropic.com>
* ci: decouple race detector from binary params, add nightly race job
The race detector was accidentally bundled with binary params via
the fullyparallel=false → RACE_MODE coupling in the test template.
These test different things:
- Binary params: Postgres driver binary encoding mode
- Race detector: Go data race detection
Changes:
- Add explicit 'race-enabled' input to server-test-template.yml
- Remove implicit fullyparallel→race coupling from template
- Binary params now runs with fullyparallel: true (default)
- New server-ci-nightly-race.yml runs -race nightly at 2am EST
on ubuntu-22.04 (2-core) to avoid 8-core contention
Co-authored-by: Claude <claude@anthropic.com>
* ci: add push trigger for release-* branches to weekly workflow
FIPS and binary params validation must run automatically on release
branch pushes, not just on the weekly schedule. Without this trigger,
release branches would lose FIPS/binary coverage entirely.
Co-authored-by: Claude <claude@anthropic.com>
* ci: use ET instead of EST in schedule comments
Cron runs at fixed UTC times regardless of DST. Use ~ET to avoid
implying exact EST/EDT correspondence.
Co-authored-by: Claude <claude@anthropic.com>
* ci: restore conditional FIPS on per-push, unshard weekly FIPS
Per review feedback from @lieut-data:
1. Restore FIPS jobs in server-ci.yml with conditional execution:
run on all pushes (master/release) and on PRs when go.mod changed
or branch name contains 'fips'. This ensures Go upgrades and
explicit FIPS work get immediate feedback.
2. Remove sharding from weekly FIPS — no speed pressure on a weekly
schedule, so a single unsharded job is simpler (eliminates the
4-shard matrix + merge job).
3. Restore gomod-changed detection step in the go job.
Both per-push (conditional, unsharded) and weekly (unconditional,
unsharded) FIPS runs use single jobs now, reducing complexity.
Co-authored-by: Claude <claude@anthropic.com>
* ci: restore FIPS sharding for PR runs, remove from push events
FIPS tests in server-ci.yml now only trigger on PRs where the branch
name contains 'fips' or go.mod changed. Sharding (4 shards + merge)
restored for fast iteration on FIPS-related PRs. Regular FIPS coverage
provided by the weekly workflow (unsharded).
This addresses lieut-data's review feedback to restore sharding where
it matters most: during active PR iteration.
Co-authored-by: Claude <claude@anthropic.com>
* ci: add explicit permissions to weekly and nightly workflows
Set minimum required permissions (contents: read) on both new workflow
files per review feedback. Reusable workflows called via 'uses' inherit
the caller's permissions.
Co-authored-by: Claude <claude@anthropic.com>
* ci: keep coverage shards on 8-core runners
Comment out the 2-core runner override for coverage shards per
Eva's feedback. Coverage stays on the default 8-core runners.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Replace the unquoted heredoc (which embedded GITHUB_HEAD_REF into a
generated script) with a cp of the existing run-shard-tests.sh, which
already handles the light-only case. Pass BUILD_NUMBER and TEST_TARGET
as explicit docker env vars instead of interpolating them into script
content.
The allow-failure input was defined twice in the workflow_call inputs,
causing GitHub Actions to reject the workflow with 0 jobs on master push.
Duplicate was introduced in #35743 merge.
Release Note
NONE
Co-authored-by: Claude <claude@anthropic.com>
* ci: re-enable server test coverage with 4-shard parallelism
The test-coverage job was disabled due to OOM failures when running all
tests with coverage instrumentation in a single process. Re-enable it
by distributing the workload across 4 parallel runners using the shard
infrastructure from the sharding PRs.
Changes:
- Replace disabled single-runner test-coverage with 4-shard matrix
- Add merge-coverage job to combine per-shard cover.out files
- Upload merged coverage to Codecov with server flag
- Skip per-shard Codecov upload when sharding is active
- Add coverage profile merging to run-shard-tests.sh for multi-run shards
- Restore original condition: skip coverage on release branch PRs
- Keep fullyparallel=true (fast within each shard)
- Keep continue-on-error=true (coverage never blocks PRs)
Co-authored-by: Claude <claude@anthropic.com>
* fix: disable fullyparallel for coverage shards
t.Parallel() + t.Setenv() panics kill entire test binaries under
fullyparallel mode. With 4-shard splitting, serial execution within
each shard should still be fast enough (~15 min). We can re-enable
fullyparallel once the incompatible tests are fixed.
Co-authored-by: Claude <claude@anthropic.com>
* fix: add checkout to coverage merge job for Codecov file mapping
Codecov needs the source tree to map coverage data to files.
Without checkout, the upload succeeds but reports 0% coverage
because it can't associate cover.out lines with source files.
Co-authored-by: Claude <claude@anthropic.com>
* ci: add codecov.yml and retain merged coverage artifact
Add codecov.yml with:
- Project coverage: track against parent commit, 1% threshold, advisory
- Patch coverage: 50% target for new code, advisory (warns, doesn't block)
- Ignore generated code (retrylayer, timerlayer, serial_gen, mocks,
storetest, plugintest, searchtest) — these inflate the denominator
from 146K to 100K statements, rebasing coverage from 36% to 53%
- PR comments on coverage changes with condensed layout
Save merged cover.out as artifact with 30-day retention (~3.5MB/run).
90-day retention was considered (~6.3GB total vs ~2.1GB at 30 days)
but deferred to keep storage costs low.
#### Release Note
```release-note
NONE
```
Co-authored-by: Claude <claude@anthropic.com>
* ci: add codecov.yml to exclude generated code and enable PR comments (#35748)
* ci: add codecov.yml to exclude generated code and enable PR comments
Add Codecov configuration to improve coverage signal quality:
- Exclude generated code from coverage denominator:
- store/retrylayer (~10k stmts, auto-generated retry wrappers)
- store/timerlayer (~14k lines, auto-generated timing wrappers)
- *_serial_gen.go (serialization codegen)
- **/mocks (mockery-generated mocks)
- Exclude test infrastructure:
- store/storetest (~63k lines, test helpers not production code)
- plugin/plugintest (plugin test helpers)
- Exclude thin wrappers:
- model/client4.go (~4k stmts, HTTP client methods tested via integration)
- Enable PR comments with condensed layout
- Set project threshold at 0.5% drop tolerance
- Set patch target at 60% for new/changed lines
This rebases the effective coverage metric from ~33.8% to ~43% by
removing ~50k non-production statements from the denominator, giving
a more accurate picture of actual test coverage.
Co-authored-by: Claude <claude@anthropic.com>
* Update codecov.yml
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Jesse Hallam <jesse.hallam@gmail.com>
* fix: bump upload-artifact to v7 and add client4.go to codecov ignore
- Align upload-artifact pin with the rest of the workflow (v4 → v7)
- Add model/client4.go to codecov.yml ignore list as documented in PR description
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): address Jesse review feedback on coverage sharding
- Remove client4.go from codecov ignore list (coverage is meaningful)
- Remove historical comment block above test-coverage job
- Set fullyparallel back to true (safe per-shard since each runs
different packages; parallel test fixes tracked in #35751)
- Replace merge-coverage job with per-shard Codecov uploads using
flags parameter; configure after_n_builds: 4 so Codecov waits for
all shards before reporting status
- Add clarifying comment in run-shard-tests.sh explaining intra-shard
coverage merge (multiple gotestsum runs) vs cross-shard merge
(handled natively by Codecov)
- Simplify codecov.yml: remove verbose comments, use informational
status checks, streamlined ignore list
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): set fullyparallel back to false for coverage shards
Coverage shards 1-3 failed with hundreds of test failures because
fullyparallel: true causes panics and races in tests that use
t.Setenv, os.Setenv, and os.Chdir without parallel-safe alternatives.
The parallel-safety fixes are tracked in a separate PR chain:
- #35746: t.Setenv → test hooks
- #35749: os.Setenv → parallel-safe alternatives
- #35750: os.Chdir → t.Chdir
- #35751: flip fullyparallel: true (final step)
Once that chain merges, fullyparallel can be enabled for coverage too.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): split fullyparallel and allow-failure into separate inputs
Previously fullyparallel controlled both parallel test execution AND
continue-on-error, meaning disabling parallelism also made coverage
failures blocking. Split into two independent inputs:
- fullyparallel: controls ENABLE_FULLY_PARALLEL_TESTS (test execution)
- allow-failure: controls continue-on-error (advisory vs blocking)
Coverage shards now run with fullyparallel: true (Claudio's original
approach) and allow-failure: true (failures don't block PRs until
parallel-safety fixes land in #35746 → #35751).
Co-authored-by: Claude <claude@anthropic.com>
* ci: use per-flag after_n_builds for server and webapp coverage
Replace the global after_n_builds: 2 with per-flag values:
- server: after_n_builds: 4 (one per shard)
- webapp: after_n_builds: 1 (single merged upload)
Tag the webapp Codecov upload with flags: webapp so each flag
independently waits for its expected upload count. This prevents
Codecov from firing notifications with incomplete data when the
webapp upload arrives before all server shards complete.
Addresses review feedback from @esarafianou.
Co-authored-by: Claude <claude@anthropic.com>
* fix: consolidate codecov config into .github/codecov.yml
Move all codecov configuration into the existing .github/codecov.yml
instead of introducing a duplicate file at the repo root. Merges
improvements from the root file (broader ignore list, informational
statuses, require_ci_to_pass: false) while preserving the webapp flag
from the original config. Updates after_n_builds to 5 (4 server + 1
webapp).
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Jesse Hallam <jesse.hallam@gmail.com>
Binary parameters tests run unsharded on a single runner. With
fullyparallel enabled, all ~755 api4 tests run concurrently, causing
resource exhaustion (too many server instances, WebSocket hubs, and DB
connections). The test binary gets killed after 11 minutes with no
individual test failures — just overwhelmed resources.
Disabling fullyparallel for this specific job lets binary parameters
tests pass while we evaluate moving them to a nightly/weekly schedule.
Co-authored-by: Claude <claude@anthropic.com>
* ci: enable fullyparallel mode for server tests
Replace os.Setenv, os.Chdir, and global state mutations with
parallel-safe alternatives (t.Setenv, t.Chdir, test hooks) across
37 files. Refactor GetLogRootPath and MM_INSTALL_TYPE to use
package-level test hooks instead of environment variables.
This enables gotestsum --fullparallel, allowing all test packages
to run with maximum parallelism within each shard.
Co-authored-by: Claude <claude@anthropic.com>
* ci: split fullyparallel from continue-on-error in workflow template
- Add new boolean input 'allow-failure' separate from 'fullyparallel'
- Change continue-on-error to use allow-failure instead of fullyparallel
- Update server-ci.yml to pass allow-failure: true for test coverage job
- Allows independent control of parallel execution and failure tolerance
Co-authored-by: Claude <claude@anthropic.com>
* fix: protect TestOverrideLogRootPath with sync.Mutex for parallel tests
- Replace global var TestOverrideLogRootPath with mutex-protected functions
- Add SetTestOverrideLogRootPath() and getTestOverrideLogRootPath() functions
- Update GetLogRootPath() to use thread-safe getter
- Update all test files to use SetTestOverrideLogRootPath() with t.Cleanup()
- Fixes race condition when running tests with t.Parallel()
Co-authored-by: Claude <claude@anthropic.com>
* fix: configure audit settings before server setup in tests
- Move ExperimentalAuditSettings from UpdateConfig() to config defaults
- Pass audit config via app.Config() option in SetupWithServerOptions()
- Fixes audit test setup ordering to configure BEFORE server initialization
- Resolves CodeRabbit's audit config timing issue in api4 tests
Co-authored-by: Claude <claude@anthropic.com>
* fix: implement SetTestOverrideLogRootPath mutex in logger.go
The previous commit updated test callers to use SetTestOverrideLogRootPath()
but didn't actually create the function in config/logger.go, causing build
failures across all CI shards. This commit:
- Replaces the exported var TestOverrideLogRootPath with mutex-protected
unexported state (testOverrideLogRootPath + testOverrideLogRootMu)
- Adds exported SetTestOverrideLogRootPath() setter
- Adds unexported getTestOverrideLogRootPath() getter
- Updates GetLogRootPath() to use the thread-safe getter
- Fixes log_test.go callers that were missed in the previous commit
Co-authored-by: Claude <claude@anthropic.com>
* fix(test): use SetupConfig for access_control feature flag registration
InitAccessControlPolicy() checks FeatureFlags.AttributeBasedAccessControl
at route registration time during server startup. Setting the flag via
UpdateConfig after Setup() is too late — routes are never registered
and API calls return 404.
Use SetupConfig() to pass the feature flag in the initial config before
server startup, ensuring routes are properly registered.
Co-authored-by: Claude <claude@anthropic.com>
* fix(test): restore BurnOnRead flag state in TestRevealPost subtest
The 'feature not enabled' subtest disables BurnOnRead without restoring
it via t.Cleanup. Subsequent subtests inherit the disabled state, which
can cause 501 errors when they expect the feature to be available.
Add t.Cleanup to restore FeatureFlags.BurnOnRead = true after the
subtest completes.
Co-authored-by: Claude <claude@anthropic.com>
* fix(test): restore EnableSharedChannelsMemberSync flag via t.Cleanup
The test disables EnableSharedChannelsMemberSync without restoring it.
If the subtest exits early (e.g., require failure), later sibling
subtests inherit a disabled flag and become flaky.
Add t.Cleanup to restore the flag after the subtest completes.
Co-authored-by: Claude <claude@anthropic.com>
* Fix test parallelism: use instance-scoped overrides and init-time audit config
Replace package-level test globals (TestOverrideInstallType,
SetTestOverrideLogRootPath) with fields on PlatformService so each test
gets its own instance without process-wide mutation. Fix three audit
tests (TestUserLoginAudit, TestLogoutAuditAuthStatus,
TestUpdatePasswordAudit) that configured the audit logger after server
init — the audit logger only reads config at startup, so pass audit
settings via app.Config() at init time instead.
Also revert the Go 1.24.13 downgrade and bump mattermost-govet to
v2.0.2 for Go 1.25.8 compatibility.
* Fix audit unit tests
* Fix MMCLOUDURL unit tests
* Fixed unit tests using MM_NOTIFY_ADMIN_COOL_OFF_DAYS
* Make app migrations idempotent for parallel test safety
Change System().Save() to System().SaveOrUpdate() in all migration
completion markers. When two parallel tests share a database pool entry,
both may race through the check-then-insert migration pattern. Save()
causes a duplicate key fatal crash; SaveOrUpdate() makes the second
write a harmless no-op.
* test: address review feedback on fullyparallel PR
- Use SetLogRootPathOverride() setter instead of direct field access
in platform/support_packet_test.go and platform/log_test.go (pvev)
- Restore TestGetLogRootPath in config/logger_test.go to keep
MM_LOG_PATH env var coverage; test uses t.Setenv so it runs
serially which is fine (pvev)
- Fix misleading comment in config_test.go: code uses t.Setenv,
not os.Setenv (jgheithcock)
Co-authored-by: Claude <claude@anthropic.com>
* fix: add missing os import in post_test.go
The os import was dropped during a merge conflict resolution while
burn-on-read shared channel tests from master still use os.Setenv.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: wiggin77 <wiggin77@warpmail.net>
Co-authored-by: Mattermost Build <build@mattermost.com>
* Replace hardcoded test passwords with model.NewTestPassword()
Add model.NewTestPassword() utility that generates 14+ character
passwords meeting complexity requirements for FIPS compliance. Replace
all short hardcoded test passwords across the test suite with calls to
this function.
* Enforce FIPS compliance for passwords and HMAC keys
FIPS OpenSSL requires HMAC keys to be at least 14 bytes. PBKDF2 uses
the password as the HMAC key internally, so short passwords cause
PKCS5_PBKDF2_HMAC to fail.
- Add FIPSEnabled and PasswordFIPSMinimumLength build-tag constants
- Raise the password minimum length floor to 14 when compiled with
requirefips, applied in SetDefaults only when unset and validated
independently in IsValid
- Return ErrMismatchedHashAndPassword for too-short passwords in
PBKDF2 CompareHashAndPassword rather than a cryptic OpenSSL error
- Validate atmos/camo HMAC key length under FIPS and lengthen test
keys accordingly
- Adjust password validation tests to use PasswordFIPSMinimumLength
so they work under both FIPS and non-FIPS builds
* CI: shard FIPS test suite and extract merge template
Run FIPS tests on PRs that touch go.mod or have 'fips' in the branch
name. Shard FIPS tests across 4 runners matching the normal Postgres
suite. Extract the test result merge logic into a reusable workflow
template to deduplicate the normal and FIPS merge jobs.
* more
* Fix email test helper to respect FIPS minimum password length
* Fix test helpers to respect FIPS minimum password length
* Remove unnecessary "disable strict password requirements" blocks from test helpers
* Fix CodeRabbit review comments on PR #35905
- Add server-test-merge-template.yml to server-ci.yml pull_request.paths
so changes to the reusable merge workflow trigger Server CI validation
- Skip merge-postgres-fips-test-results job when test-postgres-normal-fips
was skipped, preventing failures due to missing artifacts
- Set guest.Password on returned guest in CreateGuestAndClient helper
to keep contract consistent with CreateUserWithClient
- Use shared LowercaseLetters/UppercaseLetters/NUMBERS/PasswordFIPSMinimumLength
constants in NewTestPassword() to avoid drift if FIPS floor changes
https://claude.ai/code/session_01HmE9QkZM3cAoXn2J7XrK2f
* Rename FIPS test artifact to match server-ci-report pattern
The server-ci-report job searches for artifacts matching "*-test-logs",
so rename from postgres-server-test-logs-fips to
postgres-server-fips-test-logs to be included in the report.
---------
Co-authored-by: Claude <noreply@anthropic.com>
* Update docs-impact workflow to keep stale comment instead of deleting
When a re-run of the docs impact analysis determines that documentation
updates are no longer needed, the previous bot comment was deleted while
the Docs/Needed label was kept. This left the label without context.
Instead of deleting the comment, update it to explain that a previous
analysis had flagged the PR but the latest run found no docs impact.
This preserves the audit trail and gives maintainers context to decide
whether to remove the label.
Made-with: Cursor
* Update .github/workflows/docs-impact-review.yml
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
---------
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com>
* Use repo checkout for Dockerfile in server-ci-artifacts build-docker job
The build-docker job was downloading server-build-artifact to get the
Dockerfile and supporting files for every PR. Switch to checking out
server/build/ directly from the repo for external PRs, while keeping
the artifact-based flow for same-repo PRs so that Dockerfile changes
can be tested before merge.
Only upload server-build-artifact when the PR comes from the same repo,
since external PRs no longer use it.
Made-with: Cursor
* retrigger pipelines
* undo previous commit
---------
Co-authored-by: Mattermost Build <build@mattermost.com>
* ci: post correct skip status from within cypress/playwright reusable workflows
The 'Required Status Checks' ruleset requires e2e-test/cypress-full/enterprise
and e2e-test/playwright-full/enterprise on master and release-*.* branches.
When a PR has no E2E-relevant changes, the jobs were silently skipped, leaving
required statuses unset and the PR permanently blocked.
Architecture fix: instead of a separate skip-e2e job in the caller that
hardcodes status context names, the skip logic now lives inside the reusable
workflows that already own and compute those context names.
Changes:
- e2e-tests-cypress.yml: add should_run input (default 'true') + skip job
that uses the dynamically-computed context_name when should_run == 'false'
- e2e-tests-playwright.yml: same pattern
- e2e-tests-ci.yml: change e2e-cypress/e2e-playwright job conditions from
should_run == 'true' to PR_NUMBER != '' (always run when there's a PR),
pass should_run as input to both reusable workflows