* ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
* updated go to version 1.25.8 (#35817)
* updated go to version 1.25.8
* updated gotestsum version to work with go 1.25.8
go 1.25 does not work with indirect tools 0.11 dependency pulled by
gotestsum.
* Use sync.WaitGroup.Go to simplify goroutine creation
Replace the wg.Add(1) + go func() { defer wg.Done() }() pattern with
wg.Go(), which was introduced in Go 1.25.
* pushes fips image on workflow dispatch to allow fips test to run on go version update
* fix new requirements for FIPS compliance imposed on updating to go 1.25.8
* updates openssl symbol check for library shipped with FIPS new versions
go-openssl v2 shipped with FIPS versions starting from 1.25 uses mkcgo to generate
bindings causing symbol names to be different.
* removes temp workflow-dispatch condition
* keep versions out of agents md file
* upgrade golangci-lint (#35845)
* test: clean up channel store data after TestChannelStore (#36066)
TestChannelStore sub-tests create channels, members, and team members
using fake TeamIds and UserIds (model.NewId() for non-existent rows).
These records are left in the database and cause integrity tests
(TestCheck*) running in the same binary to fail their full-table scans.
Register a t.Cleanup on TestChannelStore that purges the affected
tables entirely. A blanket purge is safe: the schema enforces no FK
constraints, and every test suite creates its own data independently.
* Fix command injection in server-test-template workflow (#36080)
Replace the unquoted heredoc (which embedded GITHUB_HEAD_REF into a
generated script) with a cp of the existing run-shard-tests.sh, which
already handles the light-only case. Pass BUILD_NUMBER and TEST_TARGET
as explicit docker env vars instead of interpolating them into script
content.
* fix(ci): restore testname format in sharded gotestsum runs (#36078)
run-shard-tests.sh called gotestsum directly without --format, so it
fell back to gotestsum's default (pkgname) instead of the testname
format set by the Makefile. Pass --format "${GOTESTSUM_FORMAT:-testname}"
to match the Makefile default.
Co-authored-by: Mattermost Build <build@mattermost.com>
* fix(lint): fix pre-existing golangci-lint v2.11.4 issues
Fix misspelling in comment and redundant nil check flagged by the
upgraded linter.
* ci: use golang image for test runner on release-10.11
mattermost-build-server images are not built for release branches.
Use the official golang image which is always available for any Go version.
* ci: use mattermost/mattermost-build-server for release-10.11
The mattermostdevelopment/ images are only built for master.
The production mattermost/ images are built for release branches.
* ci: use mattermost/mattermost-build-server in mmctl test template
The mattermostdevelopment/ images are only built for master.
The production mattermost/ images are built for release branches.
---------
Co-authored-by: Pavel Zeman <pavel.zeman@mattermost.com>
Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Carlos Garcia <carlos.garcia@mattermost.com>
Co-authored-by: Mattermost Build <build@mattermost.com>
MySQL 5.7 is at end of life.
https://mattermost.atlassian.net/browse/MM-55589
```release-note
We bump up minimum MySQL version to be 8.0.0
```
Co-authored-by: Mattermost Build <build@mattermost.com>
Co-authored-by: Ibrahim Serdar Acikgoz <serdaracikgoz86@gmail.com>
* fix openApi vetting
The underlying mattermost-govet tool effectively hasn't been called for some time, as we weren't checking out and building the spec files to pass. Now that hte API is in the monorepo, build it locally and check against it.
Unfortunately, our API documentation isn't up-to-date, and this PR isn't fixing that. For now, add a discrete `make vet-api` and workflow that won't block the build until the API documentation is back in sync and can be merged into the existing `make vet` directive.
* mattermost-govet: use upstream@new
* fix missing /api/v4 prefix for commands autocomplete suggestion
* document /api/v4/ldap/users/{user_id}/group_sync_memberships
* document /api/v4/groups/{group_id}/restore
* fix /files/{file_id}/public actually at root
* document /api/v4/users/invalid_emails
* fix SetThreadUnreadByPostId
* Revert "fix SetThreadUnreadByPostId"
This reverts commit b16bcc8044.
* Revert "Revert "fix SetThreadUnreadByPostId""
This reverts commit 8bda05dc8a.
* workaround undocumented API endpoints
* remove unnecessary whiteline
* ignore go tool output
---------
Co-authored-by: Mattermost Build <build@mattermost.com>
* disable coverage
This reduces runtime of the server test suite from ~30m to ~10m, and as far as I can see: we discarded the coverage output anyway.
* allow morph 60s to migrate when running tests
* scripts/test.sh: drop COVERMODE
Stop generating coverage data when running unit tests. It's likely we'll want this data back at some point, but for now it's unused and removing simplifies invoking tests for developers.
* scripts/test.sh: remove cleanup steps
* scripts/test.sh: drop TESTS parameter
* scripts/test.sh: drop TESTFLAGS parameter
* switch to gotestsum
It was a good decision in hindsight to keep the public module as 0.x
because this would have been a breaking change again.
https://mattermost.atlassian.net/browse/MM-53032
```release-note
Changed the Go module path from github.com/mattermost/mattermost-server/server/v8 to github.com/mattermost/mattermost/server/v8.
For the public facing module, it's path is also changed from github.com/mattermost/mattermost-server/server/public to github.com/mattermost/mattermost/server/public
```
* disable coverage
This reduces runtime of the server test suite from ~30m to ~10m, and as far as I can see: we discarded the coverage output anyway.
* allow morph 60s to migrate when running tests
* Includes mmctl into the mono-repo
* Update to use the new public module paths
* Adds docs check to the mmctl CI
* Fix public utils import path
* Tidy up modules
* Fix linter
* Update CI tasks to use the new file structure
* Update CI references
Going from `release-7.9` to `release-7.10,` has introduced a bug, in the download_mmctl_release.sh script, in which it is unable to o match version numbers with multiple digits
Specifically, `release-7.10` returned `release-7.1` downloading wrong mmctl binary.
We are refactoring the regular expression used, to match version numbers with multiple digits.
Ticket: https://mattermost.atlassian.net/browse/CLD-5682
* Add ESR upgrade migration and CI job to verify it
The script was generated as a simple concatenation of migrations in the
interval [54, 101] through:
files=`for i in $(seq 54 101); do ls mysql/$(printf "%06d" $i)*up.sql; done`
tail -n +1 $files > ../esrupgrades/esr.5.37-7.8.mysql.up.sql
The CI job runs the migration both through the server and the script,
and for now uploads the dumps generated for manual inspection. An
automatic check for differences is still needed.
* Remove debug print in script
* Fix idx_uploadsessions_type creation
* Ignore tables db_lock and db_migration on dump
* Split workflow in two parallel jobs
* Diff dumps and upload the result
* Add cleanup script
* Use DELIMITER in the script to use mysql CLI
This allows us to remove the complexity of using a different Go script
inside a Docker image.
* Standardize Roles between migrations
Document and cleanup code.
* Upload diff only if it is not empty
* Trigger action only when related files change
* Add a global timeout to the job
* Generalize ESR to ESR upgrade action (#22573)
* Generalize action
* Use logs to ensure migrations are finished
* Add migrations from 5.37 to 6.3
* Remove tables in cleanup script, not through dump
* Add initial-version input to common action
* Add migration from 6.3 to 7.8
* Remove action debug line
* ESR Upgrade: One procedure per table in the v5.37 > v7.8 upgrade script (#22590)
* Squash Users-related migrations in one query
* Squash Drafts-related migrations in one query
* Squash UploadSessions-related migrations in one query
* Squash Threads-related migrations in one query
* Squash Channels-related migrations in one query
* Squash ChannelMembers-related migrations in one query
* Squash Jobs-related migrations in one query
* Squash Sessions-related migrations in one query
* Squash Status-related migrations in one query
* Squash Posts-related migrations in one query
* Squash TeamMembers-related migrations in one query
* Squash Schemes-related migrations in one query
* Squash CommandWebhooks-related migrations in one query
* Squash OAuthApps-related migrations in one query
* Squash Teams-related migrations in one query
* Squash Reactions-related migrations in one query
* Squash PostReminders-related migrations in one query
* Adapt ThreadMemberships migration to unified style
* Adapt LinkMetadata migrations to unified style
* Adapt GroupChannels migration to unified style
* Adapt PluginKVStore migration to unified style
* Adapt UserGroups migration to unified style
* Adapt FileInfo migration to unified style
* Adapt SidebarCategories migration to unified style
* Remove blank line
* Use tabs everywhere
* Wrap every procedure with log statements
* Remove space before parentheses in procedure call
* Remove spurious extra line
* Merge two equal consecutive conditionals
* Avoid the double list of conditions/queries
* Fix variable name
* Remove outdated comment
* Add a preprocess phase with corresponding scripts
* Join all preprocess scripts setting ExpiresAt to 0
This preprocessing is something we should always do, no matter the input
DB, so we can use a common script for all cases instead of repeating the
same code in multiple files.
* Add system-bot if it does not exist
* Cleanup the ProductNoticeViewState table
* Fix SQL
* Move esrupgrades directory under server/
* Update paths in Github action
* Fix trigger path for CI
https://mattermost.atlassian.net/browse/MM-52079
```release-note
We upgrade the module version to 8.0. The new module path is github.com/mattermost-server/server/v8.
```
Co-authored-by: Doug Lauder <wiggin77@warpmail.net>