Commit graph

3 commits

Author SHA1 Message Date
Jesse Hallam
fff3820ce4
fix(ci): restore testname format in sharded gotestsum runs (#36078)
run-shard-tests.sh called gotestsum directly without --format, so it
fell back to gotestsum's default (pkgname) instead of the testname
format set by the Makefile. Pass --format "${GOTESTSUM_FORMAT:-testname}"
to match the Makefile default.

Co-authored-by: Mattermost Build <build@mattermost.com>
2026-04-14 15:13:39 -03:00
Pavel Zeman
860df69621
ci: re-enable server test coverage with 4-shard parallelism (#35743)
* ci: re-enable server test coverage with 4-shard parallelism

The test-coverage job was disabled due to OOM failures when running all
tests with coverage instrumentation in a single process. Re-enable it
by distributing the workload across 4 parallel runners using the shard
infrastructure from the sharding PRs.

Changes:
- Replace disabled single-runner test-coverage with 4-shard matrix
- Add merge-coverage job to combine per-shard cover.out files
- Upload merged coverage to Codecov with server flag
- Skip per-shard Codecov upload when sharding is active
- Add coverage profile merging to run-shard-tests.sh for multi-run shards
- Restore original condition: skip coverage on release branch PRs
- Keep fullyparallel=true (fast within each shard)
- Keep continue-on-error=true (coverage never blocks PRs)

Co-authored-by: Claude <claude@anthropic.com>

* fix: disable fullyparallel for coverage shards

t.Parallel() + t.Setenv() panics kill entire test binaries under
fullyparallel mode. With 4-shard splitting, serial execution within
each shard should still be fast enough (~15 min). We can re-enable
fullyparallel once the incompatible tests are fixed.

Co-authored-by: Claude <claude@anthropic.com>

* fix: add checkout to coverage merge job for Codecov file mapping

Codecov needs the source tree to map coverage data to files.
Without checkout, the upload succeeds but reports 0% coverage
because it can't associate cover.out lines with source files.

Co-authored-by: Claude <claude@anthropic.com>

* ci: add codecov.yml and retain merged coverage artifact

Add codecov.yml with:
- Project coverage: track against parent commit, 1% threshold, advisory
- Patch coverage: 50% target for new code, advisory (warns, doesn't block)
- Ignore generated code (retrylayer, timerlayer, serial_gen, mocks,
  storetest, plugintest, searchtest) — these inflate the denominator
  from 146K to 100K statements, rebasing coverage from 36% to 53%
- PR comments on coverage changes with condensed layout

Save merged cover.out as artifact with 30-day retention (~3.5MB/run).
90-day retention was considered (~6.3GB total vs ~2.1GB at 30 days)
but deferred to keep storage costs low.

#### Release Note
```release-note
NONE
```

Co-authored-by: Claude <claude@anthropic.com>

* ci: add codecov.yml to exclude generated code and enable PR comments (#35748)

* ci: add codecov.yml to exclude generated code and enable PR comments

Add Codecov configuration to improve coverage signal quality:

- Exclude generated code from coverage denominator:
  - store/retrylayer (~10k stmts, auto-generated retry wrappers)
  - store/timerlayer (~14k lines, auto-generated timing wrappers)
  - *_serial_gen.go (serialization codegen)
  - **/mocks (mockery-generated mocks)
- Exclude test infrastructure:
  - store/storetest (~63k lines, test helpers not production code)
  - plugin/plugintest (plugin test helpers)
- Exclude thin wrappers:
  - model/client4.go (~4k stmts, HTTP client methods tested via integration)
- Enable PR comments with condensed layout
- Set project threshold at 0.5% drop tolerance
- Set patch target at 60% for new/changed lines

This rebases the effective coverage metric from ~33.8% to ~43% by
removing ~50k non-production statements from the denominator, giving
a more accurate picture of actual test coverage.

Co-authored-by: Claude <claude@anthropic.com>

* Update codecov.yml

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Jesse Hallam <jesse.hallam@gmail.com>

* fix: bump upload-artifact to v7 and add client4.go to codecov ignore

- Align upload-artifact pin with the rest of the workflow (v4 → v7)
- Add model/client4.go to codecov.yml ignore list as documented in PR description

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): address Jesse review feedback on coverage sharding

- Remove client4.go from codecov ignore list (coverage is meaningful)
- Remove historical comment block above test-coverage job
- Set fullyparallel back to true (safe per-shard since each runs
  different packages; parallel test fixes tracked in #35751)
- Replace merge-coverage job with per-shard Codecov uploads using
  flags parameter; configure after_n_builds: 4 so Codecov waits for
  all shards before reporting status
- Add clarifying comment in run-shard-tests.sh explaining intra-shard
  coverage merge (multiple gotestsum runs) vs cross-shard merge
  (handled natively by Codecov)
- Simplify codecov.yml: remove verbose comments, use informational
  status checks, streamlined ignore list

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): set fullyparallel back to false for coverage shards

Coverage shards 1-3 failed with hundreds of test failures because
fullyparallel: true causes panics and races in tests that use
t.Setenv, os.Setenv, and os.Chdir without parallel-safe alternatives.

The parallel-safety fixes are tracked in a separate PR chain:
- #35746: t.Setenv → test hooks
- #35749: os.Setenv → parallel-safe alternatives
- #35750: os.Chdir → t.Chdir
- #35751: flip fullyparallel: true (final step)

Once that chain merges, fullyparallel can be enabled for coverage too.

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): split fullyparallel and allow-failure into separate inputs

Previously fullyparallel controlled both parallel test execution AND
continue-on-error, meaning disabling parallelism also made coverage
failures blocking. Split into two independent inputs:

- fullyparallel: controls ENABLE_FULLY_PARALLEL_TESTS (test execution)
- allow-failure: controls continue-on-error (advisory vs blocking)

Coverage shards now run with fullyparallel: true (Claudio's original
approach) and allow-failure: true (failures don't block PRs until
parallel-safety fixes land in #35746#35751).

Co-authored-by: Claude <claude@anthropic.com>

* ci: use per-flag after_n_builds for server and webapp coverage

Replace the global after_n_builds: 2 with per-flag values:
- server: after_n_builds: 4 (one per shard)
- webapp: after_n_builds: 1 (single merged upload)

Tag the webapp Codecov upload with flags: webapp so each flag
independently waits for its expected upload count. This prevents
Codecov from firing notifications with incomplete data when the
webapp upload arrives before all server shards complete.

Addresses review feedback from @esarafianou.

Co-authored-by: Claude <claude@anthropic.com>

* fix: consolidate codecov config into .github/codecov.yml

Move all codecov configuration into the existing .github/codecov.yml
instead of introducing a duplicate file at the repo root. Merges
improvements from the root file (broader ignore list, informational
statuses, require_ci_to_pass: false) while preserving the webapp flag
from the original config. Updates after_n_builds to 5 (4 server + 1
webapp).

Co-authored-by: Claude <claude@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
Co-authored-by: Jesse Hallam <jesse.hallam@gmail.com>
2026-04-09 15:27:50 -04:00
Pavel Zeman
51232b58ef
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI

Add infrastructure for upcoming test sharding without changing behavior:

- Add shard-index and shard-total inputs to server-test-template.yml
  (defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
  - Merges JUnit XML reports from shard artifacts
  - Saves timing data cache for future shard balancing
  - Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files

Co-authored-by: Claude <claude@anthropic.com>

* ci: shard server Postgres tests into 4 parallel runners

Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:

Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
  assigns test packages to shards using timing data from previous runs.
  Two-tier strategy: light packages (<2min) whole, heavy packages
  (api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
  gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
  fallback, timing-based balancing, heavy package splitting, JUnit XML
  fallback, and enterprise package separation.

Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
  packages and runs the solver. Modified Run Tests step to use wrapper
  script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
  merge job artifact patterns for shard-specific names.

Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.

Co-authored-by: Claude <claude@anthropic.com>

* fix: raise heavy package threshold to 5 min to preserve test isolation

sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.

Co-authored-by: Claude <claude@anthropic.com>

* ci: only save test timing cache on default branch

PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.

Co-authored-by: Claude <claude@anthropic.com>

* ci: skip FIPS tests on PRs (enterprise CI handles compile check)

Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.

Full FIPS tests continue to run on every push to master.

Co-authored-by: Claude <claude@anthropic.com>

* fix: address review feedback on run-shard-tests.sh

- Remove set -e so all test runs execute even if earlier ones fail;
  track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
  share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)

Co-authored-by: Claude <claude@anthropic.com>

* style: use node: prefix for built-in fs module in shard-split.js

Co-authored-by: Claude <claude@anthropic.com>

* fix: avoid interpolating file paths into generated shell script

Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): rename merged artifact to match server-ci-report glob

The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.

Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): pass RACE_MODE env into Docker container

RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): discover new tests in heavy packages not in timing cache

Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): add missing line continuation backslash in docker run

The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".

Co-authored-by: Claude <claude@anthropic.com>

* fix(ci): add setup-go step for shard test discovery

go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.

Co-authored-by: Claude <claude@anthropic.com>

---------

Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00