ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
#!/usr/bin/env node
|
|
|
|
|
/**
|
|
|
|
|
* shard-split.js — Test shard assignment solver
|
|
|
|
|
*
|
|
|
|
|
* Splits Go test packages across N parallel CI runners using timing data
|
|
|
|
|
* from previous runs. Uses a two-tier strategy:
|
|
|
|
|
*
|
|
|
|
|
* 1. "Light" packages (< HEAVY_MS total runtime): assigned whole to a shard
|
|
|
|
|
* 2. "Heavy" packages (>= HEAVY_MS): individual tests distributed across
|
|
|
|
|
* shards using -run regex filters
|
|
|
|
|
*
|
|
|
|
|
* Timing data sources (in priority order):
|
|
|
|
|
* - gotestsum.json (JSONL): per-test elapsed times from previous run
|
|
|
|
|
* - prev-report.xml (JUnit XML): package-level timing (fallback)
|
|
|
|
|
* - Round-robin: when no timing data exists at all
|
|
|
|
|
*
|
|
|
|
|
* Assignment algorithm: greedy bin-packing (sort by duration desc, assign
|
|
|
|
|
* each item to the shard with lowest current load). Simple and effective
|
|
|
|
|
* for our distribution where 2 packages dominate 84% of runtime.
|
|
|
|
|
*
|
|
|
|
|
* Environment variables:
|
|
|
|
|
* SHARD_INDEX — this runner's index (0-based)
|
|
|
|
|
* SHARD_TOTAL — total number of shards
|
|
|
|
|
*
|
|
|
|
|
* Input files (in working directory):
|
|
|
|
|
* all-packages.txt — newline-separated list of all test packages
|
|
|
|
|
* prev-gotestsum.json — (optional) JSONL timing data from previous run
|
|
|
|
|
* prev-report.xml — (optional) JUnit XML from previous run
|
|
|
|
|
*
|
|
|
|
|
* Output files (in working directory):
|
|
|
|
|
* shard-te-packages.txt — space-separated TE packages for this shard
|
|
|
|
|
* shard-ee-packages.txt — space-separated EE packages for this shard
|
|
|
|
|
* shard-heavy-runs.txt — heavy package runs, one per line: "pkg REGEX"
|
|
|
|
|
*/
|
|
|
|
|
|
|
|
|
|
const fs = require("node:fs");
|
|
|
|
|
const { execSync } = require("node:child_process");
|
|
|
|
|
|
|
|
|
|
const SHARD_INDEX = parseInt(process.env.SHARD_INDEX);
|
|
|
|
|
const SHARD_TOTAL = parseInt(process.env.SHARD_TOTAL);
|
2026-05-08 17:04:32 -04:00
|
|
|
const HEAVY_MS = 600000; // 600s (10 min): packages above this get test-level splitting
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
|
2026-05-14 12:01:49 -04:00
|
|
|
// Packages that should always be split test-by-test, even on a cold cache.
|
|
|
|
|
// Without timing data the splitter falls through to alphabetical round-robin,
|
|
|
|
|
// which places these adjacent on the same runner and overwhelms postgres.
|
|
|
|
|
// Forcing them heavy lets `go test -list` enumerate their tests so the
|
|
|
|
|
// bin-packer can spread them across all shards.
|
|
|
|
|
const KNOWN_HEAVY_PKGS = new Set([
|
|
|
|
|
"github.com/mattermost/mattermost/server/v8/channels/api4",
|
|
|
|
|
"github.com/mattermost/mattermost/server/v8/channels/app",
|
|
|
|
|
]);
|
|
|
|
|
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
if (isNaN(SHARD_INDEX) || isNaN(SHARD_TOTAL) || SHARD_TOTAL < 1) {
|
2026-05-08 17:04:32 -04:00
|
|
|
console.error("ERROR: SHARD_INDEX and SHARD_TOTAL must be set");
|
|
|
|
|
process.exit(1);
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
|
2026-05-08 17:04:32 -04:00
|
|
|
const allPkgs = fs
|
|
|
|
|
.readFileSync("all-packages.txt", "utf8")
|
|
|
|
|
.trim()
|
|
|
|
|
.split("\n")
|
|
|
|
|
.filter(Boolean);
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
if (allPkgs.length === 0) {
|
2026-05-08 17:04:32 -04:00
|
|
|
console.error("WARNING: No test packages found in all-packages.txt");
|
|
|
|
|
process.exit(0);
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const pkgTimes = {};
|
|
|
|
|
const testTimes = {}; // "pkg::TestName" -> ms
|
|
|
|
|
|
|
|
|
|
// ── Parse gotestsum.json (JSONL) for per-test timing ──
|
|
|
|
|
// Each line is a JSON event; we want "pass" events with Elapsed times.
|
|
|
|
|
if (fs.existsSync("prev-gotestsum.json")) {
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log("::group::Parsing gotestsum.json timing data");
|
|
|
|
|
const lines = fs.readFileSync("prev-gotestsum.json", "utf8").split("\n");
|
|
|
|
|
for (const line of lines) {
|
|
|
|
|
if (!line.includes('"pass"')) continue;
|
|
|
|
|
try {
|
|
|
|
|
const d = JSON.parse(line);
|
|
|
|
|
if (!d.Test || !d.Package) continue;
|
|
|
|
|
const elapsed = Math.round((d.Elapsed || 0) * 1000);
|
|
|
|
|
// Aggregate package time from test pass events
|
|
|
|
|
pkgTimes[d.Package] = (pkgTimes[d.Package] || 0) + elapsed;
|
|
|
|
|
// Top-level test name (use max elapsed for parent vs subtests)
|
|
|
|
|
const top = d.Test.split("/")[0];
|
|
|
|
|
const key = d.Package + "::" + top;
|
|
|
|
|
testTimes[key] = Math.max(testTimes[key] || 0, elapsed);
|
|
|
|
|
} catch (e) {
|
|
|
|
|
// Skip malformed lines
|
|
|
|
|
}
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log(
|
|
|
|
|
`gotestsum.json: ${Object.keys(pkgTimes).length} packages, ${Object.keys(testTimes).length} tests`,
|
|
|
|
|
);
|
|
|
|
|
console.log("::endgroup::");
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ── Fallback: parse JUnit XML for package-level timing ──
|
|
|
|
|
if (Object.keys(pkgTimes).length === 0 && fs.existsSync("prev-report.xml")) {
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log("::group::Parsing JUnit XML timing data (fallback)");
|
|
|
|
|
const xml = fs.readFileSync("prev-report.xml", "utf8");
|
|
|
|
|
for (const m of xml.matchAll(/<testsuite[^>]*>/g)) {
|
|
|
|
|
const name = m[0].match(/name="([^"]+)"/)?.[1];
|
|
|
|
|
const time = m[0].match(/\btime="([^"]+)"/)?.[1];
|
|
|
|
|
if (name && time) {
|
|
|
|
|
pkgTimes[name] =
|
|
|
|
|
(pkgTimes[name] || 0) + Math.round(parseFloat(time) * 1000);
|
|
|
|
|
}
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log(
|
|
|
|
|
`JUnit XML: ${Object.keys(pkgTimes).length} packages (no per-test data)`,
|
|
|
|
|
);
|
|
|
|
|
console.log("::endgroup::");
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
const hasTimingData = Object.keys(pkgTimes).length > 0;
|
|
|
|
|
const hasTestTiming = Object.keys(testTimes).length > 0;
|
|
|
|
|
|
|
|
|
|
// ── Identify heavy packages ──
|
2026-05-14 12:01:49 -04:00
|
|
|
// Split at test level for packages above HEAVY_MS (requires per-test timing)
|
|
|
|
|
// AND for the KNOWN_HEAVY_PKGS list (which uses go test -list discovery
|
|
|
|
|
// to enumerate tests when no timing cache exists).
|
|
|
|
|
//
|
|
|
|
|
// Both checks gate on allPkgs membership so stale entries from the cached
|
|
|
|
|
// pkgTimes (renamed/deleted packages from a prior run) can't end up in
|
|
|
|
|
// heavyPkgs — otherwise the post-discovery fallback would emit them as
|
|
|
|
|
// whole-package items for nonexistent packages.
|
|
|
|
|
const allPkgsSet = new Set(allPkgs);
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
const heavyPkgs = new Set();
|
|
|
|
|
if (hasTestTiming) {
|
2026-05-08 17:04:32 -04:00
|
|
|
for (const [pkg, ms] of Object.entries(pkgTimes)) {
|
2026-05-14 12:01:49 -04:00
|
|
|
if (ms > HEAVY_MS && allPkgsSet.has(pkg)) heavyPkgs.add(pkg);
|
2026-05-08 17:04:32 -04:00
|
|
|
}
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
2026-05-14 12:01:49 -04:00
|
|
|
for (const pkg of allPkgs) {
|
|
|
|
|
if (KNOWN_HEAVY_PKGS.has(pkg)) heavyPkgs.add(pkg);
|
|
|
|
|
}
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
if (heavyPkgs.size > 0) {
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log("Heavy packages (test-level splitting):");
|
|
|
|
|
for (const p of heavyPkgs) {
|
2026-05-14 12:01:49 -04:00
|
|
|
const t = pkgTimes[p];
|
|
|
|
|
const label = t ? `${(t / 1000).toFixed(0)}s` : "no-timing";
|
|
|
|
|
console.log(` ${label} ${p.split("/").pop()}`);
|
2026-05-08 17:04:32 -04:00
|
|
|
}
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ── Build work items ──
|
|
|
|
|
// Each item is either a whole package ("P") or a single test from a heavy package ("T")
|
|
|
|
|
const items = [];
|
|
|
|
|
for (const pkg of allPkgs) {
|
2026-05-08 17:04:32 -04:00
|
|
|
if (heavyPkgs.has(pkg)) {
|
|
|
|
|
// Split into individual test items
|
|
|
|
|
const tests = Object.entries(testTimes)
|
|
|
|
|
.filter(([k]) => k.startsWith(pkg + "::"))
|
|
|
|
|
.map(([k, ms]) => ({ ms, type: "T", pkg, test: k.split("::")[1] }));
|
|
|
|
|
if (tests.length > 0) {
|
|
|
|
|
items.push(...tests);
|
|
|
|
|
}
|
2026-05-14 12:01:49 -04:00
|
|
|
// If no per-test timing exists, the discovery step below enumerates
|
|
|
|
|
// tests via `go test -list`. A final fallback to whole-package is
|
|
|
|
|
// added after discovery for packages where both lookups failed.
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
} else {
|
2026-05-08 17:04:32 -04:00
|
|
|
items.push({ ms: pkgTimes[pkg] || 1, type: "P", pkg });
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// ── Discover new/renamed tests in heavy packages ──
|
|
|
|
|
// Tests not in the timing cache won't appear in any shard's -run regex,
|
|
|
|
|
// silently skipping them. Discover current test names at runtime and
|
|
|
|
|
// assign any cache-missing tests to the least-loaded shard.
|
|
|
|
|
if (heavyPkgs.size > 0) {
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log("::group::Discovering new tests in heavy packages");
|
|
|
|
|
for (const pkg of heavyPkgs) {
|
|
|
|
|
const cachedTests = new Set(
|
|
|
|
|
Object.keys(testTimes)
|
|
|
|
|
.filter((k) => k.startsWith(pkg + "::"))
|
|
|
|
|
.map((k) => k.split("::")[1]),
|
|
|
|
|
);
|
|
|
|
|
try {
|
|
|
|
|
const out = execSync(`go test -list '.*' ${pkg} 2>/dev/null`, {
|
|
|
|
|
encoding: "utf8",
|
|
|
|
|
timeout: 300000,
|
|
|
|
|
});
|
|
|
|
|
const currentTests = out
|
|
|
|
|
.split("\n")
|
|
|
|
|
.map((l) => l.trim())
|
|
|
|
|
.filter((l) => /^Test[A-Z]/.test(l));
|
|
|
|
|
let newCount = 0;
|
|
|
|
|
for (const t of currentTests) {
|
|
|
|
|
if (!cachedTests.has(t)) {
|
|
|
|
|
// Assign a small default duration so it gets picked up
|
|
|
|
|
items.push({ ms: 1000, type: "T", pkg, test: t });
|
|
|
|
|
newCount++;
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if (newCount > 0) {
|
|
|
|
|
console.log(
|
|
|
|
|
` ${pkg.split("/").pop()}: ${newCount} new test(s) not in cache`,
|
|
|
|
|
);
|
|
|
|
|
}
|
|
|
|
|
} catch (e) {
|
|
|
|
|
// go test -list can fail for packages whose TestMain requires a DB
|
|
|
|
|
// connection (e.g. sqlstore) because the GitHub runner cannot reach the
|
|
|
|
|
// docker-compose postgres network. Log a warning and fall back to
|
|
|
|
|
// treating the package as a whole unit rather than failing all shards.
|
|
|
|
|
console.error(
|
|
|
|
|
`::warning::${pkg.split("/").pop()}: go test -list failed — treating as whole package (new tests may be skipped this run). ${e.message}`,
|
|
|
|
|
);
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
}
|
2026-05-14 12:01:49 -04:00
|
|
|
// Ensure every heavy package has at least one item. A package can reach
|
|
|
|
|
// this point with zero items if it has no per-test timing AND `go test
|
|
|
|
|
// -list` failed (e.g. sqlstore on a cold cache).
|
|
|
|
|
for (const pkg of heavyPkgs) {
|
|
|
|
|
const hasItems = items.some((it) => it.pkg === pkg);
|
|
|
|
|
if (!hasItems) {
|
|
|
|
|
console.log(
|
|
|
|
|
` ${pkg.split("/").pop()}: no per-test data, running as whole package`,
|
|
|
|
|
);
|
|
|
|
|
items.push({ ms: pkgTimes[pkg] || 1, type: "P", pkg });
|
|
|
|
|
}
|
|
|
|
|
}
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log("::endgroup::");
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sort descending by duration for greedy bin-packing
|
|
|
|
|
items.sort((a, b) => b.ms - a.ms);
|
|
|
|
|
|
|
|
|
|
// ── Greedy bin-packing assignment ──
|
|
|
|
|
const shards = Array.from({ length: SHARD_TOTAL }, () => ({
|
2026-05-08 17:04:32 -04:00
|
|
|
load: 0,
|
|
|
|
|
whole: [],
|
|
|
|
|
heavy: {},
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}));
|
|
|
|
|
|
2026-05-14 12:01:49 -04:00
|
|
|
if (!hasTimingData && heavyPkgs.size === 0) {
|
|
|
|
|
// Round-robin fallback only when we have *no* signal — no timing cache
|
|
|
|
|
// and no known-heavy packages to test-level-split. With heavyPkgs we
|
|
|
|
|
// can still bin-pack: discovered tests (ms=1000 each) drive the
|
|
|
|
|
// distribution and whole-package items (ms=1) fill in evenly.
|
2026-05-08 17:04:32 -04:00
|
|
|
console.log("No timing data — using round-robin");
|
|
|
|
|
allPkgs.forEach((pkg, i) => {
|
|
|
|
|
shards[i % SHARD_TOTAL].whole.push(pkg);
|
|
|
|
|
});
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
} else {
|
2026-05-08 17:04:32 -04:00
|
|
|
for (const item of items) {
|
|
|
|
|
// Find shard with minimum current load
|
|
|
|
|
const min = shards.reduce(
|
|
|
|
|
(m, s, i) => (s.load < shards[m].load ? i : m),
|
|
|
|
|
0,
|
|
|
|
|
);
|
|
|
|
|
shards[min].load += item.ms;
|
|
|
|
|
if (item.type === "P") {
|
|
|
|
|
shards[min].whole.push(item.pkg);
|
|
|
|
|
} else {
|
|
|
|
|
if (!shards[min].heavy[item.pkg]) shards[min].heavy[item.pkg] = [];
|
|
|
|
|
shards[min].heavy[item.pkg].push(item.test);
|
|
|
|
|
}
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ── Report shard assignments ──
|
|
|
|
|
console.log("::group::Shard assignment");
|
|
|
|
|
for (let i = 0; i < SHARD_TOTAL; i++) {
|
2026-05-08 17:04:32 -04:00
|
|
|
const s = shards[i];
|
|
|
|
|
const hRuns = Object.keys(s.heavy).length;
|
|
|
|
|
const hTests = Object.values(s.heavy).reduce((n, a) => n + a.length, 0);
|
|
|
|
|
const marker = i === SHARD_INDEX ? " ← THIS SHARD" : "";
|
|
|
|
|
console.log(
|
|
|
|
|
`Shard ${i}: ${(s.load / 1000).toFixed(1)}s | ${s.whole.length} pkgs` +
|
|
|
|
|
(hRuns > 0 ? `, ${hRuns} heavy splits (${hTests} tests)` : "") +
|
|
|
|
|
marker,
|
|
|
|
|
);
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
}
|
|
|
|
|
console.log("::endgroup::");
|
|
|
|
|
|
|
|
|
|
// ── Write output for this shard ──
|
|
|
|
|
const myShard = shards[SHARD_INDEX];
|
|
|
|
|
const te = myShard.whole.filter((p) => !p.includes("/enterprise/")).join(" ");
|
|
|
|
|
const ee = myShard.whole.filter((p) => p.includes("/enterprise/")).join(" ");
|
|
|
|
|
|
|
|
|
|
fs.writeFileSync("shard-te-packages.txt", te);
|
|
|
|
|
fs.writeFileSync("shard-ee-packages.txt", ee);
|
|
|
|
|
|
|
|
|
|
// Heavy package runs: one line per run as "pkg REGEX"
|
|
|
|
|
const heavyRuns = Object.entries(myShard.heavy).map(([pkg, tests]) => {
|
2026-05-08 17:04:32 -04:00
|
|
|
const regex = tests.map((t) => "^" + t + "$").join("|");
|
|
|
|
|
return pkg + " " + regex;
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
});
|
|
|
|
|
fs.writeFileSync("shard-heavy-runs.txt", heavyRuns.join("\n"));
|
|
|
|
|
|
|
|
|
|
console.log(
|
2026-05-08 17:04:32 -04:00
|
|
|
`Light packages: ${myShard.whole.length} (${te.split(" ").filter(Boolean).length} TE, ${ee.split(" ").filter(Boolean).length} EE)`,
|
ci: shard server Postgres tests into 4 parallel runners (#35739)
* ci: add test sharding plumbing to server CI
Add infrastructure for upcoming test sharding without changing behavior:
- Add shard-index and shard-total inputs to server-test-template.yml
(defaults preserve existing single-runner behavior)
- Add timing cache restore step (activates only when shard-total > 1)
- Add merge-postgres-test-results job to server-ci.yml that:
- Merges JUnit XML reports from shard artifacts
- Saves timing data cache for future shard balancing
- Handles both single-artifact and multi-shard scenarios
- Add .gitignore entries for timing cache and shard work files
Co-authored-by: Claude <claude@anthropic.com>
* ci: shard server Postgres tests into 4 parallel runners
Extract sharding logic into standalone, tested scripts and enable
4-shard parallel test execution for server Postgres CI:
Scripts:
- server/scripts/shard-split.js: Node.js bin-packing solver that
assigns test packages to shards using timing data from previous runs.
Two-tier strategy: light packages (<2min) whole, heavy packages
(api4, app) split at individual test level.
- server/scripts/run-shard-tests.sh: Multi-run wrapper that calls
gotestsum directly for each package group with -run regex filters.
- server/scripts/shard-split.test.js: 8 test cases covering round-robin
fallback, timing-based balancing, heavy package splitting, JUnit XML
fallback, and enterprise package separation.
Workflow changes:
- server-test-template.yml: Add shard splitting step that discovers test
packages and runs the solver. Modified Run Tests step to use wrapper
script when sharding is active.
- server-ci.yml: Add 4-shard matrix to test-postgres-normal. Update
merge job artifact patterns for shard-specific names.
Performance: 7.2 min with timing cache vs 62.5 min baseline = 88%
wall-time improvement. First run without cache uses JUnit XML fallback
or round-robin, then populates the cache for subsequent runs.
Co-authored-by: Claude <claude@anthropic.com>
* fix: raise heavy package threshold to 5 min to preserve test isolation
sqlstore integrity tests scan the entire database and fail when other
packages' test data is present. At 182s, sqlstore was just over the
120s threshold and getting split at test level. Raising to 300s keeps
only api4 (~38 min) and app (~15 min) as heavy — where the real
sharding gains are — while sqlstore, elasticsearch, etc. stay whole
and maintain their test isolation guarantees.
Co-authored-by: Claude <claude@anthropic.com>
* ci: only save test timing cache on default branch
PR branches always restore from master's timing cache via restore-keys
prefix matching. Timing data is stable day-to-day so this eliminates
cache misses on first PR runs and reduces cache storage.
Co-authored-by: Claude <claude@anthropic.com>
* ci: skip FIPS tests on PRs (enterprise CI handles compile check)
Per review feedback: the enterprise CI already runs a FIPS compile
check on every PR. Running the full FIPS test suite on PRs is redundant
since it uses the identical test suite as non-FIPS — the only
FIPS-specific failure mode is a build failure from non-approved crypto
imports, which the enterprise compile check catches.
Full FIPS tests continue to run on every push to master.
Co-authored-by: Claude <claude@anthropic.com>
* fix: address review feedback on run-shard-tests.sh
- Remove set -e so all test runs execute even if earlier ones fail;
track failures and exit with error at the end (wiggin77)
- Remove unused top-level COVERAGE_FLAG variable (wiggin77)
- Fix RUN_IDX increment position so report, json, and coverage files
share the same index (wiggin77)
- Update workflow comment: heavy threshold is 5 min, not 2 min (wiggin77)
Co-authored-by: Claude <claude@anthropic.com>
* style: use node: prefix for built-in fs module in shard-split.js
Co-authored-by: Claude <claude@anthropic.com>
* fix: avoid interpolating file paths into generated shell script
Read shard package lists from files at runtime instead of interpolating
them into the generated script via printf. This prevents theoretical
shell metacharacter injection from directory names, as flagged by
DryRun Security.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): rename merged artifact to match server-ci-report glob
The merged artifact was named postgres-server-test-logs-merged which
does not match the *-test-logs pattern in server-ci-report.yml,
causing Postgres test results to be missing from PR/commit reports.
Also pins junit-report-merger to exact version 7.0.0 for supply chain
safety.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): pass RACE_MODE env into Docker container
RACE_MODE was set on the host runner but never included in the docker
run --env list. The light-package path worked because the heredoc
expanded on the host, but run-shard-tests.sh reads RACE_MODE at
runtime inside the container where it was unset. This caused heavy
packages (api4, app) to silently lose -race detection.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): discover new tests in heavy packages not in timing cache
Tests not present in the timing cache (newly added or renamed) would
not appear in any shard -run regex, causing them to silently skip.
After building items from the cache, run go test -list to discover
current test names and assign any cache-missing tests to shards via
the normal bin-packing algorithm with a small default duration.
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add missing line continuation backslash in docker run
The previous --env FIPS_ENABLED line was missing a trailing backslash
after adding --env RACE_MODE, causing docker run to see a truncated
command and fail with "requires at least 1 argument".
Co-authored-by: Claude <claude@anthropic.com>
* fix(ci): add setup-go step for shard test discovery
go test -list in shard-split.js runs on the host runner via execSync,
but Go is only available inside the Docker container. Without this
step, every invocation fails silently and new-test discovery is a
no-op. Adding actions/setup-go before the shard split step ensures
the Go toolchain is available on the host.
Co-authored-by: Claude <claude@anthropic.com>
---------
Co-authored-by: Claude <claude@anthropic.com>
2026-03-26 15:07:40 -04:00
|
|
|
);
|
|
|
|
|
console.log(`Heavy package runs: ${heavyRuns.length}`);
|