Adjust cost model for HashAgg that spills to disk.

Tomas Vondra observed that the IO behavior for HashAgg tends to be worse than for Sort. Penalize HashAgg IO costs accordingly. Also, account for the CPU effort of spilling the tuples and reading them back. Discussion: https://postgr.es/m/20200906212112.nzoy5ytrzjjodpfh@development Reviewed-by: Tomas Vondra Backpatch-through: 13
2026-04-26 08:37:12 -04:00 · 2020-09-07 13:31:59 -07:00 · 2020-09-07 13:31:59 -07:00 · b61d048e0d
commit b61d048e0d
parent e02c99bff6
1 changed files with 13 additions and 0 deletions
--- a/src/backend/optimizer/path/costsize.c
+++ b/src/backend/optimizer/path/costsize.c
@ -2416,6 +2416,7 @@ cost_agg(Path *path, PlannerInfo *root,
 		double		pages;
 		double		pages_written = 0.0;
 		double		pages_read = 0.0;
+		double		spill_cost;
 		double		hashentrysize;
 		double		nbatches;
 		Size		mem_limit;
@ -2453,9 +2454,21 @@ cost_agg(Path *path, PlannerInfo *root,
 		pages = relation_byte_size(input_tuples, input_width) / BLCKSZ;
 		pages_written = pages_read = pages * depth;

+		/*
+		 * HashAgg has somewhat worse IO behavior than Sort on typical
+		 * hardware/OS combinations. Account for this with a generic penalty.
+		 */
+		pages_read *= 2.0;
+		pages_written *= 2.0;
+
 		startup_cost += pages_written * random_page_cost;
 		total_cost += pages_written * random_page_cost;
 		total_cost += pages_read * seq_page_cost;
+
+		/* account for CPU cost of spilling a tuple and reading it back */
+		spill_cost = depth * input_tuples * 2.0 * cpu_tuple_cost;
+		startup_cost += spill_cost;
+		total_cost += spill_cost;
 	}

 	/*