postgresql/src
Tom Lane 8b0b6303e9 Try to ensure that stats collector's receive buffer size is at least 100KB.
Since commit 4e37b3e15, buildfarm member frogmouth has been failing
occasionally with symptoms indicating that some expected stats data is
getting dropped.  The reason that that commit changed the behavior seems
probably to be that more data is getting shoved at the collector in a short
span of time.  In current sources, the stats test's first session sends
about 9KB of data while exiting, which is probably the same as what was
sent just before wait_for_stats() in the previous test design.  But now,
the test's second session is starting up concurrently, and it sends another
2KB (presumably reflecting its initial catalog accesses).  Since frogmouth
is running on Windows XP, which reputedly has a default socket receive
buffer size of only 8KB, it is not very surprising if this has put us over
the threshold where the receive buffer can overflow and drop messages.

The same mechanism could very easily explain the intermittent stats test
failures we've been seeing for years, since background processes such
as the bgwriter will sometimes send data concurrently with all this, and
could thus cause occasional buffer overflows.

Hence, insert some code into pgstat_init() to increase the stats socket's
receive buffer size to 100KB if it's less than that.  (On failure, emit a
LOG message, but keep going.)  Modern systems seem to have default sizes
in the range of 100KB-250KB, but older platforms don't.  I couldn't find
any platforms that wouldn't accept 100KB, so in theory this won't cause
any portability problems.

If this is successful at reducing the buildfarm failure rate in HEAD,
we should back-patch it, because it's certain that similar buffer overflows
happen in the field on platforms with small buffer sizes.  Going forward,
there might be an argument for trying to increase the buffer size even
more, but let's take a baby step first.

Discussion: https://postgr.es/m/22173.1494788088@sss.pgh.pa.us
2017-05-16 15:24:52 -04:00
..
backend Try to ensure that stats collector's receive buffer size is at least 100KB. 2017-05-16 15:24:52 -04:00
bin Translation updates 2017-05-15 12:19:54 -04:00
common Add PQencryptPasswordConn function to libpq, use it in psql and createuser. 2017-05-03 11:19:07 +03:00
fe_utils Allow psql variable substitution to occur in backtick command strings. 2017-04-01 21:44:54 -04:00
include Fix relcache leak when row triggers on partitions are fired by COPY. 2017-05-16 12:46:32 -04:00
interfaces Translation updates 2017-05-15 12:19:54 -04:00
makefiles Try to fix non-MSVC Windows builds in the wake of logical replication. 2017-01-20 12:51:31 -05:00
pl Translation updates 2017-05-15 12:19:54 -04:00
port Run the postmaster's signal handlers without SA_RESTART. 2017-04-24 13:00:30 -04:00
template Remove "sco" and "unixware" ports. 2016-10-11 11:26:04 -04:00
test Fix relcache leak when row triggers on partitions are fired by COPY. 2017-05-16 12:46:32 -04:00
timezone Guard against null t->tm_zone in strftime.c. 2017-05-07 12:33:12 -04:00
tools Update oidjoins regression test for v10. 2017-05-15 14:04:11 -04:00
tutorial Recommend wrappers of PG_DETOAST_DATUM_PACKED(). 2017-03-12 19:35:33 -04:00
.gitignore
DEVELOPERS
Makefile Remove redundant coverage target 2017-02-17 08:56:57 -05:00
Makefile.global.in Specify --outputdir for isolation install check, not just plain check. 2017-05-13 15:13:17 -07:00
Makefile.shlib Remove support for bcc and msvc standalone libpq builds 2017-04-11 15:22:21 +02:00
nls-global.mk nls-global.mk: search build dir for source files, too 2016-06-07 18:55:18 -04:00