postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-02-28 12:20:43 -05:00

Author	SHA1	Message	Date
Tom Lane	48c7d9f6ff	Improve GIN indexscan cost estimation. The better estimate requires more statistics than we previously stored: in particular, counts of "entry" versus "data" pages within the index, as well as knowledge of the number of distinct key values. We collect this information during initial index build and update it during VACUUM, storing the info in new fields on the index metapage. No initdb is required because these fields will read as zeroes in a pre-existing index, and the new gincostestimate code is coded to behave (reasonably) sanely if they are zeroes. Teodor Sigaev, reviewed by Jan Urbanski, Tom Lane, and Itagaki Takahiro.	2010-10-17 20:52:32 -04:00
Tom Lane	9cc8c84e73	Improve logging in VACUUM FULL VERBOSE and CLUSTER VERBOSE. This patch resurrects some of the information that could be logged by the old, now-dead implementation of VACUUM FULL, in particular counts of live and dead tuples and the time taken for the table rebuild proper. There's still no logging about the ensuing index rebuilds, though. Itagaki Takahiro	2010-10-07 21:46:46 -04:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Heikki Linnakangas	723d0184e2	Use a latch to make startup process wake up and replay immediately when new WAL arrives via streaming replication. This reduces the latency, and also allows us to use a longer polling interval, which is good for energy efficiency. We still need to poll to check for the appearance of a trigger file, but the interval is now 5 seconds (instead of 100ms), like when waiting for a new WAL segment to appear in WAL archive.	2010-09-15 10:35:05 +00:00
Joe Conway	5eb15c9942	SERIALIZABLE transactions are actually implemented beneath the covers with transaction snapshots, i.e. a snapshot registered at the beginning of a transaction. Change variable naming and comments to reflect this reality in preparation for a future, truly serializable mode, e.g. Serializable Snapshot Isolation (SSI). For the moment transaction snapshots are still used to implement SERIALIZABLE, but hopefully not for too much longer. Patch by Kevin Grittner and Dan Ports with review and some minor wording changes by me.	2010-09-11 18:38:58 +00:00
Robert Haas	debcec7dc3	Include the backend ID in the relpath of temporary relations. This allows us to reliably remove all leftover temporary relation files on cluster startup without reference to system catalogs or WAL; therefore, we no longer include temporary relations in XLOG_XACT_COMMIT and XLOG_XACT_ABORT WAL records. Since these changes require including a backend ID in each SharedInvalSmgrMsg, the size of the SharedInvalidationMessage.id field has been reduced from two bytes to one, and the maximum number of connections has been reduced from INT_MAX / 4 to 2^23-1. It would be possible to remove these restrictions by increasing the size of SharedInvalidationMessage by 4 bytes, but right now that doesn't seem like a good trade-off. Review by Jaime Casanova and Tom Lane.	2010-08-13 20:10:54 +00:00
Robert Haas	30c22eb8fc	Correct sundry errors in Hot Standby-related comments. Fujii Masao	2010-08-12 23:24:54 +00:00
Tom Lane	d4fe61b083	Fix an additional set of problems in GIN's handling of lossy page pointers. Although the key-combining code claimed to work correctly if its input contained both lossy and exact pointers for a single page in a single TID stream, in fact this did not work, and could not work without pretty fundamental redesign. Modify keyGetItem so that it will not return such a stream, by handling lossy-pointer cases a bit more explicitly than we did before. Per followup investigation of a gripe from Artur Dabrowski. An example of a query that failed given his data set is select count() from search_tab where (to_tsvector('german', keywords ) @@ to_tsquery('german', 'ee: \| dd:')) and (to_tsvector('german', keywords ) @@ to_tsquery('german', 'aa:')); Back-patch to 8.4 where the lossy pointer code was introduced.	2010-08-01 19:16:39 +00:00
Tom Lane	0454f13161	Rewrite the rbtree routines so that an RBNode is the first field of the struct representing a tree entry, rather than being a separately allocated piece of storage. This API is at least as clean as the old one (if not more so --- there were some bizarre choices in there) and it permits a very substantial memory savings, on the order of 2X in ginbulk.c's usage. Also, fix minor memory leaks in code called by ginEntryInsert, in particular in ginInsertValue and entryFillRoot, as well as ginEntryInsert itself. These leaks resulted in the GIN index build context continuing to bloat even after we'd filled it to maintenance_work_mem and started to dump data out to the index. In combination these fixes restore the GIN index build code to honoring the maintenance_work_mem limit about as well as it did in 8.4. Speed seems on par with 8.4 too, maybe even a bit faster, for a non-pathological case in which HEAD was formerly slower. Back-patch to 9.0 so we don't have a performance regression from 8.4.	2010-08-01 02:12:42 +00:00
Tom Lane	2ab57e089b	Rewrite the key-combination logic in GIN's keyGetItem() and scanGetItem() routines to make them behave better in the presence of "lossy" index pointers. The previous coding was outright incorrect for some cases, as recently reported by Artur Dabrowski: scanGetItem would fail to return index entries in cases where one index key had multiple exact pointers on the same page as another key had a lossy pointer. Also, keyGetItem was extremely inefficient for cases where a single index key generates multiple "entry" streams, such as an @@ operator with a multiple-clause tsquery. The presence of a lossy page pointer in any one stream defeated its ability to use the opclass consistentFn, resulting in probing many heap pages that didn't really need to be visited. In Artur's example case, a query like WHERE tsvector @@ to_tsquery('a & b') was about 50X slower than the theoretically equivalent WHERE tsvector @@ to_tsquery('a') AND tsvector @@ to_tsquery('b') The way that I chose to fix this was to have GIN call the consistentFn twice with both TRUE and FALSE values for the in-doubt entry stream, returning a hit if either call produces TRUE, but not if they both return FALSE. The code handles this for the case of a single in-doubt entry stream, but punts (falling back to the stupid behavior) if there's more than one lossy reference to the same page. The idea could be scaled up to deal with multiple lossy references, but I think that would probably be wasted complexity. At least to judge by Artur's example, such cases don't occur often enough to be worth trying to optimize. Back-patch to 8.4. 8.3 did not have lossy GIN index pointers, so not subject to these problems.	2010-07-31 00:30:54 +00:00
Simon Riggs	5b8bd0529e	Rename asyncCommitLSN to asyncXactLSN to reflect changed role in 9.0. Transaction aborts now record their LSN to avoid corner case behaviour in SR/HS, hence change of name of variables and functions. As pointed out by Fujii Masao. Cosmetic changes only.	2010-07-29 22:27:27 +00:00
Bruce Momjian	239d769e7e	pgindent run for 9.0, second run	2010-07-06 19:19:02 +00:00
Tom Lane	e76c1a0f4d	Replace max_standby_delay with two parameters, max_standby_archive_delay and max_standby_streaming_delay, and revise the implementation to avoid assuming that timestamps found in WAL records can meaningfully be compared to clock time on the standby server. Instead, the delay limits are compared to the elapsed time since we last obtained a new WAL segment from archive or since we were last "caught up" to WAL data arriving via streaming replication. This avoids problems with clock skew between primary and standby, as well as other corner cases that the original coding would misbehave in, such as the primary server having significant idle time between transactions. Per my complaint some time ago and considerable ensuing discussion. Do some desultory editing on the hot standby documentation, too.	2010-07-03 20:43:58 +00:00
Tom Lane	07e8b6aabc	Don't allow walsender to send WAL data until it's been safely fsync'd on the master. Otherwise a subsequent crash could cause the master to lose WAL that has already been applied on the slave, resulting in the slave being out of sync and soon corrupt. Per recent discussion and an example from Robert Haas. Fujii Masao	2010-06-17 16:41:25 +00:00
Heikki Linnakangas	0a7cb85531	Make TriggerFile variable static. It's not used outside xlog.c. Fujii Masao	2010-06-10 07:49:23 +00:00
Tom Lane	f0488bd57c	Rename the parameter recovery_connections to hot_standby, to reduce possible confusion with streaming-replication settings. Also, change its default value to "off", because of concern about executing new and poorly-tested code during ordinary non-replicating operation. Per discussion. In passing do some minor editing of related documentation.	2010-04-29 21:36:19 +00:00
Heikki Linnakangas	9b8a73326e	Introduce wal_level GUC to explicitly control if information needed for archival or hot standby should be WAL-logged, instead of deducing that from other options like archive_mode. This replaces recovery_connections GUC in the primary, where it now has no effect, but it's still used in the standby to enable/disable hot standby. Remove the WAL-logging of "unlogged operations", like creating an index without WAL-logging and fsyncing it at the end. Instead, we keep a copy of the wal_mode setting and the settings that affect how much shared memory a hot standby server needs to track master transactions (max_connections, max_prepared_xacts, max_locks_per_xact) in pg_control. Whenever the settings change, at server restart, write a WAL record noting the new settings and update pg_control. This allows us to notice the change in those settings in the standby at the right moment, they used to be included in checkpoint records, but that meant that a changed value was not reflected in the standby until the first checkpoint after the change. Bump PG_CONTROL_VERSION and XLOG_PAGE_MAGIC. Whack XLOG_PAGE_MAGIC back to the sequence it used to follow, before hot standby and subsequent patches changed it to 0x9003.	2010-04-28 16:10:43 +00:00
Simon Riggs	bc2b85d904	Fix oversight in collecting values for cleanup_info records. vacuum_log_cleanup_info() now generates log records with a valid latestRemovedXid set in all cases. Also be careful not to zero the value when we do a round of vacuuming part-way through lazy_scan_heap(). Incidentally, this reduces frequency of conflicts in Hot Standby.	2010-04-21 17:20:56 +00:00
Robert Haas	481cb5d9b5	Rename standby_keep_segments to wal_keep_segments. Also, make the name of the GUC and the name of the backing variable match. Alnong the way, clean up a couple of slight typographical errors in the related docs.	2010-04-20 11:15:06 +00:00
Simon Riggs	2847de9df2	Remove some additional changes in previous commit that belong elsewhere.	2010-04-18 18:17:12 +00:00
Simon Riggs	21d6a6a128	Tune GetSnapshotData() during Hot Standby by avoiding loop through normal backends. Makes code clearer also, since we avoid various Assert()s. Performance of snapshots taken during recovery no longer depends upon number of read-only backends.	2010-04-18 18:06:07 +00:00
Heikki Linnakangas	361bd1662e	Allow Hot Standby to begin from a shutdown checkpoint. Patch by Simon Riggs & me	2010-04-13 14:17:46 +00:00
Heikki Linnakangas	30556568f5	Update the location of last removed WAL segment in shared memory only after actually removing one, so that if we can't remove segments because WAL archiving is lagging behind, we don't unnecessarily forbid streaming the old not-yet-archived segments that are still perfectly valid. Per suggestion from Fujii Masao.	2010-04-12 10:40:43 +00:00
Heikki Linnakangas	e57cd7f0a1	Change the logic to decide when to delete old WAL segments, so that it doesn't take into account how far the WAL senders are. This way a hung WAL sender doesn't prevent old WAL segments from being recycled/removed in the primary, ultimately causing the disk to fill up. Instead add standby_keep_segments setting to control how many old WAL segments are kept in the primary. This also makes it more reliable to use streaming replication without WAL archiving, assuming that you set standby_keep_segments high enough.	2010-04-12 09:52:29 +00:00
Robert Haas	54943734f8	Refer to max_wal_senders in a more consistent fashion. The error message now makes explicit reference to the GUC that must be changed to fix the problem, using wording suggested by Tom Lane. Along the way, rename the GUC from MaxWalSenders to max_wal_senders for consistency and grep-ability.	2010-04-01 00:43:29 +00:00
Simon Riggs	a760893dbd	Derive latestRemovedXid for btree deletes by reading heap pages. The WAL record for btree delete contains a list of tids, even when backup blocks are present. We follow the tids to their heap tuples, taking care to follow LP_REDIRECT tuples. We ignore LP_DEAD tuples on the understanding that they will always have xmin/xmax earlier than any LP_NORMAL tuples referred to by killed index tuples. Iff all tuples are LP_DEAD we return InvalidTransactionId. The heap relfilenode is added to the WAL record, requiring API changes to pass down the heap Relation. XLOG_PAGE_MAGIC updated.	2010-03-28 09:27:02 +00:00
Simon Riggs	bf6285b3a7	Further corrections of mismatching struct and btree SizeOf macros. In this case, correction is to remove now unused fields from struct. Since these were unused and full of garbage anyway, no version change.	2010-03-20 07:49:48 +00:00
Tom Lane	865b29540e	Fix oversight in btpo.xact patch; it was in fact installing garbage in the xact field on replay, due to not writing out all the data in the wal log struct.	2010-03-19 20:51:30 +00:00
Simon Riggs	aa36bd2039	Update XLOG_PAGE_MAGIC to recognise WAL format changes.	2010-03-19 17:42:10 +00:00
Simon Riggs	3cdafe40e7	Adjust comment in .history file to match recovery target specified. Comment present since 8.0 was never fully meaningful, since two recovery targets cannot be specified. Refactor recovery target type to make this change and associated code easier to understand. No change in function. Bug report arising from internal support question.	2010-03-19 11:05:15 +00:00
Simon Riggs	5c73ae17d1	Reset btpo.xact following recovery of btree delete page. Add btpo_xact field into WAL record and reset it from there, rather than using FrozenTransactionId which can lead to some corner case bugs. Problem report and suggested route to a fix from Heikki, details by me.	2010-03-19 10:41:22 +00:00
Bruce Momjian	65e806cba1	pgindent run for 9.0	2010-02-26 02:01:40 +00:00
Heikki Linnakangas	ad458cfe81	Don't use O_DIRECT when writing WAL files if archiving or streaming is enabled. Bypassing the kernel cache is counter-productive in that case, because the archiver/walsender process will read from the WAL file soon after it's written, and if it's not cached the read will cause a physical read, eating I/O bandwidth available on the WAL drive. Also, walreceiver process does unaligned writes, so disable O_DIRECT in walreceiver process for that reason too.	2010-02-19 10:51:04 +00:00
Tom Lane	d1e027221d	Replace the pg_listener-based LISTEN/NOTIFY mechanism with an in-memory queue. In addition, add support for a "payload" string to be passed along with each notify event. This implementation should be significantly more efficient than the old one, and is also more compatible with Hot Standby usage. There is not yet any facility for HS slaves to receive notifications generated on the master, although such a thing is possible in future. Joachim Wieland, reviewed by Jeff Davis; also hacked on by me.	2010-02-16 22:34:57 +00:00
Simon Riggs	dd428c79a4	Fix relcache init file invalidation during Hot Standby for the case where a database has a non-default tablespaceid. Pass thru MyDatabaseId and MyDatabaseTableSpace to allow file path to be re-created in standby and correct invalidation to take place in all cases. Update and rework xact_commit_desc() debug messages. Bug report from Tom by code inspection. Fix by me.	2010-02-13 16:15:48 +00:00
Simon Riggs	fafa374f2d	Introduce WAL records to log reuse of btree pages, allowing conflict resolution during Hot Standby. Page reuse interlock requested by Tom. Analysis and patch by me.	2010-02-13 00:59:58 +00:00
Teodor Sigaev	5209c084a6	Generic implementation of red-black binary tree. It's planned to use in several places, but for now only GIN uses it during index creation. Using self-balanced tree greatly speeds up index creation in corner cases with preordered data.	2010-02-11 14:29:50 +00:00
Bruce Momjian	dfc902854a	Add C comments that HEAP_MOVED_* define usage is only for pre-9.0 binary upgrades.	2010-02-08 14:10:21 +00:00
Tom Lane	0a469c8769	Remove old-style VACUUM FULL (which was known for a little while as VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity. Per discussion, the use case for this method of vacuuming is no longer large enough to justify maintaining it; not to mention that we don't wish to invest the work that would be needed to make it play nicely with Hot Standby. Aside from the code directly related to old-style VACUUM FULL, this commit removes support for certain WAL record types that could only be generated within VACUUM FULL, redirect-pointer removal in heap_page_prune, and nontransactional generation of cache invalidation sinval messages (the last being the sticking point for Hot Standby). We still have to retain all code that copes with finding HEAP_MOVED_OFF and HEAP_MOVED_IN flag bits on existing tuples. This can't be removed as long as we want to support in-place update from pre-9.0 databases.	2010-02-08 04:33:55 +00:00
Tom Lane	b9b8831ad6	Create a "relation mapping" infrastructure to support changing the relfilenodes of shared or nailed system catalogs. This has two key benefits: * The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs. * We no longer have to use an unsafe reindex-in-place approach for reindexing shared catalogs. CLUSTER on nailed catalogs now works too, although I left it disabled on shared catalogs because the resulting pg_index.indisclustered update would only be visible in one database. Since reindexing shared system catalogs is now fully transactional and crash-safe, the former special cases in REINDEX behavior have been removed; shared catalogs are treated the same as non-shared. This commit does not do anything about the recently-discussed problem of deadlocks between VACUUM FULL/CLUSTER on a system catalog and other concurrent queries; will address that in a separate patch. As a stopgap, parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid such failures during the regression tests.	2010-02-07 20:48:13 +00:00
Simon Riggs	296578feb4	Revoke augmentation of WAL records for btree delete, per discussion.	2010-02-01 13:40:28 +00:00
Simon Riggs	6d2bc0a6cf	Augment WAL records for btree delete with GetOldestXmin() to reduce false positives during Hot Standby conflict processing. Simple patch to enhance conflict processing, following previous discussions. Controlled by parameter minimize_standby_conflicts = on \| off, with default off allows measurement of performance impact to see whether it should be set on all the time.	2010-01-29 18:39:05 +00:00
Heikki Linnakangas	e0e8b96345	Change a few remaining calls of XLogArchivingActive() to use XLogIsNeeded() instead, to determine if an otherwise non-logged operation needs to be logged in WAL for standby servers. Fujii Masao	2010-01-28 07:31:42 +00:00
Robert Haas	76a47c0e74	Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism. Attributes can now have options, just as relations and tablespaces do, and the reloptions code is used to parse, validate, and store them. For simplicity and because these options are not performance critical, we store them in a separate cache rather than the main relcache. Thanks to Alex Hunsaker for the review.	2010-01-22 16:40:19 +00:00
Heikki Linnakangas	09b115f706	Write a WAL record whenever we perform an operation without WAL-logging that would've been WAL-logged if archiving was enabled. If we encounter such records in archive recovery anyway, we know that some data is missing from the log. A WARNING is emitted in that case. Original patch by Fujii Masao, with changes by me.	2010-01-20 19:43:40 +00:00
Tom Lane	47a09eda89	PGDLLIMPORT-ize the remaining variables needed by walreceiver.	2010-01-16 00:04:41 +00:00
Heikki Linnakangas	40f908bdcd	Introduce Streaming Replication. This includes two new kinds of postmaster processes, walsenders and walreceiver. Walreceiver is responsible for connecting to the primary server and streaming WAL to disk, while walsender runs in the primary server and streams WAL from disk to the client. Documentation still needs work, but the basics are there. We will probably pull the replication section to a new chapter later on, as well as the sections describing file-based replication. But let's do that as a separate patch, so that it's easier to see what has been added/changed. This patch also adds a new section to the chapter about FE/BE protocol, documenting the protocol used by walsender/walreceivxer. Bump catalog version because of two new functions, pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for monitoring the progress of replication. Fujii Masao, with additional hacking by me	2010-01-15 09:19:10 +00:00
Robert Haas	84b6d5f359	Remove partial, broken support for NULL pointers when fetching attributes. Previously, fastgetattr() and heap_getattr() tested their fourth argument against a null pointer, but any attempt to use them with a literal-NULL fourth argument evaluated to (void )0, resulting in a compiler error. Remove these NULL tests to avoid leading future readers of this code to believe that this has a chance of working. Also clean up related legacy code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr(). The new coding standard is that any code which calls a getattr-type function or macro which takes an isnull argument MUST pass a valid boolean pointer. Per discussion with Bruce Momjian, Tom Lane, Alvaro Herrera.	2010-01-10 04:26:36 +00:00
Robert Haas	d86d51a958	Support ALTER TABLESPACE name SET/RESET ( tablespace_options ). This patch only supports seq_page_cost and random_page_cost as parameters, but it provides the infrastructure to scalably support many more. In particular, we may want to add support for effective_io_concurrency, but I'm leaving that as future work for now. Thanks to Tom Lane for design help and Alvaro Herrera for the review.	2010-01-05 21:54:00 +00:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00

1 2 3 4 5 ...

701 commits