postgresql

mirror of https://github.com/postgres/postgres.git synced 2026-03-15 07:04:10 -04:00

Author	SHA1	Message	Date
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Tom Lane	4f335a3d7f	Repair two related errors in heap_lock_tuple: it was failing to recognize cases where we already hold the desired lock "indirectly", either via membership in a MultiXact or because the lock was originally taken by a different subtransaction of the current transaction. These cases must be accounted for to avoid needless deadlocks and/or inappropriate replacement of an exclusive lock with a shared lock. Per report from Clarence Gardner and subsequent investigation.	2006-11-17 18:00:15 +00:00
Tom Lane	48188e1621	Fix recently-understood problems with handling of XID freezing, particularly in PITR scenarios. We now WAL-log the replacement of old XIDs with FrozenTransactionId, so that such replacement is guaranteed to propagate to PITR slave databases. Also, rather than relying on hint-bit updates to be preserved, pg_clog is not truncated until all instances of an XID are known to have been replaced by FrozenTransactionId. Add new GUC variables and pg_autovacuum columns to allow management of the freezing policy, so that users can trade off the size of pg_clog against the amount of freezing work done. Revise the already-existing code that forces autovacuum of tables approaching the wraparound point to make it more bulletproof; also, revise the autovacuum logic so that anti-wraparound vacuuming is done per-table rather than per-database. initdb forced because of changes in pg_class, pg_database, and pg_autovacuum catalogs. Heikki Linnakangas, Simon Riggs, and Tom Lane.	2006-11-05 22:42:10 +00:00
Tom Lane	70ce5c9082	Fix "failed to re-find parent key" btree VACUUM failure by revising page deletion code to avoid the case where an upper-level btree page remains "half dead" for a significant period of time, and to block insertions into a key range that is in process of being re-assigned to the right sibling of the deleted page's parent. This prevents the scenario reported by Ed L. wherein index keys could become out-of-order in the grandparent index level. Since this is a moderately invasive fix, I'm applying it only to HEAD. The bug exists back to 7.4, but the back branches will get a different patch.	2006-11-01 19:43:17 +00:00
Tom Lane	e378f82e00	Make use of qsort_arg in several places that were formerly using klugy static variables. This avoids any risk of potential non-reentrancy, and in particular offers a much cleaner workaround for the Intel compiler bug that was affecting ginutil.c.	2006-10-05 17:57:40 +00:00
Bruce Momjian	f99a569a2e	pgindent run for 8.2.	2006-10-04 00:30:14 +00:00
Tom Lane	f5b4d9a9e0	If we're going to advertise the array overlap/containment operators, we probably should make them work reliably for all arrays. Fix code to handle NULLs and multidimensional arrays, move it into arrayfuncs.c. GIN is still restricted to indexing arrays with no null elements, however.	2006-09-10 20:14:20 +00:00
Tom Lane	ba920e1c91	Rename contains/contained-by operators to @> and <@, per discussion that agreed these symbols are less easily confused. I made new pg_operator entries (with new OIDs) for the old names, so as to provide backward compatibility while making it pretty easy to remove the old names in some future release cycle. This commit only touches the core datatypes, contrib will be fixed separately.	2006-09-10 00:29:35 +00:00
Tom Lane	08ae5edc5c	Optimize the case where a btree indexscan has current and mark positions on the same index page; we can avoid data copying as well as buffer refcount manipulations in this common case. Makes for a small but noticeable improvement in mergejoin speed. Heikki Linnakangas	2006-08-24 01:18:34 +00:00
Tom Lane	35af5422f6	Make the server track an 'XID epoch', that is, maintain higher-order bits of the transaction ID counter. Nothing is done with the epoch except to store it in checkpoint records, but this provides a foundation with which add-on code can pretend that XIDs never wrap around. This is a severely trimmed and rewritten version of the xxid patch submitted by Marko Kreen. Per discussion, the epoch counter seems the only part of xxid that really needs to be in the core server.	2006-08-21 16:16:31 +00:00
Tom Lane	7aa772f03e	Now that we've rearranged relation open to get a lock before touching the rel, it's easy to get rid of the narrow race-condition window that used to exist in VACUUM and CLUSTER. Did some minor code-beautification work in the same area, too.	2006-08-18 16:09:13 +00:00
Tom Lane	e8ea9e9587	Implement archive_timeout feature to force xlog file switches to occur no more than N seconds apart. This allows a simple, if not very high performance, means of guaranteeing that a PITR archive is no more than N seconds behind real time. Also make pg_current_xlog_location return the WAL Write pointer, add pg_current_xlog_insert_location to return the Insert pointer, and fix pg_xlogfile_name_offset to return its results as a two-element record instead of a smashed-together string, as per recent discussion. Simon Riggs	2006-08-17 23:04:10 +00:00
Tom Lane	e002836913	Make recovery from WAL be restartable, by executing a checkpoint-like operation every so often. This improves the usefulness of PITR log shipping for hot standby: formerly, if the standby server crashed, it was necessary to restart it from the last base backup and replay all the WAL since then. Now it will only need to reread about the same amount of WAL as the master server would. The behavior might also come in handy during a long PITR replay sequence. Simon Riggs, with some editorialization by Tom Lane.	2006-08-07 16:57:57 +00:00
Tom Lane	704ddaaa09	Add support for forcing a switch to a new xlog file; cause such a switch to happen automatically during pg_stop_backup(). Add some functions for interrogating the current xlog insertion point and for easily extracting WAL filenames from the hex WAL locations displayed by pg_stop_backup and friends. Simon Riggs with some editorialization by Tom Lane.	2006-08-06 03:53:44 +00:00
Tom Lane	09d3670df3	Change the relation_open protocol so that we obtain lock on a relation (table or index) before trying to open its relcache entry. This fixes race conditions in which someone else commits a change to the relation's catalog entries while we are in process of doing relcache load. Problems of that ilk have been reported sporadically for years, but it was not really practical to fix until recently --- for instance, the recent addition of WAL-log support for in-place updates helped. Along the way, remove pg_am.amconcurrent: all AMs are now expected to support concurrent update.	2006-07-31 20:09:10 +00:00
Tom Lane	e6284649b9	Modify btree to delete known-dead index entries without an actual VACUUM. When we are about to split an index page to do an insertion, first look to see if any entries marked LP_DELETE exist on the page, and if so remove them to try to make enough space for the desired insert. This should reduce index bloat in heavily-updated tables, although of course you still need VACUUM eventually to clean up the heap. Junji Teramoto	2006-07-25 19:13:00 +00:00
Bruce Momjian	b43ebe5f83	More include file adjustments.	2006-07-13 18:01:02 +00:00
Bruce Momjian	b844dd3f9e	More include file adjustments.	2006-07-13 17:47:02 +00:00
Bruce Momjian	a22d76d96a	Allow include files to compile own their own. Strip unused include files out unused include files, and add needed includes to C files. The next step is to remove unused include files in C files.	2006-07-13 16:49:20 +00:00
Tom Lane	d29b66882a	Tweak fillfactor code as per my recent proposal. Fix nbtsort.c so that it can handle small fillfactors for ordinary-sized index entries without failing on large ones; fix nbtinsert.c to distinguish leaf and nonleaf pages; change the minimum fillfactor to 10% for all index types.	2006-07-11 21:05:57 +00:00
Bruce Momjian	ac230e7431	Alphabetically order reference to include files, "S"-"Z".	2006-07-11 18:26:11 +00:00
Bruce Momjian	3a534ade39	Alphabetically order reference to include files, "G" - "M".	2006-07-11 17:04:13 +00:00
Teodor Sigaev	234163649e	GIN improvements - Replace sorted array of entries in maintenance_work_mem to binary tree, this should improve create performance. - More precisely calculate allocated memory, eliminate leaks with user-defined extractValue() - Improve wordings in tsearch2	2006-07-11 16:55:34 +00:00
Bruce Momjian	b85a965f5f	Allow each C include file to compile on its own by including any needed header files.	2006-07-11 13:54:25 +00:00
Alvaro Herrera	d4cef0aa2a	Improve vacuum code to track minimum Xids per table instead of per database. To this end, add a couple of columns to pg_class, relminxid and relvacuumxid, based on which we calculate the pg_database columns after each vacuum. We now force all databases to be vacuumed, even template ones. A backend noticing too old a database (meaning pg_database.datminxid is in danger of falling behind Xid wraparound) will signal the postmaster, which in turn will start an autovacuum iteration to process the offending database. In principle this is only there to cope with frozen (non-connectable) databases without forcing users to set them to connectable, but it could force regular user database to go through a database-wide vacuum at any time. Maybe we should warn users about this somehow. Of course the real solution will be to use autovacuum all the time ;-) There are some additional improvements we could have in this area: for example the vacuum code could be smarter about not updating pg_database for each table when called by autovacuum, and do it only once the whole autovacuum iteration is done. I updated the system catalogs documentation, but I didn't modify the maintenance section. Also having some regression tests for this would be nice but it's not really a very straightforward thing to do. Catalog version bumped due to system catalog changes.	2006-07-10 16:20:52 +00:00
Tom Lane	b7b78d24f7	Code review for FILLFACTOR patch. Change WITH grammar as per earlier discussion (including making def_arg allow reserved words), add missed opt_definition for UNIQUE case. Put the reloptions support code in a less random place (I chose to make a new file access/common/reloptions.c). Eliminate header inclusion creep. Make the index options functions safely user-callable (seems like client apps might like to be able to test validity of options before trying to make an index). Reduce overhead for normal case with no options by allowing rd_options to be NULL. Fix some unmaintainably klugy code, including getting rid of Natts_pg_class_fixed at long last. Some stylistic cleanup too, and pay attention to keeping comments in sync with code. Documentation still needs work, though I did fix the omissions in catalogs.sgml and indexam.sgml.	2006-07-03 22:45:41 +00:00
Bruce Momjian	277807bd9e	Add FILLFACTOR to CREATE INDEX. ITAGAKI Takahiro	2006-07-02 02:23:23 +00:00
Teodor Sigaev	1f7ef548ec	Changes * new split algorithm (as proposed in http://archives.postgresql.org/pgsql-hackers/2006-06/msg00254.php) * possible call pickSplit() for second and below columns * add spl_(l\|r)datum_exists to GIST_SPLITVEC - pickSplit should check its values to use already defined spl_(l\|r)datum for splitting. pickSplit should set spl_(l\|r)datum_exists to 'false' (if they was 'true') to signal to caller about using spl_(l\|r)datum. * support for old pickSplit(): not very optimal but correct split * remove 'bytes' field from GISTENTRY: in any case size of value is defined by it's type. * split GIST_SPLITVEC to two structures: one for using in picksplit and second - for internal use. * some code refactoring * support of subsplit to rtree opclasses TODO: add support of subsplit to contrib modules	2006-06-28 12:00:14 +00:00
Tom Lane	3f50ba27cf	Create infrastructure for 'MinimalTuple' representation of in-memory tuples with less header overhead than a regular HeapTuple, per my recent proposal. Teach TupleTableSlot code how to deal with these. As proof of concept, change tuplestore.c to store MinimalTuples instead of HeapTuples. Future patches will expand the concept to other places where it is useful.	2006-06-27 02:51:40 +00:00
Bruce Momjian	199f8f2858	Fix GEVHDRSZ for Win32. Magnus Hagander	2006-06-25 01:02:12 +00:00
Tom Lane	06e10abc0b	Fix problems with cached tuple descriptors disappearing while still in use by creating a reference-count mechanism, similar to what we did a long time ago for catcache entries. The back branches have an ugly solution involving lots of extra copies, but this way is more efficient. Reference counting is only applied to tupdescs that are actually in caches --- there seems no need to use it for tupdescs that are generated in the executor, since they'll go away during plan shutdown by virtue of being in the per-query memory context. Neil Conway and Tom Lane	2006-06-16 18:42:24 +00:00
Teodor Sigaev	b32000eda4	Som improve page split in multicolumn GiST index. If user picksplit on n-th column generate equals left and right unions then it calls picksplit on n+1-th column.	2006-05-29 12:50:06 +00:00
Teodor Sigaev	d2158b0281	* Add support NULL to GiST. * some refactoring and simplify code int gistutil.c and gist.c * now in some cases it can be called used-defined picksplit method for non-first column in index, but here is a place to do more. * small fix of docs related to support NULL.	2006-05-24 11:01:39 +00:00
Teodor Sigaev	420cbff881	Simplify gistSplit() and some refactoring related code.	2006-05-19 16:15:17 +00:00
Teodor Sigaev	8876e37d07	Reduce size of critial section during vacuum full, critical sections now isn't nested. All user-defined functions now is called outside critsections. Small improvements in WAL protocol. TODO: improve XLOG replay	2006-05-17 16:34:59 +00:00
Tom Lane	3fdeb189e9	Clean up code associated with updating pg_class statistics columns (relpages/reltuples). To do this, create formal support in heapam.c for "overwrite" tuple updates (including xlog replay capability) and use that instead of the ad-hoc overwrites we'd been using in VACUUM and CREATE INDEX. Take the responsibility for updating stats during CREATE INDEX out of the individual index AMs, and do it where it belongs, in catalog/index.c. Aside from being more modular, this avoids having to update the same tuple twice in some paths through CREATE INDEX. It's probably not measurably faster, but for sure it's a lot cleaner than before.	2006-05-10 23:18:39 +00:00
Teodor Sigaev	10dd8df68e	Reduce size of critical section and remove call of user-defined functions in insertion and deletion, modify gistSplit() to do not use buffers. TODO: gistvacuumcleanup and XLOG	2006-05-10 09:19:54 +00:00
Tom Lane	5749f6ef0c	Rewrite btree vacuuming to fold the former bulkdelete and cleanup operations into a single mostly-physical-order scan of the index. This requires some ticklish interlocking considerations, but should create no material performance impact on normal index operations (at least given the already-committed changes to make scans work a page at a time). VACUUM itself should get significantly faster in any index that's degenerated to a very nonlinear page order. Also, we save one pass over the index entirely, except in the case where there were no deletions to do and so only one pass happened anyway. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-08 00:00:17 +00:00
Tom Lane	09cb5c0e7d	Rewrite btree index scans to work a page at a time in all cases (both btgettuple and btgetmulti). This eliminates the problem of "re-finding" the exact stopping point, since the stopping point is effectively always a page boundary, and index items are never moved across pre-existing page boundaries. A small penalty is that the keys_are_unique optimization is effectively disabled (and, therefore, is removed in this patch), causing us to apply _bt_checkkeys() to at least one more tuple than necessary when looking up a unique key. However, the advantages for non-unique cases seem great enough to accept this tradeoff. Aside from simplifying and (sometimes) speeding up the indexscan code, this will allow us to reimplement btbulkdelete as a largely sequential scan instead of index-order traversal, thereby significantly reducing the cost of VACUUM. Those changes will come in a separate patch. Original patch by Heikki Linnakangas, rework by Tom Lane.	2006-05-07 01:21:30 +00:00
Tom Lane	e57345975c	Clean up API for ambulkdelete/amvacuumcleanup as per today's discussion. This formulation requires every AM to provide amvacuumcleanup, unlike before, but it's surely a whole lot cleaner. Also, add an 'amstorage' column to pg_am so that we can get rid of hardwired knowledge in DefineOpClass().	2006-05-02 22:25:10 +00:00
Teodor Sigaev	8a3631f8d8	GIN: Generalized Inverted iNdex. text[], int4[], Tsearch2 support for GIN.	2006-05-02 11:28:56 +00:00
Bruce Momjian	e6004f0151	Add statement_timestamp(), clock_timestamp(), and transaction_timestamp() (just like now()). Also update statement_timeout() to mention it is statement arrival time that is measured. Catalog version updated.	2006-04-25 00:25:22 +00:00
Bruce Momjian	5bbea03f3b	Suppress more compiler warnings caused by macro tests.	2006-04-24 22:24:58 +00:00
Bruce Momjian	7384e95b0c	Add one more paren to macro.	2006-04-24 22:17:04 +00:00
Bruce Momjian	88fc941355	Suprress compiler warning in gcc 4.2. Report by Kris Jurka	2006-04-24 22:06:32 +00:00
Tom Lane	defe93463c	Make the world safe for full_page_writes. Allow XLOG records that try to update no-longer-existing pages to fall through as no-ops, but make a note of each page number referenced by such records. If we don't see a later XLOG entry dropping the table or truncating away the page, complain at the end of XLOG replay. Since this fixes the known failure mode for full_page_writes = off, revert my previous band-aid patch that disabled that GUC variable.	2006-04-14 20:27:24 +00:00
Tom Lane	49a7610c36	Fix an ancient oversight in btree xlog replay. When trying to determine if an upper-level insertion completes a previously-seen split, we cannot simply grab the downlink block number out of the buffer, because the buffer could contain a later state of the page --- or perhaps the page doesn't even exist at all any more, due to relation truncation. These possibilities have been masked up to now because the use of full_page_writes effectively ensured that no xlog replay routine ever actually saw a page state newer than its own change. Since we're deprecating full_page_writes in 8.1.*, there's no need to fix this in existing release branches, but we need a fix in HEAD if we want to have any hope of re-allowing full_page_writes. Accordingly, adjust the contents of btree WAL records so that we can always get the downlink block number from the WAL record rather than having to depend on buffer contents. Per report from Kevin Grittner and Peter Brant. Improve a few comments in related code while at it.	2006-04-13 03:53:05 +00:00
Tom Lane	09b5271ebd	Add a field to the first page of each WAL file to indicate the XLOG_BLCKSZ. This ought to help in preventing configuration mismatch problems if anyone tries to ship PITR files between servers compiled with different XLOG_BLCKSZ settings. Simon Riggs	2006-04-05 03:34:05 +00:00
Tom Lane	eaef111396	Define a separately configurable XLOG_BLCKSZ symbol for the page size used within WAL files. Historically this was the same as the data file BLCKSZ, but there's no necessary connection, and it's possible that performance gains might ensue from reducing XLOG_BLCKSZ. In any case distinguishing two symbols should improve code clarity. This commit does not actually change the page size, only provide the infrastructure to make it possible to do so. initdb forced because of addition of a field to pg_control. Mark Wong, with some help from Simon Riggs and Tom Lane.	2006-04-03 23:35:05 +00:00
Teodor Sigaev	8d02b15e33	Eliminate ajust scan code. Since concurrent GiST it doesn't do real work. That was missed during concurrence development.	2006-04-03 13:44:33 +00:00
Tom Lane	89bda95d82	Remove the 'slow' path for btree index build, which built the btree incrementally by successive inserts rather than by sorting the data. We were only using the slow path during bootstrap, apparently because when first written it failed during bootstrap --- but it works fine now AFAICT. Removing it saves a hundred or so lines of code and produces noticeably (~10%) smaller initial states of the system catalog indexes. While that won't make much difference for heavily-modified catalogs, for the more static ones there may be a useful long-term performance improvement.	2006-04-01 03:03:37 +00:00
Tom Lane	a8b8f4db23	Clean up WAL/buffer interactions as per my recent proposal. Get rid of the misleadingly-named WriteBuffer routine, and instead require routines that change buffer pages to call MarkBufferDirty (which does exactly what it says). We also require that they do so before calling XLogInsert; this takes care of the synchronization requirement documented in SyncOneBuffer. Note that because bufmgr takes the buffer content lock (in shared mode) while writing out any buffer, it doesn't matter whether MarkBufferDirty is executed before the buffer content change is complete, so long as the content change is completed before releasing exclusive lock on the buffer. So it's OK to set the dirtybit before we fill in the LSN. This eliminates the former kluge of needing to set the dirtybit in LockBuffer. Aside from making the code more transparent, we can also add some new debugging assertions, in particular that the caller of MarkBufferDirty must hold the buffer content lock, not merely a pin.	2006-03-31 23:32:07 +00:00
Tom Lane	89395bfa6f	Improve gist XLOG code to follow the coding rules needed to prevent torn-page problems. This introduces some issues of its own, mainly that there are now some critical sections of unreasonably broad scope, but it's a step forward anyway. Further cleanup will require some code refactoring that I'd prefer to get Oleg and Teodor involved in.	2006-03-30 23:03:10 +00:00
Tom Lane	6d61cdec07	Clean up and document the API for XLogOpenRelation and XLogReadBuffer. This commit doesn't make much functional change, but it does eliminate some duplicated code --- for instance, PageIsNew tests are now done inside XLogReadBuffer rather than by each caller. The GIST xlog code still needs a lot of love, but I'll worry about that separately.	2006-03-29 21:17:39 +00:00
Tom Lane	0a20207060	Arrange to emit a description of the current XLOG record as error context when an error occurs during xlog replay. Also, replace the former risky 'write into a fixed-size buffer with no overflow detection' API for XLOG record description routines; use an expansible StringInfo instead. (The latter accounts for most of the patch bulk.) Qingqing Zhou	2006-03-24 04:32:13 +00:00
Tom Lane	2316013961	Clean up representation of function RTEs for functions returning RECORD. The original coding stored the raw parser output (ColumnDef and TypeName nodes) which was ugly, bulky, and wrong because it failed to create any dependency on the referenced datatype --- and in fact would not track type renamings and suchlike. Instead store a list of column type OIDs in the RTE. Also fix up general failure of recordDependencyOnExpr to do anything sane about recording dependencies on datatypes. While there are many cases where there will be an indirect dependency (eg if an operator returns a datatype, the dependency on the operator is enough), we do have to record the datatype as a separate dependency in examples like CoerceToDomain. initdb forced because of change of stored rules.	2006-03-16 00:31:55 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Tom Lane	fd267c1ebc	Skip ambulkdelete scan if there's nothing to delete and the index is not partial. None of the existing AMs do anything useful except counting tuples when there's nothing to delete, and we can get a tuple count from the heap as long as it's not a partial index. (hash actually can skip anyway because it maintains a tuple count in the index metapage.) GIST is not currently able to exploit this optimization because, due to failure to index NULLs, GIST is always effectively partial. Possibly we should fix that sometime. Simon Riggs w/ some review by Tom Lane.	2006-02-11 23:31:34 +00:00
Bruce Momjian	77bb65d3fc	Revert based on Tom's recommendation: > Allow VACUUM to complete faster by avoiding scanning the indexes when no > rows were removed from the heap by the VACUUM.	2006-02-11 17:14:09 +00:00
Bruce Momjian	bf324946b3	Allow VACUUM to complete faster by avoiding scanning the indexes when no rows were removed from the heap by the VACUUM. Simon Riggs	2006-02-11 16:59:09 +00:00
Tom Lane	5997386a0a	Remove the no-longer-useful HashItem/HashItemData level of structure. Same motivation as for BTItem.	2006-01-25 23:26:11 +00:00
Tom Lane	c389760c32	Remove the no-longer-useful BTItem/BTItemData level of structure, and just refer to btree index entries as plain IndexTuples, which is what they have been for a very long time. This is mostly just an exercise in removing extraneous notation, but it does save a palloc/pfree cycle per index insertion.	2006-01-25 23:04:21 +00:00
Tom Lane	3a0a16cb7e	Allow row comparisons to be used as indexscan qualifications. This completes the project to upgrade our handling of row comparisons.	2006-01-25 20:29:24 +00:00
Tom Lane	7ccaf13a06	Instead of using a numberOfRequiredKeys count to distinguish required and non-required keys in a btree index scan, mark the required scankeys with private flag bits SK_BT_REQFWD and/or SK_BT_REQBKWD. This seems at least marginally clearer to me, and it eliminates a wired-into-the- data-structure assumption that required keys are consecutive. Even though that assumption will remain true for the foreseeable future, having it in there makes the code seem more complex than necessary.	2006-01-23 22:31:41 +00:00
Tom Lane	f7ea931287	Some minor code cleanup, falling out from the removal of rtree. SK_NEGATE isn't being used anywhere anymore, and there seems no point in a generic index_keytest() routine when two out of three remaining access methods aren't using it. Also, add a comment documenting a convention for letting access methods define private flag bits in ScanKey sk_flags. There are no such flags at the moment but I'm thinking about changing btree's handling of "required keys" to use flag bits in the keys rather than a count of required key positions. Also, if some AM did still want SK_NEGATE then it would be reasonable to treat it as a private flag bit.	2006-01-14 22:03:35 +00:00
Tom Lane	cefcbbf1fd	Push the responsibility for handling ignore_killed_tuples down into _bt_checkkeys(), instead of checking it in the top-level nbtree.c routines as formerly. This saves a little bit of loop overhead, but more importantly it lets us skip performing the index key comparisons for dead tuples.	2005-12-07 19:37:53 +00:00
Tom Lane	887a7c61f6	Get rid of slru.c's hardwired insistence on a fixed number of slots per SLRU area. The number of slots is still a compile-time constant (someday we might want to change that), but at least it's a different constant for each SLRU area. Increase number of subtrans buffers to 32 based on experimentation with a heavily subtrans-bashing test case, and increase number of multixact member buffers to 16, since it's obviously silly for it not to be at least twice the number of multixact offset buffers.	2005-12-06 23:08:34 +00:00
Tom Lane	a615acf555	Arrange for read-only accesses to SLRU page buffers to take only a shared lock, not exclusive, if the desired page is already in memory. This can be demonstrated to be a significant win on the pg_subtrans cache when there is a large window of open transactions. It should be useful for pg_clog as well. I didn't try to make GetMultiXactIdMembers() use the code, as that would have taken some restructuring, and what with the local cache for multixact contents it probably wouldn't really make a difference. Per my recent proposal.	2005-12-06 18:10:06 +00:00
Tom Lane	a98871b7ac	Tweak indexscan machinery to avoid taking an AccessShareLock on an index if we already have a stronger lock due to the index's table being the update target table of the query. Same optimization I applied earlier at the table level. There doesn't seem to be much interest in the more radical idea of not locking indexes at all, so do what we can ...	2005-12-03 05:51:03 +00:00
Tom Lane	70f1482de3	Change seqscan logic so that we check visibility of all tuples on a page when we first read the page, rather than checking them one at a time. This allows us to take and release the buffer content lock just once per page, instead of once per tuple. Since it's a shared lock the contention penalty for holding the lock longer shouldn't be too bad. We can safely do this only when using an MVCC snapshot; else the assumption that visibility won't change over time is uncool. Therefore there are now two code paths depending on the snapshot type. I also made the same change in nodeBitmapHeapscan.c, where it can be done always because we only support MVCC snapshots for bitmap scans anyway. Also make some incidental cleanups in the APIs of these functions. Per a suggestion from Qingqing Zhou.	2005-11-26 03:03:07 +00:00
Bruce Momjian	436a2956d8	Re-run pgindent, fixing a problem where comment lines after a blank comment line where output as too long, and update typedefs for /lib directory. Also fix case where identifiers were used as variable names in the backend, but as typedefs in ecpg (favor the backend for indenting). Backpatch to 8.1.X.	2005-11-22 18:17:34 +00:00
Tom Lane	dd218ae7b0	Remove the t_datamcxt field of HeapTupleData. This was introduced for the convenience of tuptoaster.c and is no longer needed, so may as well get rid of some small amount of overhead.	2005-11-20 19:49:08 +00:00
Tom Lane	40314f2dac	Modify tuptoaster's API so that it does not try to modify the passed tuple in-place, but instead passes back an all-new tuple structure if any changes are needed. This is a much cleaner and more robust solution for the bug discovered by Alexey Beschiokov; accordingly, revert the quick hack I installed yesterday. With this change, HeapTupleData.t_datamcxt is no longer needed; will remove it in a separate commit in HEAD only.	2005-11-20 18:38:20 +00:00
Tom Lane	2a8d3d83ef	R-tree is dead ... long live GiST.	2005-11-07 17:36:47 +00:00
Tom Lane	6236991143	Add simple sanity checks on newly-read pages to GiST, too.	2005-11-06 22:39:21 +00:00
Tom Lane	766dc45d9f	Add defenses to btree and hash index AMs to do simple sanity checks on every index page they read; in particular to catch the case of an all-zero page, which PageHeaderIsValid allows to pass. It turns out hash already had this idea, but it was just Assert()ing things rather than doing a straight error check, and the Asserts were partially redundant with PageHeaderIsValid anyway. Per recent failure example from Jim Nasby. (gist still needs the same treatment.)	2005-11-06 19:29:01 +00:00
Tom Lane	18691d8ee3	Clean up representation of SLRU page state. This is the cleaner fix for the SLRU race condition that I posted a few days ago, but we decided not to use in 8.1 and older branches.	2005-11-05 21:19:47 +00:00
Bruce Momjian	1dc3498251	Standard pgindent run for 8.1.	2005-10-15 02:49:52 +00:00
Alvaro Herrera	a84429a1aa	Remove an unused typedef.	2005-10-07 14:55:36 +00:00
Tom Lane	35e9b1cc1e	Clean up a couple of ad-hoc computations of the maximum number of tuples on a page, as suggested by ITAGAKI Takahiro. Also, change a few places that were using some other estimates of max-items-per-page to consistently use MaxOffsetNumber. This is conservatively large --- we could have used the new MaxHeapTuplesPerPage macro, or a similar one for index tuples --- but those places are simply declaring a fixed-size buffer and assuming it will work, rather than actively testing for overrun. It seems safer to size these buffers in a way that can't overflow even if the page is corrupt.	2005-09-02 19:02:20 +00:00
Tom Lane	0007490e09	Convert the arithmetic for shared memory size calculation from 'int' to 'Size' (that is, size_t), and install overflow detection checks in it. This allows us to remove the former arbitrary restrictions on NBuffers etc. It won't make any difference in a 32-bit machine, but in a 64-bit machine you could theoretically have terabytes of shared buffers. (How efficiently we could manage 'em remains to be seen.) Similarly, num_temp_buffers, work_mem, and maintenance_work_mem can be set above 2Gb on a 64-bit machine. Original patch from Koichi Suzuki, additional work by moi.	2005-08-20 23:26:37 +00:00
Tatsuo Ishii	ba2fc7eb4b	Make GetMultiXactIdMembers() a public function.	2005-08-20 01:29:27 +00:00
Tom Lane	f57e3f4cf3	Repair problems with VACUUM destroying t_ctid chains too soon, and with insufficient paranoia in code that follows t_ctid links. (We must do both because even with VACUUM doing it properly, the intermediate state with a dangling t_ctid link is visible concurrently during lazy VACUUM, and could be seen afterwards if either type of VACUUM crashes partway through.) Also try to improve documentation about what's going on. Patch is a bit bulky because passing the XMAX information around required changing the APIs of some low-level heapam.c routines, but it's not conceptually very complicated. Per trouble report from Teodor and subsequent analysis. This needs to be back-patched, but I'll do that after 8.1 beta is out.	2005-08-20 00:40:32 +00:00
Tom Lane	721e53785d	Solve the problem of OID collisions by probing for duplicate OIDs whenever we generate a new OID. This prevents occasional duplicate-OID errors that can otherwise occur once the OID counter has wrapped around. Duplicate relfilenode values are also checked for when creating new physical files. Per my recent proposal.	2005-08-12 01:36:05 +00:00
Tom Lane	2a4fad1a0e	Add NOWAIT option to SELECT FOR UPDATE/SHARE. Original patch by Hans-Juergen Schoenig, revisions by Karel Zak and Tom Lane.	2005-08-01 20:31:16 +00:00
Tom Lane	5d5f1a79e6	Clean up a number of autovacuum loose ends. Make the stats collector track shared relations in a separate hashtable, so that operations done from different databases are counted correctly. Add proper support for anti-XID-wraparound vacuuming, even in databases that are never connected to and so have no stats entries. Miscellaneous other bug fixes. Alvaro Herrera, some additional fixes by Tom Lane.	2005-07-29 19:30:09 +00:00
Bruce Momjian	a923602855	Add pg_column_size() to return storage size of a column, including possible compression. Mark Kirkwood	2005-07-06 19:02:54 +00:00
Tom Lane	eb5949d190	Arrange for the postmaster (and standalone backends, initdb, etc) to chdir into PGDATA and subsequently use relative paths instead of absolute paths to access all files under PGDATA. This seems to give a small performance improvement, and it should make the system more robust against naive DBAs doing things like moving a database directory that has a live postmaster in it. Per recent discussion.	2005-07-04 04:51:52 +00:00
Teodor Sigaev	898a7bd13b	Bug fixes for GiST crash recovery. - add forgotten check of lsn for insert completion - remove level of pages: hard to check in recovery - some cleanups	2005-06-30 17:52:14 +00:00
Tom Lane	b5f7cff84f	Clean up the rather historically encumbered interface to now() and current time: provide a GetCurrentTimestamp() function that returns current time in the form of a TimestampTz, instead of separate time_t and microseconds fields. This is what all the callers really want anyway, and it eliminates low-level dependencies on AbsoluteTime, which is a deprecated datatype that will have to disappear eventually.	2005-06-29 22:51:57 +00:00
Tom Lane	7762619e95	Replace pg_shadow and pg_group by new role-capable catalogs pg_authid and pg_auth_members. There are still many loose ends to finish in this patch (no documentation, no regression tests, no pg_dump support for instance). But I'm going to commit it now anyway so that Alvaro can make some progress on shared dependencies. The catalog changes should be pretty much done.	2005-06-28 05:09:14 +00:00
Teodor Sigaev	e8cab5fe49	Concurrency for GiST - full concurrency for insert/update/select/vacuum: - select and vacuum never locks more than one page simultaneously - select (gettuple) hasn't any lock across it's calls - insert never locks more than two page simultaneously: - during search of leaf to insert it locks only one page simultaneously - while walk upward to the root it locked only parent (may be non-direct parent) and child. One of them X-lock, another may be S- or X-lock - 'vacuum full' locks index - improve gistgetmulti - simplify XLOG records Fix bug in index_beginscan_internal: LockRelation may clean rd_aminfo structure, so move GET_REL_PROCEDURE after LockRelation	2005-06-27 12:45:23 +00:00
Tom Lane	b90f8f20f0	Extend r-tree operator classes to handle Y-direction tests equivalent to the existing X-direction tests. An rtree class now includes 4 actual 2-D tests, 4 1-D X-direction tests, and 4 1-D Y-direction tests. This involved adding four new Y-direction test operators for each of box and polygon; I followed the PostGIS project's lead as to the names of these operators. NON BACKWARDS COMPATIBLE CHANGE: the poly_overleft (&<) and poly_overright (&>) operators now have semantics comparable to box_overleft and box_overright. This is necessary to make r-tree indexes work correctly on polygons. Also, I changed circle_left and circle_right to agree with box_left and box_right --- formerly they allowed the boundaries to touch. This isn't actually essential given the lack of any r-tree opclass for circles, but it seems best to sync all the definitions while we are at it.	2005-06-24 20:53:34 +00:00
Tom Lane	9a09248edd	Fix rtree and contrib/rtree_gist search behavior for the 1-D box and polygon operators (<<, &<, >>, &>). Per ideas originally put forward by andrew@supernews and later rediscovered by moi. This patch just fixes the existing opclasses, and does not add any new behavior as I proposed earlier; that can be sorted out later. In principle this could be back-patched, since it changes only search behavior and not system catalog entries nor rtree index contents. I'm not currently planning to do that, though, since I think it could use more testing.	2005-06-24 00:18:52 +00:00
Tom Lane	b95ae32b41	Avoid WAL-logging individual tuple insertions during CREATE TABLE AS (a/k/a SELECT INTO). Instead, flush and fsync the whole relation before committing. We do still need the WAL log when PITR is active, however. Simon Riggs and Tom Lane.	2005-06-20 18:37:02 +00:00
Teodor Sigaev	1bfdd1a893	fix founded hole in recovery after crash, add vacuum_delay_point()	2005-06-20 15:22:38 +00:00
Teodor Sigaev	d544ec8bbd	1. full functional WAL for GiST 2. improve vacuum for gist - use FSM - full vacuum: - reforms parent tuple if it's needed ( tuples was deleted on child page or parent tuple remains invalid after crash recovery ) - truncate index file if possible 3. fixes bugs and mistakes	2005-06-20 10:29:37 +00:00
Tom Lane	e26b0abda3	Arrange to fsync two-phase-commit state files only during checkpoints; given reasonably short lifespans for prepared transactions, this should mean that only a small minority of state files ever need to be fsynced at all. Per discussion with Heikki Linnakangas.	2005-06-19 20:00:39 +00:00
Tom Lane	a8d1075f27	Add a time-of-preparation column to the pg_prepared_xacts view, per an old suggestion by Oliver Jowett. Also, add a transaction column to the pg_locks view to show the xid of each transaction holding or awaiting locks; this allows prepared transactions to be properly associated with the locks they own. There was already a column named 'transaction', and I chose to rename it to 'transactionid' --- since this column is new in the current devel cycle there should be no backwards compatibility issue to worry about.	2005-06-18 19:33:42 +00:00
Tom Lane	d0a89683a3	Two-phase commit. Original patch by Heikki Linnakangas, with additional hacking by Alvaro Herrera and Tom Lane.	2005-06-17 22:32:51 +00:00

1 2 3 4 5 ...

560 commits