2006-05-02 13:28:56 +02:00
|
|
|
/*-------------------------------------------------------------------------
|
|
|
|
*
|
|
|
|
* ginbtree.c
|
2006-10-04 02:30:14 +02:00
|
|
|
* page utilities routines for the postgres inverted index access method.
|
2006-05-02 13:28:56 +02:00
|
|
|
*
|
|
|
|
*
|
2020-01-01 18:21:45 +01:00
|
|
|
* Portions Copyright (c) 1996-2020, PostgreSQL Global Development Group
|
2006-05-02 13:28:56 +02:00
|
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
|
|
*
|
|
|
|
* IDENTIFICATION
|
2010-09-20 22:08:53 +02:00
|
|
|
* src/backend/access/gin/ginbtree.c
|
2006-05-02 13:28:56 +02:00
|
|
|
*-------------------------------------------------------------------------
|
|
|
|
*/
|
|
|
|
|
|
|
|
#include "postgres.h"
|
2008-05-12 02:00:54 +02:00
|
|
|
|
Fix GIN to support null keys, empty and null items, and full index scans.
Per my recent proposal(s). Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue. This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans. A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.
Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.
Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.
This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators. The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.
Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.
Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend. I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
2011-01-08 01:16:24 +01:00
|
|
|
#include "access/gin_private.h"
|
2017-02-14 21:37:59 +01:00
|
|
|
#include "access/ginxlog.h"
|
2014-11-06 12:52:08 +01:00
|
|
|
#include "access/xloginsert.h"
|
2006-05-02 13:28:56 +02:00
|
|
|
#include "miscadmin.h"
|
2019-11-12 04:00:16 +01:00
|
|
|
#include "storage/predicate.h"
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
#include "utils/memutils.h"
|
2008-06-19 02:46:06 +02:00
|
|
|
#include "utils/rel.h"
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
static void ginFindParents(GinBtree btree, GinBtreeStack *stack);
|
|
|
|
static bool ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
|
2019-05-22 19:04:48 +02:00
|
|
|
void *insertdata, BlockNumber updateblkno,
|
|
|
|
Buffer childbuf, GinStatsData *buildStats);
|
2013-11-27 18:21:23 +01:00
|
|
|
static void ginFinishSplit(GinBtree btree, GinBtreeStack *stack,
|
2019-05-22 19:04:48 +02:00
|
|
|
bool freestack, GinStatsData *buildStats);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
2006-05-02 13:28:56 +02:00
|
|
|
/*
|
2013-11-27 14:43:05 +01:00
|
|
|
* Lock buffer by needed method for search.
|
2006-05-02 13:28:56 +02:00
|
|
|
*/
|
2017-03-23 17:38:47 +01:00
|
|
|
int
|
2006-10-04 02:30:14 +02:00
|
|
|
ginTraverseLock(Buffer buffer, bool searchMode)
|
|
|
|
{
|
|
|
|
Page page;
|
|
|
|
int access = GIN_SHARE;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
|
|
|
LockBuffer(buffer, GIN_SHARE);
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(buffer);
|
2006-10-04 02:30:14 +02:00
|
|
|
if (GinPageIsLeaf(page))
|
|
|
|
{
|
2017-08-16 06:22:32 +02:00
|
|
|
if (searchMode == false)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2006-05-02 13:28:56 +02:00
|
|
|
/* we should relock our page */
|
|
|
|
LockBuffer(buffer, GIN_UNLOCK);
|
|
|
|
LockBuffer(buffer, GIN_EXCLUSIVE);
|
|
|
|
|
|
|
|
/* But root can become non-leaf during relock */
|
2006-10-04 02:30:14 +02:00
|
|
|
if (!GinPageIsLeaf(page))
|
|
|
|
{
|
2006-11-12 07:55:54 +01:00
|
|
|
/* restore old lock type (very rare) */
|
2006-05-02 13:28:56 +02:00
|
|
|
LockBuffer(buffer, GIN_UNLOCK);
|
|
|
|
LockBuffer(buffer, GIN_SHARE);
|
2006-10-04 02:30:14 +02:00
|
|
|
}
|
|
|
|
else
|
2006-05-02 13:28:56 +02:00
|
|
|
access = GIN_EXCLUSIVE;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
return access;
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-11-27 14:43:05 +01:00
|
|
|
* Descend the tree to the leaf page that contains or would contain the key
|
|
|
|
* we're searching for. The key should already be filled in 'btree', in
|
|
|
|
* tree-type specific manner. If btree->fullScan is true, descends to the
|
2013-11-20 15:09:14 +01:00
|
|
|
* leftmost leaf page.
|
|
|
|
*
|
|
|
|
* If 'searchmode' is false, on return stack->buffer is exclusively locked,
|
|
|
|
* and the stack represents the full path to the root. Otherwise stack->buffer
|
|
|
|
* is share-locked, and stack->parent is NULL.
|
2018-12-27 02:10:51 +01:00
|
|
|
*
|
|
|
|
* If 'rootConflictCheck' is true, tree root is checked for serialization
|
|
|
|
* conflict.
|
2006-05-02 13:28:56 +02:00
|
|
|
*/
|
2006-10-04 02:30:14 +02:00
|
|
|
GinBtreeStack *
|
2018-12-27 02:10:51 +01:00
|
|
|
ginFindLeafPage(GinBtree btree, bool searchMode,
|
|
|
|
bool rootConflictCheck, Snapshot snapshot)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2013-11-20 15:09:14 +01:00
|
|
|
GinBtreeStack *stack;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-20 15:09:14 +01:00
|
|
|
stack = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
|
2013-11-27 14:43:05 +01:00
|
|
|
stack->blkno = btree->rootBlkno;
|
|
|
|
stack->buffer = ReadBuffer(btree->index, btree->rootBlkno);
|
2013-11-20 15:09:14 +01:00
|
|
|
stack->parent = NULL;
|
|
|
|
stack->predictNumber = 1;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2018-12-27 02:10:51 +01:00
|
|
|
if (rootConflictCheck)
|
Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.
This fixes two bugs:
1. If fast update was turned on concurrently, subsequent inserts to the
pending list would not conflict with predicate locks that were acquired
earlier, on entry pages. The included 'predicate-gin-fastupdate' test
demonstrates that. To fix, make all scans acquire a predicate lock on
the metapage. That lock represents a scan of the pending list, whether
or not there is a pending list at the moment. Forget about the
optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
point of predicate locks is to lock the gabs between values, whether
or not there is a match. The included 'predicate-gin-nomatch' test
tests that case.
In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)
Also, some spelling fixes.
Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
2018-05-04 10:27:50 +02:00
|
|
|
CheckForSerializableConflictIn(btree->index, NULL, stack->buffer);
|
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
for (;;)
|
|
|
|
{
|
|
|
|
Page page;
|
2006-05-02 13:28:56 +02:00
|
|
|
BlockNumber child;
|
2013-11-20 15:09:14 +01:00
|
|
|
int access;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
|
|
|
stack->off = InvalidOffsetNumber;
|
|
|
|
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(stack->buffer);
|
|
|
|
TestForOldSnapshot(snapshot, btree->index, page);
|
2006-10-04 02:30:14 +02:00
|
|
|
|
2013-11-20 15:09:14 +01:00
|
|
|
access = ginTraverseLock(stack->buffer, searchMode);
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/*
|
|
|
|
* If we're going to modify the tree, finish any incomplete splits we
|
|
|
|
* encounter on the way.
|
|
|
|
*/
|
|
|
|
if (!searchMode && GinPageIsIncompleteSplit(page))
|
|
|
|
ginFinishSplit(btree, stack, false, NULL);
|
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
/*
|
|
|
|
* ok, page is correctly locked, we should check to move right ..,
|
|
|
|
* root never has a right link, so small optimization
|
|
|
|
*/
|
2017-08-16 06:22:32 +02:00
|
|
|
while (btree->fullScan == false && stack->blkno != btree->rootBlkno &&
|
Fix GIN to support null keys, empty and null items, and full index scans.
Per my recent proposal(s). Null key datums can now be returned by
extractValue and extractQuery functions, and will be stored in the index.
Also, placeholder entries are made for indexable items that are NULL or
contain no keys according to extractValue. This means that the index is
now always complete, having at least one entry for every indexed heap TID,
and so we can get rid of the prohibition on full-index scans. A full-index
scan is implemented much the same way as partial-match scans were already:
we build a bitmap representing all the TIDs found in the index, and then
drive the results off that.
Also, introduce a concept of a "search mode" that can be requested by
extractQuery when the operator requires matching to empty items (this is
just as cheap as matching to a single key) or requires a full index scan
(which is not so cheap, but it sure beats failing or giving wrong answers).
The behavior remains backward compatible for opclasses that don't return
any null keys or request a non-default search mode.
Using these features, we can now make the GIN index opclass for anyarray
behave in a way that matches the actual anyarray operators for &&, <@, @>,
and = ... which it failed to do before in assorted corner cases.
This commit fixes the core GIN code and ginarrayprocs.c, updates the
documentation, and adds some simple regression test cases for the new
behaviors using the array operators. The tsearch and contrib GIN opclass
support functions still need to be looked over and probably fixed.
Another thing I intend to fix separately is that this is pretty inefficient
for cases where more than one scan condition needs a full-index search:
we'll run duplicate GinScanEntrys, each one of which builds a large bitmap.
There is some existing logic to merge duplicate GinScanEntrys but it needs
refactoring to make it work for entries belonging to different scan keys.
Note that most of gin.h has been split out into a new file gin_private.h,
so that gin.h doesn't export anything that's not supposed to be used by GIN
opclasses or the rest of the backend. I did quite a bit of other code
beautification work as well, mostly fixing comments and choosing more
appropriate names for things.
2011-01-08 01:16:24 +01:00
|
|
|
btree->isMoveRight(btree, page))
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2006-05-02 13:28:56 +02:00
|
|
|
BlockNumber rightlink = GinPageGetOpaque(page)->rightlink;
|
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
if (rightlink == InvalidBlockNumber)
|
2006-05-02 13:28:56 +02:00
|
|
|
/* rightmost page */
|
|
|
|
break;
|
|
|
|
|
2013-11-08 21:21:42 +01:00
|
|
|
stack->buffer = ginStepRight(stack->buffer, btree->index, access);
|
2006-05-02 13:28:56 +02:00
|
|
|
stack->blkno = rightlink;
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(stack->buffer);
|
|
|
|
TestForOldSnapshot(snapshot, btree->index, page);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
if (!searchMode && GinPageIsIncompleteSplit(page))
|
|
|
|
ginFinishSplit(btree, stack, false, NULL);
|
2006-05-02 13:28:56 +02:00
|
|
|
}
|
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
if (GinPageIsLeaf(page)) /* we found, return locked page */
|
2006-05-02 13:28:56 +02:00
|
|
|
return stack;
|
|
|
|
|
|
|
|
/* now we have correct buffer, try to find child */
|
|
|
|
child = btree->findChildPage(btree, stack);
|
|
|
|
|
|
|
|
LockBuffer(stack->buffer, GIN_UNLOCK);
|
2006-10-04 02:30:14 +02:00
|
|
|
Assert(child != InvalidBlockNumber);
|
|
|
|
Assert(stack->blkno != child);
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-20 15:09:14 +01:00
|
|
|
if (searchMode)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2006-05-02 13:28:56 +02:00
|
|
|
/* in search mode we may forget path to leaf */
|
|
|
|
stack->blkno = child;
|
2006-10-04 02:30:14 +02:00
|
|
|
stack->buffer = ReleaseAndReadBuffer(stack->buffer, btree->index, stack->blkno);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
GinBtreeStack *ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
|
2006-05-02 13:28:56 +02:00
|
|
|
|
|
|
|
ptr->parent = stack;
|
|
|
|
stack = ptr;
|
|
|
|
stack->blkno = child;
|
|
|
|
stack->buffer = ReadBuffer(btree->index, stack->blkno);
|
|
|
|
stack->predictNumber = 1;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
2013-11-08 21:21:42 +01:00
|
|
|
/*
|
|
|
|
* Step right from current page.
|
|
|
|
*
|
|
|
|
* The next page is locked first, before releasing the current page. This is
|
|
|
|
* crucial to protect from concurrent page deletion (see comment in
|
|
|
|
* ginDeletePage).
|
|
|
|
*/
|
|
|
|
Buffer
|
|
|
|
ginStepRight(Buffer buffer, Relation index, int lockmode)
|
|
|
|
{
|
|
|
|
Buffer nextbuffer;
|
2016-04-20 15:31:19 +02:00
|
|
|
Page page = BufferGetPage(buffer);
|
2013-11-08 21:21:42 +01:00
|
|
|
bool isLeaf = GinPageIsLeaf(page);
|
|
|
|
bool isData = GinPageIsData(page);
|
2013-11-27 14:43:05 +01:00
|
|
|
BlockNumber blkno = GinPageGetOpaque(page)->rightlink;
|
2013-11-08 21:21:42 +01:00
|
|
|
|
|
|
|
nextbuffer = ReadBuffer(index, blkno);
|
|
|
|
LockBuffer(nextbuffer, lockmode);
|
|
|
|
UnlockReleaseBuffer(buffer);
|
|
|
|
|
|
|
|
/* Sanity check that the page we stepped to is of similar kind. */
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(nextbuffer);
|
2013-11-08 21:21:42 +01:00
|
|
|
if (isLeaf != GinPageIsLeaf(page) || isData != GinPageIsData(page))
|
|
|
|
elog(ERROR, "right sibling of GIN page is of different type");
|
|
|
|
|
|
|
|
return nextbuffer;
|
|
|
|
}
|
|
|
|
|
2006-05-02 13:28:56 +02:00
|
|
|
void
|
2006-10-04 02:30:14 +02:00
|
|
|
freeGinBtreeStack(GinBtreeStack *stack)
|
|
|
|
{
|
|
|
|
while (stack)
|
|
|
|
{
|
|
|
|
GinBtreeStack *tmp = stack->parent;
|
|
|
|
|
|
|
|
if (stack->buffer != InvalidBuffer)
|
2006-05-02 13:28:56 +02:00
|
|
|
ReleaseBuffer(stack->buffer);
|
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
pfree(stack);
|
2006-05-02 13:28:56 +02:00
|
|
|
stack = tmp;
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-11-27 14:43:05 +01:00
|
|
|
* Try to find parent for current stack position. Returns correct parent and
|
|
|
|
* child's offset in stack->parent. The root page is never released, to
|
2018-09-08 21:24:19 +02:00
|
|
|
* prevent conflict with vacuum process.
|
2006-05-02 13:28:56 +02:00
|
|
|
*/
|
2013-11-27 18:21:23 +01:00
|
|
|
static void
|
2013-11-27 14:43:05 +01:00
|
|
|
ginFindParents(GinBtree btree, GinBtreeStack *stack)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
|
|
|
Page page;
|
|
|
|
Buffer buffer;
|
|
|
|
BlockNumber blkno,
|
|
|
|
leftmostBlkno;
|
2006-05-02 13:28:56 +02:00
|
|
|
OffsetNumber offset;
|
2013-11-27 18:21:23 +01:00
|
|
|
GinBtreeStack *root;
|
2006-10-04 02:30:14 +02:00
|
|
|
GinBtreeStack *ptr;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/*
|
|
|
|
* Unwind the stack all the way up to the root, leaving only the root
|
|
|
|
* item.
|
|
|
|
*
|
|
|
|
* Be careful not to release the pin on the root page! The pin on root
|
|
|
|
* page is required to lock out concurrent vacuums on the tree.
|
|
|
|
*/
|
|
|
|
root = stack->parent;
|
|
|
|
while (root->parent)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2013-11-27 18:21:23 +01:00
|
|
|
ReleaseBuffer(root->buffer);
|
|
|
|
root = root->parent;
|
2006-10-04 02:30:14 +02:00
|
|
|
}
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
Assert(root->blkno == btree->rootBlkno);
|
|
|
|
Assert(BufferGetBlockNumber(root->buffer) == btree->rootBlkno);
|
2006-05-02 13:28:56 +02:00
|
|
|
root->off = InvalidOffsetNumber;
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
blkno = root->blkno;
|
|
|
|
buffer = root->buffer;
|
|
|
|
offset = InvalidOffsetNumber;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
ptr = (GinBtreeStack *) palloc(sizeof(GinBtreeStack));
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
for (;;)
|
|
|
|
{
|
2006-05-02 13:28:56 +02:00
|
|
|
LockBuffer(buffer, GIN_EXCLUSIVE);
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(buffer);
|
2006-10-04 02:30:14 +02:00
|
|
|
if (GinPageIsLeaf(page))
|
2006-05-02 13:28:56 +02:00
|
|
|
elog(ERROR, "Lost path");
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
if (GinPageIsIncompleteSplit(page))
|
|
|
|
{
|
|
|
|
Assert(blkno != btree->rootBlkno);
|
|
|
|
ptr->blkno = blkno;
|
|
|
|
ptr->buffer = buffer;
|
2014-05-06 18:12:18 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/*
|
|
|
|
* parent may be wrong, but if so, the ginFinishSplit call will
|
|
|
|
* recurse to call ginFindParents again to fix it.
|
|
|
|
*/
|
|
|
|
ptr->parent = root;
|
|
|
|
ptr->off = InvalidOffsetNumber;
|
|
|
|
|
|
|
|
ginFinishSplit(btree, ptr, false, NULL);
|
|
|
|
}
|
|
|
|
|
2013-11-20 15:09:14 +01:00
|
|
|
leftmostBlkno = btree->getLeftMostChild(btree, page);
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
while ((offset = btree->findChildPtr(btree, page, stack->blkno, InvalidOffsetNumber)) == InvalidOffsetNumber)
|
|
|
|
{
|
2006-05-02 13:28:56 +02:00
|
|
|
blkno = GinPageGetOpaque(page)->rightlink;
|
2006-10-04 02:30:14 +02:00
|
|
|
if (blkno == InvalidBlockNumber)
|
2013-11-08 21:21:42 +01:00
|
|
|
{
|
|
|
|
UnlockReleaseBuffer(buffer);
|
2006-05-02 13:28:56 +02:00
|
|
|
break;
|
2013-11-08 21:21:42 +01:00
|
|
|
}
|
|
|
|
buffer = ginStepRight(buffer, btree->index, GIN_EXCLUSIVE);
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(buffer);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
/* finish any incomplete splits, as above */
|
|
|
|
if (GinPageIsIncompleteSplit(page))
|
|
|
|
{
|
|
|
|
Assert(blkno != btree->rootBlkno);
|
|
|
|
ptr->blkno = blkno;
|
|
|
|
ptr->buffer = buffer;
|
|
|
|
ptr->parent = root;
|
|
|
|
ptr->off = InvalidOffsetNumber;
|
|
|
|
|
|
|
|
ginFinishSplit(btree, ptr, false, NULL);
|
|
|
|
}
|
2006-05-02 13:28:56 +02:00
|
|
|
}
|
|
|
|
|
2006-10-04 02:30:14 +02:00
|
|
|
if (blkno != InvalidBlockNumber)
|
|
|
|
{
|
2006-05-02 13:28:56 +02:00
|
|
|
ptr->blkno = blkno;
|
|
|
|
ptr->buffer = buffer;
|
2013-11-27 14:43:05 +01:00
|
|
|
ptr->parent = root; /* it may be wrong, but in next call we will
|
2006-10-04 02:30:14 +02:00
|
|
|
* correct */
|
2006-05-26 10:01:17 +02:00
|
|
|
ptr->off = offset;
|
2006-05-02 13:28:56 +02:00
|
|
|
stack->parent = ptr;
|
|
|
|
return;
|
|
|
|
}
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/* Descend down to next level */
|
2006-05-02 13:28:56 +02:00
|
|
|
blkno = leftmostBlkno;
|
2013-11-27 18:21:23 +01:00
|
|
|
buffer = ReadBuffer(btree->index, blkno);
|
2006-05-02 13:28:56 +02:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
2013-11-27 14:43:05 +01:00
|
|
|
* Insert a new item to a page.
|
|
|
|
*
|
|
|
|
* Returns true if the insertion was finished. On false, the page was split and
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* the parent needs to be updated. (A root split returns true as it doesn't
|
|
|
|
* need any further action by the caller to complete.)
|
2013-11-27 14:43:05 +01:00
|
|
|
*
|
2015-05-20 15:18:11 +02:00
|
|
|
* When inserting a downlink to an internal page, 'childbuf' contains the
|
2013-11-27 18:21:23 +01:00
|
|
|
* child page that was split. Its GIN_INCOMPLETE_SPLIT flag will be cleared
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* atomically with the insert. Also, the existing item at offset stack->off
|
|
|
|
* in the target page is updated to point to updateblkno.
|
2010-10-18 02:52:32 +02:00
|
|
|
*
|
2013-11-20 16:00:53 +01:00
|
|
|
* stack->buffer is locked on entry, and is kept locked.
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* Likewise for childbuf, if given.
|
2006-05-02 13:28:56 +02:00
|
|
|
*/
|
2013-11-20 16:00:53 +01:00
|
|
|
static bool
|
2013-11-27 14:43:05 +01:00
|
|
|
ginPlaceToPage(GinBtree btree, GinBtreeStack *stack,
|
|
|
|
void *insertdata, BlockNumber updateblkno,
|
2013-11-27 18:21:23 +01:00
|
|
|
Buffer childbuf, GinStatsData *buildStats)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2016-04-20 15:31:19 +02:00
|
|
|
Page page = BufferGetPage(stack->buffer);
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
bool result;
|
2014-01-22 17:51:48 +01:00
|
|
|
GinPlaceToPageRC rc;
|
2013-11-27 18:21:23 +01:00
|
|
|
uint16 xlflags = 0;
|
|
|
|
Page childpage = NULL;
|
2014-05-06 18:12:18 +02:00
|
|
|
Page newlpage = NULL,
|
|
|
|
newrpage = NULL;
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
void *ptp_workspace = NULL;
|
|
|
|
MemoryContext tmpCxt;
|
|
|
|
MemoryContext oldCxt;
|
|
|
|
|
|
|
|
/*
|
|
|
|
* We do all the work of this function and its subfunctions in a temporary
|
|
|
|
* memory context. This avoids leakages and simplifies APIs, since some
|
|
|
|
* subfunctions allocate storage that has to survive until we've finished
|
|
|
|
* the WAL insertion.
|
|
|
|
*/
|
|
|
|
tmpCxt = AllocSetContextCreate(CurrentMemoryContext,
|
|
|
|
"ginPlaceToPage temporary context",
|
Add macros to make AllocSetContextCreate() calls simpler and safer.
I found that half a dozen (nearly 5%) of our AllocSetContextCreate calls
had typos in the context-sizing parameters. While none of these led to
especially significant problems, they did create minor inefficiencies,
and it's now clear that expecting people to copy-and-paste those calls
accurately is not a great idea. Let's reduce the risk of future errors
by introducing single macros that encapsulate the common use-cases.
Three such macros are enough to cover all but two special-purpose contexts;
those two calls can be left as-is, I think.
While this patch doesn't in itself improve matters for third-party
extensions, it doesn't break anything for them either, and they can
gradually adopt the simplified notation over time.
In passing, change TopMemoryContext to use the default allocation
parameters. Formerly it could only be extended 8K at a time. That was
probably reasonable when this code was written; but nowadays we create
many more contexts than we did then, so that it's not unusual to have a
couple hundred K in TopMemoryContext, even without considering various
dubious code that sticks other things there. There seems no good reason
not to let it use growing blocks like most other contexts.
Back-patch to 9.6, mostly because that's still close enough to HEAD that
it's easy to do so, and keeping the branches in sync can be expected to
avoid some future back-patching pain. The bugs fixed by these changes
don't seem to be significant enough to justify fixing them further back.
Discussion: <21072.1472321324@sss.pgh.pa.us>
2016-08-27 23:50:38 +02:00
|
|
|
ALLOCSET_DEFAULT_SIZES);
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
oldCxt = MemoryContextSwitchTo(tmpCxt);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
if (GinPageIsData(page))
|
|
|
|
xlflags |= GIN_INSERT_ISDATA;
|
|
|
|
if (GinPageIsLeaf(page))
|
|
|
|
{
|
|
|
|
xlflags |= GIN_INSERT_ISLEAF;
|
|
|
|
Assert(!BufferIsValid(childbuf));
|
|
|
|
Assert(updateblkno == InvalidBlockNumber);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
Assert(BufferIsValid(childbuf));
|
|
|
|
Assert(updateblkno != InvalidBlockNumber);
|
2016-04-20 15:31:19 +02:00
|
|
|
childpage = BufferGetPage(childbuf);
|
2013-11-27 18:21:23 +01:00
|
|
|
}
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 14:43:05 +01:00
|
|
|
/*
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* See if the incoming tuple will fit on the page. beginPlaceToPage will
|
|
|
|
* decide if the page needs to be split, and will compute the split
|
|
|
|
* contents if so. See comments for beginPlaceToPage and execPlaceToPage
|
|
|
|
* functions for more details of the API here.
|
2013-11-27 14:43:05 +01:00
|
|
|
*/
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
rc = btree->beginPlaceToPage(btree, stack->buffer, stack,
|
|
|
|
insertdata, updateblkno,
|
|
|
|
&ptp_workspace,
|
|
|
|
&newlpage, &newrpage);
|
|
|
|
|
|
|
|
if (rc == GPTP_NO_WORK)
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
{
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* Nothing to do */
|
|
|
|
result = true;
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
}
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
else if (rc == GPTP_INSERT)
|
2013-11-20 16:00:53 +01:00
|
|
|
{
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* It will fit, perform the insertion */
|
|
|
|
START_CRIT_SECTION();
|
|
|
|
|
Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.
For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.
Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 16:03:15 +02:00
|
|
|
if (RelationNeedsWAL(btree->index) && !btree->isBuild)
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
{
|
|
|
|
XLogBeginInsert();
|
|
|
|
XLogRegisterBuffer(0, stack->buffer, REGBUF_STANDARD);
|
|
|
|
if (BufferIsValid(childbuf))
|
|
|
|
XLogRegisterBuffer(1, childbuf, REGBUF_STANDARD);
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Perform the page update, and register any extra WAL data */
|
|
|
|
btree->execPlaceToPage(btree, stack->buffer, stack,
|
|
|
|
insertdata, updateblkno, ptp_workspace);
|
|
|
|
|
2013-11-20 16:00:53 +01:00
|
|
|
MarkBufferDirty(stack->buffer);
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/* An insert to an internal page finishes the split of the child. */
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
if (BufferIsValid(childbuf))
|
2013-11-27 18:21:23 +01:00
|
|
|
{
|
|
|
|
GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
|
|
|
|
MarkBufferDirty(childbuf);
|
|
|
|
}
|
|
|
|
|
Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.
For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.
Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 16:03:15 +02:00
|
|
|
if (RelationNeedsWAL(btree->index) && !btree->isBuild)
|
2013-11-20 16:00:53 +01:00
|
|
|
{
|
|
|
|
XLogRecPtr recptr;
|
2013-11-27 18:21:23 +01:00
|
|
|
ginxlogInsert xlrec;
|
2014-05-06 18:12:18 +02:00
|
|
|
BlockIdData childblknos[2];
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
xlrec.flags = xlflags;
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogRegisterData((char *) &xlrec, sizeof(ginxlogInsert));
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* Log information about child if this was an insertion of a
|
|
|
|
* downlink.
|
|
|
|
*/
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
if (BufferIsValid(childbuf))
|
2013-11-27 18:21:23 +01:00
|
|
|
{
|
|
|
|
BlockIdSet(&childblknos[0], BufferGetBlockNumber(childbuf));
|
|
|
|
BlockIdSet(&childblknos[1], GinPageGetOpaque(childpage)->rightlink);
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
XLogRegisterData((char *) childblknos,
|
|
|
|
sizeof(BlockIdData) * 2);
|
2013-11-27 18:21:23 +01:00
|
|
|
}
|
2013-11-20 16:00:53 +01:00
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_INSERT);
|
2013-11-20 16:00:53 +01:00
|
|
|
PageSetLSN(page, recptr);
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
if (BufferIsValid(childbuf))
|
2013-11-27 18:21:23 +01:00
|
|
|
PageSetLSN(childpage, recptr);
|
2013-11-20 16:00:53 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
END_CRIT_SECTION();
|
|
|
|
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* Insertion is complete. */
|
|
|
|
result = true;
|
2013-11-20 16:00:53 +01:00
|
|
|
}
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
else if (rc == GPTP_SPLIT)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/*
|
|
|
|
* Didn't fit, need to split. The split has been computed in newlpage
|
|
|
|
* and newrpage, which are pointers to palloc'd pages, not associated
|
|
|
|
* with buffers. stack->buffer is not touched yet.
|
|
|
|
*/
|
2013-11-20 16:00:53 +01:00
|
|
|
Buffer rbuffer;
|
2006-10-04 02:30:14 +02:00
|
|
|
BlockNumber savedRightLink;
|
2013-11-27 18:21:23 +01:00
|
|
|
ginxlogSplit data;
|
|
|
|
Buffer lbuffer = InvalidBuffer;
|
|
|
|
Page newrootpg = NULL;
|
2013-11-20 16:00:53 +01:00
|
|
|
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* Get a new index page to become the right page */
|
2013-11-20 16:00:53 +01:00
|
|
|
rbuffer = GinNewBuffer(btree->index);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
2013-11-27 14:43:05 +01:00
|
|
|
/* During index build, count the new page */
|
2013-11-20 16:00:53 +01:00
|
|
|
if (buildStats)
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
2013-11-20 16:00:53 +01:00
|
|
|
if (btree->isData)
|
|
|
|
buildStats->nDataPages++;
|
|
|
|
else
|
|
|
|
buildStats->nEntryPages++;
|
|
|
|
}
|
|
|
|
|
2013-11-27 14:43:05 +01:00
|
|
|
savedRightLink = GinPageGetOpaque(page)->rightlink;
|
|
|
|
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* Begin setting up WAL record */
|
2013-11-27 18:21:23 +01:00
|
|
|
data.node = btree->index->rd_node;
|
|
|
|
data.flags = xlflags;
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
if (BufferIsValid(childbuf))
|
2013-11-27 18:21:23 +01:00
|
|
|
{
|
|
|
|
data.leftChildBlkno = BufferGetBlockNumber(childbuf);
|
|
|
|
data.rightChildBlkno = GinPageGetOpaque(childpage)->rightlink;
|
|
|
|
}
|
|
|
|
else
|
|
|
|
data.leftChildBlkno = data.rightChildBlkno = InvalidBlockNumber;
|
|
|
|
|
|
|
|
if (stack->parent == NULL)
|
2013-11-20 16:00:53 +01:00
|
|
|
{
|
|
|
|
/*
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* splitting the root, so we need to allocate new left page and
|
|
|
|
* place pointers to left and right page on root page.
|
2013-11-20 16:00:53 +01:00
|
|
|
*/
|
2013-11-27 18:21:23 +01:00
|
|
|
lbuffer = GinNewBuffer(btree->index);
|
2013-11-20 16:00:53 +01:00
|
|
|
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* During index build, count the new left page */
|
2013-11-27 14:43:05 +01:00
|
|
|
if (buildStats)
|
|
|
|
{
|
|
|
|
if (btree->isData)
|
|
|
|
buildStats->nDataPages++;
|
|
|
|
else
|
|
|
|
buildStats->nEntryPages++;
|
|
|
|
}
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
data.rrlink = InvalidBlockNumber;
|
2013-11-27 18:21:23 +01:00
|
|
|
data.flags |= GIN_SPLIT_ROOT;
|
2013-11-20 16:00:53 +01:00
|
|
|
|
2014-01-22 17:51:48 +01:00
|
|
|
GinPageGetOpaque(newrpage)->rightlink = InvalidBlockNumber;
|
2013-11-20 16:00:53 +01:00
|
|
|
GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
|
2013-11-06 09:31:38 +01:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/*
|
|
|
|
* Construct a new root page containing downlinks to the new left
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* and right pages. (Do this in a temporary copy rather than
|
|
|
|
* overwriting the original page directly, since we're not in the
|
|
|
|
* critical section yet.)
|
2013-11-27 18:21:23 +01:00
|
|
|
*/
|
2014-01-22 17:51:48 +01:00
|
|
|
newrootpg = PageGetTempPage(newrpage);
|
|
|
|
GinInitPage(newrootpg, GinPageGetOpaque(newlpage)->flags & ~(GIN_LEAF | GIN_COMPRESSED), BLCKSZ);
|
2010-10-18 02:52:32 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
btree->fillRoot(btree, newrootpg,
|
|
|
|
BufferGetBlockNumber(lbuffer), newlpage,
|
2014-01-22 17:51:48 +01:00
|
|
|
BufferGetBlockNumber(rbuffer), newrpage);
|
2018-03-30 13:23:17 +02:00
|
|
|
|
|
|
|
if (GinPageIsLeaf(BufferGetPage(stack->buffer)))
|
|
|
|
{
|
|
|
|
|
|
|
|
PredicateLockPageSplit(btree->index,
|
2018-04-26 20:47:16 +02:00
|
|
|
BufferGetBlockNumber(stack->buffer),
|
|
|
|
BufferGetBlockNumber(lbuffer));
|
2018-03-30 13:23:17 +02:00
|
|
|
|
|
|
|
PredicateLockPageSplit(btree->index,
|
2018-04-26 20:47:16 +02:00
|
|
|
BufferGetBlockNumber(stack->buffer),
|
|
|
|
BufferGetBlockNumber(rbuffer));
|
2018-03-30 13:23:17 +02:00
|
|
|
}
|
|
|
|
|
2013-11-20 16:00:53 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* splitting a non-root page */
|
2013-11-27 18:21:23 +01:00
|
|
|
data.rrlink = savedRightLink;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2014-01-22 17:51:48 +01:00
|
|
|
GinPageGetOpaque(newrpage)->rightlink = savedRightLink;
|
2013-11-27 18:21:23 +01:00
|
|
|
GinPageGetOpaque(newlpage)->flags |= GIN_INCOMPLETE_SPLIT;
|
2013-11-20 16:00:53 +01:00
|
|
|
GinPageGetOpaque(newlpage)->rightlink = BufferGetBlockNumber(rbuffer);
|
2018-03-30 13:23:17 +02:00
|
|
|
|
|
|
|
if (GinPageIsLeaf(BufferGetPage(stack->buffer)))
|
|
|
|
{
|
|
|
|
|
|
|
|
PredicateLockPageSplit(btree->index,
|
2018-04-26 20:47:16 +02:00
|
|
|
BufferGetBlockNumber(stack->buffer),
|
|
|
|
BufferGetBlockNumber(rbuffer));
|
2018-03-30 13:23:17 +02:00
|
|
|
}
|
2013-11-27 18:21:23 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* OK, we have the new contents of the left page in a temporary copy
|
|
|
|
* now (newlpage), and likewise for the new contents of the
|
|
|
|
* newly-allocated right block. The original page is still unchanged.
|
2013-11-27 18:21:23 +01:00
|
|
|
*
|
|
|
|
* If this is a root split, we also have a temporary page containing
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* the new contents of the root.
|
2013-11-27 18:21:23 +01:00
|
|
|
*/
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
START_CRIT_SECTION();
|
2006-10-04 02:30:14 +02:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
MarkBufferDirty(rbuffer);
|
2014-01-22 17:51:48 +01:00
|
|
|
MarkBufferDirty(stack->buffer);
|
2010-08-01 04:12:42 +02:00
|
|
|
|
2014-01-22 17:51:48 +01:00
|
|
|
/*
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* Restore the temporary copies over the real buffers.
|
2014-01-22 17:51:48 +01:00
|
|
|
*/
|
2013-11-27 18:21:23 +01:00
|
|
|
if (stack->parent == NULL)
|
|
|
|
{
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* Splitting the root, three pages to update */
|
2013-11-27 18:21:23 +01:00
|
|
|
MarkBufferDirty(lbuffer);
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
memcpy(page, newrootpg, BLCKSZ);
|
2016-04-20 15:31:19 +02:00
|
|
|
memcpy(BufferGetPage(lbuffer), newlpage, BLCKSZ);
|
|
|
|
memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ);
|
2014-01-22 17:51:48 +01:00
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* Normal split, only two pages to update */
|
|
|
|
memcpy(page, newlpage, BLCKSZ);
|
2016-04-20 15:31:19 +02:00
|
|
|
memcpy(BufferGetPage(rbuffer), newrpage, BLCKSZ);
|
2013-11-27 18:21:23 +01:00
|
|
|
}
|
2006-05-02 13:28:56 +02:00
|
|
|
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
/* We also clear childbuf's INCOMPLETE_SPLIT flag, if passed */
|
|
|
|
if (BufferIsValid(childbuf))
|
|
|
|
{
|
|
|
|
GinPageGetOpaque(childpage)->flags &= ~GIN_INCOMPLETE_SPLIT;
|
|
|
|
MarkBufferDirty(childbuf);
|
|
|
|
}
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/* write WAL record */
|
Generate less WAL during GiST, GIN and SP-GiST index build.
Instead of WAL-logging every modification during the build separately,
first build the index without any WAL-logging, and make a separate pass
through the index at the end, to write all pages to the WAL. This
significantly reduces the amount of WAL generated, and is usually also
faster, despite the extra I/O needed for the extra scan through the index.
WAL generated this way is also faster to replay.
For GiST, the LSN-NSN interlock makes this a little tricky. All pages must
be marked with a valid (i.e. non-zero) LSN, so that the parent-child
LSN-NSN interlock works correctly. We now use magic value 1 for that during
index build. Change the fake LSN counter to begin from 1000, so that 1 is
safely smaller than any real or fake LSN. 2 would've been enough for our
purposes, but let's reserve a bigger range, in case we need more special
values in the future.
Author: Anastasia Lubennikova, Andrey V. Lepikhov
Reviewed-by: Heikki Linnakangas, Dmitry Dolgov
2019-04-03 16:03:15 +02:00
|
|
|
if (RelationNeedsWAL(btree->index) && !btree->isBuild)
|
2013-11-27 18:21:23 +01:00
|
|
|
{
|
|
|
|
XLogRecPtr recptr;
|
|
|
|
|
2015-06-28 20:59:29 +02:00
|
|
|
XLogBeginInsert();
|
|
|
|
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
/*
|
|
|
|
* We just take full page images of all the split pages. Splits
|
|
|
|
* are uncommon enough that it's not worth complicating the code
|
|
|
|
* to be more efficient.
|
|
|
|
*/
|
|
|
|
if (stack->parent == NULL)
|
|
|
|
{
|
|
|
|
XLogRegisterBuffer(0, lbuffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
|
|
|
|
XLogRegisterBuffer(1, rbuffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
|
|
|
|
XLogRegisterBuffer(2, stack->buffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
|
|
|
|
}
|
|
|
|
else
|
|
|
|
{
|
|
|
|
XLogRegisterBuffer(0, stack->buffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
|
|
|
|
XLogRegisterBuffer(1, rbuffer, REGBUF_FORCE_IMAGE | REGBUF_STANDARD);
|
|
|
|
}
|
|
|
|
if (BufferIsValid(childbuf))
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
XLogRegisterBuffer(3, childbuf, REGBUF_STANDARD);
|
Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.
There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.
This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.
For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.
The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.
Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 16:56:26 +01:00
|
|
|
|
|
|
|
XLogRegisterData((char *) &data, sizeof(ginxlogSplit));
|
|
|
|
|
|
|
|
recptr = XLogInsert(RM_GIN_ID, XLOG_GIN_SPLIT);
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
|
|
|
|
PageSetLSN(page, recptr);
|
2016-04-20 15:31:19 +02:00
|
|
|
PageSetLSN(BufferGetPage(rbuffer), recptr);
|
2013-11-27 18:21:23 +01:00
|
|
|
if (stack->parent == NULL)
|
2016-04-20 15:31:19 +02:00
|
|
|
PageSetLSN(BufferGetPage(lbuffer), recptr);
|
2014-04-01 21:45:10 +02:00
|
|
|
if (BufferIsValid(childbuf))
|
|
|
|
PageSetLSN(childpage, recptr);
|
2013-11-20 16:00:53 +01:00
|
|
|
}
|
2013-11-27 18:21:23 +01:00
|
|
|
END_CRIT_SECTION();
|
|
|
|
|
|
|
|
/*
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
* We can release the locks/pins on the new pages now, but keep
|
|
|
|
* stack->buffer locked. childbuf doesn't get unlocked either.
|
2013-11-27 18:21:23 +01:00
|
|
|
*/
|
|
|
|
UnlockReleaseBuffer(rbuffer);
|
|
|
|
if (stack->parent == NULL)
|
|
|
|
UnlockReleaseBuffer(lbuffer);
|
|
|
|
|
|
|
|
/*
|
|
|
|
* If we split the root, we're done. Otherwise the split is not
|
|
|
|
* complete until the downlink for the new page has been inserted to
|
|
|
|
* the parent.
|
|
|
|
*/
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
result = (stack->parent == NULL);
|
2013-11-20 16:00:53 +01:00
|
|
|
}
|
2014-01-22 17:51:48 +01:00
|
|
|
else
|
2014-01-23 21:14:20 +01:00
|
|
|
{
|
2019-08-05 05:14:58 +02:00
|
|
|
elog(ERROR, "invalid return code from GIN beginPlaceToPage method: %d", rc);
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
result = false; /* keep compiler quiet */
|
2014-01-23 21:14:20 +01:00
|
|
|
}
|
Fix memory leak and other bugs in ginPlaceToPage() & subroutines.
Commit 36a35c550ac114ca turned the interface between ginPlaceToPage and
its subroutines in gindatapage.c and ginentrypage.c into a royal mess:
page-update critical sections were started in one place and finished in
another place not even in the same file, and the very same subroutine
might return having started a critical section or not. Subsequent patches
band-aided over some of the problems with this design by making things
even messier.
One user-visible resulting problem is memory leaks caused by the need for
the subroutines to allocate storage that would survive until ginPlaceToPage
calls XLogInsert (as reported by Julien Rouhaud). This would not typically
be noticeable during retail index updates. It could be visible in a GIN
index build, in the form of memory consumption swelling to several times
the commanded maintenance_work_mem.
Another rather nasty problem is that in the internal-page-splitting code
path, we would clear the child page's GIN_INCOMPLETE_SPLIT flag well before
entering the critical section that it's supposed to be cleared in; a
failure in between would leave the index in a corrupt state. There were
also assorted coding-rule violations with little immediate consequence but
possible long-term hazards, such as beginning an XLogInsert sequence before
entering a critical section, or calling elog(DEBUG) inside a critical
section.
To fix, redefine the API between ginPlaceToPage() and its subroutines
by splitting the subroutines into two parts. The "beginPlaceToPage"
subroutine does what can be done outside a critical section, including
full computation of the result pages into temporary storage when we're
going to split the target page. The "execPlaceToPage" subroutine is called
within a critical section established by ginPlaceToPage(), and it handles
the actual page update in the non-split code path. The critical section,
as well as the XLOG insertion call sequence, are both now always started
and finished in ginPlaceToPage(). Also, make ginPlaceToPage() create and
work in a short-lived memory context to eliminate the leakage problem.
(Since a short-lived memory context had been getting created in the most
common code path in the subroutines, this shouldn't cause any noticeable
performance penalty; we're just moving the overhead up one call level.)
In passing, fix a bunch of comments that had gone unmaintained throughout
all this klugery.
Report: <571276DD.5050303@dalibo.com>
2016-04-20 20:25:15 +02:00
|
|
|
|
|
|
|
/* Clean up temp context */
|
|
|
|
MemoryContextSwitchTo(oldCxt);
|
|
|
|
MemoryContextDelete(tmpCxt);
|
|
|
|
|
|
|
|
return result;
|
2013-11-20 16:00:53 +01:00
|
|
|
}
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-20 16:00:53 +01:00
|
|
|
/*
|
2013-11-27 14:43:05 +01:00
|
|
|
* Finish a split by inserting the downlink for the new page to parent.
|
2013-11-20 16:00:53 +01:00
|
|
|
*
|
2013-11-27 14:43:05 +01:00
|
|
|
* On entry, stack->buffer is exclusively locked.
|
2013-11-20 16:00:53 +01:00
|
|
|
*
|
2013-11-27 18:21:23 +01:00
|
|
|
* If freestack is true, all the buffers are released and unlocked as we
|
|
|
|
* crawl up the tree, and 'stack' is freed. Otherwise stack->buffer is kept
|
|
|
|
* locked, and stack is unmodified, except for possibly moving right to find
|
|
|
|
* the correct parent of page.
|
2013-11-20 16:00:53 +01:00
|
|
|
*/
|
2013-11-27 18:21:23 +01:00
|
|
|
static void
|
|
|
|
ginFinishSplit(GinBtree btree, GinBtreeStack *stack, bool freestack,
|
|
|
|
GinStatsData *buildStats)
|
2013-11-20 16:00:53 +01:00
|
|
|
{
|
|
|
|
Page page;
|
2013-11-27 14:43:05 +01:00
|
|
|
bool done;
|
2013-11-27 18:21:23 +01:00
|
|
|
bool first = true;
|
|
|
|
|
|
|
|
/*
|
2014-05-06 18:12:18 +02:00
|
|
|
* freestack == false when we encounter an incompletely split page during
|
|
|
|
* a scan, while freestack == true is used in the normal scenario that a
|
2013-11-27 18:21:23 +01:00
|
|
|
* split is finished right after the initial insert.
|
|
|
|
*/
|
|
|
|
if (!freestack)
|
|
|
|
elog(DEBUG1, "finishing incomplete split of block %u in gin index \"%s\"",
|
|
|
|
stack->blkno, RelationGetRelationName(btree->index));
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2013-11-20 16:00:53 +01:00
|
|
|
/* this loop crawls up the stack until the insertion is complete */
|
2013-11-27 14:43:05 +01:00
|
|
|
do
|
2013-11-20 16:00:53 +01:00
|
|
|
{
|
2013-11-27 14:43:05 +01:00
|
|
|
GinBtreeStack *parent = stack->parent;
|
|
|
|
void *insertdata;
|
|
|
|
BlockNumber updateblkno;
|
2006-05-02 13:28:56 +02:00
|
|
|
|
|
|
|
/* search parent to lock */
|
|
|
|
LockBuffer(parent->buffer, GIN_EXCLUSIVE);
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/*
|
|
|
|
* If the parent page was incompletely split, finish that split first,
|
|
|
|
* then continue with the current one.
|
|
|
|
*
|
|
|
|
* Note: we have to finish *all* incomplete splits we encounter, even
|
2014-05-06 18:12:18 +02:00
|
|
|
* if we have to move right. Otherwise we might choose as the target a
|
|
|
|
* page that has no downlink in the parent, and splitting it further
|
2013-11-27 18:21:23 +01:00
|
|
|
* would fail.
|
|
|
|
*/
|
2016-04-20 15:31:19 +02:00
|
|
|
if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
|
2013-11-27 18:21:23 +01:00
|
|
|
ginFinishSplit(btree, parent, false, buildStats);
|
|
|
|
|
2006-05-02 13:28:56 +02:00
|
|
|
/* move right if it's needed */
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(parent->buffer);
|
2006-10-04 02:30:14 +02:00
|
|
|
while ((parent->off = btree->findChildPtr(btree, page, stack->blkno, parent->off)) == InvalidOffsetNumber)
|
|
|
|
{
|
2013-11-27 18:21:23 +01:00
|
|
|
if (GinPageRightMost(page))
|
2006-10-04 02:30:14 +02:00
|
|
|
{
|
|
|
|
/*
|
|
|
|
* rightmost page, but we don't find parent, we should use
|
|
|
|
* plain search...
|
|
|
|
*/
|
2013-11-08 21:21:42 +01:00
|
|
|
LockBuffer(parent->buffer, GIN_UNLOCK);
|
2013-11-27 14:43:05 +01:00
|
|
|
ginFindParents(btree, stack);
|
2006-10-04 02:30:14 +02:00
|
|
|
parent = stack->parent;
|
2012-07-13 17:37:39 +02:00
|
|
|
Assert(parent != NULL);
|
2006-05-02 13:28:56 +02:00
|
|
|
break;
|
|
|
|
}
|
|
|
|
|
2013-11-08 21:21:42 +01:00
|
|
|
parent->buffer = ginStepRight(parent->buffer, btree->index, GIN_EXCLUSIVE);
|
2013-11-27 18:21:23 +01:00
|
|
|
parent->blkno = BufferGetBlockNumber(parent->buffer);
|
2016-04-20 15:31:19 +02:00
|
|
|
page = BufferGetPage(parent->buffer);
|
2006-05-02 13:28:56 +02:00
|
|
|
|
2016-04-20 15:31:19 +02:00
|
|
|
if (GinPageIsIncompleteSplit(BufferGetPage(parent->buffer)))
|
2013-11-27 18:21:23 +01:00
|
|
|
ginFinishSplit(btree, parent, false, buildStats);
|
|
|
|
}
|
2013-11-27 14:43:05 +01:00
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/* insert the downlink */
|
|
|
|
insertdata = btree->prepareDownlink(btree, stack->buffer);
|
2016-04-20 15:31:19 +02:00
|
|
|
updateblkno = GinPageGetOpaque(BufferGetPage(stack->buffer))->rightlink;
|
2013-11-27 18:21:23 +01:00
|
|
|
done = ginPlaceToPage(btree, parent,
|
2013-11-27 14:43:05 +01:00
|
|
|
insertdata, updateblkno,
|
2013-11-27 18:21:23 +01:00
|
|
|
stack->buffer, buildStats);
|
2013-11-27 14:43:05 +01:00
|
|
|
pfree(insertdata);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
/*
|
|
|
|
* If the caller requested to free the stack, unlock and release the
|
|
|
|
* child buffer now. Otherwise keep it pinned and locked, but if we
|
|
|
|
* have to recurse up the tree, we can unlock the upper pages, only
|
|
|
|
* keeping the page at the bottom of the stack locked.
|
|
|
|
*/
|
|
|
|
if (!first || freestack)
|
|
|
|
LockBuffer(stack->buffer, GIN_UNLOCK);
|
|
|
|
if (freestack)
|
|
|
|
{
|
|
|
|
ReleaseBuffer(stack->buffer);
|
|
|
|
pfree(stack);
|
|
|
|
}
|
|
|
|
stack = parent;
|
|
|
|
|
|
|
|
first = false;
|
2013-11-27 14:43:05 +01:00
|
|
|
} while (!done);
|
2013-11-27 18:21:23 +01:00
|
|
|
|
|
|
|
/* unlock the parent */
|
2013-11-27 14:43:05 +01:00
|
|
|
LockBuffer(stack->buffer, GIN_UNLOCK);
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
if (freestack)
|
|
|
|
freeGinBtreeStack(stack);
|
2013-11-27 14:43:05 +01:00
|
|
|
}
|
|
|
|
|
|
|
|
/*
|
|
|
|
* Insert a value to tree described by stack.
|
|
|
|
*
|
|
|
|
* The value to be inserted is given in 'insertdata'. Its format depends
|
|
|
|
* on whether this is an entry or data tree, ginInsertValue just passes it
|
|
|
|
* through to the tree-specific callback function.
|
|
|
|
*
|
|
|
|
* During an index build, buildStats is non-null and the counters it contains
|
|
|
|
* are incremented as needed.
|
|
|
|
*
|
|
|
|
* NB: the passed-in stack is freed, as though by freeGinBtreeStack.
|
|
|
|
*/
|
|
|
|
void
|
|
|
|
ginInsertValue(GinBtree btree, GinBtreeStack *stack, void *insertdata,
|
|
|
|
GinStatsData *buildStats)
|
|
|
|
{
|
|
|
|
bool done;
|
|
|
|
|
2013-11-27 18:21:23 +01:00
|
|
|
/* If the leaf page was incompletely split, finish the split first */
|
2016-04-20 15:31:19 +02:00
|
|
|
if (GinPageIsIncompleteSplit(BufferGetPage(stack->buffer)))
|
2013-11-27 18:21:23 +01:00
|
|
|
ginFinishSplit(btree, stack, false, buildStats);
|
|
|
|
|
2013-11-27 14:43:05 +01:00
|
|
|
done = ginPlaceToPage(btree, stack,
|
|
|
|
insertdata, InvalidBlockNumber,
|
2013-11-27 18:21:23 +01:00
|
|
|
InvalidBuffer, buildStats);
|
2013-11-27 14:43:05 +01:00
|
|
|
if (done)
|
|
|
|
{
|
|
|
|
LockBuffer(stack->buffer, GIN_UNLOCK);
|
|
|
|
freeGinBtreeStack(stack);
|
2006-05-02 13:28:56 +02:00
|
|
|
}
|
2013-11-27 14:43:05 +01:00
|
|
|
else
|
2013-11-27 18:21:23 +01:00
|
|
|
ginFinishSplit(btree, stack, true, buildStats);
|
2006-05-02 13:28:56 +02:00
|
|
|
}
|