Fix failure to delete spill files of aborted transactions

Logical decoding's reorderbuffer.c may spill transaction files to disk
when transactions are large.  These are supposed to be removed when they
become "too old" by xid; but file removal requires the boundary LSNs of
the transaction to be known.  The final_lsn is only set when we see the
commit or abort record for the transaction, but nothing sets the value
for transactions that crash, so the removal code misbehaves -- in
assertion-enabled builds, it crashes by a failed assertion.

To fix, modify the final_lsn of transactions that don't have a value
set, to the LSN of the very latest change in the transaction.  This
causes the spilled files to be removed appropriately.

Author: Atsushi Torikoshi
Reviewed-by: Kyotaro HORIGUCHI, Craig Ringer, Masahiko Sawada
Discussion: https://postgr.es/m/54e4e488-186b-a056-6628-50628e4e4ebc@lab.ntt.co.jp
This commit is contained in:
Alvaro Herrera 2018-01-05 12:17:10 -03:00
parent 054e8c6cdb
commit df9f682c7b
2 changed files with 19 additions and 2 deletions

View File

@ -1670,8 +1670,8 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId oldestRunningXid)
* Iterate through all (potential) toplevel TXNs and abort all that are
* older than what possibly can be running. Once we've found the first
* that is alive we stop, there might be some that acquired an xid earlier
* but started writing later, but it's unlikely and they will cleaned up
* in a later call to ReorderBufferAbortOld().
* but started writing later, but it's unlikely and they will be cleaned
* up in a later call to this function.
*/
dlist_foreach_modify(it, &rb->toplevel_by_lsn)
{
@ -1681,6 +1681,21 @@ ReorderBufferAbortOld(ReorderBuffer *rb, TransactionId oldestRunningXid)
if (TransactionIdPrecedes(txn->xid, oldestRunningXid))
{
/*
* We set final_lsn on a transaction when we decode its commit or
* abort record, but we never see those records for crashed
* transactions. To ensure cleanup of these transactions, set
* final_lsn to that of their last change; this causes
* ReorderBufferRestoreCleanup to do the right thing.
*/
if (txn->serialized && txn->final_lsn == 0)
{
ReorderBufferChange *last =
dlist_tail_element(ReorderBufferChange, node, &txn->changes);
txn->final_lsn = last->lsn;
}
elog(DEBUG2, "aborting old transaction %u", txn->xid);
/* remove potential on-disk data, and deallocate this tx */

View File

@ -168,6 +168,8 @@ typedef struct ReorderBufferTXN
* * plain abort record
* * prepared transaction abort
* * error during decoding
* * for a crashed transaction, the LSN of the last change, regardless of
* what it was.
* ----
*/
XLogRecPtr final_lsn;