Keep replication slot statistics on invalidation

The code path in charge of invalidating a replication slot includes a
call to pgstat_drop_replslot(), which would result in removing the
statistics of the slot once invalidated.  However, there is no need to
remove the statistics of an invalidated slot as one could still be
interested in looking at them to understand the activity of the slot
until its actual removal.

The initial design of the feature committed in be87200efd used the
approach to drop the slots, which is likely why the statistics were
still removed during the invalidation.

Another problem with this operation is that it was done without holding
ReplicationSlotAllocationLock, leaving it unprotected on concurrent
activity.  This part is arguably a bug, but that's a limited problem in
practice so no backpatch is done.

In passing, this commit adds a test to check this behavior.  The only
remaining code path where slot statistics are dropped now related to the
slot getting dropped.

Author: Bertrand Drouvot
Discussion: https://postgr.es/m/ZermH08Eq6YydHpO@ip-10-97-1-34.eu-west-3.compute.internal
This commit is contained in:
Michael Paquier 2024-03-12 14:22:31 +09:00
parent 397cd0b3c7
commit d6e171fed6
2 changed files with 20 additions and 1 deletions

View File

@ -1726,7 +1726,6 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlotInvalidationCause cause,
ReplicationSlotMarkDirty();
ReplicationSlotSave();
ReplicationSlotRelease();
pgstat_drop_replslot(s);
ReportSlotInvalidation(conflict, false, active_pid,
slotname, restart_lsn,

View File

@ -494,6 +494,9 @@ $node_subscriber->stop;
##################################################
# Recovery conflict: Invalidate conflicting slots, including in-use slots
# Scenario 1: hot_standby_feedback off and vacuum FULL
#
# In passing, ensure that replication slot stats are not removed when the
# active slot is invalidated.
##################################################
# One way to produce recovery conflict is to create/drop a relation and
@ -502,6 +505,15 @@ $node_subscriber->stop;
reactive_slots_change_hfs_and_wait_for_xmins('behaves_ok_', 'vacuum_full_',
0, 1);
# Ensure that replication slot stats are not empty before triggering the
# conflict.
$node_primary->safe_psql('testdb',
qq[INSERT INTO decoding_test(x,y) SELECT 100,'100';]);
$node_standby->poll_query_until('testdb',
qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']
) or die "replication slot stats of vacuum_full_activeslot not updated";
# This should trigger the conflict
wait_until_vacuum_can_remove(
'full', 'CREATE TABLE conflict_test(x integer, y text);
@ -515,6 +527,14 @@ check_for_invalidation('vacuum_full_', 1, 'with vacuum FULL on pg_class');
# Verify conflict_reason is 'rows_removed' in pg_replication_slots
check_slots_conflict_reason('vacuum_full_', 'rows_removed');
# Ensure that replication slot stats are not removed after invalidation.
is( $node_standby->safe_psql(
'testdb',
qq[SELECT total_txns > 0 FROM pg_stat_replication_slots WHERE slot_name = 'vacuum_full_activeslot']
),
't',
'replication slot stats not removed after invalidation');
$handle =
make_slot_active($node_standby, 'vacuum_full_', 0, \$stdout, \$stderr);