Avoid unnecessary recursion to child tables in ALTER TABLE SET NOT NULL.

If a partitioned table's column is already marked NOT NULL, there is
no need to examine its partitions, because we can rely on previous
DDL to have enforced that the child columns are NOT NULL as well.
(Unfortunately, the same cannot be said for traditional inheritance,
so for now we have to restrict the optimization to partitioned tables.)
Hence, we may skip recursing to child tables in this situation.

The reason this case is worth worrying about is that when pg_dump dumps
a partitioned table having a primary key, it will include the requisite
NOT NULL markings in the CREATE TABLE commands, and then add the
primary key as a separate step.  The primary key addition generates a
SET NOT NULL as a subcommand, just to be sure.  So the situation where
a SET NOT NULL is redundant does arise in the real world.

Skipping the recursion does more than just save a few cycles: it means
that a command such as "ALTER TABLE ONLY partition_parent ADD PRIMARY
KEY" will take locks only on the partition parent table, not on the
partitions.  It turns out that parallel pg_restore is effectively
assuming that that's true, and has little choice but to do so because
the dependencies listed for such a TOC entry don't include the
partitions.  pg_restore could thus issue this ALTER while data restores
on the partitions are still in progress.  Taking unnecessary locks on
the partitions not only hurts concurrency, but can lead to actual
deadlock failures, as reported by Domagoj Smoljanovic.

(A contributing factor in the deadlock is that TRUNCATE on a child
partition wants a non-exclusive lock on the parent.  This seems
likewise unnecessary, but the fix for it is more invasive so we
won't consider back-patching it.  Fortunately, getting rid of one
of these two poor behaviors is enough to remove the deadlock.)

Although support for partitioned primary keys came in with v11,
this patch is dependent on the SET NOT NULL refactoring done by
commit f4a3fdfbd, so we can only patch back to v12.

Patch by me; thanks to Alvaro Herrera and Amit Langote for review.

Discussion: https://postgr.es/m/VI1PR03MB31670CA1BD9625C3A8C5DD05EB230@VI1PR03MB3167.eurprd03.prod.outlook.com
This commit is contained in:
Tom Lane 2020-09-16 13:38:26 -04:00
parent 3d65b0593c
commit e5fac1cb19
1 changed files with 38 additions and 7 deletions

View File

@ -5681,14 +5681,10 @@ ATSimpleRecursion(List **wqueue, Relation rel,
AlterTableUtilityContext *context)
{
/*
* Propagate to children if desired. Only plain tables, foreign tables
* and partitioned tables have children, so no need to search for other
* relkinds.
* Propagate to children, if desired and if there are (or might be) any
* children.
*/
if (recurse &&
(rel->rd_rel->relkind == RELKIND_RELATION ||
rel->rd_rel->relkind == RELKIND_FOREIGN_TABLE ||
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE))
if (recurse && rel->rd_rel->relhassubclass)
{
Oid relid = RelationGetRelid(rel);
ListCell *child;
@ -6698,6 +6694,41 @@ ATPrepSetNotNull(List **wqueue, Relation rel,
if (recursing)
return;
/*
* If the target column is already marked NOT NULL, we can skip recursing
* to children, because their columns should already be marked NOT NULL as
* well. But there's no point in checking here unless the relation has
* some children; else we can just wait till execution to check. (If it
* does have children, however, this can save taking per-child locks
* unnecessarily. This greatly improves concurrency in some parallel
* restore scenarios.)
*
* Unfortunately, we can only apply this optimization to partitioned
* tables, because traditional inheritance doesn't enforce that child
* columns be NOT NULL when their parent is. (That's a bug that should
* get fixed someday.)
*/
if (rel->rd_rel->relhassubclass &&
rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
{
HeapTuple tuple;
bool attnotnull;
tuple = SearchSysCacheAttName(RelationGetRelid(rel), cmd->name);
/* Might as well throw the error now, if name is bad */
if (!HeapTupleIsValid(tuple))
ereport(ERROR,
(errcode(ERRCODE_UNDEFINED_COLUMN),
errmsg("column \"%s\" of relation \"%s\" does not exist",
cmd->name, RelationGetRelationName(rel))));
attnotnull = ((Form_pg_attribute) GETSTRUCT(tuple))->attnotnull;
ReleaseSysCache(tuple);
if (attnotnull)
return;
}
/*
* If we have ALTER TABLE ONLY ... SET NOT NULL on a partitioned table,
* apply ALTER TABLE ... CHECK NOT NULL to every child. Otherwise, use