Doc: show how to get the equivalent of LIMIT for UPDATE/DELETE.

Add examples showing use of a CTE and a self-join to perform
partial UPDATEs and DELETEs.

Corey Huinker, reviewed by Laurenz Albe

Discussion: https://postgr.es/m/CADkLM=caNEQsUwPWnfi2jR4ix99E0EJM_3jtcE-YjnEQC7Rssw@mail.gmail.com
This commit is contained in:
Tom Lane 2024-04-07 16:26:47 -04:00
parent 1973d9fb31
commit 2daeba6a4e
2 changed files with 61 additions and 3 deletions

View File

@ -260,12 +260,32 @@ DELETE FROM tasks WHERE status = 'DONE' RETURNING *;
</programlisting>
</para>
<para>
<para>
Delete the row of <structname>tasks</structname> on which the cursor
<literal>c_tasks</literal> is currently positioned:
<programlisting>
DELETE FROM tasks WHERE CURRENT OF c_tasks;
</programlisting></para>
</programlisting>
</para>
<para>
While there is no <literal>LIMIT</literal> clause
for <command>DELETE</command>, it is possible to get a similar effect
using the same method described in <link linkend="update-limit">the
documentation of <command>UPDATE</command></link>:
<programlisting>
WITH delete_batch AS (
SELECT l.ctid FROM user_logs AS l
WHERE l.status = 'archived'
ORDER BY l.creation_date
FOR UPDATE
LIMIT 10000
)
DELETE FROM user_logs AS dl
USING delete_batch AS del
WHERE dl.ctid = del.ctid;
</programlisting>
</para>
</refsect1>
<refsect1>

View File

@ -441,7 +441,45 @@ COMMIT;
<literal>c_films</literal> is currently positioned:
<programlisting>
UPDATE films SET kind = 'Dramatic' WHERE CURRENT OF c_films;
</programlisting></para>
</programlisting>
</para>
<para id="update-limit">
Updates affecting many rows can have negative effects on system
performance, such as table bloat, increased replica lag, and increased
lock contention. In such situations it can make sense to perform the
operation in smaller batches, possibly with a <command>VACUUM</command>
operation on the table between batches. While there is
no <literal>LIMIT</literal> clause for <command>UPDATE</command>, it is
possible to get a similar effect through the use of
a <link linkend="queries-with">Common Table Expression</link> and a
self-join. With the standard <productname>PostgreSQL</productname>
table access method, a self-join on the system
column <link linkend="ddl-system-columns-ctid">ctid</link> is very
efficient:
<programlisting>
WITH exceeded_max_retries AS (
SELECT w.ctid FROM work_item AS w
WHERE w.status = 'active' AND w.num_retries &gt; 10
ORDER BY w.retry_timestamp
FOR UPDATE
LIMIT 5000
)
UPDATE work_item SET status = 'failed'
FROM exceeded_max_retries AS emr
WHERE work_item.ctid = emr.ctid;
</programlisting>
This command will need to be repeated until no rows remain to be updated.
Use of an <literal>ORDER BY</literal> clause allows the command to
prioritize which rows will be updated; it can also prevent deadlock
with other update operations if they use the same ordering.
If lock contention is a concern, then <literal>SKIP LOCKED</literal>
can be added to the <acronym>CTE</acronym> to prevent multiple commands
from updating the same row. However, then a
final <command>UPDATE</command> without <literal>SKIP LOCKED</literal>
or <literal>LIMIT</literal> will be needed to ensure that no matching
rows were overlooked.
</para>
</refsect1>
<refsect1>