Improve documentation about behavior of multi-statement Query messages.

We've long done our best to sweep this topic under the rug, but in view
of recent work it seems like it's time to explain it more precisely.
Here's an initial cut at doing that.

Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05
This commit is contained in:
Tom Lane 2017-09-07 14:04:41 -04:00
parent 1356f78ea9
commit b976499480
3 changed files with 168 additions and 4 deletions

View File

@ -2223,7 +2223,9 @@ PGresult *PQexec(PGconn *conn, const char *command);
<function>PQexec</> call are processed in a single transaction, unless
there are explicit <command>BEGIN</command>/<command>COMMIT</command>
commands included in the query string to divide it into multiple
transactions. Note however that the returned
transactions. (See <xref linkend="protocol-flow-multi-statement">
for more details about how the server handles multi-query strings.)
Note however that the returned
<structname>PGresult</structname> structure describes only the result
of the last command executed from the string. Should one of the
commands fail, processing of the string stops with it and the returned

View File

@ -675,6 +675,125 @@
that will accept any message type at any time that it could make sense,
rather than wiring in assumptions about the exact sequence of messages.
</para>
<sect3 id="protocol-flow-multi-statement">
<title>Multiple Statements in a Simple Query</title>
<para>
When a simple Query message contains more than one SQL statement
(separated by semicolons), those statements are executed as a single
transaction, unless explicit transaction control commands are included
to force a different behavior. For example, if the message contains
<programlisting>
INSERT INTO mytable VALUES(1);
SELECT 1/0;
INSERT INTO mytable VALUES(2);
</programlisting>
then the divide-by-zero failure in the <command>SELECT</> will force
rollback of the first <command>INSERT</>. Furthermore, because
execution of the message is abandoned at the first error, the second
<command>INSERT</> is never attempted at all.
</para>
<para>
If instead the message contains
<programlisting>
BEGIN;
INSERT INTO mytable VALUES(1);
COMMIT;
INSERT INTO mytable VALUES(2);
SELECT 1/0;
</programlisting>
then the first <command>INSERT</> is committed by the
explicit <command>COMMIT</> command. The second <command>INSERT</>
and the <command>SELECT</> are still treated as a single transaction,
so that the divide-by-zero failure will roll back the
second <command>INSERT</>, but not the first one.
</para>
<para>
This behavior is implemented by running the statements in a
multi-statement Query message in an <firstterm>implicit transaction
block</> unless there is some explicit transaction block for them to
run in. The main difference between an implicit transaction block and
a regular one is that an implicit block is closed automatically at the
end of the Query message, either by an implicit commit if there was no
error, or an implicit rollback if there was an error. This is similar
to the implicit commit or rollback that happens for a statement
executed by itself (when not in a transaction block).
</para>
<para>
If the session is already in a transaction block, as a result of
a <command>BEGIN</> in some previous message, then the Query message
simply continues that transaction block, whether the message contains
one statement or several. However, if the Query message contains
a <command>COMMIT</> or <command>ROLLBACK</> closing the existing
transaction block, then any following statements are executed in an
implicit transaction block.
Conversely, if a <command>BEGIN</> appears in a multi-statement Query
message, then it starts a regular transaction block that will only be
terminated by an explicit <command>COMMIT</> or <command>ROLLBACK</>,
whether that appears in this Query message or a later one.
If the <command>BEGIN</> follows some statements that were executed as
an implicit transaction block, those statements are not immediately
committed; in effect, they are retroactively included into the new
regular transaction block.
</para>
<para>
A <command>COMMIT</> or <command>ROLLBACK</> appearing in an implicit
transaction block is executed as normal, closing the implicit block;
however, a warning will be issued since a <command>COMMIT</>
or <command>ROLLBACK</> without a previous <command>BEGIN</> might
represent a mistake. If more statements follow, a new implicit
transaction block will be started for them.
</para>
<para>
Savepoints are not allowed in an implicit transaction block, since
they would conflict with the behavior of automatically closing the
block upon any error.
</para>
<para>
Remember that, regardless of any transaction control commands that may
be present, execution of the Query message stops at the first error.
Thus for example given
<programlisting>
BEGIN;
SELECT 1/0;
ROLLBACK;
</programlisting>
in a single Query message, the session will be left inside a failed
regular transaction block, since the <command>ROLLBACK</> is not
reached after the divide-by-zero error. Another <command>ROLLBACK</>
will be needed to restore the session to a usable state.
</para>
<para>
Another behavior of note is that initial lexical and syntactic
analysis is done on the entire query string before any of it is
executed. Thus simple errors (such as a misspelled keyword) in later
statements can prevent execution of any of the statements. This
is normally invisible to users since the statements would all roll
back anyway when done as an implicit transaction block. However,
it can be visible when attempting to do multiple transactions within a
multi-statement Query. For instance, if a typo turned our previous
example into
<programlisting>
BEGIN;
INSERT INTO mytable VALUES(1);
COMMIT;
INSERT INTO mytable VALUES(2);
SELCT 1/0;
</programlisting>
then none of the statements would get run, resulting in the visible
difference that the first <command>INSERT</> is not committed.
Errors detected at semantic analysis or later, such as a misspelled
table or column name, do not have this effect.
</para>
</sect3>
</sect2>
<sect2 id="protocol-flow-ext-query">

View File

@ -120,12 +120,14 @@ echo '\x \\ SELECT * FROM foo;' | psql
</para>
<para>
Each <acronym>SQL</acronym> command string passed
to <option>-c</option> is sent to the server as a single query.
to <option>-c</option> is sent to the server as a single request.
Because of this, the server executes it as a single transaction even
if the string contains multiple <acronym>SQL</acronym> commands,
unless there are explicit <command>BEGIN</>/<command>COMMIT</>
commands included in the string to divide it into multiple
transactions. Also, <application>psql</application> only prints the
transactions. (See <xref linkend="protocol-flow-multi-statement">
for more details about how the server handles multi-query strings.)
Also, <application>psql</application> only prints the
result of the last <acronym>SQL</acronym> command in the string.
This is different from the behavior when the same string is read from
a file or fed to <application>psql</application>'s standard input,
@ -133,7 +135,7 @@ echo '\x \\ SELECT * FROM foo;' | psql
each <acronym>SQL</acronym> command separately.
</para>
<para>
Because of this behavior, putting more than one command in a
Because of this behavior, putting more than one SQL command in a
single <option>-c</option> string often has unexpected results.
It's better to use repeated <option>-c</option> commands or feed
multiple commands to <application>psql</application>'s standard input,
@ -3179,6 +3181,47 @@ testdb=&gt; <userinput>\setenv LESS -imx4F</userinput>
</listitem>
</varlistentry>
<varlistentry>
<term><literal>\;</literal></term>
<listitem>
<para>
Backslash-semicolon is not a meta-command in the same way as the
preceding commands; rather, it simply causes a semicolon to be
added to the query buffer without any further processing.
</para>
<para>
Normally, <application>psql</> will dispatch a SQL command to the
server as soon as it reaches the command-ending semicolon, even if
more input remains on the current line. Thus for example entering
<programlisting>
select 1; select 2; select 3;
</programlisting>
will result in the three SQL commands being individually sent to
the server, with each one's results being displayed before
continuing to the next command. However, a semicolon entered
as <literal>\;</> will not trigger command processing, so that the
command before it and the one after are effectively combined and
sent to the server in one request. So for example
<programlisting>
select 1\; select 2\; select 3;
</programlisting>
results in sending the three SQL commands to the server in a single
request, when the non-backslashed semicolon is reached.
The server executes such a request as a single transaction,
unless there are explicit <command>BEGIN</>/<command>COMMIT</>
commands included in the string to divide it into multiple
transactions. (See <xref linkend="protocol-flow-multi-statement">
for more details about how the server handles multi-query strings.)
<application>psql</application> prints only the last query result
it receives for each request; in this example, although all
three <command>SELECT</>s are indeed executed, <application>psql</>
only prints the <literal>3</>.
</para>
</listitem>
</varlistentry>
</variablelist>
</para>