Improve documentation about behavior of multi-statement Query messages.

We've long done our best to sweep this topic under the rug, but in view of recent work it seems like it's time to explain it more precisely. Here's an initial cut at doing that. Discussion: https://postgr.es/m/0A3221C70F24FB45833433255569204D1F6BE40D@G01JPEXMBYT05
2017-09-07 14:04:41 -04:00 · 2017-09-07 14:04:41 -04:00 · b976499480
parent 1356f78ea9
commit b976499480
3 changed files with 168 additions and 4 deletions
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@ -2223,7 +2223,9 @@ PGresult *PQexec(PGconn *conn, const char *command);
    <function>PQexec</> call are processed in a single transaction, unless
    there are explicit <command>BEGIN</command>/<command>COMMIT</command>
    commands included in the query string to divide it into multiple
-    transactions.  Note however that the returned
+    transactions.  (See <xref linkend="protocol-flow-multi-statement">
+    for more details about how the server handles multi-query strings.)
+    Note however that the returned
    <structname>PGresult</structname> structure describes only the result
    of the last command executed from the string.  Should one of the
    commands fail, processing of the string stops with it and the returned
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@ -675,6 +675,125 @@
    that will accept any message type at any time that it could make sense,
    rather than wiring in assumptions about the exact sequence of messages.
   </para>
+
+   <sect3 id="protocol-flow-multi-statement">
+    <title>Multiple Statements in a Simple Query</title>
+
+    <para>
+     When a simple Query message contains more than one SQL statement
+     (separated by semicolons), those statements are executed as a single
+     transaction, unless explicit transaction control commands are included
+     to force a different behavior.  For example, if the message contains
+<programlisting>
+INSERT INTO mytable VALUES(1);
+SELECT 1/0;
+INSERT INTO mytable VALUES(2);
+</programlisting>
+     then the divide-by-zero failure in the <command>SELECT</> will force
+     rollback of the first <command>INSERT</>.  Furthermore, because
+     execution of the message is abandoned at the first error, the second
+     <command>INSERT</> is never attempted at all.
+    </para>
+
+    <para>
+     If instead the message contains
+<programlisting>
+BEGIN;
+INSERT INTO mytable VALUES(1);
+COMMIT;
+INSERT INTO mytable VALUES(2);
+SELECT 1/0;
+</programlisting>
+     then the first <command>INSERT</> is committed by the
+     explicit <command>COMMIT</> command.  The second <command>INSERT</>
+     and the <command>SELECT</> are still treated as a single transaction,
+     so that the divide-by-zero failure will roll back the
+     second <command>INSERT</>, but not the first one.
+    </para>
+
+    <para>
+     This behavior is implemented by running the statements in a
+     multi-statement Query message in an <firstterm>implicit transaction
+     block</> unless there is some explicit transaction block for them to
+     run in.  The main difference between an implicit transaction block and
+     a regular one is that an implicit block is closed automatically at the
+     end of the Query message, either by an implicit commit if there was no
+     error, or an implicit rollback if there was an error.  This is similar
+     to the implicit commit or rollback that happens for a statement
+     executed by itself (when not in a transaction block).
+    </para>
+
+    <para>
+     If the session is already in a transaction block, as a result of
+     a <command>BEGIN</> in some previous message, then the Query message
+     simply continues that transaction block, whether the message contains
+     one statement or several.  However, if the Query message contains
+     a <command>COMMIT</> or <command>ROLLBACK</> closing the existing
+     transaction block, then any following statements are executed in an
+     implicit transaction block.
+     Conversely, if a <command>BEGIN</> appears in a multi-statement Query
+     message, then it starts a regular transaction block that will only be
+     terminated by an explicit <command>COMMIT</> or <command>ROLLBACK</>,
+     whether that appears in this Query message or a later one.
+     If the <command>BEGIN</> follows some statements that were executed as
+     an implicit transaction block, those statements are not immediately
+     committed; in effect, they are retroactively included into the new
+     regular transaction block.
+    </para>
+
+    <para>
+     A <command>COMMIT</> or <command>ROLLBACK</> appearing in an implicit
+     transaction block is executed as normal, closing the implicit block;
+     however, a warning will be issued since a <command>COMMIT</>
+     or <command>ROLLBACK</> without a previous <command>BEGIN</> might
+     represent a mistake.  If more statements follow, a new implicit
+     transaction block will be started for them.
+    </para>
+
+    <para>
+     Savepoints are not allowed in an implicit transaction block, since
+     they would conflict with the behavior of automatically closing the
+     block upon any error.
+    </para>
+
+    <para>
+     Remember that, regardless of any transaction control commands that may
+     be present, execution of the Query message stops at the first error.
+     Thus for example given
+<programlisting>
+BEGIN;
+SELECT 1/0;
+ROLLBACK;
+</programlisting>
+     in a single Query message, the session will be left inside a failed
+     regular transaction block, since the <command>ROLLBACK</> is not
+     reached after the divide-by-zero error.  Another <command>ROLLBACK</>
+     will be needed to restore the session to a usable state.
+    </para>
+
+    <para>
+     Another behavior of note is that initial lexical and syntactic
+     analysis is done on the entire query string before any of it is
+     executed.  Thus simple errors (such as a misspelled keyword) in later
+     statements can prevent execution of any of the statements.  This
+     is normally invisible to users since the statements would all roll
+     back anyway when done as an implicit transaction block.  However,
+     it can be visible when attempting to do multiple transactions within a
+     multi-statement Query.  For instance, if a typo turned our previous
+     example into
+<programlisting>
+BEGIN;
+INSERT INTO mytable VALUES(1);
+COMMIT;
+INSERT INTO mytable VALUES(2);
+SELCT 1/0;
+</programlisting>
+     then none of the statements would get run, resulting in the visible
+     difference that the first <command>INSERT</> is not committed.
+     Errors detected at semantic analysis or later, such as a misspelled
+     table or column name, do not have this effect.
+    </para>
+   </sect3>
  </sect2>

  <sect2 id="protocol-flow-ext-query">
--- a/doc/src/sgml/ref/psql-ref.sgml
+++ b/doc/src/sgml/ref/psql-ref.sgml
@ -120,12 +120,14 @@ echo '\x \\ SELECT * FROM foo;' | psql
      </para>
      <para>
       Each <acronym>SQL</acronym> command string passed
-       to <option>-c</option> is sent to the server as a single query.
+       to <option>-c</option> is sent to the server as a single request.
       Because of this, the server executes it as a single transaction even
       if the string contains multiple <acronym>SQL</acronym> commands,
       unless there are explicit <command>BEGIN</>/<command>COMMIT</>
       commands included in the string to divide it into multiple
-       transactions.  Also, <application>psql</application> only prints the
+       transactions.  (See <xref linkend="protocol-flow-multi-statement">
+       for more details about how the server handles multi-query strings.)
+       Also, <application>psql</application> only prints the
       result of the last <acronym>SQL</acronym> command in the string.
       This is different from the behavior when the same string is read from
       a file or fed to <application>psql</application>'s standard input,
@ -133,7 +135,7 @@ echo '\x \\ SELECT * FROM foo;' | psql
       each <acronym>SQL</acronym> command separately.
      </para>
      <para>
-       Because of this behavior, putting more than one command in a
+       Because of this behavior, putting more than one SQL command in a
       single <option>-c</option> string often has unexpected results.
       It's better to use repeated <option>-c</option> commands or feed
       multiple commands to <application>psql</application>'s standard input,
@ -3179,6 +3181,47 @@ testdb=&gt; <userinput>\setenv LESS -imx4F</userinput>
        </listitem>
      </varlistentry>

+
+      <varlistentry>
+        <term><literal>\;</literal></term>
+        <listitem>
+        <para>
+        Backslash-semicolon is not a meta-command in the same way as the
+        preceding commands; rather, it simply causes a semicolon to be
+        added to the query buffer without any further processing.
+        </para>
+
+        <para>
+        Normally, <application>psql</> will dispatch a SQL command to the
+        server as soon as it reaches the command-ending semicolon, even if
+        more input remains on the current line.  Thus for example entering
+<programlisting>
+select 1; select 2; select 3;
+</programlisting>
+        will result in the three SQL commands being individually sent to
+        the server, with each one's results being displayed before
+        continuing to the next command.  However, a semicolon entered
+        as <literal>\;</> will not trigger command processing, so that the
+        command before it and the one after are effectively combined and
+        sent to the server in one request.  So for example
+<programlisting>
+select 1\; select 2\; select 3;
+</programlisting>
+        results in sending the three SQL commands to the server in a single
+        request, when the non-backslashed semicolon is reached.
+        The server executes such a request as a single transaction,
+        unless there are explicit <command>BEGIN</>/<command>COMMIT</>
+        commands included in the string to divide it into multiple
+        transactions.  (See <xref linkend="protocol-flow-multi-statement">
+        for more details about how the server handles multi-query strings.)
+        <application>psql</application> prints only the last query result
+        it receives for each request; in this example, although all
+        three <command>SELECT</>s are indeed executed, <application>psql</>
+        only prints the <literal>3</>.
+        </para>
+        </listitem>
+      </varlistentry>
+
    </variablelist>
  </para>