Clarify that surrogate pairs are not encoded in UTF-8 directly

This commit is contained in:
Peter Eisentraut 2010-09-07 18:54:09 +00:00
parent c5d94a34fb
commit 7cd082f907
1 changed files with 28 additions and 21 deletions

View File

@ -1,4 +1,4 @@
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.154 2010/09/01 18:22:29 tgl Exp $ -->
<!-- $PostgreSQL: pgsql/doc/src/sgml/syntax.sgml,v 1.155 2010/09/07 18:54:09 petere Exp $ -->
<chapter id="sql-syntax">
<title>SQL Syntax</title>
@ -236,12 +236,15 @@ U&amp;"d!0061t!+000061" UESCAPE '!'
<para>
The Unicode escape syntax works only when the server encoding is
UTF8. When other server encodings are used, only code points in
the ASCII range (up to <literal>\007F</literal>) can be specified.
Both the 4-digit and the 6-digit form can be used to specify
UTF-16 surrogate pairs to compose characters with code points
larger than U+FFFF (although the availability of
the 6-digit form technically makes this unnecessary).
<literal>UTF8</>. When other server encodings are used, only code
points in the ASCII range (up to <literal>\007F</literal>) can be
specified. Both the 4-digit and the 6-digit form can be used to
specify UTF-16 surrogate pairs to compose characters with code
points larger than U+FFFF, although the availability of the
6-digit form technically makes this unnecessary. (When surrogate
pairs are used when the server encoding is <literal>UTF8</>, they
are first combined into a single code point that is then encoded
in UTF-8.)
</para>
<para>
@ -431,13 +434,15 @@ SELECT 'foo' 'bar';
<para>
The Unicode escape syntax works fully only when the server
encoding is UTF-8. When other server encodings are used, only
code points in the ASCII range (up to <literal>\u007F</>) can be
specified. Both the 4-digit and the 8-digit form can be used to
specify UTF-16 surrogate pairs to compose characters with code
points larger than U+FFFF (although the
availability of the 8-digit form technically makes this
unnecessary).
encoding is <literal>UTF8</>. When other server encodings are
used, only code points in the ASCII range (up
to <literal>\u007F</>) can be specified. Both the 4-digit and
the 8-digit form can be used to specify UTF-16 surrogate pairs to
compose characters with code points larger than U+FFFF, although
the availability of the 8-digit form technically makes this
unnecessary. (When surrogate pairs are used when the server
encoding is <literal>UTF8</>, they are first combined into a
single code point that is then encoded in UTF-8.)
</para>
<caution>
@ -517,13 +522,15 @@ U&amp;'d!0061t!+000061' UESCAPE '!'
<para>
The Unicode escape syntax works only when the server encoding is
UTF8. When other server encodings are used, only code points in
the ASCII range (up to <literal>\007F</literal>) can be
specified.
Both the 4-digit and the 6-digit form can be used to specify
UTF-16 surrogate pairs to compose characters with code points
larger than U+FFFF (although the availability
of the 6-digit form technically makes this unnecessary).
<literal>UTF8</>. When other server encodings are used, only
code points in the ASCII range (up to <literal>\007F</literal>)
can be specified. Both the 4-digit and the 6-digit form can be
used to specify UTF-16 surrogate pairs to compose characters with
code points larger than U+FFFF, although the availability of the
6-digit form technically makes this unnecessary. (When surrogate
pairs are used when the server encoding is <literal>UTF8</>, they
are first combined into a single code point that is then encoded
in UTF-8.)
</para>
<para>