From 34a0a81bfb388504deaa51b16a8bb531b827e519 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Sat, 18 Jan 2020 17:51:03 -0500 Subject: [PATCH] Doc: rearrange the documentation of binary-string functions. Rather than intermixing the discussion of text-string and binary-string functions, make a clean break, moving all discussion of binary-string operations into section 9.5. This involves some duplication of function descriptions between 9.4 and 9.5, but it seems cleaner on the whole since the individual descriptions are clearer (and on the other side of the coin, it gets rid of some duplicated descriptions, too). Move the convert*/encode/decode functions to a separate table, because they don't quite seem to fit under the heading of "binary string functions". Also provide full documentation of the textual formats supported by encode() and decode() (which was the original goal of this patch series, many moons ago). Also move the table of built-in encoding conversions out of section 9.4, where it no longer had any relevance whatsoever, and put it into section 23.3 about character sets. I chose to put both that and table 23.2 (multibyte-translation-table) into a new so as not to break up the flow of discussion in 23.3.3. Also do a bunch of minor copy-editing on the function descriptions in 9.4 and 9.5. Karl Pinc, reviewed by Fabien Coelho, further hacking by me Discussion: https://postgr.es/m/20190304163347.7bca4897@slate.meme.com --- doc/src/sgml/charset.sgml | 1273 ++++++++++++++++++++++++-------- doc/src/sgml/func.sgml | 1460 +++++++++++-------------------------- 2 files changed, 1378 insertions(+), 1355 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 55669b5cad..057a6bb81a 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1495,313 +1495,12 @@ $ psql -l Automatic Character Set Conversion Between Server and Client - PostgreSQL supports automatic - character set conversion between server and client for certain - character set combinations. The conversion information is stored in the - pg_conversion system catalog. PostgreSQL - comes with some predefined conversions, as shown in . You can create a new - conversion using the SQL command CREATE CONVERSION. + PostgreSQL supports automatic character + set conversion between server and client for many combinations of + character sets ( + shows which ones). - - Client/Server Character Set Conversions - - - - Server Character Set - Available Client Character Sets - - - - - BIG5 - not supported as a server encoding - - - - EUC_CN - EUC_CN, - MULE_INTERNAL, - UTF8 - - - - EUC_JP - EUC_JP, - MULE_INTERNAL, - SJIS, - UTF8 - - - - EUC_JIS_2004 - EUC_JIS_2004, - SHIFT_JIS_2004, - UTF8 - - - - EUC_KR - EUC_KR, - MULE_INTERNAL, - UTF8 - - - - EUC_TW - EUC_TW, - BIG5, - MULE_INTERNAL, - UTF8 - - - - GB18030 - not supported as a server encoding - - - - GBK - not supported as a server encoding - - - - ISO_8859_5 - ISO_8859_5, - KOI8R, - MULE_INTERNAL, - UTF8, - WIN866, - WIN1251 - - - - ISO_8859_6 - ISO_8859_6, - UTF8 - - - - ISO_8859_7 - ISO_8859_7, - UTF8 - - - - ISO_8859_8 - ISO_8859_8, - UTF8 - - - - JOHAB - not supported as a server encoding - - - - KOI8R - KOI8R, - ISO_8859_5, - MULE_INTERNAL, - UTF8, - WIN866, - WIN1251 - - - - KOI8U - KOI8U, - UTF8 - - - - LATIN1 - LATIN1, - MULE_INTERNAL, - UTF8 - - - - LATIN2 - LATIN2, - MULE_INTERNAL, - UTF8, - WIN1250 - - - - LATIN3 - LATIN3, - MULE_INTERNAL, - UTF8 - - - - LATIN4 - LATIN4, - MULE_INTERNAL, - UTF8 - - - - LATIN5 - LATIN5, - UTF8 - - - - LATIN6 - LATIN6, - UTF8 - - - - LATIN7 - LATIN7, - UTF8 - - - - LATIN8 - LATIN8, - UTF8 - - - - LATIN9 - LATIN9, - UTF8 - - - - LATIN10 - LATIN10, - UTF8 - - - - MULE_INTERNAL - MULE_INTERNAL, - BIG5, - EUC_CN, - EUC_JP, - EUC_KR, - EUC_TW, - ISO_8859_5, - KOI8R, - LATIN1 to LATIN4, - SJIS, - WIN866, - WIN1250, - WIN1251 - - - - SJIS - not supported as a server encoding - - - - SHIFT_JIS_2004 - not supported as a server encoding - - - - SQL_ASCII - any (no conversion will be performed) - - - - UHC - not supported as a server encoding - - - - UTF8 - all supported encodings - - - - WIN866 - WIN866, - ISO_8859_5, - KOI8R, - MULE_INTERNAL, - UTF8, - WIN1251 - - - - WIN874 - WIN874, - UTF8 - - - - WIN1250 - WIN1250, - LATIN2, - MULE_INTERNAL, - UTF8 - - - - WIN1251 - WIN1251, - ISO_8859_5, - KOI8R, - MULE_INTERNAL, - UTF8, - WIN866 - - - - WIN1252 - WIN1252, - UTF8 - - - - WIN1253 - WIN1253, - UTF8 - - - - WIN1254 - WIN1254, - UTF8 - - - - WIN1255 - WIN1255, - UTF8 - - - - WIN1256 - WIN1256, - UTF8 - - - - WIN1257 - WIN1257, - UTF8 - - - - WIN1258 - WIN1258, - UTF8 - - - - -
- To enable automatic character set conversion, you have to tell PostgreSQL the character set @@ -1905,6 +1604,970 @@ RESET client_encoding;
+ + Available Character Set Conversions + + + PostgreSQL allows conversion between any + two character sets for which a conversion function is listed in the + pg_conversion + system catalog. PostgreSQL comes with + some predefined conversions, as summarized in + and shown in more + detail in . You can + create a new conversion using the SQL command + . (To be used for automatic + client/server conversions, a conversion must be marked + as default for its character set pair.) + + + + Built-in Client/Server Character Set Conversions + + + + Server Character Set + Available Client Character Sets + + + + + BIG5 + not supported as a server encoding + + + + EUC_CN + EUC_CN, + MULE_INTERNAL, + UTF8 + + + + EUC_JP + EUC_JP, + MULE_INTERNAL, + SJIS, + UTF8 + + + + EUC_JIS_2004 + EUC_JIS_2004, + SHIFT_JIS_2004, + UTF8 + + + + EUC_KR + EUC_KR, + MULE_INTERNAL, + UTF8 + + + + EUC_TW + EUC_TW, + BIG5, + MULE_INTERNAL, + UTF8 + + + + GB18030 + not supported as a server encoding + + + + GBK + not supported as a server encoding + + + + ISO_8859_5 + ISO_8859_5, + KOI8R, + MULE_INTERNAL, + UTF8, + WIN866, + WIN1251 + + + + ISO_8859_6 + ISO_8859_6, + UTF8 + + + + ISO_8859_7 + ISO_8859_7, + UTF8 + + + + ISO_8859_8 + ISO_8859_8, + UTF8 + + + + JOHAB + not supported as a server encoding + + + + KOI8R + KOI8R, + ISO_8859_5, + MULE_INTERNAL, + UTF8, + WIN866, + WIN1251 + + + + KOI8U + KOI8U, + UTF8 + + + + LATIN1 + LATIN1, + MULE_INTERNAL, + UTF8 + + + + LATIN2 + LATIN2, + MULE_INTERNAL, + UTF8, + WIN1250 + + + + LATIN3 + LATIN3, + MULE_INTERNAL, + UTF8 + + + + LATIN4 + LATIN4, + MULE_INTERNAL, + UTF8 + + + + LATIN5 + LATIN5, + UTF8 + + + + LATIN6 + LATIN6, + UTF8 + + + + LATIN7 + LATIN7, + UTF8 + + + + LATIN8 + LATIN8, + UTF8 + + + + LATIN9 + LATIN9, + UTF8 + + + + LATIN10 + LATIN10, + UTF8 + + + + MULE_INTERNAL + MULE_INTERNAL, + BIG5, + EUC_CN, + EUC_JP, + EUC_KR, + EUC_TW, + ISO_8859_5, + KOI8R, + LATIN1 to LATIN4, + SJIS, + WIN866, + WIN1250, + WIN1251 + + + + SJIS + not supported as a server encoding + + + + SHIFT_JIS_2004 + not supported as a server encoding + + + + SQL_ASCII + any (no conversion will be performed) + + + + UHC + not supported as a server encoding + + + + UTF8 + all supported encodings + + + + WIN866 + WIN866, + ISO_8859_5, + KOI8R, + MULE_INTERNAL, + UTF8, + WIN1251 + + + + WIN874 + WIN874, + UTF8 + + + + WIN1250 + WIN1250, + LATIN2, + MULE_INTERNAL, + UTF8 + + + + WIN1251 + WIN1251, + ISO_8859_5, + KOI8R, + MULE_INTERNAL, + UTF8, + WIN866 + + + + WIN1252 + WIN1252, + UTF8 + + + + WIN1253 + WIN1253, + UTF8 + + + + WIN1254 + WIN1254, + UTF8 + + + + WIN1255 + WIN1255, + UTF8 + + + + WIN1256 + WIN1256, + UTF8 + + + + WIN1257 + WIN1257, + UTF8 + + + + WIN1258 + WIN1258, + UTF8 + + + + +
+ + + All Built-in Character Set Conversions + + + + Conversion Name + + + The conversion names follow a standard naming scheme: The + official name of the source encoding with all + non-alphanumeric characters replaced by underscores, followed + by _to_, followed by the similarly processed + destination encoding name. Therefore, these names sometimes + deviate from the customary encoding names shown in + . + + + + Source Encoding + Destination Encoding + + + + + + big5_to_euc_tw + BIG5 + EUC_TW + + + big5_to_mic + BIG5 + MULE_INTERNAL + + + big5_to_utf8 + BIG5 + UTF8 + + + euc_cn_to_mic + EUC_CN + MULE_INTERNAL + + + euc_cn_to_utf8 + EUC_CN + UTF8 + + + euc_jp_to_mic + EUC_JP + MULE_INTERNAL + + + euc_jp_to_sjis + EUC_JP + SJIS + + + euc_jp_to_utf8 + EUC_JP + UTF8 + + + euc_kr_to_mic + EUC_KR + MULE_INTERNAL + + + euc_kr_to_utf8 + EUC_KR + UTF8 + + + euc_tw_to_big5 + EUC_TW + BIG5 + + + euc_tw_to_mic + EUC_TW + MULE_INTERNAL + + + euc_tw_to_utf8 + EUC_TW + UTF8 + + + gb18030_to_utf8 + GB18030 + UTF8 + + + gbk_to_utf8 + GBK + UTF8 + + + iso_8859_10_to_utf8 + LATIN6 + UTF8 + + + iso_8859_13_to_utf8 + LATIN7 + UTF8 + + + iso_8859_14_to_utf8 + LATIN8 + UTF8 + + + iso_8859_15_to_utf8 + LATIN9 + UTF8 + + + iso_8859_16_to_utf8 + LATIN10 + UTF8 + + + iso_8859_1_to_mic + LATIN1 + MULE_INTERNAL + + + iso_8859_1_to_utf8 + LATIN1 + UTF8 + + + iso_8859_2_to_mic + LATIN2 + MULE_INTERNAL + + + iso_8859_2_to_utf8 + LATIN2 + UTF8 + + + iso_8859_2_to_windows_1250 + LATIN2 + WIN1250 + + + iso_8859_3_to_mic + LATIN3 + MULE_INTERNAL + + + iso_8859_3_to_utf8 + LATIN3 + UTF8 + + + iso_8859_4_to_mic + LATIN4 + MULE_INTERNAL + + + iso_8859_4_to_utf8 + LATIN4 + UTF8 + + + iso_8859_5_to_koi8_r + ISO_8859_5 + KOI8R + + + iso_8859_5_to_mic + ISO_8859_5 + MULE_INTERNAL + + + iso_8859_5_to_utf8 + ISO_8859_5 + UTF8 + + + iso_8859_5_to_windows_1251 + ISO_8859_5 + WIN1251 + + + iso_8859_5_to_windows_866 + ISO_8859_5 + WIN866 + + + iso_8859_6_to_utf8 + ISO_8859_6 + UTF8 + + + iso_8859_7_to_utf8 + ISO_8859_7 + UTF8 + + + iso_8859_8_to_utf8 + ISO_8859_8 + UTF8 + + + iso_8859_9_to_utf8 + LATIN5 + UTF8 + + + johab_to_utf8 + JOHAB + UTF8 + + + koi8_r_to_iso_8859_5 + KOI8R + ISO_8859_5 + + + koi8_r_to_mic + KOI8R + MULE_INTERNAL + + + koi8_r_to_utf8 + KOI8R + UTF8 + + + koi8_r_to_windows_1251 + KOI8R + WIN1251 + + + koi8_r_to_windows_866 + KOI8R + WIN866 + + + koi8_u_to_utf8 + KOI8U + UTF8 + + + mic_to_big5 + MULE_INTERNAL + BIG5 + + + mic_to_euc_cn + MULE_INTERNAL + EUC_CN + + + mic_to_euc_jp + MULE_INTERNAL + EUC_JP + + + mic_to_euc_kr + MULE_INTERNAL + EUC_KR + + + mic_to_euc_tw + MULE_INTERNAL + EUC_TW + + + mic_to_iso_8859_1 + MULE_INTERNAL + LATIN1 + + + mic_to_iso_8859_2 + MULE_INTERNAL + LATIN2 + + + mic_to_iso_8859_3 + MULE_INTERNAL + LATIN3 + + + mic_to_iso_8859_4 + MULE_INTERNAL + LATIN4 + + + mic_to_iso_8859_5 + MULE_INTERNAL + ISO_8859_5 + + + mic_to_koi8_r + MULE_INTERNAL + KOI8R + + + mic_to_sjis + MULE_INTERNAL + SJIS + + + mic_to_windows_1250 + MULE_INTERNAL + WIN1250 + + + mic_to_windows_1251 + MULE_INTERNAL + WIN1251 + + + mic_to_windows_866 + MULE_INTERNAL + WIN866 + + + sjis_to_euc_jp + SJIS + EUC_JP + + + sjis_to_mic + SJIS + MULE_INTERNAL + + + sjis_to_utf8 + SJIS + UTF8 + + + windows_1258_to_utf8 + WIN1258 + UTF8 + + + uhc_to_utf8 + UHC + UTF8 + + + utf8_to_big5 + UTF8 + BIG5 + + + utf8_to_euc_cn + UTF8 + EUC_CN + + + utf8_to_euc_jp + UTF8 + EUC_JP + + + utf8_to_euc_kr + UTF8 + EUC_KR + + + utf8_to_euc_tw + UTF8 + EUC_TW + + + utf8_to_gb18030 + UTF8 + GB18030 + + + utf8_to_gbk + UTF8 + GBK + + + utf8_to_iso_8859_1 + UTF8 + LATIN1 + + + utf8_to_iso_8859_10 + UTF8 + LATIN6 + + + utf8_to_iso_8859_13 + UTF8 + LATIN7 + + + utf8_to_iso_8859_14 + UTF8 + LATIN8 + + + utf8_to_iso_8859_15 + UTF8 + LATIN9 + + + utf8_to_iso_8859_16 + UTF8 + LATIN10 + + + utf8_to_iso_8859_2 + UTF8 + LATIN2 + + + utf8_to_iso_8859_3 + UTF8 + LATIN3 + + + utf8_to_iso_8859_4 + UTF8 + LATIN4 + + + utf8_to_iso_8859_5 + UTF8 + ISO_8859_5 + + + utf8_to_iso_8859_6 + UTF8 + ISO_8859_6 + + + utf8_to_iso_8859_7 + UTF8 + ISO_8859_7 + + + utf8_to_iso_8859_8 + UTF8 + ISO_8859_8 + + + utf8_to_iso_8859_9 + UTF8 + LATIN5 + + + utf8_to_johab + UTF8 + JOHAB + + + utf8_to_koi8_r + UTF8 + KOI8R + + + utf8_to_koi8_u + UTF8 + KOI8U + + + utf8_to_sjis + UTF8 + SJIS + + + utf8_to_windows_1258 + UTF8 + WIN1258 + + + utf8_to_uhc + UTF8 + UHC + + + utf8_to_windows_1250 + UTF8 + WIN1250 + + + utf8_to_windows_1251 + UTF8 + WIN1251 + + + utf8_to_windows_1252 + UTF8 + WIN1252 + + + utf8_to_windows_1253 + UTF8 + WIN1253 + + + utf8_to_windows_1254 + UTF8 + WIN1254 + + + utf8_to_windows_1255 + UTF8 + WIN1255 + + + utf8_to_windows_1256 + UTF8 + WIN1256 + + + utf8_to_windows_1257 + UTF8 + WIN1257 + + + utf8_to_windows_866 + UTF8 + WIN866 + + + utf8_to_windows_874 + UTF8 + WIN874 + + + windows_1250_to_iso_8859_2 + WIN1250 + LATIN2 + + + windows_1250_to_mic + WIN1250 + MULE_INTERNAL + + + windows_1250_to_utf8 + WIN1250 + UTF8 + + + windows_1251_to_iso_8859_5 + WIN1251 + ISO_8859_5 + + + windows_1251_to_koi8_r + WIN1251 + KOI8R + + + windows_1251_to_mic + WIN1251 + MULE_INTERNAL + + + windows_1251_to_utf8 + WIN1251 + UTF8 + + + windows_1251_to_windows_866 + WIN1251 + WIN866 + + + windows_1252_to_utf8 + WIN1252 + UTF8 + + + windows_1256_to_utf8 + WIN1256 + UTF8 + + + windows_866_to_iso_8859_5 + WIN866 + ISO_8859_5 + + + windows_866_to_koi8_r + WIN866 + KOI8R + + + windows_866_to_mic + WIN866 + MULE_INTERNAL + + + windows_866_to_utf8 + WIN866 + UTF8 + + + windows_866_to_windows_1251 + WIN866 + WIN + + + windows_874_to_utf8 + WIN874 + UTF8 + + + euc_jis_2004_to_utf8 + EUC_JIS_2004 + UTF8 + + + utf8_to_euc_jis_2004 + UTF8 + EUC_JIS_2004 + + + shift_jis_2004_to_utf8 + SHIFT_JIS_2004 + UTF8 + + + utf8_to_shift_jis_2004 + UTF8 + SHIFT_JIS_2004 + + + euc_jis_2004_to_shift_jis_2004 + EUC_JIS_2004 + SHIFT_JIS_2004 + + + shift_jis_2004_to_euc_jis_2004 + SHIFT_JIS_2004 + EUC_JIS_2004 + + + +
+
+ Further Reading diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 72072e7545..6c4359dc7b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1455,8 +1455,7 @@ and text. Unless otherwise noted, all of the functions listed below work on all of these types, but be wary of potential effects of automatic space-padding when using the - character type. Some functions also exist - natively for the bit-string types. + character type. @@ -1628,7 +1627,8 @@ text - Extract substring + Extract substring (provide at least one of from + and for) substring('Thomas' from 2 for 3) hom @@ -1638,9 +1638,9 @@ substring(string from pattern) text - Extract substring matching POSIX regular expression. See + Extract substring matching POSIX regular expression (see for more information on pattern - matching. + matching) substring('Thomas' from '...$') mas @@ -1650,9 +1650,9 @@ substring(string from pattern for escape) text - Extract substring matching SQL regular expression. - See for more information on - pattern matching. + Extract substring matching SQL regular expression + (see for more information on + pattern matching) substring('Thomas' from '%#"o_a#"_' for '#') oma @@ -1682,9 +1682,8 @@ trim(leading | trailing | both from - string - , characters - ) + string , + characters) text @@ -1822,101 +1821,6 @@ - - convert - - convert(string bytea, - src_encoding name, - dest_encoding name) - - bytea - - Convert string to dest_encoding. The - original encoding is specified by - src_encoding. The - string must be valid in this encoding. - Conversions can be defined by CREATE CONVERSION. - Also there are some predefined conversions. See for available conversions. - - convert('text_in_utf8', 'UTF8', 'LATIN1') - text_in_utf8 represented in Latin-1 - encoding (ISO 8859-1) - - - - - - convert_from - - convert_from(string bytea, - src_encoding name) - - text - - Convert string to the database encoding. The original encoding - is specified by src_encoding. The - string must be valid in this encoding. - - convert_from('text_in_utf8', 'UTF8') - text_in_utf8 represented in the current database encoding - - - - - - convert_to - - convert_to(string text, - dest_encoding name) - - bytea - - Convert string to dest_encoding. - - convert_to('some text', 'UTF8') - some text represented in the UTF8 encoding - - - - - - decode - - decode(string text, - format text) - - bytea - - Decode binary data from textual representation in string. - Options for format are same as in encode. - - decode('MTIzAAE=', 'base64') - \x3132330001 - - - - - - encode - - encode(data bytea, - format text) - - text - - Encode binary data into a textual representation. Supported - formats are: base64, hex, escape. - escape converts zero bytes and high-bit-set bytes to - octal sequences (\nnn) and - doubles backslashes. - - encode('123\000\001', 'base64') - MTIzAAE= - - - - format @@ -1955,13 +1859,14 @@ left - left(str text, + left(string text, n int) text - Return first n characters in the string. When n - is negative, return all but last |n| characters. + Return first n characters in the + string, or when n is negative, return + all but last |n| characters left('abcde', 2) ab @@ -1982,19 +1887,6 @@ 4 - - length(string bytea, - encoding name ) - int - - Number of characters in string in the given - encoding. The string - must be valid in this encoding. - - length('jose', 'UTF8') - 4 - - @@ -2006,7 +1898,7 @@ text - Fill up the string to length + Extend the string to length length by prepending the characters fill (a space by default). If the string is already longer than @@ -2044,8 +1936,8 @@ text - Calculates the MD5 hash of string, - returning the result in hexadecimal + MD5 hash, with + the result written in hexadecimal md5('abc') 900150983cd24fb0 d6963f7d28e17f72 @@ -2182,8 +2074,8 @@ text[] Return captured substring(s) resulting from the first match of a POSIX - regular expression to the string. See - for more information. + regular expression to the string (see + for more information) regexp_match('foobarbequebaz', '(bar)(beque)') {bar,beque} @@ -2199,8 +2091,8 @@ setof text[] Return captured substring(s) resulting from matching a POSIX regular - expression to the string. See - for more information. + expression to the string (see + for more information) regexp_matches('foobarbequebaz', 'ba.', 'g') {bar}{baz} (2 rows) @@ -2215,8 +2107,8 @@ text - Replace substring(s) matching a POSIX regular expression. See - for more information. + Replace substring(s) matching a POSIX regular expression (see + for more information) regexp_replace('Thomas', '.[mN]a.', 'M') ThM @@ -2232,8 +2124,8 @@ text[] Split string using a POSIX regular expression as - the delimiter. See for more - information. + the delimiter (see for more + information) regexp_split_to_array('hello world', '\s+') {hello,world} @@ -2249,8 +2141,8 @@ setof text Split string using a POSIX regular expression as - the delimiter. See for more - information. + the delimiter (see for more + information) regexp_split_to_table('hello world', '\s+') helloworld (2 rows) @@ -2296,7 +2188,7 @@ text - Return reversed string. + Reverse the order of the characters in string reverse('abcde') edcba @@ -2307,13 +2199,14 @@ right - right(str text, + right(string text, n int) text - Return last n characters in the string. When n - is negative, return all but first |n| characters. + Return last n characters in the string, + or when n is negative, return all but + first |n| characters right('abcde', 2) de @@ -2330,7 +2223,7 @@ text - Fill up the string to length + Extend the string to length length by appending the characters fill (a space by default). If the string is already longer than @@ -2418,7 +2311,8 @@ bool - Returns true if string starts with prefix. + Return true if string starts + with prefix starts_with('alphabet', 'alph') t @@ -2497,781 +2391,12 @@ - See also the aggregate function string_agg in - . + See also the aggregate function string_agg in + , and the functions for + converting between strings and the bytea type in + . - - Built-in Conversions - - - - Conversion Name - - - The conversion names follow a standard naming scheme: The - official name of the source encoding with all - non-alphanumeric characters replaced by underscores, followed - by _to_, followed by the similarly processed - destination encoding name. Therefore, the names might deviate - from the customary encoding names. - - - - Source Encoding - Destination Encoding - - - - - - big5_to_euc_tw - BIG5 - EUC_TW - - - - big5_to_mic - BIG5 - MULE_INTERNAL - - - - big5_to_utf8 - BIG5 - UTF8 - - - - euc_cn_to_mic - EUC_CN - MULE_INTERNAL - - - - euc_cn_to_utf8 - EUC_CN - UTF8 - - - - euc_jp_to_mic - EUC_JP - MULE_INTERNAL - - - - euc_jp_to_sjis - EUC_JP - SJIS - - - - euc_jp_to_utf8 - EUC_JP - UTF8 - - - - euc_kr_to_mic - EUC_KR - MULE_INTERNAL - - - - euc_kr_to_utf8 - EUC_KR - UTF8 - - - - euc_tw_to_big5 - EUC_TW - BIG5 - - - - euc_tw_to_mic - EUC_TW - MULE_INTERNAL - - - - euc_tw_to_utf8 - EUC_TW - UTF8 - - - - gb18030_to_utf8 - GB18030 - UTF8 - - - - gbk_to_utf8 - GBK - UTF8 - - - - iso_8859_10_to_utf8 - LATIN6 - UTF8 - - - - iso_8859_13_to_utf8 - LATIN7 - UTF8 - - - - iso_8859_14_to_utf8 - LATIN8 - UTF8 - - - - iso_8859_15_to_utf8 - LATIN9 - UTF8 - - - - iso_8859_16_to_utf8 - LATIN10 - UTF8 - - - - iso_8859_1_to_mic - LATIN1 - MULE_INTERNAL - - - - iso_8859_1_to_utf8 - LATIN1 - UTF8 - - - - iso_8859_2_to_mic - LATIN2 - MULE_INTERNAL - - - - iso_8859_2_to_utf8 - LATIN2 - UTF8 - - - - iso_8859_2_to_windows_1250 - LATIN2 - WIN1250 - - - - iso_8859_3_to_mic - LATIN3 - MULE_INTERNAL - - - - iso_8859_3_to_utf8 - LATIN3 - UTF8 - - - - iso_8859_4_to_mic - LATIN4 - MULE_INTERNAL - - - - iso_8859_4_to_utf8 - LATIN4 - UTF8 - - - - iso_8859_5_to_koi8_r - ISO_8859_5 - KOI8R - - - - iso_8859_5_to_mic - ISO_8859_5 - MULE_INTERNAL - - - - iso_8859_5_to_utf8 - ISO_8859_5 - UTF8 - - - - iso_8859_5_to_windows_1251 - ISO_8859_5 - WIN1251 - - - - iso_8859_5_to_windows_866 - ISO_8859_5 - WIN866 - - - - iso_8859_6_to_utf8 - ISO_8859_6 - UTF8 - - - - iso_8859_7_to_utf8 - ISO_8859_7 - UTF8 - - - - iso_8859_8_to_utf8 - ISO_8859_8 - UTF8 - - - - iso_8859_9_to_utf8 - LATIN5 - UTF8 - - - - johab_to_utf8 - JOHAB - UTF8 - - - - koi8_r_to_iso_8859_5 - KOI8R - ISO_8859_5 - - - - koi8_r_to_mic - KOI8R - MULE_INTERNAL - - - - koi8_r_to_utf8 - KOI8R - UTF8 - - - - koi8_r_to_windows_1251 - KOI8R - WIN1251 - - - - koi8_r_to_windows_866 - KOI8R - WIN866 - - - - koi8_u_to_utf8 - KOI8U - UTF8 - - - - mic_to_big5 - MULE_INTERNAL - BIG5 - - - - mic_to_euc_cn - MULE_INTERNAL - EUC_CN - - - - mic_to_euc_jp - MULE_INTERNAL - EUC_JP - - - - mic_to_euc_kr - MULE_INTERNAL - EUC_KR - - - - mic_to_euc_tw - MULE_INTERNAL - EUC_TW - - - - mic_to_iso_8859_1 - MULE_INTERNAL - LATIN1 - - - - mic_to_iso_8859_2 - MULE_INTERNAL - LATIN2 - - - - mic_to_iso_8859_3 - MULE_INTERNAL - LATIN3 - - - - mic_to_iso_8859_4 - MULE_INTERNAL - LATIN4 - - - - mic_to_iso_8859_5 - MULE_INTERNAL - ISO_8859_5 - - - - mic_to_koi8_r - MULE_INTERNAL - KOI8R - - - - mic_to_sjis - MULE_INTERNAL - SJIS - - - - mic_to_windows_1250 - MULE_INTERNAL - WIN1250 - - - - mic_to_windows_1251 - MULE_INTERNAL - WIN1251 - - - - mic_to_windows_866 - MULE_INTERNAL - WIN866 - - - - sjis_to_euc_jp - SJIS - EUC_JP - - - - sjis_to_mic - SJIS - MULE_INTERNAL - - - - sjis_to_utf8 - SJIS - UTF8 - - - - windows_1258_to_utf8 - WIN1258 - UTF8 - - - - uhc_to_utf8 - UHC - UTF8 - - - - utf8_to_big5 - UTF8 - BIG5 - - - - utf8_to_euc_cn - UTF8 - EUC_CN - - - - utf8_to_euc_jp - UTF8 - EUC_JP - - - - utf8_to_euc_kr - UTF8 - EUC_KR - - - - utf8_to_euc_tw - UTF8 - EUC_TW - - - - utf8_to_gb18030 - UTF8 - GB18030 - - - - utf8_to_gbk - UTF8 - GBK - - - - utf8_to_iso_8859_1 - UTF8 - LATIN1 - - - - utf8_to_iso_8859_10 - UTF8 - LATIN6 - - - - utf8_to_iso_8859_13 - UTF8 - LATIN7 - - - - utf8_to_iso_8859_14 - UTF8 - LATIN8 - - - - utf8_to_iso_8859_15 - UTF8 - LATIN9 - - - - utf8_to_iso_8859_16 - UTF8 - LATIN10 - - - - utf8_to_iso_8859_2 - UTF8 - LATIN2 - - - - utf8_to_iso_8859_3 - UTF8 - LATIN3 - - - - utf8_to_iso_8859_4 - UTF8 - LATIN4 - - - - utf8_to_iso_8859_5 - UTF8 - ISO_8859_5 - - - - utf8_to_iso_8859_6 - UTF8 - ISO_8859_6 - - - - utf8_to_iso_8859_7 - UTF8 - ISO_8859_7 - - - - utf8_to_iso_8859_8 - UTF8 - ISO_8859_8 - - - - utf8_to_iso_8859_9 - UTF8 - LATIN5 - - - - utf8_to_johab - UTF8 - JOHAB - - - - utf8_to_koi8_r - UTF8 - KOI8R - - - - utf8_to_koi8_u - UTF8 - KOI8U - - - - utf8_to_sjis - UTF8 - SJIS - - - - utf8_to_windows_1258 - UTF8 - WIN1258 - - - - utf8_to_uhc - UTF8 - UHC - - - - utf8_to_windows_1250 - UTF8 - WIN1250 - - - - utf8_to_windows_1251 - UTF8 - WIN1251 - - - - utf8_to_windows_1252 - UTF8 - WIN1252 - - - - utf8_to_windows_1253 - UTF8 - WIN1253 - - - - utf8_to_windows_1254 - UTF8 - WIN1254 - - - - utf8_to_windows_1255 - UTF8 - WIN1255 - - - - utf8_to_windows_1256 - UTF8 - WIN1256 - - - - utf8_to_windows_1257 - UTF8 - WIN1257 - - - - utf8_to_windows_866 - UTF8 - WIN866 - - - - utf8_to_windows_874 - UTF8 - WIN874 - - - - windows_1250_to_iso_8859_2 - WIN1250 - LATIN2 - - - - windows_1250_to_mic - WIN1250 - MULE_INTERNAL - - - - windows_1250_to_utf8 - WIN1250 - UTF8 - - - - windows_1251_to_iso_8859_5 - WIN1251 - ISO_8859_5 - - - - windows_1251_to_koi8_r - WIN1251 - KOI8R - - - - windows_1251_to_mic - WIN1251 - MULE_INTERNAL - - - - windows_1251_to_utf8 - WIN1251 - UTF8 - - - - windows_1251_to_windows_866 - WIN1251 - WIN866 - - - - windows_1252_to_utf8 - WIN1252 - UTF8 - - - - windows_1256_to_utf8 - WIN1256 - UTF8 - - - - windows_866_to_iso_8859_5 - WIN866 - ISO_8859_5 - - - - windows_866_to_koi8_r - WIN866 - KOI8R - - - - windows_866_to_mic - WIN866 - MULE_INTERNAL - - - - windows_866_to_utf8 - WIN866 - UTF8 - - - - windows_866_to_windows_1251 - WIN866 - WIN - - - - windows_874_to_utf8 - WIN874 - UTF8 - - - - euc_jis_2004_to_utf8 - EUC_JIS_2004 - UTF8 - - - - utf8_to_euc_jis_2004 - UTF8 - EUC_JIS_2004 - - - - shift_jis_2004_to_utf8 - SHIFT_JIS_2004 - UTF8 - - - - utf8_to_shift_jis_2004 - UTF8 - SHIFT_JIS_2004 - - - - euc_jis_2004_to_shift_jis_2004 - EUC_JIS_2004 - SHIFT_JIS_2004 - - - - shift_jis_2004_to_euc_jis_2004 - SHIFT_JIS_2004 - EUC_JIS_2004 - - - - -
- <function>format</function> @@ -3494,7 +2619,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); This section describes functions and operators for examining and - manipulating values of type bytea. + manipulating binary strings, that is values of type bytea. + Many of these are equivalent, in purpose and syntax, to the + text-string functions described in the previous section. @@ -3512,6 +2639,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); The sample results shown on this page assume that the server parameter bytea_output is set to escape (the traditional PostgreSQL format). + The output will look different when using the default setting + (hex). @@ -3530,11 +2659,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); - string || - string + bytes || + bytes bytea - String concatenation + Binary string concatenation binary string concatenation @@ -3544,12 +2673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); \\Post'gres\000 + + + + bit_length + + bit_length(bytes) + + int + Number of bits in binary string + bit_length('jo\000se':bytea) + 40 + + octet_length - octet_length(string) + octet_length(bytes) int Number of bytes in binary string @@ -3562,7 +2704,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); overlay - overlay(string placing string from int for int) + overlay(bytes placing bytes from int for int) bytea @@ -3577,7 +2719,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); position - position(substring in string) + position(bytesubstring in bytes) int Location of specified substring @@ -3590,11 +2732,12 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); substring - substring(string from int for int) + substring(bytes from int for int) bytea - Extract substring + Extract substring (provide at least one of from + and for) substring('Th\000omas'::bytea from 2 for 3) h\000o @@ -3606,18 +2749,32 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); trim trim(both - bytes from - string) + bytesremoved from + bytes) bytea Remove the longest string containing only bytes appearing in - bytes from the start - and end of string + bytesremoved from the start + and end of bytes trim('\000\001'::bytea from '\000Tom\001'::bytea) Tom + + + + trim(both from + bytes, + bytesremoved) + + bytea + + Non-standard syntax for trim() + + trim(both from 'yxTomxx'::bytea, 'xyz'::bytea) + Tom + @@ -3649,66 +2806,30 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); btrim - btrim(string - bytea, bytes bytea) + btrim(bytes + bytea, bytesremoved bytea) bytea Remove the longest string containing only bytes appearing in - bytes from the start and end of - string - + bytesremoved from the start and end of + bytes + btrim('\000trim\001'::bytea, '\000\001'::bytea) trim - - - - decode - - decode(string text, - format text) - - bytea - - Decode binary data from textual representation in string. - Options for format are same as in encode. - - decode('123\000456', 'escape') - 123\000456 - - - - - - encode - - encode(data bytea, - format text) - - text - - Encode binary data into a textual representation. Supported - formats are: base64, hex, escape. - escape converts zero bytes and high-bit-set bytes to - octal sequences (\nnn) and - doubles backslashes. - - encode('123\000456'::bytea, 'escape') - 123\000456 - - get_bit - get_bit(string, offset) + get_bit(bytes bytea, offset int) int - Extract bit from string + Extract n'th bit + from binary string get_bit('Th\000omas'::bytea, 45) 1 @@ -3719,67 +2840,82 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); get_byte - get_byte(string, offset) + get_byte(bytes bytea, offset int) int - Extract byte from string + Extract n'th byte + from binary string get_byte('Th\000omas'::bytea, 4) 109 - - - - length - - length(string) - - int - - Length of binary string - - binary string - length - - - length - of a binary string - binary strings, length - - - length('jo\000se'::bytea) - 5 - + + + + length + + length(bytes bytea) + + int + + Number of bytes in binary string + + binary string + length + + + length + of a binary string + binary strings, length + + + length('jo\000se'::bytea) + 5 + - - - - md5 - - md5(string) - - text - - Calculates the MD5 hash of string, - returning the result in hexadecimal - - md5('Th\000omas'::bytea) - 8ab2d3c9689aaf18​b4958c334c82d8b1 - + + length(bytes bytea, + encoding name) + int + + Number of characters in bytes, assuming + that it is text in the given encoding + + length('jose'::bytea, 'UTF8') + 4 + + + + + + md5 + + md5(bytes bytea) + + text + + MD5 hash, with + the result written in hexadecimal + + md5('Th\000omas'::bytea) + 8ab2d3c9689aaf18​b4958c334c82d8b1 + set_bit - set_bit(string, - offset, newvalue) + set_bit(bytes bytea, + offset int, + newvalue int) bytea - Set bit in string + Set n'th bit in + binary string set_bit('Th\000omas'::bytea, 45, 0) Th\000omAs @@ -3790,12 +2926,14 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); set_byte - set_byte(string, - offset, newvalue) + set_byte(bytes bytea, + offset int, + newvalue int) bytea - Set byte in string + Set n'th byte in + binary string set_byte('Th\000omas'::bytea, 4, 64) Th\000o@as @@ -3806,13 +2944,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha224 - sha224(bytea) + sha224(bytes bytea) bytea - SHA-224 hash + SHA-224 hash - sha224('abc') + sha224('abc'::bytea) \x23097d223405d8228642a477bda2​55b32aadbce4bda0b3f7e36c9da7 @@ -3821,13 +2959,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha256 - sha256(bytea) + sha256(bytes bytea) bytea - SHA-256 hash + SHA-256 hash - sha256('abc') + sha256('abc'::bytea) \xba7816bf8f01cfea414140de5dae2223​b00361a396177a9cb410ff61f20015ad @@ -3836,13 +2974,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha384 - sha384(bytea) + sha384(bytes bytea) bytea - SHA-384 hash + SHA-384 hash - sha384('abc') + sha384('abc'::bytea) \xcb00753f45a35e8bb5a03d699ac65007​272c32ab0eded1631a8b605a43ff5bed​8086072ba1e7cc2358baeca134c825a7 @@ -3851,34 +2989,258 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha512 - sha512(bytea) + sha512(bytes bytea) bytea - SHA-512 hash + SHA-512 hash - sha512('abc') + sha512('abc'::bytea) \xddaf35a193617abacc417349ae204131​12e6fa4e89a97ea20a9eeee64b55d39a​2192992a274fc1a836ba3c23a3feebbd​454d4423643ce80e2a9ac94fa54ca49f + + + + + substr + + substr(bytes bytea, from int , count int) + + bytea + + Extract substring (same as + substring(bytea from from for count)) + + substr('alphabet', 3, 2) + ph + - - get_byte and set_byte number the first byte - of a binary string as byte 0. - get_bit and set_bit number bits from the - right within each byte; for example bit 0 is the least significant bit of - the first byte, and bit 15 is the most significant bit of the second byte. + + Functions get_byte and set_byte + number the first byte of a binary string as byte 0. + Functions get_bit and set_bit + number bits from the right within each byte; for example bit 0 is the least + significant bit of the first byte, and bit 15 is the most significant bit + of the second byte. + + + + For historical reasons, the function md5 + returns a hex-encoded value of type text whereas the SHA-2 + functions return type bytea. Use the functions + encode + and decode to + convert between the two. For example write encode(sha256('abc'), + 'hex') to get a hex-encoded text representation, + or decode(md5('abc'), 'hex') to get + a bytea value. - Note that for historic reasons, the function md5 - returns a hex-encoded value of type text whereas the SHA-2 - functions return type bytea. Use the functions - encode and decode to convert - between the two, for example encode(sha256('abc'), - 'hex') to get a hex-encoded text representation. + + character string + converting to binary string + + + binary string + converting to character string + + Functions for converting strings between different character sets + (encodings), and for representing arbitrary binary data in textual + form, are shown in + . For these + functions, an argument or result of type text is expressed + in the database's default encoding, while arguments or results of + type bytea are in an encoding named by another argument. + + + + Text/Binary String Conversion Functions + + + + Function + Return Type + Description + Example + Result + + + + + + + + convert + + convert(bytes bytea, + src_encoding name, + dest_encoding name) + + bytea + + Convert binary string representing text in + encoding src_encoding + to a binary string in encoding dest_encoding + (see for + available conversions) + + convert('text_in_utf8', 'UTF8', 'LATIN1') + text_in_utf8 represented in Latin-1 encoding + + + + + + convert_from + + convert_from(bytes bytea, + src_encoding name) + + text + + Convert binary string representing text in + encoding src_encoding + to text in the database encoding + (see for + available conversions) + + convert_from('text_in_utf8', 'UTF8') + text_in_utf8 represented in the + database encoding + + + + + + convert_to + + convert_to(string text, + dest_encoding name) + + bytea + + Convert text (in the database encoding) to a binary + string encoded in encoding dest_encoding + (see for + available conversions) + + convert_to('some text', 'UTF8') + some text represented in UTF8 encoding + + + + + + encode + + encode(bytes bytea, + format text) + + text + + Encode binary data into a textual representation; supported + format values are: + base64, + escape, + hex + + encode('123\000\001', 'base64') + MTIzAAE= + + + + + + decode + + decode(string text, + format text) + + bytea + + Decode binary data from a textual representation given + in string; allowed + format values are the same as + for encode + + decode('MTIzAAE=', 'base64') + \x3132330001 + + + +
+ + + The encode and decode + functions support the following textual formats: + + + + base64 + + base64 format + + + + The base64 format is that + of RFC + 2045 Section 6.8. As per the RFC, encoded lines are + broken at 76 characters. However instead of the MIME CRLF + end-of-line marker, only a newline is used for end-of-line. + The decode function ignores carriage-return, + newline, space, and tab characters. Otherwise, an error is + raised when decode is supplied invalid + base64 data — including when trailing padding is incorrect. + + + + + + escape + + escape format + + + + The escape format converts zero bytes and + bytes with the high bit set into octal escape sequences + (\nnn), and it doubles + backslashes. Other byte values are represented literally. + The decode function will raise an error if a + backslash is not followed by either a second backslash or three + octal digits; it accepts other byte values unchanged. + + + + + + hex + + hex format + + + + The hex format represents each 4 bits of + data as one hexadecimal digit, 0 + through f, writing the higher-order digit of + each byte first. The encode function outputs + the a-f hex digits in lower + case. Because the smallest unit of data is 8 bits, there are + always an even number of characters returned + by encode. + The decode function + accepts the a-f characters in + either upper or lower case. An error is raised + when decode is given invalid hex data + — including when given an odd number of characters. + + + + @@ -3976,22 +3338,20 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); - The following SQL-standard functions work on bit - strings as well as character strings: + Some of the binary-string functions shown in + and + are also available + for bit strings, specifically: length, bit_length, octet_length, position, substring, - overlay. - - - - The following functions work on bit strings as well as binary - strings: + overlay, get_bit, set_bit. - When working with a bit string, these functions number the first + When working with a bit string, get_bit + and set_bit number the first (leftmost) bit of the string as bit 0.