diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 55669b5cad..057a6bb81a 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -1495,313 +1495,12 @@ $ psql -l Automatic Character Set Conversion Between Server and Client - PostgreSQL supports automatic - character set conversion between server and client for certain - character set combinations. The conversion information is stored in the - pg_conversion system catalog. PostgreSQL - comes with some predefined conversions, as shown in . You can create a new - conversion using the SQL command CREATE CONVERSION. + PostgreSQL supports automatic character + set conversion between server and client for many combinations of + character sets ( + shows which ones). - - Client/Server Character Set Conversions - - - - Server Character Set - Available Client Character Sets - - - - - BIG5 - not supported as a server encoding - - - - EUC_CN - EUC_CN, - MULE_INTERNAL, - UTF8 - - - - EUC_JP - EUC_JP, - MULE_INTERNAL, - SJIS, - UTF8 - - - - EUC_JIS_2004 - EUC_JIS_2004, - SHIFT_JIS_2004, - UTF8 - - - - EUC_KR - EUC_KR, - MULE_INTERNAL, - UTF8 - - - - EUC_TW - EUC_TW, - BIG5, - MULE_INTERNAL, - UTF8 - - - - GB18030 - not supported as a server encoding - - - - GBK - not supported as a server encoding - - - - ISO_8859_5 - ISO_8859_5, - KOI8R, - MULE_INTERNAL, - UTF8, - WIN866, - WIN1251 - - - - ISO_8859_6 - ISO_8859_6, - UTF8 - - - - ISO_8859_7 - ISO_8859_7, - UTF8 - - - - ISO_8859_8 - ISO_8859_8, - UTF8 - - - - JOHAB - not supported as a server encoding - - - - KOI8R - KOI8R, - ISO_8859_5, - MULE_INTERNAL, - UTF8, - WIN866, - WIN1251 - - - - KOI8U - KOI8U, - UTF8 - - - - LATIN1 - LATIN1, - MULE_INTERNAL, - UTF8 - - - - LATIN2 - LATIN2, - MULE_INTERNAL, - UTF8, - WIN1250 - - - - LATIN3 - LATIN3, - MULE_INTERNAL, - UTF8 - - - - LATIN4 - LATIN4, - MULE_INTERNAL, - UTF8 - - - - LATIN5 - LATIN5, - UTF8 - - - - LATIN6 - LATIN6, - UTF8 - - - - LATIN7 - LATIN7, - UTF8 - - - - LATIN8 - LATIN8, - UTF8 - - - - LATIN9 - LATIN9, - UTF8 - - - - LATIN10 - LATIN10, - UTF8 - - - - MULE_INTERNAL - MULE_INTERNAL, - BIG5, - EUC_CN, - EUC_JP, - EUC_KR, - EUC_TW, - ISO_8859_5, - KOI8R, - LATIN1 to LATIN4, - SJIS, - WIN866, - WIN1250, - WIN1251 - - - - SJIS - not supported as a server encoding - - - - SHIFT_JIS_2004 - not supported as a server encoding - - - - SQL_ASCII - any (no conversion will be performed) - - - - UHC - not supported as a server encoding - - - - UTF8 - all supported encodings - - - - WIN866 - WIN866, - ISO_8859_5, - KOI8R, - MULE_INTERNAL, - UTF8, - WIN1251 - - - - WIN874 - WIN874, - UTF8 - - - - WIN1250 - WIN1250, - LATIN2, - MULE_INTERNAL, - UTF8 - - - - WIN1251 - WIN1251, - ISO_8859_5, - KOI8R, - MULE_INTERNAL, - UTF8, - WIN866 - - - - WIN1252 - WIN1252, - UTF8 - - - - WIN1253 - WIN1253, - UTF8 - - - - WIN1254 - WIN1254, - UTF8 - - - - WIN1255 - WIN1255, - UTF8 - - - - WIN1256 - WIN1256, - UTF8 - - - - WIN1257 - WIN1257, - UTF8 - - - - WIN1258 - WIN1258, - UTF8 - - - - -
- To enable automatic character set conversion, you have to tell PostgreSQL the character set @@ -1905,6 +1604,970 @@ RESET client_encoding; + + Available Character Set Conversions + + + PostgreSQL allows conversion between any + two character sets for which a conversion function is listed in the + pg_conversion + system catalog. PostgreSQL comes with + some predefined conversions, as summarized in + and shown in more + detail in . You can + create a new conversion using the SQL command + . (To be used for automatic + client/server conversions, a conversion must be marked + as default for its character set pair.) + + + + Built-in Client/Server Character Set Conversions + + + + Server Character Set + Available Client Character Sets + + + + + BIG5 + not supported as a server encoding + + + + EUC_CN + EUC_CN, + MULE_INTERNAL, + UTF8 + + + + EUC_JP + EUC_JP, + MULE_INTERNAL, + SJIS, + UTF8 + + + + EUC_JIS_2004 + EUC_JIS_2004, + SHIFT_JIS_2004, + UTF8 + + + + EUC_KR + EUC_KR, + MULE_INTERNAL, + UTF8 + + + + EUC_TW + EUC_TW, + BIG5, + MULE_INTERNAL, + UTF8 + + + + GB18030 + not supported as a server encoding + + + + GBK + not supported as a server encoding + + + + ISO_8859_5 + ISO_8859_5, + KOI8R, + MULE_INTERNAL, + UTF8, + WIN866, + WIN1251 + + + + ISO_8859_6 + ISO_8859_6, + UTF8 + + + + ISO_8859_7 + ISO_8859_7, + UTF8 + + + + ISO_8859_8 + ISO_8859_8, + UTF8 + + + + JOHAB + not supported as a server encoding + + + + KOI8R + KOI8R, + ISO_8859_5, + MULE_INTERNAL, + UTF8, + WIN866, + WIN1251 + + + + KOI8U + KOI8U, + UTF8 + + + + LATIN1 + LATIN1, + MULE_INTERNAL, + UTF8 + + + + LATIN2 + LATIN2, + MULE_INTERNAL, + UTF8, + WIN1250 + + + + LATIN3 + LATIN3, + MULE_INTERNAL, + UTF8 + + + + LATIN4 + LATIN4, + MULE_INTERNAL, + UTF8 + + + + LATIN5 + LATIN5, + UTF8 + + + + LATIN6 + LATIN6, + UTF8 + + + + LATIN7 + LATIN7, + UTF8 + + + + LATIN8 + LATIN8, + UTF8 + + + + LATIN9 + LATIN9, + UTF8 + + + + LATIN10 + LATIN10, + UTF8 + + + + MULE_INTERNAL + MULE_INTERNAL, + BIG5, + EUC_CN, + EUC_JP, + EUC_KR, + EUC_TW, + ISO_8859_5, + KOI8R, + LATIN1 to LATIN4, + SJIS, + WIN866, + WIN1250, + WIN1251 + + + + SJIS + not supported as a server encoding + + + + SHIFT_JIS_2004 + not supported as a server encoding + + + + SQL_ASCII + any (no conversion will be performed) + + + + UHC + not supported as a server encoding + + + + UTF8 + all supported encodings + + + + WIN866 + WIN866, + ISO_8859_5, + KOI8R, + MULE_INTERNAL, + UTF8, + WIN1251 + + + + WIN874 + WIN874, + UTF8 + + + + WIN1250 + WIN1250, + LATIN2, + MULE_INTERNAL, + UTF8 + + + + WIN1251 + WIN1251, + ISO_8859_5, + KOI8R, + MULE_INTERNAL, + UTF8, + WIN866 + + + + WIN1252 + WIN1252, + UTF8 + + + + WIN1253 + WIN1253, + UTF8 + + + + WIN1254 + WIN1254, + UTF8 + + + + WIN1255 + WIN1255, + UTF8 + + + + WIN1256 + WIN1256, + UTF8 + + + + WIN1257 + WIN1257, + UTF8 + + + + WIN1258 + WIN1258, + UTF8 + + + + +
+ + + All Built-in Character Set Conversions + + + + Conversion Name + + + The conversion names follow a standard naming scheme: The + official name of the source encoding with all + non-alphanumeric characters replaced by underscores, followed + by _to_, followed by the similarly processed + destination encoding name. Therefore, these names sometimes + deviate from the customary encoding names shown in + . + + + + Source Encoding + Destination Encoding + + + + + + big5_to_euc_tw + BIG5 + EUC_TW + + + big5_to_mic + BIG5 + MULE_INTERNAL + + + big5_to_utf8 + BIG5 + UTF8 + + + euc_cn_to_mic + EUC_CN + MULE_INTERNAL + + + euc_cn_to_utf8 + EUC_CN + UTF8 + + + euc_jp_to_mic + EUC_JP + MULE_INTERNAL + + + euc_jp_to_sjis + EUC_JP + SJIS + + + euc_jp_to_utf8 + EUC_JP + UTF8 + + + euc_kr_to_mic + EUC_KR + MULE_INTERNAL + + + euc_kr_to_utf8 + EUC_KR + UTF8 + + + euc_tw_to_big5 + EUC_TW + BIG5 + + + euc_tw_to_mic + EUC_TW + MULE_INTERNAL + + + euc_tw_to_utf8 + EUC_TW + UTF8 + + + gb18030_to_utf8 + GB18030 + UTF8 + + + gbk_to_utf8 + GBK + UTF8 + + + iso_8859_10_to_utf8 + LATIN6 + UTF8 + + + iso_8859_13_to_utf8 + LATIN7 + UTF8 + + + iso_8859_14_to_utf8 + LATIN8 + UTF8 + + + iso_8859_15_to_utf8 + LATIN9 + UTF8 + + + iso_8859_16_to_utf8 + LATIN10 + UTF8 + + + iso_8859_1_to_mic + LATIN1 + MULE_INTERNAL + + + iso_8859_1_to_utf8 + LATIN1 + UTF8 + + + iso_8859_2_to_mic + LATIN2 + MULE_INTERNAL + + + iso_8859_2_to_utf8 + LATIN2 + UTF8 + + + iso_8859_2_to_windows_1250 + LATIN2 + WIN1250 + + + iso_8859_3_to_mic + LATIN3 + MULE_INTERNAL + + + iso_8859_3_to_utf8 + LATIN3 + UTF8 + + + iso_8859_4_to_mic + LATIN4 + MULE_INTERNAL + + + iso_8859_4_to_utf8 + LATIN4 + UTF8 + + + iso_8859_5_to_koi8_r + ISO_8859_5 + KOI8R + + + iso_8859_5_to_mic + ISO_8859_5 + MULE_INTERNAL + + + iso_8859_5_to_utf8 + ISO_8859_5 + UTF8 + + + iso_8859_5_to_windows_1251 + ISO_8859_5 + WIN1251 + + + iso_8859_5_to_windows_866 + ISO_8859_5 + WIN866 + + + iso_8859_6_to_utf8 + ISO_8859_6 + UTF8 + + + iso_8859_7_to_utf8 + ISO_8859_7 + UTF8 + + + iso_8859_8_to_utf8 + ISO_8859_8 + UTF8 + + + iso_8859_9_to_utf8 + LATIN5 + UTF8 + + + johab_to_utf8 + JOHAB + UTF8 + + + koi8_r_to_iso_8859_5 + KOI8R + ISO_8859_5 + + + koi8_r_to_mic + KOI8R + MULE_INTERNAL + + + koi8_r_to_utf8 + KOI8R + UTF8 + + + koi8_r_to_windows_1251 + KOI8R + WIN1251 + + + koi8_r_to_windows_866 + KOI8R + WIN866 + + + koi8_u_to_utf8 + KOI8U + UTF8 + + + mic_to_big5 + MULE_INTERNAL + BIG5 + + + mic_to_euc_cn + MULE_INTERNAL + EUC_CN + + + mic_to_euc_jp + MULE_INTERNAL + EUC_JP + + + mic_to_euc_kr + MULE_INTERNAL + EUC_KR + + + mic_to_euc_tw + MULE_INTERNAL + EUC_TW + + + mic_to_iso_8859_1 + MULE_INTERNAL + LATIN1 + + + mic_to_iso_8859_2 + MULE_INTERNAL + LATIN2 + + + mic_to_iso_8859_3 + MULE_INTERNAL + LATIN3 + + + mic_to_iso_8859_4 + MULE_INTERNAL + LATIN4 + + + mic_to_iso_8859_5 + MULE_INTERNAL + ISO_8859_5 + + + mic_to_koi8_r + MULE_INTERNAL + KOI8R + + + mic_to_sjis + MULE_INTERNAL + SJIS + + + mic_to_windows_1250 + MULE_INTERNAL + WIN1250 + + + mic_to_windows_1251 + MULE_INTERNAL + WIN1251 + + + mic_to_windows_866 + MULE_INTERNAL + WIN866 + + + sjis_to_euc_jp + SJIS + EUC_JP + + + sjis_to_mic + SJIS + MULE_INTERNAL + + + sjis_to_utf8 + SJIS + UTF8 + + + windows_1258_to_utf8 + WIN1258 + UTF8 + + + uhc_to_utf8 + UHC + UTF8 + + + utf8_to_big5 + UTF8 + BIG5 + + + utf8_to_euc_cn + UTF8 + EUC_CN + + + utf8_to_euc_jp + UTF8 + EUC_JP + + + utf8_to_euc_kr + UTF8 + EUC_KR + + + utf8_to_euc_tw + UTF8 + EUC_TW + + + utf8_to_gb18030 + UTF8 + GB18030 + + + utf8_to_gbk + UTF8 + GBK + + + utf8_to_iso_8859_1 + UTF8 + LATIN1 + + + utf8_to_iso_8859_10 + UTF8 + LATIN6 + + + utf8_to_iso_8859_13 + UTF8 + LATIN7 + + + utf8_to_iso_8859_14 + UTF8 + LATIN8 + + + utf8_to_iso_8859_15 + UTF8 + LATIN9 + + + utf8_to_iso_8859_16 + UTF8 + LATIN10 + + + utf8_to_iso_8859_2 + UTF8 + LATIN2 + + + utf8_to_iso_8859_3 + UTF8 + LATIN3 + + + utf8_to_iso_8859_4 + UTF8 + LATIN4 + + + utf8_to_iso_8859_5 + UTF8 + ISO_8859_5 + + + utf8_to_iso_8859_6 + UTF8 + ISO_8859_6 + + + utf8_to_iso_8859_7 + UTF8 + ISO_8859_7 + + + utf8_to_iso_8859_8 + UTF8 + ISO_8859_8 + + + utf8_to_iso_8859_9 + UTF8 + LATIN5 + + + utf8_to_johab + UTF8 + JOHAB + + + utf8_to_koi8_r + UTF8 + KOI8R + + + utf8_to_koi8_u + UTF8 + KOI8U + + + utf8_to_sjis + UTF8 + SJIS + + + utf8_to_windows_1258 + UTF8 + WIN1258 + + + utf8_to_uhc + UTF8 + UHC + + + utf8_to_windows_1250 + UTF8 + WIN1250 + + + utf8_to_windows_1251 + UTF8 + WIN1251 + + + utf8_to_windows_1252 + UTF8 + WIN1252 + + + utf8_to_windows_1253 + UTF8 + WIN1253 + + + utf8_to_windows_1254 + UTF8 + WIN1254 + + + utf8_to_windows_1255 + UTF8 + WIN1255 + + + utf8_to_windows_1256 + UTF8 + WIN1256 + + + utf8_to_windows_1257 + UTF8 + WIN1257 + + + utf8_to_windows_866 + UTF8 + WIN866 + + + utf8_to_windows_874 + UTF8 + WIN874 + + + windows_1250_to_iso_8859_2 + WIN1250 + LATIN2 + + + windows_1250_to_mic + WIN1250 + MULE_INTERNAL + + + windows_1250_to_utf8 + WIN1250 + UTF8 + + + windows_1251_to_iso_8859_5 + WIN1251 + ISO_8859_5 + + + windows_1251_to_koi8_r + WIN1251 + KOI8R + + + windows_1251_to_mic + WIN1251 + MULE_INTERNAL + + + windows_1251_to_utf8 + WIN1251 + UTF8 + + + windows_1251_to_windows_866 + WIN1251 + WIN866 + + + windows_1252_to_utf8 + WIN1252 + UTF8 + + + windows_1256_to_utf8 + WIN1256 + UTF8 + + + windows_866_to_iso_8859_5 + WIN866 + ISO_8859_5 + + + windows_866_to_koi8_r + WIN866 + KOI8R + + + windows_866_to_mic + WIN866 + MULE_INTERNAL + + + windows_866_to_utf8 + WIN866 + UTF8 + + + windows_866_to_windows_1251 + WIN866 + WIN + + + windows_874_to_utf8 + WIN874 + UTF8 + + + euc_jis_2004_to_utf8 + EUC_JIS_2004 + UTF8 + + + utf8_to_euc_jis_2004 + UTF8 + EUC_JIS_2004 + + + shift_jis_2004_to_utf8 + SHIFT_JIS_2004 + UTF8 + + + utf8_to_shift_jis_2004 + UTF8 + SHIFT_JIS_2004 + + + euc_jis_2004_to_shift_jis_2004 + EUC_JIS_2004 + SHIFT_JIS_2004 + + + shift_jis_2004_to_euc_jis_2004 + SHIFT_JIS_2004 + EUC_JIS_2004 + + + +
+
+ Further Reading diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 72072e7545..6c4359dc7b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1455,8 +1455,7 @@ and text. Unless otherwise noted, all of the functions listed below work on all of these types, but be wary of potential effects of automatic space-padding when using the - character type. Some functions also exist - natively for the bit-string types. + character type. @@ -1628,7 +1627,8 @@ text - Extract substring + Extract substring (provide at least one of from + and for) substring('Thomas' from 2 for 3) hom @@ -1638,9 +1638,9 @@ substring(string from pattern) text - Extract substring matching POSIX regular expression. See + Extract substring matching POSIX regular expression (see for more information on pattern - matching. + matching) substring('Thomas' from '...$') mas @@ -1650,9 +1650,9 @@ substring(string from pattern for escape) text - Extract substring matching SQL regular expression. - See for more information on - pattern matching. + Extract substring matching SQL regular expression + (see for more information on + pattern matching) substring('Thomas' from '%#"o_a#"_' for '#') oma @@ -1682,9 +1682,8 @@ trim(leading | trailing | both from - string - , characters - ) + string , + characters) text @@ -1822,101 +1821,6 @@ - - convert - - convert(string bytea, - src_encoding name, - dest_encoding name) - - bytea - - Convert string to dest_encoding. The - original encoding is specified by - src_encoding. The - string must be valid in this encoding. - Conversions can be defined by CREATE CONVERSION. - Also there are some predefined conversions. See for available conversions. - - convert('text_in_utf8', 'UTF8', 'LATIN1') - text_in_utf8 represented in Latin-1 - encoding (ISO 8859-1) - - - - - - convert_from - - convert_from(string bytea, - src_encoding name) - - text - - Convert string to the database encoding. The original encoding - is specified by src_encoding. The - string must be valid in this encoding. - - convert_from('text_in_utf8', 'UTF8') - text_in_utf8 represented in the current database encoding - - - - - - convert_to - - convert_to(string text, - dest_encoding name) - - bytea - - Convert string to dest_encoding. - - convert_to('some text', 'UTF8') - some text represented in the UTF8 encoding - - - - - - decode - - decode(string text, - format text) - - bytea - - Decode binary data from textual representation in string. - Options for format are same as in encode. - - decode('MTIzAAE=', 'base64') - \x3132330001 - - - - - - encode - - encode(data bytea, - format text) - - text - - Encode binary data into a textual representation. Supported - formats are: base64, hex, escape. - escape converts zero bytes and high-bit-set bytes to - octal sequences (\nnn) and - doubles backslashes. - - encode('123\000\001', 'base64') - MTIzAAE= - - - - format @@ -1955,13 +1859,14 @@ left - left(str text, + left(string text, n int) text - Return first n characters in the string. When n - is negative, return all but last |n| characters. + Return first n characters in the + string, or when n is negative, return + all but last |n| characters left('abcde', 2) ab @@ -1982,19 +1887,6 @@ 4 - - length(string bytea, - encoding name ) - int - - Number of characters in string in the given - encoding. The string - must be valid in this encoding. - - length('jose', 'UTF8') - 4 - - @@ -2006,7 +1898,7 @@ text - Fill up the string to length + Extend the string to length length by prepending the characters fill (a space by default). If the string is already longer than @@ -2044,8 +1936,8 @@ text - Calculates the MD5 hash of string, - returning the result in hexadecimal + MD5 hash, with + the result written in hexadecimal md5('abc') 900150983cd24fb0 d6963f7d28e17f72 @@ -2182,8 +2074,8 @@ text[] Return captured substring(s) resulting from the first match of a POSIX - regular expression to the string. See - for more information. + regular expression to the string (see + for more information) regexp_match('foobarbequebaz', '(bar)(beque)') {bar,beque} @@ -2199,8 +2091,8 @@ setof text[] Return captured substring(s) resulting from matching a POSIX regular - expression to the string. See - for more information. + expression to the string (see + for more information) regexp_matches('foobarbequebaz', 'ba.', 'g') {bar}{baz} (2 rows) @@ -2215,8 +2107,8 @@ text - Replace substring(s) matching a POSIX regular expression. See - for more information. + Replace substring(s) matching a POSIX regular expression (see + for more information) regexp_replace('Thomas', '.[mN]a.', 'M') ThM @@ -2232,8 +2124,8 @@ text[] Split string using a POSIX regular expression as - the delimiter. See for more - information. + the delimiter (see for more + information) regexp_split_to_array('hello world', '\s+') {hello,world} @@ -2249,8 +2141,8 @@ setof text Split string using a POSIX regular expression as - the delimiter. See for more - information. + the delimiter (see for more + information) regexp_split_to_table('hello world', '\s+') helloworld (2 rows) @@ -2296,7 +2188,7 @@ text - Return reversed string. + Reverse the order of the characters in string reverse('abcde') edcba @@ -2307,13 +2199,14 @@ right - right(str text, + right(string text, n int) text - Return last n characters in the string. When n - is negative, return all but first |n| characters. + Return last n characters in the string, + or when n is negative, return all but + first |n| characters right('abcde', 2) de @@ -2330,7 +2223,7 @@ text - Fill up the string to length + Extend the string to length length by appending the characters fill (a space by default). If the string is already longer than @@ -2418,7 +2311,8 @@ bool - Returns true if string starts with prefix. + Return true if string starts + with prefix starts_with('alphabet', 'alph') t @@ -2497,781 +2391,12 @@ - See also the aggregate function string_agg in - . + See also the aggregate function string_agg in + , and the functions for + converting between strings and the bytea type in + . - - Built-in Conversions - - - - Conversion Name - - - The conversion names follow a standard naming scheme: The - official name of the source encoding with all - non-alphanumeric characters replaced by underscores, followed - by _to_, followed by the similarly processed - destination encoding name. Therefore, the names might deviate - from the customary encoding names. - - - - Source Encoding - Destination Encoding - - - - - - big5_to_euc_tw - BIG5 - EUC_TW - - - - big5_to_mic - BIG5 - MULE_INTERNAL - - - - big5_to_utf8 - BIG5 - UTF8 - - - - euc_cn_to_mic - EUC_CN - MULE_INTERNAL - - - - euc_cn_to_utf8 - EUC_CN - UTF8 - - - - euc_jp_to_mic - EUC_JP - MULE_INTERNAL - - - - euc_jp_to_sjis - EUC_JP - SJIS - - - - euc_jp_to_utf8 - EUC_JP - UTF8 - - - - euc_kr_to_mic - EUC_KR - MULE_INTERNAL - - - - euc_kr_to_utf8 - EUC_KR - UTF8 - - - - euc_tw_to_big5 - EUC_TW - BIG5 - - - - euc_tw_to_mic - EUC_TW - MULE_INTERNAL - - - - euc_tw_to_utf8 - EUC_TW - UTF8 - - - - gb18030_to_utf8 - GB18030 - UTF8 - - - - gbk_to_utf8 - GBK - UTF8 - - - - iso_8859_10_to_utf8 - LATIN6 - UTF8 - - - - iso_8859_13_to_utf8 - LATIN7 - UTF8 - - - - iso_8859_14_to_utf8 - LATIN8 - UTF8 - - - - iso_8859_15_to_utf8 - LATIN9 - UTF8 - - - - iso_8859_16_to_utf8 - LATIN10 - UTF8 - - - - iso_8859_1_to_mic - LATIN1 - MULE_INTERNAL - - - - iso_8859_1_to_utf8 - LATIN1 - UTF8 - - - - iso_8859_2_to_mic - LATIN2 - MULE_INTERNAL - - - - iso_8859_2_to_utf8 - LATIN2 - UTF8 - - - - iso_8859_2_to_windows_1250 - LATIN2 - WIN1250 - - - - iso_8859_3_to_mic - LATIN3 - MULE_INTERNAL - - - - iso_8859_3_to_utf8 - LATIN3 - UTF8 - - - - iso_8859_4_to_mic - LATIN4 - MULE_INTERNAL - - - - iso_8859_4_to_utf8 - LATIN4 - UTF8 - - - - iso_8859_5_to_koi8_r - ISO_8859_5 - KOI8R - - - - iso_8859_5_to_mic - ISO_8859_5 - MULE_INTERNAL - - - - iso_8859_5_to_utf8 - ISO_8859_5 - UTF8 - - - - iso_8859_5_to_windows_1251 - ISO_8859_5 - WIN1251 - - - - iso_8859_5_to_windows_866 - ISO_8859_5 - WIN866 - - - - iso_8859_6_to_utf8 - ISO_8859_6 - UTF8 - - - - iso_8859_7_to_utf8 - ISO_8859_7 - UTF8 - - - - iso_8859_8_to_utf8 - ISO_8859_8 - UTF8 - - - - iso_8859_9_to_utf8 - LATIN5 - UTF8 - - - - johab_to_utf8 - JOHAB - UTF8 - - - - koi8_r_to_iso_8859_5 - KOI8R - ISO_8859_5 - - - - koi8_r_to_mic - KOI8R - MULE_INTERNAL - - - - koi8_r_to_utf8 - KOI8R - UTF8 - - - - koi8_r_to_windows_1251 - KOI8R - WIN1251 - - - - koi8_r_to_windows_866 - KOI8R - WIN866 - - - - koi8_u_to_utf8 - KOI8U - UTF8 - - - - mic_to_big5 - MULE_INTERNAL - BIG5 - - - - mic_to_euc_cn - MULE_INTERNAL - EUC_CN - - - - mic_to_euc_jp - MULE_INTERNAL - EUC_JP - - - - mic_to_euc_kr - MULE_INTERNAL - EUC_KR - - - - mic_to_euc_tw - MULE_INTERNAL - EUC_TW - - - - mic_to_iso_8859_1 - MULE_INTERNAL - LATIN1 - - - - mic_to_iso_8859_2 - MULE_INTERNAL - LATIN2 - - - - mic_to_iso_8859_3 - MULE_INTERNAL - LATIN3 - - - - mic_to_iso_8859_4 - MULE_INTERNAL - LATIN4 - - - - mic_to_iso_8859_5 - MULE_INTERNAL - ISO_8859_5 - - - - mic_to_koi8_r - MULE_INTERNAL - KOI8R - - - - mic_to_sjis - MULE_INTERNAL - SJIS - - - - mic_to_windows_1250 - MULE_INTERNAL - WIN1250 - - - - mic_to_windows_1251 - MULE_INTERNAL - WIN1251 - - - - mic_to_windows_866 - MULE_INTERNAL - WIN866 - - - - sjis_to_euc_jp - SJIS - EUC_JP - - - - sjis_to_mic - SJIS - MULE_INTERNAL - - - - sjis_to_utf8 - SJIS - UTF8 - - - - windows_1258_to_utf8 - WIN1258 - UTF8 - - - - uhc_to_utf8 - UHC - UTF8 - - - - utf8_to_big5 - UTF8 - BIG5 - - - - utf8_to_euc_cn - UTF8 - EUC_CN - - - - utf8_to_euc_jp - UTF8 - EUC_JP - - - - utf8_to_euc_kr - UTF8 - EUC_KR - - - - utf8_to_euc_tw - UTF8 - EUC_TW - - - - utf8_to_gb18030 - UTF8 - GB18030 - - - - utf8_to_gbk - UTF8 - GBK - - - - utf8_to_iso_8859_1 - UTF8 - LATIN1 - - - - utf8_to_iso_8859_10 - UTF8 - LATIN6 - - - - utf8_to_iso_8859_13 - UTF8 - LATIN7 - - - - utf8_to_iso_8859_14 - UTF8 - LATIN8 - - - - utf8_to_iso_8859_15 - UTF8 - LATIN9 - - - - utf8_to_iso_8859_16 - UTF8 - LATIN10 - - - - utf8_to_iso_8859_2 - UTF8 - LATIN2 - - - - utf8_to_iso_8859_3 - UTF8 - LATIN3 - - - - utf8_to_iso_8859_4 - UTF8 - LATIN4 - - - - utf8_to_iso_8859_5 - UTF8 - ISO_8859_5 - - - - utf8_to_iso_8859_6 - UTF8 - ISO_8859_6 - - - - utf8_to_iso_8859_7 - UTF8 - ISO_8859_7 - - - - utf8_to_iso_8859_8 - UTF8 - ISO_8859_8 - - - - utf8_to_iso_8859_9 - UTF8 - LATIN5 - - - - utf8_to_johab - UTF8 - JOHAB - - - - utf8_to_koi8_r - UTF8 - KOI8R - - - - utf8_to_koi8_u - UTF8 - KOI8U - - - - utf8_to_sjis - UTF8 - SJIS - - - - utf8_to_windows_1258 - UTF8 - WIN1258 - - - - utf8_to_uhc - UTF8 - UHC - - - - utf8_to_windows_1250 - UTF8 - WIN1250 - - - - utf8_to_windows_1251 - UTF8 - WIN1251 - - - - utf8_to_windows_1252 - UTF8 - WIN1252 - - - - utf8_to_windows_1253 - UTF8 - WIN1253 - - - - utf8_to_windows_1254 - UTF8 - WIN1254 - - - - utf8_to_windows_1255 - UTF8 - WIN1255 - - - - utf8_to_windows_1256 - UTF8 - WIN1256 - - - - utf8_to_windows_1257 - UTF8 - WIN1257 - - - - utf8_to_windows_866 - UTF8 - WIN866 - - - - utf8_to_windows_874 - UTF8 - WIN874 - - - - windows_1250_to_iso_8859_2 - WIN1250 - LATIN2 - - - - windows_1250_to_mic - WIN1250 - MULE_INTERNAL - - - - windows_1250_to_utf8 - WIN1250 - UTF8 - - - - windows_1251_to_iso_8859_5 - WIN1251 - ISO_8859_5 - - - - windows_1251_to_koi8_r - WIN1251 - KOI8R - - - - windows_1251_to_mic - WIN1251 - MULE_INTERNAL - - - - windows_1251_to_utf8 - WIN1251 - UTF8 - - - - windows_1251_to_windows_866 - WIN1251 - WIN866 - - - - windows_1252_to_utf8 - WIN1252 - UTF8 - - - - windows_1256_to_utf8 - WIN1256 - UTF8 - - - - windows_866_to_iso_8859_5 - WIN866 - ISO_8859_5 - - - - windows_866_to_koi8_r - WIN866 - KOI8R - - - - windows_866_to_mic - WIN866 - MULE_INTERNAL - - - - windows_866_to_utf8 - WIN866 - UTF8 - - - - windows_866_to_windows_1251 - WIN866 - WIN - - - - windows_874_to_utf8 - WIN874 - UTF8 - - - - euc_jis_2004_to_utf8 - EUC_JIS_2004 - UTF8 - - - - utf8_to_euc_jis_2004 - UTF8 - EUC_JIS_2004 - - - - shift_jis_2004_to_utf8 - SHIFT_JIS_2004 - UTF8 - - - - utf8_to_shift_jis_2004 - UTF8 - SHIFT_JIS_2004 - - - - euc_jis_2004_to_shift_jis_2004 - EUC_JIS_2004 - SHIFT_JIS_2004 - - - - shift_jis_2004_to_euc_jis_2004 - SHIFT_JIS_2004 - EUC_JIS_2004 - - - - -
- <function>format</function> @@ -3494,7 +2619,9 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); This section describes functions and operators for examining and - manipulating values of type bytea. + manipulating binary strings, that is values of type bytea. + Many of these are equivalent, in purpose and syntax, to the + text-string functions described in the previous section. @@ -3512,6 +2639,8 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); The sample results shown on this page assume that the server parameter bytea_output is set to escape (the traditional PostgreSQL format). + The output will look different when using the default setting + (hex). @@ -3530,11 +2659,11 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); - string || - string + bytes || + bytes bytea - String concatenation + Binary string concatenation binary string concatenation @@ -3544,12 +2673,25 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); \\Post'gres\000 + + + + bit_length + + bit_length(bytes) + + int + Number of bits in binary string + bit_length('jo\000se':bytea) + 40 + + octet_length - octet_length(string) + octet_length(bytes) int Number of bytes in binary string @@ -3562,7 +2704,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); overlay - overlay(string placing string from int for int) + overlay(bytes placing bytes from int for int) bytea @@ -3577,7 +2719,7 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); position - position(substring in string) + position(bytesubstring in bytes) int Location of specified substring @@ -3590,11 +2732,12 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); substring - substring(string from int for int) + substring(bytes from int for int) bytea - Extract substring + Extract substring (provide at least one of from + and for) substring('Th\000omas'::bytea from 2 for 3) h\000o @@ -3606,18 +2749,32 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); trim trim(both - bytes from - string) + bytesremoved from + bytes) bytea Remove the longest string containing only bytes appearing in - bytes from the start - and end of string + bytesremoved from the start + and end of bytes trim('\000\001'::bytea from '\000Tom\001'::bytea) Tom + + + + trim(both from + bytes, + bytesremoved) + + bytea + + Non-standard syntax for trim() + + trim(both from 'yxTomxx'::bytea, 'xyz'::bytea) + Tom + @@ -3649,66 +2806,30 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); btrim - btrim(string - bytea, bytes bytea) + btrim(bytes + bytea, bytesremoved bytea) bytea Remove the longest string containing only bytes appearing in - bytes from the start and end of - string - + bytesremoved from the start and end of + bytes + btrim('\000trim\001'::bytea, '\000\001'::bytea) trim - - - - decode - - decode(string text, - format text) - - bytea - - Decode binary data from textual representation in string. - Options for format are same as in encode. - - decode('123\000456', 'escape') - 123\000456 - - - - - - encode - - encode(data bytea, - format text) - - text - - Encode binary data into a textual representation. Supported - formats are: base64, hex, escape. - escape converts zero bytes and high-bit-set bytes to - octal sequences (\nnn) and - doubles backslashes. - - encode('123\000456'::bytea, 'escape') - 123\000456 - - get_bit - get_bit(string, offset) + get_bit(bytes bytea, offset int) int - Extract bit from string + Extract n'th bit + from binary string get_bit('Th\000omas'::bytea, 45) 1 @@ -3719,67 +2840,82 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); get_byte - get_byte(string, offset) + get_byte(bytes bytea, offset int) int - Extract byte from string + Extract n'th byte + from binary string get_byte('Th\000omas'::bytea, 4) 109 - - - - length - - length(string) - - int - - Length of binary string - - binary string - length - - - length - of a binary string - binary strings, length - - - length('jo\000se'::bytea) - 5 - + + + + length + + length(bytes bytea) + + int + + Number of bytes in binary string + + binary string + length + + + length + of a binary string + binary strings, length + + + length('jo\000se'::bytea) + 5 + - - - - md5 - - md5(string) - - text - - Calculates the MD5 hash of string, - returning the result in hexadecimal - - md5('Th\000omas'::bytea) - 8ab2d3c9689aaf18​b4958c334c82d8b1 - + + length(bytes bytea, + encoding name) + int + + Number of characters in bytes, assuming + that it is text in the given encoding + + length('jose'::bytea, 'UTF8') + 4 + + + + + + md5 + + md5(bytes bytea) + + text + + MD5 hash, with + the result written in hexadecimal + + md5('Th\000omas'::bytea) + 8ab2d3c9689aaf18​b4958c334c82d8b1 + set_bit - set_bit(string, - offset, newvalue) + set_bit(bytes bytea, + offset int, + newvalue int) bytea - Set bit in string + Set n'th bit in + binary string set_bit('Th\000omas'::bytea, 45, 0) Th\000omAs @@ -3790,12 +2926,14 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); set_byte - set_byte(string, - offset, newvalue) + set_byte(bytes bytea, + offset int, + newvalue int) bytea - Set byte in string + Set n'th byte in + binary string set_byte('Th\000omas'::bytea, 4, 64) Th\000o@as @@ -3806,13 +2944,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha224 - sha224(bytea) + sha224(bytes bytea) bytea - SHA-224 hash + SHA-224 hash - sha224('abc') + sha224('abc'::bytea) \x23097d223405d8228642a477bda2​55b32aadbce4bda0b3f7e36c9da7 @@ -3821,13 +2959,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha256 - sha256(bytea) + sha256(bytes bytea) bytea - SHA-256 hash + SHA-256 hash - sha256('abc') + sha256('abc'::bytea) \xba7816bf8f01cfea414140de5dae2223​b00361a396177a9cb410ff61f20015ad @@ -3836,13 +2974,13 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha384 - sha384(bytea) + sha384(bytes bytea) bytea - SHA-384 hash + SHA-384 hash - sha384('abc') + sha384('abc'::bytea) \xcb00753f45a35e8bb5a03d699ac65007​272c32ab0eded1631a8b605a43ff5bed​8086072ba1e7cc2358baeca134c825a7 @@ -3851,34 +2989,258 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); sha512 - sha512(bytea) + sha512(bytes bytea) bytea - SHA-512 hash + SHA-512 hash - sha512('abc') + sha512('abc'::bytea) \xddaf35a193617abacc417349ae204131​12e6fa4e89a97ea20a9eeee64b55d39a​2192992a274fc1a836ba3c23a3feebbd​454d4423643ce80e2a9ac94fa54ca49f + + + + + substr + + substr(bytes bytea, from int , count int) + + bytea + + Extract substring (same as + substring(bytea from from for count)) + + substr('alphabet', 3, 2) + ph + - - get_byte and set_byte number the first byte - of a binary string as byte 0. - get_bit and set_bit number bits from the - right within each byte; for example bit 0 is the least significant bit of - the first byte, and bit 15 is the most significant bit of the second byte. + + Functions get_byte and set_byte + number the first byte of a binary string as byte 0. + Functions get_bit and set_bit + number bits from the right within each byte; for example bit 0 is the least + significant bit of the first byte, and bit 15 is the most significant bit + of the second byte. + + + + For historical reasons, the function md5 + returns a hex-encoded value of type text whereas the SHA-2 + functions return type bytea. Use the functions + encode + and decode to + convert between the two. For example write encode(sha256('abc'), + 'hex') to get a hex-encoded text representation, + or decode(md5('abc'), 'hex') to get + a bytea value. - Note that for historic reasons, the function md5 - returns a hex-encoded value of type text whereas the SHA-2 - functions return type bytea. Use the functions - encode and decode to convert - between the two, for example encode(sha256('abc'), - 'hex') to get a hex-encoded text representation. + + character string + converting to binary string + + + binary string + converting to character string + + Functions for converting strings between different character sets + (encodings), and for representing arbitrary binary data in textual + form, are shown in + . For these + functions, an argument or result of type text is expressed + in the database's default encoding, while arguments or results of + type bytea are in an encoding named by another argument. + + + + Text/Binary String Conversion Functions + + + + Function + Return Type + Description + Example + Result + + + + + + + + convert + + convert(bytes bytea, + src_encoding name, + dest_encoding name) + + bytea + + Convert binary string representing text in + encoding src_encoding + to a binary string in encoding dest_encoding + (see for + available conversions) + + convert('text_in_utf8', 'UTF8', 'LATIN1') + text_in_utf8 represented in Latin-1 encoding + + + + + + convert_from + + convert_from(bytes bytea, + src_encoding name) + + text + + Convert binary string representing text in + encoding src_encoding + to text in the database encoding + (see for + available conversions) + + convert_from('text_in_utf8', 'UTF8') + text_in_utf8 represented in the + database encoding + + + + + + convert_to + + convert_to(string text, + dest_encoding name) + + bytea + + Convert text (in the database encoding) to a binary + string encoded in encoding dest_encoding + (see for + available conversions) + + convert_to('some text', 'UTF8') + some text represented in UTF8 encoding + + + + + + encode + + encode(bytes bytea, + format text) + + text + + Encode binary data into a textual representation; supported + format values are: + base64, + escape, + hex + + encode('123\000\001', 'base64') + MTIzAAE= + + + + + + decode + + decode(string text, + format text) + + bytea + + Decode binary data from a textual representation given + in string; allowed + format values are the same as + for encode + + decode('MTIzAAE=', 'base64') + \x3132330001 + + + +
+ + + The encode and decode + functions support the following textual formats: + + + + base64 + + base64 format + + + + The base64 format is that + of RFC + 2045 Section 6.8. As per the RFC, encoded lines are + broken at 76 characters. However instead of the MIME CRLF + end-of-line marker, only a newline is used for end-of-line. + The decode function ignores carriage-return, + newline, space, and tab characters. Otherwise, an error is + raised when decode is supplied invalid + base64 data — including when trailing padding is incorrect. + + + + + + escape + + escape format + + + + The escape format converts zero bytes and + bytes with the high bit set into octal escape sequences + (\nnn), and it doubles + backslashes. Other byte values are represented literally. + The decode function will raise an error if a + backslash is not followed by either a second backslash or three + octal digits; it accepts other byte values unchanged. + + + + + + hex + + hex format + + + + The hex format represents each 4 bits of + data as one hexadecimal digit, 0 + through f, writing the higher-order digit of + each byte first. The encode function outputs + the a-f hex digits in lower + case. Because the smallest unit of data is 8 bits, there are + always an even number of characters returned + by encode. + The decode function + accepts the a-f characters in + either upper or lower case. An error is raised + when decode is given invalid hex data + — including when given an odd number of characters. + + + + @@ -3976,22 +3338,20 @@ SELECT format('Testing %3$s, %2$s, %s', 'one', 'two', 'three'); - The following SQL-standard functions work on bit - strings as well as character strings: + Some of the binary-string functions shown in + and + are also available + for bit strings, specifically: length, bit_length, octet_length, position, substring, - overlay. - - - - The following functions work on bit strings as well as binary - strings: + overlay, get_bit, set_bit. - When working with a bit string, these functions number the first + When working with a bit string, get_bit + and set_bit number the first (leftmost) bit of the string as bit 0.