postgresql

Commit Graph

Author	SHA1	Message	Date
Peter Eisentraut	bdb839cbde	Update unicode.org URLs Use https, consistent host name, remove references to ftp. Also update the URLs for CLDR, which has moved from Trac to GitHub.	2019-10-13 22:10:38 +02:00
Bruce Momjian	97c39498e5	Update copyright for 2019 Backpatch-through: certain files through 9.4	2019-01-02 12:44:25 -05:00
Peter Eisentraut	9ee0573ef1	Add current directory to Perl include path Recent Perl versions don't have the current directory in the module include path anymore, so we need to add it here explicitly to make these scripts continue to work.	2018-02-24 14:54:16 -05:00
Bruce Momjian	9d4649ca49	Update copyright for 2018 Backpatch-through: certain files through 9.3	2018-01-02 23:30:12 -05:00
Heikki Linnakangas	aeed17d000	Use radix tree for character encoding conversions. Replace the mapping tables used to convert between UTF-8 and other character encodings with new radix tree-based maps. Looking up an entry in a radix tree is much faster than a binary search in the old maps. As a bonus, the radix tree representation is also more compact, making the binaries slightly smaller. The "combined" maps work the same as before, with binary search. They are much smaller than the main tables, so it doesn't matter so much. However, the "combined" maps are now stored in the same .map files as the main tables. This seems more clear, since they're always used together, and generated from the same source files. Patch by Kyotaro Horiguchi, with lot of hacking by me at various stages. Reviewed by Michael Paquier and Daniel Gustafsson. Discussion: https://www.postgresql.org/message-id/20170306.171609.204324917.horiguchi.kyotaro%40lab.ntt.co.jp	2017-03-13 20:46:39 +02:00
Heikki Linnakangas	84892692fd	Remove obsolete references to JIS0201.TXT JIS0208.TXT. We don't use those files anymore, since commit `1de9cc0dcc`.	2017-03-13 19:06:56 +02:00
Heikki Linnakangas	53dd2da257	Add KOI8-U map files to Makefile. These were left out by mistake back when support for KOI8-U encoding was added. Extracted from Kyotaro Horiguchi's larger patch.	2017-02-02 14:12:35 +02:00
Bruce Momjian	1d25779284	Update copyright via script for 2017	2017-01-03 13:48:53 -05:00
Heikki Linnakangas	1de9cc0dcc	Rewrite the perl scripts to produce our Unicode conversion tables. Generate EUC_CN mappings from gb-18030-2000.xml, because GB2312.TXT is no longer available. Get UHC from windows-949-2000.xml, it's more up-to-date. Plus tons more small changes. With these changes, the perl scripts faithfully produce the *.map files we have in the repository, from the external source files. In the passing, fix the Makefile to also download CP932.TXT and CP950.TXT. Based on patches by Kyotaro Horiguchi, reviewed by Daniel Gustafsson. Discussion: https://postgr.es/m/08e7892a-d55c-eefe-76e6-7910bc8dd1f3@iki.fi	2016-11-30 14:54:52 +02:00
Peter Eisentraut	3a47c704fb	Add make rules to download raw Unicode mapping files This serves as implicit documentation and is handy if someone wants to tweak things. The rules are not part of a normal build, like this entire directory.	2016-11-01 11:54:58 -04:00
Peter Eisentraut	1fa2a6b1d4	Add prerequisite for KOI8-U.TXT This was missed when the encoding was added.	2016-03-03 20:44:47 -05:00
Peter Eisentraut	b497abc602	Make some adjustments in variable assignments These variables aren't really used for anything interesting, but it seems the existing grouping was somewhat nonsensical.	2016-03-03 20:44:47 -05:00
Peter Eisentraut	7a4a813c99	Add missing rules related to EUC_JIS_2004 and SHIFT_JIS_2004 encodings This was apparently forgotten in commit `75c6519ff6`.	2016-03-03 20:44:47 -05:00
Peter Eisentraut	bd6cf3f237	Add Unicode map generation scripts as rule prerequisites That way, the rules will trigger when the scripts change.	2016-02-29 21:19:28 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Tom Lane	8d3e0906df	Extend GB18030 encoding conversion to cover full Unicode range. Our previous code for GB18030 <-> UTF8 conversion only covered Unicode code points up to U+FFFF, but the actual spec defines conversions for all code points up to U+10FFFF. That would be rather impractical as a lookup table, but fortunately there is a simple algorithmic conversion between the additional code points and the equivalent GB18030 byte patterns. Make use of the just-added callback facility in LocalToUtf/UtfToLocal to perform the additional conversions. Having created the infrastructure to do that, we can use the same code to map certain linearly-related subranges of the Unicode space below U+FFFF, allowing removal of the corresponding lookup table entries. This more than halves the lookup table size, which is a substantial savings; utf8_and_gb18030.so drops from nearly a megabyte to about half that. In support of doing that, replace ISO10646-GB18030.TXT with the data file gb-18030-2000.xml (retrieved from http://source.icu-project.org/repos/icu/data/trunk/charset/data/xml/ ) in which these subranges have been deleted from the simple lookup entries. Per bug #12845 from Arjen Nienhuis. The conversion code added here is based on his proposed patch, though I whacked it around rather heavily.	2015-05-15 15:02:13 -04:00
Bruce Momjian	4baaf863ec	Update copyright for 2015 Backpatch certain files through 9.0	2015-01-06 11:43:47 -05:00
Bruce Momjian	7e04792a1c	Update copyright for 2014 Update all files in head, and files COPYRIGHT and legal.sgml in all back branches.	2014-01-07 16:05:30 -05:00
Bruce Momjian	bd61a623ac	Update copyrights for 2013 Fully update git head, and update back branches in ./COPYRIGHT and legal.sgml files.	2013-01-01 17:15:01 -05:00
Bruce Momjian	e126958c2e	Update copyright notices for year 2012.	2012-01-01 18:01:58 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Magnus Hagander	9f2e211386	Remove cvs keywords from all files.	2010-09-20 22:08:53 +02:00
Bruce Momjian	0239800893	Update copyright for the year 2010.	2010-01-02 16:58:17 +00:00
Heikki Linnakangas	afcde99b1b	Fix case of the just resurrected UCS_to_BIG5.pl script, and update Makefile to use it.	2009-03-18 16:26:18 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Bruce Momjian	9098ab9e32	Update copyrights in source tree to 2008.	2008-01-01 19:46:01 +00:00
Bruce Momjian	29dccf5fe0	Update CVS HEAD for 2007 copyright. Back branches are typically not back-stamped for this.	2007-01-05 22:20:05 +00:00
Bruce Momjian	f2f5b05655	Update copyright for 2006. Update scripts.	2006-03-05 15:59:11 +00:00
Peter Eisentraut	268c1b6077	The Makefile was invoking perl scripts as ./script.pl. This fails when the script is not executable as UCS_to_most.pl is in CVS. It also won't pick up any custom setting of the perl version/location to use. This patch calls perl scripts like $(PERL) $(srcdir)/script.pl. Kris Jurka	2006-02-24 13:25:44 +00:00
Peter Eisentraut	1b658473ea	Add support for Windows codepages 1253, 1254, 1255, and 1257 and clean up a bunch of the support utilities. In src/backend/utils/mb/Unicode remove nearly duplicate copies of the UCS_to_XXX perl script and replace with one version to handle all generic files. Update the Makefile so that it knows about all the map files. This produces a slight difference in some of the map files, using a uniform naming convention and not mapping the null character. In src/backend/utils/mb/conversion_procs create a master utf8<->win codepage function like the ISO 8859 versions instead of having a separate handler for each conversion. There is an externally visible change in the name of the win1258 to utf8 conversion. According to the documentation notes, it was named incorrectly and this changes it to a standard name. Running the Unicode mapping perl scripts has shown some additional mapping changes in koi8r and iso8859-7.	2006-02-18 16:15:23 +00:00
Bruce Momjian	e3d7de6b99	Rename canonical encodings, per Peter: UNICODE => UTF8 ALT => WIN866 WIN => WIN1251 TCVN => WIN1258 The old codes continue to work.	2005-03-07 04:30:55 +00:00
Tom Lane	7e1c8ef4fc	Some more missed copyright notices. Many of these look like they should have been caught by the src/tools/copyright script ... why weren't they?	2005-01-01 20:44:34 +00:00
Bruce Momjian	da9a8649d8	Update copyright to 2004.	2004-08-29 04:13:13 +00:00
PostgreSQL Daemon	969685ad44	$Header: -> $PostgreSQL Changes ...	2003-11-29 19:52:15 +00:00
Tom Lane	2f9c859ea1	Fix some copyright notices that weren't updated. Improve copyright tool so it won't miss 'em again.	2003-08-04 23:59:41 +00:00
Tatsuo Ishii	15378a53f8	Add support for GB18030	2002-06-14 03:30:56 +00:00
Tatsuo Ishii	227767112c	Commit Karel's patch. ------------------------------------------------------------------- Subject: Re: [PATCHES] encoding names From: Karel Zak <zakkr@zf.jcu.cz> To: Peter Eisentraut <peter_e@gmx.net> Cc: pgsql-patches <pgsql-patches@postgresql.org> Date: Fri, 31 Aug 2001 17:24:38 +0200 On Thu, Aug 30, 2001 at 01:30:40AM +0200, Peter Eisentraut wrote: > > - convert encoding 'name' to 'id' > > I thought we decided not to add functions returning "new" names until we > know exactly what the new names should be, and pending schema Ok, the patch not to add functions. > better > > ...(): encoding name too long Fixed. I found new bug in command/variable.c in parse_client_encoding(), nobody probably never see this error: if (pg_set_client_encoding(encoding)) { elog(ERROR, "Conversion between %s and %s is not supported", value, GetDatabaseEncodingName()); } because pg_set_client_encoding() returns -1 for error and 0 as true. It's fixed too. IMHO it can be apply. Karel PS: * following files are renamed: src/utils/mb/Unicode/KOI8_to_utf8.map --> src/utils/mb/Unicode/koi8r_to_utf8.map src/utils/mb/Unicode/WIN_to_utf8.map --> src/utils/mb/Unicode/win1251_to_utf8.map src/utils/mb/Unicode/utf8_to_KOI8.map --> src/utils/mb/Unicode/utf8_to_koi8r.map src/utils/mb/Unicode/utf8_to_WIN.map --> src/utils/mb/Unicode/utf8_to_win1251.map * new file: src/utils/mb/encname.c * removed file: src/utils/mb/common.c -- Karel Zak <zakkr@zf.jcu.cz> http://home.zf.jcu.cz/~zakkr/ C, PostgreSQL, PHP, WWW, http://docs.linux.cz, http://mape.jcu.cz	2001-09-06 04:57:30 +00:00
Tatsuo Ishii	c527366b60	Add missing Unicode support for Cyrillic encodings. Patches contributed by Victor Wagner.	2001-04-29 07:27:38 +00:00
Tatsuo Ishii	1acf6f9c8e	Add support for code conversion between Unicode and other encodings. Supported encodings are: EUC_JP, EUC_CN, EUC_KR, EUC_TW, Shift JIS, Big5, ISO8859-[1-5]. TODO: testings! and documentations...	2000-10-30 10:41:05 +00:00

39 Commits