postgresql

History

Tom Lane 7f380c59f8 Reduce size of backend scanner's tables. Previously, the core scanner's yy_transition[] array had 37045 elements. Since that number is larger than INT16_MAX, Flex generated the array to contain 32-bit integers. By reimplementing some of the bulkier scanner rules, this patch reduces the array to 20495 elements. The much smaller total length, combined with the consequent use of 16-bit integers for the array elements reduces the binary size by over 200kB. This was accomplished in two ways: 1. Consolidate handling of quote continuations into a new start condition, rather than duplicating that logic for five different string types. 2. Treat Unicode strings and identifiers followed by a UESCAPE sequence as three separate tokens, rather than one. The logic to de-escape Unicode strings is moved to the filter code in parser.c, which already had the ability to provide special processing for token sequences. While we could have implemented the conversion in the grammar, that approach was rejected for performance and maintainability reasons. Performance in microbenchmarks of raw parsing seems equal or slightly faster in most cases, and it's reasonable to expect that in real-world usage (with more competition for the CPU cache) there will be a larger win. The exception is UESCAPE sequences; lexing those is about 10% slower, primarily because the scanner now has to be called three times rather than one. This seems acceptable since that feature is very rarely used. The psql and epcg lexers are likewise modified, primarily because we want to keep them all in sync. Since those lexers don't use the space-hogging -CF option, the space savings is much less, but it's still good for perhaps 10kB apiece. While at it, merge the ecpg lexer's handling of C-style comments used in SQL and in C. Those have different rules regarding nested comments, but since we already have the ability to keep track of the previous start condition, we can use that to handle both cases within a single start condition. This matches the core scanner more closely. John Naylor Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com		2020-01-13 15:04:31 -05:00
..
po	Translation updates	2019-06-17 15:30:20 +02:00
.gitignore	Replace the data structure used for keyword lookup.	2019-01-06 17:02:57 -05:00
Makefile	Update copyrights for 2020	2020-01-01 12:21:45 -05:00
README.parser	Move parse2.pl to parse.pl	2011-06-14 07:34:00 +03:00
c_keywords.c	Make the order of the header file includes consistent in non-backend modules.	2019-10-25 07:41:52 +05:30
c_kwlist.h	Update copyrights for 2020	2020-01-01 12:21:45 -05:00
check_rules.pl	Update copyrights for 2020	2020-01-01 12:21:45 -05:00
descriptor.c	Rename ecpg's various "extern.h" files to have distinct names.	2018-12-01 16:34:00 -05:00
ecpg.addons	Revert "Add DECLARE STATEMENT support to ECPG."	2019-09-20 12:47:37 -04:00
ecpg.c	Update copyrights for 2020	2020-01-01 12:21:45 -05:00
ecpg.header	Revert "Add DECLARE STATEMENT support to ECPG."	2019-09-20 12:47:37 -04:00
ecpg.tokens	Reduce size of backend scanner's tables.	2020-01-13 15:04:31 -05:00
ecpg.trailer	Reduce size of backend scanner's tables.	2020-01-13 15:04:31 -05:00
ecpg.type	Reduce size of backend scanner's tables.	2020-01-13 15:04:31 -05:00
ecpg_keywords.c	Make the order of the header file includes consistent in non-backend modules.	2019-10-25 07:41:52 +05:30
ecpg_kwlist.h	Update copyrights for 2020	2020-01-01 12:21:45 -05:00
keywords.c	Update copyrights for 2020	2020-01-01 12:21:45 -05:00
nls.mk	Translation updates	2018-05-21 12:29:52 -04:00
output.c	Revert "Add DECLARE STATEMENT support to ECPG."	2019-09-20 12:47:37 -04:00
parse.pl	Reduce size of backend scanner's tables.	2020-01-13 15:04:31 -05:00
parser.c	Reduce size of backend scanner's tables.	2020-01-13 15:04:31 -05:00
pgc.l	Reduce size of backend scanner's tables.	2020-01-13 15:04:31 -05:00
preproc_extern.h	Make the order of the header file includes consistent.	2019-11-25 08:08:57 +05:30
type.c	Fix double-word typos	2019-06-13 10:03:56 -04:00
type.h	Revert "Add DECLARE STATEMENT support to ECPG."	2019-09-20 12:47:37 -04:00
variable.c	Add bytea datatype to ECPG.	2019-02-18 10:20:31 +01:00

README.parser

ECPG modifies and extends the core grammar in a way that
1) every token in ECPG is <str> type. New tokens are
   defined in ecpg.tokens, types are defined in ecpg.type
2) most tokens from the core grammar are simply converted
   to literals concatenated together to form the SQL string
   passed to the server, this is done by parse.pl.
3) some rules need side-effects, actions are either added
   or completely overridden (compared to the basic token
   concatenation) for them, these are defined in ecpg.addons,
   the rules for ecpg.addons are explained below.
4) new grammar rules are needed for ECPG metacommands.
   These are in ecpg.trailer.
5) ecpg.header contains common functions, etc. used by
   actions for grammar rules.

In "ecpg.addons", every modified rule follows this pattern:
       ECPG: dumpedtokens postfix
where "dumpedtokens" is simply tokens from core gram.y's
rules concatenated together. e.g. if gram.y has this:
       ruleA: tokenA tokenB tokenC {...}
then "dumpedtokens" is "ruleAtokenAtokenBtokenC".
"postfix" above can be:
a) "block" - the automatic rule created by parse.pl is completely
    overridden, the code block has to be written completely as
    it were in a plain bison grammar
b) "rule" - the automatic rule is extended on, so new syntaxes
    are accepted for "ruleA". E.g.:
      ECPG: ruleAtokenAtokenBtokenC rule
          | tokenD tokenE { action_code; }
          ...
    It will be substituted with:
      ruleA: <original syntax forms and actions up to and including
                    "tokenA tokenB tokenC">
             | tokenD tokenE { action_code; }
             ...
c) "addon" - the automatic action for the rule (SQL syntax constructed
    from the tokens concatenated together) is prepended with a new
    action code part. This code part is written as is's already inside
    the { ... }

Multiple "addon" or "block" lines may appear together with the
new code block if the code block is common for those rules.