postgresql/src/backend/parser
Tom Lane 7f380c59f8 Reduce size of backend scanner's tables.
Previously, the core scanner's yy_transition[] array had 37045 elements.
Since that number is larger than INT16_MAX, Flex generated the array to
contain 32-bit integers.  By reimplementing some of the bulkier scanner
rules, this patch reduces the array to 20495 elements.  The much smaller
total length, combined with the consequent use of 16-bit integers for
the array elements reduces the binary size by over 200kB.  This was
accomplished in two ways:

1. Consolidate handling of quote continuations into a new start condition,
rather than duplicating that logic for five different string types.

2. Treat Unicode strings and identifiers followed by a UESCAPE sequence
as three separate tokens, rather than one.  The logic to de-escape
Unicode strings is moved to the filter code in parser.c, which already
had the ability to provide special processing for token sequences.
While we could have implemented the conversion in the grammar, that
approach was rejected for performance and maintainability reasons.

Performance in microbenchmarks of raw parsing seems equal or slightly
faster in most cases, and it's reasonable to expect that in real-world
usage (with more competition for the CPU cache) there will be a larger
win.  The exception is UESCAPE sequences; lexing those is about 10%
slower, primarily because the scanner now has to be called three times
rather than one.  This seems acceptable since that feature is very
rarely used.

The psql and epcg lexers are likewise modified, primarily because we
want to keep them all in sync.  Since those lexers don't use the
space-hogging -CF option, the space savings is much less, but it's
still good for perhaps 10kB apiece.

While at it, merge the ecpg lexer's handling of C-style comments used
in SQL and in C.  Those have different rules regarding nested comments,
but since we already have the ability to keep track of the previous
start condition, we can use that to handle both cases within a single
start condition.  This matches the core scanner more closely.

John Naylor

Discussion: https://postgr.es/m/CACPNZCvaoa3EgVWm5yZhcSTX6RAtaLgniCPcBVOCwm8h3xpWkw@mail.gmail.com
2020-01-13 15:04:31 -05:00
..
.gitignore Convert cvsignore to gitignore, and add .gitignore for build targets. 2010-09-22 12:57:04 +02:00
Makefile Split all OBJS style lines in makefiles into one-line-per-entry style. 2019-11-05 14:41:07 -08:00
README Move keywords.c/kwlookup.c into src/common/. 2016-03-23 20:22:08 -04:00
analyze.c Reconsider the representation of join alias Vars. 2020-01-09 11:56:59 -05:00
check_keywords.pl Update copyrights for 2020 2020-01-01 12:21:45 -05:00
gram.y Reduce size of backend scanner's tables. 2020-01-13 15:04:31 -05:00
parse_agg.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_clause.c Reconsider the representation of join alias Vars. 2020-01-09 11:56:59 -05:00
parse_coerce.c Make parser rely more heavily on the ParseNamespaceItem data structure. 2020-01-02 11:29:01 -05:00
parse_collate.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_cte.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_enr.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_expr.c Make parser rely more heavily on the ParseNamespaceItem data structure. 2020-01-02 11:29:01 -05:00
parse_func.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_node.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_oper.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_param.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_relation.c Reconsider the representation of join alias Vars. 2020-01-09 11:56:59 -05:00
parse_target.c Reconsider the representation of join alias Vars. 2020-01-09 11:56:59 -05:00
parse_type.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00
parse_utilcmd.c Make parser rely more heavily on the ParseNamespaceItem data structure. 2020-01-02 11:29:01 -05:00
parser.c Reduce size of backend scanner's tables. 2020-01-13 15:04:31 -05:00
scan.l Reduce size of backend scanner's tables. 2020-01-13 15:04:31 -05:00
scansup.c Update copyrights for 2020 2020-01-01 12:21:45 -05:00

README

src/backend/parser/README

Parser
======

This directory does more than tokenize and parse SQL queries.  It also
creates Query structures for the various complex queries that are passed
to the optimizer and then executor.

parser.c	things start here
scan.l		break query into tokens
scansup.c	handle escapes in input strings
gram.y		parse the tokens and produce a "raw" parse tree
analyze.c	top level of parse analysis for optimizable queries
parse_agg.c	handle aggregates, like SUM(col1),  AVG(col2), ...
parse_clause.c	handle clauses like WHERE, ORDER BY, GROUP BY, ...
parse_coerce.c	handle coercing expressions to different data types
parse_collate.c	assign collation information in completed expressions
parse_cte.c	handle Common Table Expressions (WITH clauses)
parse_expr.c	handle expressions like col, col + 3, x = 3 or x = 4
parse_func.c	handle functions, table.column and column identifiers
parse_node.c	create nodes for various structures
parse_oper.c	handle operators in expressions
parse_param.c	handle Params (for the cases used in the core backend)
parse_relation.c support routines for tables and column handling
parse_target.c	handle the result list of the query
parse_type.c	support routines for data type handling
parse_utilcmd.c	parse analysis for utility commands (done at execution time)

See also src/common/keywords.c, which contains the table of standard
keywords and the keyword lookup function.  We separated that out because
various frontend code wants to use it too.