postgresql/doc/src/sgml/jit.sgml

286 lines
11 KiB
Plaintext

<!-- doc/src/sgml/jit.sgml -->
<chapter id="jit">
<title>Just-in-Time Compilation (<acronym>JIT</acronym>)</title>
<indexterm zone="jit">
<primary><acronym>JIT</acronym></primary>
</indexterm>
<indexterm>
<primary>Just-In-Time compilation</primary>
<see><acronym>JIT</acronym></see>
</indexterm>
<para>
This chapter explains what just-in-time compilation is, and how it can be
configured in <productname>PostgreSQL</productname>.
</para>
<sect1 id="jit-reason">
<title>What Is <acronym>JIT</acronym> compilation?</title>
<para>
Just-in-Time (<acronym>JIT</acronym>) compilation is the process of turning
some form of interpreted program evaluation into a native program, and
doing so at run time.
For example, instead of using general-purpose code that can evaluate
arbitrary SQL expressions to evaluate a particular SQL predicate
like <literal>WHERE a.col = 3</literal>, it is possible to generate a
function that is specific to that expression and can be natively executed
by the CPU, yielding a speedup.
</para>
<para>
<productname>PostgreSQL</productname> has builtin support to perform
<acronym>JIT</acronym> compilation using <ulink
url="https://llvm.org/"><productname>LLVM</productname></ulink> when
<productname>PostgreSQL</productname> is built with
<link linkend="configure-with-llvm"><literal>--with-llvm</literal></link>.
</para>
<para>
See <filename>src/backend/jit/README</filename> for further details.
</para>
<sect2 id="jit-accelerated-operations">
<title><acronym>JIT</acronym> Accelerated Operations</title>
<para>
Currently <productname>PostgreSQL</productname>'s <acronym>JIT</acronym>
implementation has support for accelerating expression evaluation and
tuple deforming. Several other operations could be accelerated in the
future.
</para>
<para>
Expression evaluation is used to evaluate <literal>WHERE</literal>
clauses, target lists, aggregates and projections. It can be accelerated
by generating code specific to each case.
</para>
<para>
Tuple deforming is the process of transforming an on-disk tuple (see <xref
linkend="storage-tuple-layout"/>) into its in-memory representation.
It can be accelerated by creating a function specific to the table layout
and the number of columns to be extracted.
</para>
</sect2>
<sect2 id="jit-inlining">
<title>Inlining</title>
<para>
<productname>PostgreSQL</productname> is very extensible and allows new
data types, functions, operators and other database objects to be defined;
see <xref linkend="extend"/>. In fact the built-in objects are implemented
using nearly the same mechanisms. This extensibility implies some
overhead, for example due to function calls (see <xref linkend="xfunc"/>).
To reduce that overhead, <acronym>JIT</acronym> compilation can inline the
bodies of small functions into the expressions using them. That allows a
significant percentage of the overhead to be optimized away.
</para>
</sect2>
<sect2 id="jit-optimization">
<title>Optimization</title>
<para>
<productname>LLVM</productname> has support for optimizing generated
code. Some of the optimizations are cheap enough to be performed whenever
<acronym>JIT</acronym> is used, while others are only beneficial for
longer-running queries.
See <ulink url="https://llvm.org/docs/Passes.html#transform-passes"/> for
more details about optimizations.
</para>
</sect2>
</sect1>
<sect1 id="jit-decision">
<title>When to <acronym>JIT</acronym>?</title>
<para>
<acronym>JIT</acronym> compilation is beneficial primarily for long-running
CPU-bound queries. Frequently these will be analytical queries. For short
queries the added overhead of performing <acronym>JIT</acronym> compilation
will often be higher than the time it can save.
</para>
<para>
To determine whether <acronym>JIT</acronym> compilation should be used,
the total estimated cost of a query (see
<xref linkend="planner-stats-details"/> and
<xref linkend="runtime-config-query-constants"/>) is used.
The estimated cost of the query will be compared with the setting of <xref
linkend="guc-jit-above-cost"/>. If the cost is higher,
<acronym>JIT</acronym> compilation will be performed.
Two further decisions are then needed.
Firstly, if the estimated cost is more
than the setting of <xref linkend="guc-jit-inline-above-cost"/>, short
functions and operators used in the query will be inlined.
Secondly, if the estimated cost is more than the setting of <xref
linkend="guc-jit-optimize-above-cost"/>, expensive optimizations are
applied to improve the generated code.
Each of these options increases the <acronym>JIT</acronym> compilation
overhead, but can reduce query execution time considerably.
</para>
<para>
These cost-based decisions will be made at plan time, not execution
time. This means that when prepared statements are in use, and a generic
plan is used (see <xref linkend="sql-prepare"/>), the values of the
configuration parameters in effect at prepare time control the decisions,
not the settings at execution time.
</para>
<note>
<para>
If <xref linkend="guc-jit"/> is set to <literal>off</literal>, or if no
<acronym>JIT</acronym> implementation is available (for example because
the server was compiled without <literal>--with-llvm</literal>),
<acronym>JIT</acronym> will not be performed, even if it would be
beneficial based on the above criteria. Setting <xref linkend="guc-jit"/>
to <literal>off</literal> has effects at both plan and execution time.
</para>
</note>
<para>
<xref linkend="sql-explain"/> can be used to see whether
<acronym>JIT</acronym> is used or not. As an example, here is a query that
is not using <acronym>JIT</acronym>:
<screen>
=# EXPLAIN ANALYZE SELECT SUM(relpages) FROM pg_class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=16.27..16.29 rows=1 width=8) (actual time=0.303..0.303 rows=1 loops=1)
-> Seq Scan on pg_class (cost=0.00..15.42 rows=342 width=4) (actual time=0.017..0.111 rows=356 loops=1)
Planning Time: 0.116 ms
Execution Time: 0.365 ms
(4 rows)
</screen>
Given the cost of the plan, it is entirely reasonable that no
<acronym>JIT</acronym> was used; the cost of <acronym>JIT</acronym> would
have been bigger than the potential savings. Adjusting the cost limits
will lead to <acronym>JIT</acronym> use:
<screen>
=# SET jit_above_cost = 10;
SET
=# EXPLAIN ANALYZE SELECT SUM(relpages) FROM pg_class;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Aggregate (cost=16.27..16.29 rows=1 width=8) (actual time=6.049..6.049 rows=1 loops=1)
-> Seq Scan on pg_class (cost=0.00..15.42 rows=342 width=4) (actual time=0.019..0.052 rows=356 loops=1)
Planning Time: 0.133 ms
JIT:
Functions: 3
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 1.259 ms, Inlining 0.000 ms, Optimization 0.797 ms, Emission 5.048 ms, Total 7.104 ms
Execution Time: 7.416 ms
</screen>
As visible here, <acronym>JIT</acronym> was used, but inlining and
expensive optimization were not. If <xref
linkend="guc-jit-inline-above-cost"/> or <xref
linkend="guc-jit-optimize-above-cost"/> were also lowered,
that would change.
</para>
</sect1>
<sect1 id="jit-configuration" xreflabel="JIT Configuration">
<title>Configuration</title>
<para>
The configuration variable
<xref linkend="guc-jit"/> determines whether <acronym>JIT</acronym>
compilation is enabled or disabled.
If it is enabled, the configuration variables
<xref linkend="guc-jit-above-cost"/>, <xref
linkend="guc-jit-inline-above-cost"/>, and <xref
linkend="guc-jit-optimize-above-cost"/> determine
whether <acronym>JIT</acronym> compilation is performed for a query,
and how much effort is spent doing so.
</para>
<para>
<xref linkend="guc-jit-provider"/> determines which <acronym>JIT</acronym>
implementation is used. It is rarely required to be changed. See <xref
linkend="jit-pluggable"/>.
</para>
<para>
For development and debugging purposes a few additional configuration
parameters exist, as described in
<xref linkend="runtime-config-developer"/>.
</para>
</sect1>
<sect1 id="jit-extensibility">
<title>Extensibility</title>
<sect2 id="jit-extensibility-bitcode">
<title>Inlining Support for Extensions</title>
<para>
<productname>PostgreSQL</productname>'s <acronym>JIT</acronym>
implementation can inline the bodies of functions
of types <literal>C</literal> and <literal>internal</literal>, as well as
operators based on such functions. To do so for functions in extensions,
the definitions of those functions need to be made available.
When using <link linkend="extend-pgxs">PGXS</link> to build an extension
against a server that has been compiled with LLVM JIT support, the
relevant files will be built and installed automatically.
</para>
<para>
The relevant files have to be installed into
<filename>$pkglibdir/bitcode/$extension/</filename> and a summary of them
into <filename>$pkglibdir/bitcode/$extension.index.bc</filename>, where
<literal>$pkglibdir</literal> is the directory returned by
<literal>pg_config --pkglibdir</literal> and <literal>$extension</literal>
is the base name of the extension's shared library.
<note>
<para>
For functions built into <productname>PostgreSQL</productname> itself,
the bitcode is installed into
<literal>$pkglibdir/bitcode/postgres</literal>.
</para>
</note>
</para>
</sect2>
<sect2 id="jit-pluggable">
<title>Pluggable <acronym>JIT</acronym> Providers</title>
<para>
<productname>PostgreSQL</productname> provides a <acronym>JIT</acronym>
implementation based on <productname>LLVM</productname>. The interface to
the <acronym>JIT</acronym> provider is pluggable and the provider can be
changed without recompiling (although currently, the build process only
provides inlining support data for <productname>LLVM</productname>).
The active provider is chosen via the setting
<xref linkend="guc-jit-provider"/>.
</para>
<sect3>
<title><acronym>JIT</acronym> Provider Interface</title>
<para>
A <acronym>JIT</acronym> provider is loaded by dynamically loading the
named shared library. The normal library search path is used to locate
the library. To provide the required <acronym>JIT</acronym> provider
callbacks and to indicate that the library is actually a
<acronym>JIT</acronym> provider, it needs to provide a C function named
<function>_PG_jit_provider_init</function>. This function is passed a
struct that needs to be filled with the callback function pointers for
individual actions:
<programlisting>
struct JitProviderCallbacks
{
JitProviderResetAfterErrorCB reset_after_error;
JitProviderReleaseContextCB release_context;
JitProviderCompileExprCB compile_expr;
};
extern void _PG_jit_provider_init(JitProviderCallbacks *cb);
</programlisting>
</para>
</sect3>
</sect2>
</sect1>
</chapter>