postgresql/src/include/miscadmin.h

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

520 lines
19 KiB
C
Raw Permalink Normal View History

1996-10-31 08:10:14 +01:00
/*-------------------------------------------------------------------------
*
* miscadmin.h
* This file contains general postgres administration and initialization
1996-10-31 08:10:14 +01:00
* stuff that used to be spread out between the following files:
* globals.h global variables
* pdir.h directory path crud
* pinit.h postgres initialization
* pmod.h processing modes
* Over time, this has also become the preferred place for widely known
* resource-limitation stuff, such as work_mem and check_stack_depth().
1996-10-31 08:10:14 +01:00
*
* Portions Copyright (c) 1996-2024, PostgreSQL Global Development Group
* Portions Copyright (c) 1994, Regents of the University of California
1996-10-31 08:10:14 +01:00
*
2010-09-20 22:08:53 +02:00
* src/include/miscadmin.h
1996-10-31 08:10:14 +01:00
*
* NOTES
* some of the information in this file should be moved to other files.
1996-10-31 08:10:14 +01:00
*
*-------------------------------------------------------------------------
*/
#ifndef MISCADMIN_H
#define MISCADMIN_H
#include <signal.h>
#include "datatype/timestamp.h" /* for TimestampTz */
#include "pgtime.h" /* for pg_time_t */
#define InvalidPid (-1)
/*****************************************************************************
* System interrupt and critical section handling
*
* There are two types of interrupts that a running backend needs to accept
* without messing up its state: QueryCancel (SIGINT) and ProcDie (SIGTERM).
* In both cases, we need to be able to clean up the current transaction
* gracefully, so we can't respond to the interrupt instantaneously ---
* there's no guarantee that internal data structures would be self-consistent
* if the code is interrupted at an arbitrary instant. Instead, the signal
* handlers set flags that are checked periodically during execution.
*
* The CHECK_FOR_INTERRUPTS() macro is called at strategically located spots
* where it is normally safe to accept a cancel or die interrupt. In some
* cases, we invoke CHECK_FOR_INTERRUPTS() inside low-level subroutines that
* might sometimes be called in contexts that do *not* want to allow a cancel
* or die interrupt. The HOLD_INTERRUPTS() and RESUME_INTERRUPTS() macros
* allow code to ensure that no cancel or die interrupt will be accepted,
* even if CHECK_FOR_INTERRUPTS() gets called in a subroutine. The interrupt
* will be held off until CHECK_FOR_INTERRUPTS() is done outside any
* HOLD_INTERRUPTS() ... RESUME_INTERRUPTS() section.
*
Be more careful to not lose sync in the FE/BE protocol. If any error occurred while we were in the middle of reading a protocol message from the client, we could lose sync, and incorrectly try to interpret a part of another message as a new protocol message. That will usually lead to an "invalid frontend message" error that terminates the connection. However, this is a security issue because an attacker might be able to deliberately cause an error, inject a Query message in what's supposed to be just user data, and have the server execute it. We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other operations that could ereport(ERROR) in the middle of processing a message, but a query cancel interrupt or statement timeout could nevertheless cause it to happen. Also, the V2 fastpath and COPY handling were not so careful. It's very difficult to recover in the V2 COPY protocol, so we will just terminate the connection on error. In practice, that's what happened previously anyway, as we lost protocol sync. To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set whenever we're in the middle of reading a message. When it's set, we cannot safely ERROR out and continue running, because we might've read only part of a message. PqCommReadingMsg acts somewhat similarly to critical sections in that if an error occurs while it's set, the error handler will force the connection to be terminated, as if the error was FATAL. It's not implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted to PANIC in critical sections, because we want to be able to use PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes advantage of that to prevent an OOM error from terminating the connection. To prevent unnecessary connection terminations, add a holdoff mechanism similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel interrupts, but still allow die interrupts. The rules on which interrupts are processed when are now a bit more complicated, so refactor ProcessInterrupts() and the calls to it in signal handlers so that the signal handlers always call it if ImmediateInterruptOK is set, and ProcessInterrupts() can decide to not do anything if the other conditions are not met. Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund. Backpatch to all supported versions. Security: CVE-2015-0244
2015-02-02 16:08:45 +01:00
* There is also a mechanism to prevent query cancel interrupts, while still
* allowing die interrupts: HOLD_CANCEL_INTERRUPTS() and
* RESUME_CANCEL_INTERRUPTS().
*
* Note that ProcessInterrupts() has also acquired a number of tasks that
* do not necessarily cause a query-cancel-or-die response. Hence, it's
* possible that it will just clear InterruptPending and return.
*
* INTERRUPTS_PENDING_CONDITION() can be checked to see whether an
* interrupt needs to be serviced, without trying to do so immediately.
* Some callers are also interested in INTERRUPTS_CAN_BE_PROCESSED(),
* which tells whether ProcessInterrupts is sure to clear the interrupt.
*
* Special mechanisms are used to let an interrupt be accepted when we are
* waiting for a lock or when we are waiting for command input (but, of
* course, only if the interrupt holdoff counter is zero). See the
* related code for details.
*
* A lost connection is handled similarly, although the loss of connection
* does not raise a signal, but is detected when we fail to write to the
* socket. If there was a signal for a broken connection, we could make use of
* it by setting ClientConnectionLost in the signal handler.
*
* A related, but conceptually distinct, mechanism is the "critical section"
* mechanism. A critical section not only holds off cancel/die interrupts,
2003-07-27 19:10:07 +02:00
* but causes any ereport(ERROR) or ereport(FATAL) to become ereport(PANIC)
* --- that is, a system-wide reset is forced. Needless to say, only really
* *critical* code should be marked as a critical section! Currently, this
* mechanism is only used for XLOG-related code.
*
*****************************************************************************/
/* in globals.c */
/* these are marked volatile because they are set by signal handlers: */
extern PGDLLIMPORT volatile sig_atomic_t InterruptPending;
extern PGDLLIMPORT volatile sig_atomic_t QueryCancelPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcDiePending;
extern PGDLLIMPORT volatile sig_atomic_t IdleInTransactionSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t TransactionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t IdleSessionTimeoutPending;
extern PGDLLIMPORT volatile sig_atomic_t ProcSignalBarrierPending;
Add function to log the memory contexts of specified backend process. Commit 3e98c0bafb added pg_backend_memory_contexts view to display the memory contexts of the backend process. However its target process is limited to the backend that is accessing to the view. So this is not so convenient when investigating the local memory bloat of other backend process. To improve this situation, this commit adds pg_log_backend_memory_contexts() function that requests to log the memory contexts of the specified backend process. This information can be also collected by calling MemoryContextStats(TopMemoryContext) via a debugger. But this technique cannot be used in some environments because no debugger is available there. So, pg_log_backend_memory_contexts() allows us to see the memory contexts of specified backend more easily. Only superusers are allowed to request to log the memory contexts because allowing any users to issue this request at an unbounded rate would cause lots of log messages and which can lead to denial of service. On receipt of the request, at the next CHECK_FOR_INTERRUPTS(), the target backend logs its memory contexts at LOG_SERVER_ONLY level, so that these memory contexts will appear in the server log but not be sent to the client. It logs one message per memory context. Because if it buffers all memory contexts into StringInfo to log them as one message, which may require the buffer to be enlarged very much and lead to OOM error since there can be a large number of memory contexts in a backend. When a backend process is consuming huge memory, logging all its memory contexts might overrun available disk space. To prevent this, now this patch limits the number of child contexts to log per parent to 100. As with MemoryContextStats(), it supposes that practical cases where the log gets long will typically be huge numbers of siblings under the same parent context; while the additional debugging value from seeing details about individual siblings beyond 100 will not be large. There was another proposed patch to add the function to return the memory contexts of specified backend as the result sets, instead of logging them, in the discussion. However that patch is not included in this commit because it had several issues to address. Thanks to Tatsuhito Kasahara, Andres Freund, Tom Lane, Tomas Vondra, Michael Paquier, Kyotaro Horiguchi and Zhihong Yu for the discussion. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Kyotaro Horiguchi, Zhihong Yu, Fujii Masao Discussion: https://postgr.es/m/0271f440ac77f2a4180e0e56ebd944d1@oss.nttdata.com
2021-04-06 06:44:15 +02:00
extern PGDLLIMPORT volatile sig_atomic_t LogMemoryContextPending;
pgstat: store statistics in shared memory. Previously the statistics collector received statistics updates via UDP and shared statistics data by writing them out to temporary files regularly. These files can reach tens of megabytes and are written out up to twice a second. This has repeatedly prevented us from adding additional useful statistics. Now statistics are stored in shared memory. Statistics for variable-numbered objects are stored in a dshash hashtable (backed by dynamic shared memory). Fixed-numbered stats are stored in plain shared memory. The header for pgstat.c contains an overview of the architecture. The stats collector is not needed anymore, remove it. By utilizing the transactional statistics drop infrastructure introduced in a prior commit statistics entries cannot "leak" anymore. Previously leaked statistics were dropped by pgstat_vacuum_stat(), called from [auto-]vacuum. On systems with many small relations pgstat_vacuum_stat() could be quite expensive. Now that replicas drop statistics entries for dropped objects, it is not necessary anymore to reset stats when starting from a cleanly shut down replica. Subsequent commits will perform some further code cleanup, adapt docs and add tests. Bumps PGSTAT_FILE_FORMAT_ID. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Andres Freund <andres@anarazel.de> Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-By: Andres Freund <andres@anarazel.de> Reviewed-By: Thomas Munro <thomas.munro@gmail.com> Reviewed-By: Justin Pryzby <pryzby@telsasoft.com> Reviewed-By: "David G. Johnston" <david.g.johnston@gmail.com> Reviewed-By: Tomas Vondra <tomas.vondra@2ndquadrant.com> (in a much earlier version) Reviewed-By: Arthur Zakirov <a.zakirov@postgrespro.ru> (in a much earlier version) Reviewed-By: Antonin Houska <ah@cybertec.at> (in a much earlier version) Discussion: https://postgr.es/m/20220303021600.hs34ghqcw6zcokdh@alap3.anarazel.de Discussion: https://postgr.es/m/20220308205351.2xcn6k4x5yivcxyd@alap3.anarazel.de Discussion: https://postgr.es/m/20210319235115.y3wz7hpnnrshdyv6@alap3.anarazel.de
2022-04-07 06:29:46 +02:00
extern PGDLLIMPORT volatile sig_atomic_t IdleStatsUpdateTimeoutPending;
2001-03-22 05:01:46 +01:00
extern PGDLLIMPORT volatile sig_atomic_t CheckClientConnectionPending;
extern PGDLLIMPORT volatile sig_atomic_t ClientConnectionLost;
/* these are marked volatile because they are examined by signal handlers: */
extern PGDLLIMPORT volatile uint32 InterruptHoldoffCount;
Be more careful to not lose sync in the FE/BE protocol. If any error occurred while we were in the middle of reading a protocol message from the client, we could lose sync, and incorrectly try to interpret a part of another message as a new protocol message. That will usually lead to an "invalid frontend message" error that terminates the connection. However, this is a security issue because an attacker might be able to deliberately cause an error, inject a Query message in what's supposed to be just user data, and have the server execute it. We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other operations that could ereport(ERROR) in the middle of processing a message, but a query cancel interrupt or statement timeout could nevertheless cause it to happen. Also, the V2 fastpath and COPY handling were not so careful. It's very difficult to recover in the V2 COPY protocol, so we will just terminate the connection on error. In practice, that's what happened previously anyway, as we lost protocol sync. To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set whenever we're in the middle of reading a message. When it's set, we cannot safely ERROR out and continue running, because we might've read only part of a message. PqCommReadingMsg acts somewhat similarly to critical sections in that if an error occurs while it's set, the error handler will force the connection to be terminated, as if the error was FATAL. It's not implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted to PANIC in critical sections, because we want to be able to use PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes advantage of that to prevent an OOM error from terminating the connection. To prevent unnecessary connection terminations, add a holdoff mechanism similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel interrupts, but still allow die interrupts. The rules on which interrupts are processed when are now a bit more complicated, so refactor ProcessInterrupts() and the calls to it in signal handlers so that the signal handlers always call it if ImmediateInterruptOK is set, and ProcessInterrupts() can decide to not do anything if the other conditions are not met. Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund. Backpatch to all supported versions. Security: CVE-2015-0244
2015-02-02 16:08:45 +01:00
extern PGDLLIMPORT volatile uint32 QueryCancelHoldoffCount;
extern PGDLLIMPORT volatile uint32 CritSectionCount;
/* in tcop/postgres.c */
extern void ProcessInterrupts(void);
/* Test whether an interrupt is pending */
#ifndef WIN32
#define INTERRUPTS_PENDING_CONDITION() \
(unlikely(InterruptPending))
#else
#define INTERRUPTS_PENDING_CONDITION() \
(unlikely(UNBLOCKED_SIGNAL_QUEUE()) ? pgwin32_dispatch_queued_signals() : 0, \
unlikely(InterruptPending))
#endif
/* Service interrupt, if one is pending and it's safe to service it now */
#define CHECK_FOR_INTERRUPTS() \
do { \
if (INTERRUPTS_PENDING_CONDITION()) \
ProcessInterrupts(); \
} while(0)
/* Is ProcessInterrupts() guaranteed to clear InterruptPending? */
#define INTERRUPTS_CAN_BE_PROCESSED() \
(InterruptHoldoffCount == 0 && CritSectionCount == 0 && \
QueryCancelHoldoffCount == 0)
#define HOLD_INTERRUPTS() (InterruptHoldoffCount++)
#define RESUME_INTERRUPTS() \
do { \
Assert(InterruptHoldoffCount > 0); \
InterruptHoldoffCount--; \
} while(0)
Be more careful to not lose sync in the FE/BE protocol. If any error occurred while we were in the middle of reading a protocol message from the client, we could lose sync, and incorrectly try to interpret a part of another message as a new protocol message. That will usually lead to an "invalid frontend message" error that terminates the connection. However, this is a security issue because an attacker might be able to deliberately cause an error, inject a Query message in what's supposed to be just user data, and have the server execute it. We were quite careful to not have CHECK_FOR_INTERRUPTS() calls or other operations that could ereport(ERROR) in the middle of processing a message, but a query cancel interrupt or statement timeout could nevertheless cause it to happen. Also, the V2 fastpath and COPY handling were not so careful. It's very difficult to recover in the V2 COPY protocol, so we will just terminate the connection on error. In practice, that's what happened previously anyway, as we lost protocol sync. To fix, add a new variable in pqcomm.c, PqCommReadingMsg, that is set whenever we're in the middle of reading a message. When it's set, we cannot safely ERROR out and continue running, because we might've read only part of a message. PqCommReadingMsg acts somewhat similarly to critical sections in that if an error occurs while it's set, the error handler will force the connection to be terminated, as if the error was FATAL. It's not implemented by promoting ERROR to FATAL in elog.c, like ERROR is promoted to PANIC in critical sections, because we want to be able to use PG_TRY/CATCH to recover and regain protocol sync. pq_getmessage() takes advantage of that to prevent an OOM error from terminating the connection. To prevent unnecessary connection terminations, add a holdoff mechanism similar to HOLD/RESUME_INTERRUPTS() that can be used hold off query cancel interrupts, but still allow die interrupts. The rules on which interrupts are processed when are now a bit more complicated, so refactor ProcessInterrupts() and the calls to it in signal handlers so that the signal handlers always call it if ImmediateInterruptOK is set, and ProcessInterrupts() can decide to not do anything if the other conditions are not met. Reported by Emil Lenngren. Patch reviewed by Noah Misch and Andres Freund. Backpatch to all supported versions. Security: CVE-2015-0244
2015-02-02 16:08:45 +01:00
#define HOLD_CANCEL_INTERRUPTS() (QueryCancelHoldoffCount++)
#define RESUME_CANCEL_INTERRUPTS() \
do { \
Assert(QueryCancelHoldoffCount > 0); \
QueryCancelHoldoffCount--; \
} while(0)
#define START_CRIT_SECTION() (CritSectionCount++)
#define END_CRIT_SECTION() \
do { \
Assert(CritSectionCount > 0); \
CritSectionCount--; \
} while(0)
1996-10-31 08:10:14 +01:00
/*****************************************************************************
* globals.h -- *
*****************************************************************************/
/*
* from utils/init/globals.c
*/
extern PGDLLIMPORT pid_t PostmasterPid;
extern PGDLLIMPORT bool IsPostmasterEnvironment;
extern PGDLLIMPORT bool IsUnderPostmaster;
extern PGDLLIMPORT bool IsBinaryUpgrade;
extern PGDLLIMPORT bool ExitOnAnyError;
extern PGDLLIMPORT char *DataDir;
extern PGDLLIMPORT int data_directory_mode;
1996-10-31 08:10:14 +01:00
extern PGDLLIMPORT int NBuffers;
extern PGDLLIMPORT int MaxBackends;
extern PGDLLIMPORT int MaxConnections;
extern PGDLLIMPORT int max_worker_processes;
extern PGDLLIMPORT int max_parallel_workers;
Improve performance of subsystems on top of SLRU More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by bcdfa5f2e2f2 in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
2024-02-28 17:05:31 +01:00
extern PGDLLIMPORT int commit_timestamp_buffers;
extern PGDLLIMPORT int multixact_member_buffers;
extern PGDLLIMPORT int multixact_offset_buffers;
extern PGDLLIMPORT int notify_buffers;
extern PGDLLIMPORT int serializable_buffers;
extern PGDLLIMPORT int subtransaction_buffers;
extern PGDLLIMPORT int transaction_buffers;
extern PGDLLIMPORT int MyProcPid;
extern PGDLLIMPORT pg_time_t MyStartTime;
extern PGDLLIMPORT TimestampTz MyStartTimestamp;
extern PGDLLIMPORT struct Port *MyProcPort;
extern PGDLLIMPORT struct Latch *MyLatch;
extern PGDLLIMPORT int32 MyCancelKey;
extern PGDLLIMPORT int MyPMChildSlot;
extern PGDLLIMPORT char OutputFileName[];
extern PGDLLIMPORT char my_exec_path[];
extern PGDLLIMPORT char pkglib_path[];
2004-08-29 07:07:03 +02:00
#ifdef EXEC_BACKEND
extern PGDLLIMPORT char postgres_exec_path[];
#endif
1996-10-31 08:10:14 +01:00
extern PGDLLIMPORT Oid MyDatabaseId;
1996-10-31 08:10:14 +01:00
extern PGDLLIMPORT Oid MyDatabaseTableSpace;
extern PGDLLIMPORT bool MyDatabaseHasLoginEventTriggers;
/*
* Date/Time Configuration
*
* DateStyle defines the output formatting choice for date/time types:
* USE_POSTGRES_DATES specifies traditional Postgres format
* USE_ISO_DATES specifies ISO-compliant format
* USE_SQL_DATES specifies Oracle/Ingres-compliant format
* USE_GERMAN_DATES specifies German-style dd.mm/yyyy
*
* DateOrder defines the field order to be assumed when reading an
* ambiguous date (anything not in YYYY-MM-DD format, with a four-digit
* year field first, is taken to be ambiguous):
* DATEORDER_YMD specifies field order yy-mm-dd
* DATEORDER_DMY specifies field order dd-mm-yy ("European" convention)
* DATEORDER_MDY specifies field order mm-dd-yy ("US" convention)
*
* In the Postgres and SQL DateStyles, DateOrder also selects output field
* order: day comes before month in DMY style, else month comes before day.
*
* The user-visible "DateStyle" run-time parameter subsumes both of these.
*/
/* valid DateStyle values */
#define USE_POSTGRES_DATES 0
#define USE_ISO_DATES 1
#define USE_SQL_DATES 2
#define USE_GERMAN_DATES 3
#define USE_XSD_DATES 4
/* valid DateOrder values */
#define DATEORDER_YMD 0
#define DATEORDER_DMY 1
#define DATEORDER_MDY 2
extern PGDLLIMPORT int DateStyle;
extern PGDLLIMPORT int DateOrder;
/*
* IntervalStyles
* INTSTYLE_POSTGRES Like Postgres < 8.4 when DateStyle = 'iso'
* INTSTYLE_POSTGRES_VERBOSE Like Postgres < 8.4 when DateStyle != 'iso'
* INTSTYLE_SQL_STANDARD SQL standard interval literals
* INTSTYLE_ISO_8601 ISO-8601-basic formatted intervals
*/
#define INTSTYLE_POSTGRES 0
#define INTSTYLE_POSTGRES_VERBOSE 1
#define INTSTYLE_SQL_STANDARD 2
#define INTSTYLE_ISO_8601 3
extern PGDLLIMPORT int IntervalStyle;
#define MAXTZLEN 10 /* max TZ name len, not counting tr. null */
extern PGDLLIMPORT bool enableFsync;
extern PGDLLIMPORT bool allowSystemTableMods;
extern PGDLLIMPORT int work_mem;
Add hash_mem_multiplier GUC. Add a GUC that acts as a multiplier on work_mem. It gets applied when sizing executor node hash tables that were previously size constrained using work_mem alone. The new GUC can be used to preferentially give hash-based nodes more memory than the generic work_mem limit. It is intended to enable admin tuning of the executor's memory usage. Overall system throughput and system responsiveness can be improved by giving hash-based executor nodes more memory (especially over sort-based alternatives, which are often much less sensitive to being memory constrained). The default value for hash_mem_multiplier is 1.0, which is also the minimum valid value. This means that hash-based nodes continue to apply work_mem in the traditional way by default. hash_mem_multiplier is generally useful. However, it is being added now due to concerns about hash aggregate performance stability for users that upgrade to Postgres 13 (which added disk-based hash aggregation in commit 1f39bce0). While the old hash aggregate behavior risked out-of-memory errors, it is nevertheless likely that many users actually benefited. Hash agg's previous indifference to work_mem during query execution was not just faster; it also accidentally made aggregation resilient to grouping estimate problems (at least in cases where this didn't create destabilizing memory pressure). hash_mem_multiplier can provide a certain kind of continuity with the behavior of Postgres 12 hash aggregates in cases where the planner incorrectly estimates that all groups (plus related allocations) will fit in work_mem/hash_mem. This seems necessary because hash-based aggregation is usually much slower when only a small fraction of all groups can fit. Even when it isn't possible to totally avoid hash aggregates that spill, giving hash aggregation more memory will reliably improve performance (the same cannot be said for external sort operations, which appear to be almost unaffected by memory availability provided it's at least possible to get a single merge pass). The PostgreSQL 13 release notes should advise users that increasing hash_mem_multiplier can help with performance regressions associated with hash aggregation. That can be taken care of by a later commit. Author: Peter Geoghegan Reviewed-By: Álvaro Herrera, Jeff Davis Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 23:14:58 +02:00
extern PGDLLIMPORT double hash_mem_multiplier;
extern PGDLLIMPORT int maintenance_work_mem;
extern PGDLLIMPORT int max_parallel_maintenance_workers;
2004-02-06 20:36:18 +01:00
/*
* Upper and lower hard limits for the buffer access strategy ring size
* specified by the VacuumBufferUsageLimit GUC and BUFFER_USAGE_LIMIT option
* to VACUUM and ANALYZE.
*/
#define MIN_BAS_VAC_RING_SIZE_KB 128
#define MAX_BAS_VAC_RING_SIZE_KB (16 * 1024 * 1024)
extern PGDLLIMPORT int VacuumBufferUsageLimit;
extern PGDLLIMPORT int VacuumCostPageHit;
extern PGDLLIMPORT int VacuumCostPageMiss;
extern PGDLLIMPORT int VacuumCostPageDirty;
extern PGDLLIMPORT int VacuumCostLimit;
extern PGDLLIMPORT double VacuumCostDelay;
extern PGDLLIMPORT int64 VacuumPageHit;
extern PGDLLIMPORT int64 VacuumPageMiss;
extern PGDLLIMPORT int64 VacuumPageDirty;
extern PGDLLIMPORT int VacuumCostBalance;
extern PGDLLIMPORT bool VacuumCostActive;
/* in tcop/postgres.c */
typedef char *pg_stack_base_t;
extern pg_stack_base_t set_stack_base(void);
extern void restore_stack_base(pg_stack_base_t base);
extern void check_stack_depth(void);
extern bool stack_is_too_deep(void);
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
/* in tcop/utility.c */
extern void PreventCommandIfReadOnly(const char *cmdname);
extern void PreventCommandIfParallelMode(const char *cmdname);
extern void PreventCommandDuringRecovery(const char *cmdname);
Allow read only connections during recovery, known as Hot Standby. Enabled by recovery_connections = on (default) and forcing archive recovery using a recovery.conf. Recovery processing now emulates the original transactions as they are replayed, providing full locking and MVCC behaviour for read only queries. Recovery must enter consistent state before connections are allowed, so there is a delay, typically short, before connections succeed. Replay of recovering transactions can conflict and in some cases deadlock with queries during recovery; these result in query cancellation after max_standby_delay seconds have expired. Infrastructure changes have minor effects on normal running, though introduce four new types of WAL record. New test mode "make standbycheck" allows regression tests of static command behaviour on a standby server while in recovery. Typical and extreme dynamic behaviours have been checked via code inspection and manual testing. Few port specific behaviours have been utilised, though primary testing has been on Linux only so far. This commit is the basic patch. Additional changes will follow in this release to enhance some aspects of behaviour, notably improved handling of conflicts, deadlock detection and query cancellation. Changes to VACUUM FULL are also required. Simon Riggs, with significant and lengthy review by Heikki Linnakangas, including streamlined redesign of snapshot creation and two-phase commit. Important contributions from Florian Pflug, Mark Kirkwood, Merlin Moncure, Greg Stark, Gianni Ciolli, Gabriele Bartolini, Hannu Krosing, Robert Haas, Tatsuo Ishii, Hiroyuki Yamada plus support and feedback from many other community members.
2009-12-19 02:32:45 +01:00
1996-10-31 08:10:14 +01:00
/*****************************************************************************
* pdir.h -- *
* POSTGRES directory path definitions. *
*****************************************************************************/
2009-12-09 22:57:51 +01:00
/* flags to be OR'd to form sec_context */
#define SECURITY_LOCAL_USERID_CHANGE 0x0001
#define SECURITY_RESTRICTED_OPERATION 0x0002
#define SECURITY_NOFORCE_RLS 0x0004
2009-12-09 22:57:51 +01:00
extern PGDLLIMPORT char *DatabasePath;
1996-10-31 08:10:14 +01:00
/* now in utils/init/miscinit.c */
extern void InitPostmasterChild(void);
extern void InitStandaloneProcess(const char *argv0);
extern void InitProcessLocalLatch(void);
extern void SwitchToSharedLatch(void);
extern void SwitchBackToLocalLatch(void);
/*
* MyBackendType indicates what kind of a backend this is.
*
* If you add entries, please also update the child_process_kinds array in
* launch_backend.c.
*/
typedef enum BackendType
{
B_INVALID = 0,
/* Backends and other backend-like processes */
B_BACKEND,
B_AUTOVAC_LAUNCHER,
B_AUTOVAC_WORKER,
B_BG_WORKER,
B_WAL_SENDER,
B_SLOTSYNC_WORKER,
B_STANDALONE_BACKEND,
/*
* Auxiliary processes. These have PGPROC entries, but they are not
* attached to any particular database. There can be only one of each of
* these running at a time.
*
* If you modify these, make sure to update NUM_AUXILIARY_PROCS and the
* glossary in the docs.
*/
B_ARCHIVER,
B_BG_WRITER,
B_CHECKPOINTER,
B_STARTUP,
B_WAL_RECEIVER,
B_WAL_SUMMARIZER,
B_WAL_WRITER,
/*
* Logger is not connected to shared memory and does not have a PGPROC
* entry.
*/
B_LOGGER,
} BackendType;
#define BACKEND_NUM_TYPES (B_LOGGER + 1)
pgstat: Infrastructure for more detailed IO statistics This commit adds the infrastructure for more detailed IO statistics. The calls to actually count IOs, a system view to access the new statistics, documentation and tests will be added in subsequent commits, to make review easier. While we already had some IO statistics, e.g. in pg_stat_bgwriter and pg_stat_database, they did not provide sufficient detail to understand what the main sources of IO are, or whether configuration changes could avoid IO. E.g., pg_stat_bgwriter.buffers_backend does contain the number of buffers written out by a backend, but as that includes extending relations (always done by backends) and writes triggered by the use of buffer access strategies, it cannot easily be used to tune background writer or checkpointer. Similarly, pg_stat_database.blks_read cannot easily be used to tune shared_buffers / compute a cache hit ratio, as the use of buffer access strategies will often prevent a large fraction of the read blocks to end up in shared_buffers. The new IO statistics count IO operations (evict, extend, fsync, read, reuse, and write), and are aggregated for each combination of backend type (backend, autovacuum worker, bgwriter, etc), target object of the IO (relations, temp relations) and context of the IO (normal, vacuum, bulkread, bulkwrite). What is tracked in this series of patches, is sufficient to perform the aforementioned analyses. Further details, e.g. tracking the number of buffer hits, would make that even easier, but was left out for now, to keep the scope of the already large patchset manageable. Bumps PGSTAT_FILE_FORMAT_ID. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Justin Pryzby <pryzby@telsasoft.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/20200124195226.lth52iydq2n2uilq@alap3.anarazel.de
2023-02-09 05:53:42 +01:00
extern PGDLLIMPORT BackendType MyBackendType;
#define AmAutoVacuumLauncherProcess() (MyBackendType == B_AUTOVAC_LAUNCHER)
#define AmAutoVacuumWorkerProcess() (MyBackendType == B_AUTOVAC_WORKER)
#define AmBackgroundWorkerProcess() (MyBackendType == B_BG_WORKER)
#define AmWalSenderProcess() (MyBackendType == B_WAL_SENDER)
#define AmLogicalSlotSyncWorkerProcess() (MyBackendType == B_SLOTSYNC_WORKER)
#define AmArchiverProcess() (MyBackendType == B_ARCHIVER)
#define AmBackgroundWriterProcess() (MyBackendType == B_BG_WRITER)
#define AmCheckpointerProcess() (MyBackendType == B_CHECKPOINTER)
#define AmStartupProcess() (MyBackendType == B_STARTUP)
#define AmWalReceiverProcess() (MyBackendType == B_WAL_RECEIVER)
#define AmWalSummarizerProcess() (MyBackendType == B_WAL_SUMMARIZER)
#define AmWalWriterProcess() (MyBackendType == B_WAL_WRITER)
extern const char *GetBackendTypeDesc(BackendType backendType);
extern void SetDatabasePath(const char *path);
extern void checkDataDir(void);
extern void SetDataDir(const char *dir);
extern void ChangeToDataDir(void);
extern char *GetUserNameFromId(Oid roleid, bool noerr);
extern Oid GetUserId(void);
extern Oid GetOuterUserId(void);
extern Oid GetSessionUserId(void);
extern Oid GetAuthenticatedUserId(void);
2009-12-09 22:57:51 +01:00
extern void GetUserIdAndSecContext(Oid *userid, int *sec_context);
extern void SetUserIdAndSecContext(Oid userid, int sec_context);
extern bool InLocalUserIdChange(void);
extern bool InSecurityRestrictedOperation(void);
extern bool InNoForceRLSOperation(void);
extern void GetUserIdAndContext(Oid *userid, bool *sec_def_context);
extern void SetUserIdAndContext(Oid userid, bool sec_def_context);
extern void InitializeSessionUserId(const char *rolename, Oid roleid,
bool bypass_login_check);
extern void InitializeSessionUserIdStandalone(void);
extern void SetSessionAuthorization(Oid userid, bool is_superuser);
extern Oid GetCurrentRoleId(void);
extern void SetCurrentRoleId(Oid roleid, bool is_superuser);
extern void InitializeSystemUser(const char *authn_id,
const char *auth_method);
extern const char *GetSystemUser(void);
/* in utils/misc/superuser.c */
extern bool superuser(void); /* current user is superuser */
extern bool superuser_arg(Oid roleid); /* given user is superuser */
1996-10-31 08:10:14 +01:00
/*****************************************************************************
* pmod.h -- *
* POSTGRES processing mode definitions. *
*****************************************************************************/
1996-10-31 08:10:14 +01:00
/*
* Description:
* There are three processing modes in POSTGRES. They are
* BootstrapProcessing or "bootstrap," InitProcessing or
1996-10-31 08:10:14 +01:00
* "initialization," and NormalProcessing or "normal."
*
* The first two processing modes are used during special times. When the
1996-10-31 08:10:14 +01:00
* system state indicates bootstrap processing, transactions are all given
* transaction id "one" and are consequently guaranteed to commit. This mode
1996-10-31 08:10:14 +01:00
* is used during the initial generation of template databases.
*
* Initialization mode: used while starting a backend, until all normal
* initialization is complete. Some code behaves differently when executed
* in this mode to enable system bootstrapping.
*
Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.
2012-07-18 21:28:10 +02:00
* If a POSTGRES backend process is in normal mode, then all code may be
* executed normally.
1996-10-31 08:10:14 +01:00
*/
typedef enum ProcessingMode
{
BootstrapProcessing, /* bootstrap creation of template database */
InitProcessing, /* initializing system */
NormalProcessing, /* normal processing */
} ProcessingMode;
extern PGDLLIMPORT ProcessingMode Mode;
Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.
2012-07-18 21:28:10 +02:00
#define IsBootstrapProcessingMode() (Mode == BootstrapProcessing)
#define IsInitProcessingMode() (Mode == InitProcessing)
#define IsNormalProcessingMode() (Mode == NormalProcessing)
#define GetProcessingMode() Mode
#define SetProcessingMode(mode) \
do { \
Assert((mode) == BootstrapProcessing || \
(mode) == InitProcessing || \
(mode) == NormalProcessing); \
Mode = (mode); \
} while(0)
1996-10-31 08:10:14 +01:00
Fix management of pendingOpsTable in auxiliary processes. mdinit() was misusing IsBootstrapProcessingMode() to decide whether to create an fsync pending-operations table in the current process. This led to creating a table not only in the startup and checkpointer processes as intended, but also in the bgwriter process, not to mention other auxiliary processes such as walwriter and walreceiver. Creation of the table in the bgwriter is fatal, because it absorbs fsync requests that should have gone to the checkpointer; instead they just sit in bgwriter local memory and are never acted on. So writes performed by the bgwriter were not being fsync'd which could result in data loss after an OS crash. I think there is no live bug with respect to walwriter and walreceiver because those never perform any writes of shared buffers; but the potential is there for future breakage in those processes too. To fix, make AuxiliaryProcessMain() export the current process's AuxProcType as a global variable, and then make mdinit() test directly for the types of aux process that should have a pendingOpsTable. Having done that, we might as well also get rid of the random bool flags such as am_walreceiver that some of the aux processes had grown. (Note that we could not have fixed the bug by examining those variables in mdinit(), because it's called from BaseInit() which is run by AuxiliaryProcessMain() before entering any of the process-type-specific code.) Back-patch to 9.2, where the problem was introduced by the split-up of bgwriter and checkpointer processes. The bogus pendingOpsTable exists in walwriter and walreceiver processes in earlier branches, but absent any evidence that it causes actual problems there, I'll leave the older branches alone.
2012-07-18 21:28:10 +02:00
/*****************************************************************************
* pinit.h -- *
* POSTGRES initialization and cleanup definitions. *
*****************************************************************************/
/* in utils/init/postinit.c */
/* flags for InitPostgres() */
#define INIT_PG_LOAD_SESSION_LIBS 0x0001
#define INIT_PG_OVERRIDE_ALLOW_CONNS 0x0002
#define INIT_PG_OVERRIDE_ROLE_LOGIN 0x0004
extern void pg_split_opts(char **argv, int *argcp, const char *optstr);
extern void InitializeMaxBackends(void);
2022-07-25 16:27:43 +02:00
extern void InitPostgres(const char *in_dbname, Oid dboid,
const char *username, Oid useroid,
bits32 flags,
2022-07-25 16:27:43 +02:00
char *out_dbname);
extern void BaseInit(void);
/* in utils/init/miscinit.c */
extern PGDLLIMPORT bool IgnoreSystemIndexes;
extern PGDLLIMPORT bool process_shared_preload_libraries_in_progress;
extern PGDLLIMPORT bool process_shared_preload_libraries_done;
extern PGDLLIMPORT bool process_shmem_requests_in_progress;
extern PGDLLIMPORT char *session_preload_libraries_string;
extern PGDLLIMPORT char *shared_preload_libraries_string;
extern PGDLLIMPORT char *local_preload_libraries_string;
extern void CreateDataDirLockFile(bool amPostmaster);
extern void CreateSocketLockFile(const char *socketfile, bool amPostmaster,
const char *socketDir);
extern void TouchSocketLockFiles(void);
extern void AddToDataDirLockFile(int target_line, const char *str);
extern bool RecheckDataDirLockFile(void);
extern void ValidatePgVersion(const char *path);
extern void process_shared_preload_libraries(void);
extern void process_session_preload_libraries(void);
extern void process_shmem_requests(void);
extern void pg_bindtextdomain(const char *domain);
extern bool has_rolreplication(Oid roleid);
typedef void (*shmem_request_hook_type) (void);
extern PGDLLIMPORT shmem_request_hook_type shmem_request_hook;
extern Size EstimateClientConnectionInfoSpace(void);
extern void SerializeClientConnectionInfo(Size maxsize, char *start_address);
extern void RestoreClientConnectionInfo(char *conninfo);
Add hash_mem_multiplier GUC. Add a GUC that acts as a multiplier on work_mem. It gets applied when sizing executor node hash tables that were previously size constrained using work_mem alone. The new GUC can be used to preferentially give hash-based nodes more memory than the generic work_mem limit. It is intended to enable admin tuning of the executor's memory usage. Overall system throughput and system responsiveness can be improved by giving hash-based executor nodes more memory (especially over sort-based alternatives, which are often much less sensitive to being memory constrained). The default value for hash_mem_multiplier is 1.0, which is also the minimum valid value. This means that hash-based nodes continue to apply work_mem in the traditional way by default. hash_mem_multiplier is generally useful. However, it is being added now due to concerns about hash aggregate performance stability for users that upgrade to Postgres 13 (which added disk-based hash aggregation in commit 1f39bce0). While the old hash aggregate behavior risked out-of-memory errors, it is nevertheless likely that many users actually benefited. Hash agg's previous indifference to work_mem during query execution was not just faster; it also accidentally made aggregation resilient to grouping estimate problems (at least in cases where this didn't create destabilizing memory pressure). hash_mem_multiplier can provide a certain kind of continuity with the behavior of Postgres 12 hash aggregates in cases where the planner incorrectly estimates that all groups (plus related allocations) will fit in work_mem/hash_mem. This seems necessary because hash-based aggregation is usually much slower when only a small fraction of all groups can fit. Even when it isn't possible to totally avoid hash aggregates that spill, giving hash aggregation more memory will reliably improve performance (the same cannot be said for external sort operations, which appear to be almost unaffected by memory availability provided it's at least possible to get a single merge pass). The PostgreSQL 13 release notes should advise users that increasing hash_mem_multiplier can help with performance regressions associated with hash aggregation. That can be taken care of by a later commit. Author: Peter Geoghegan Reviewed-By: Álvaro Herrera, Jeff Davis Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 23:14:58 +02:00
/* in executor/nodeHash.c */
extern size_t get_hash_memory_limit(void);
Add hash_mem_multiplier GUC. Add a GUC that acts as a multiplier on work_mem. It gets applied when sizing executor node hash tables that were previously size constrained using work_mem alone. The new GUC can be used to preferentially give hash-based nodes more memory than the generic work_mem limit. It is intended to enable admin tuning of the executor's memory usage. Overall system throughput and system responsiveness can be improved by giving hash-based executor nodes more memory (especially over sort-based alternatives, which are often much less sensitive to being memory constrained). The default value for hash_mem_multiplier is 1.0, which is also the minimum valid value. This means that hash-based nodes continue to apply work_mem in the traditional way by default. hash_mem_multiplier is generally useful. However, it is being added now due to concerns about hash aggregate performance stability for users that upgrade to Postgres 13 (which added disk-based hash aggregation in commit 1f39bce0). While the old hash aggregate behavior risked out-of-memory errors, it is nevertheless likely that many users actually benefited. Hash agg's previous indifference to work_mem during query execution was not just faster; it also accidentally made aggregation resilient to grouping estimate problems (at least in cases where this didn't create destabilizing memory pressure). hash_mem_multiplier can provide a certain kind of continuity with the behavior of Postgres 12 hash aggregates in cases where the planner incorrectly estimates that all groups (plus related allocations) will fit in work_mem/hash_mem. This seems necessary because hash-based aggregation is usually much slower when only a small fraction of all groups can fit. Even when it isn't possible to totally avoid hash aggregates that spill, giving hash aggregation more memory will reliably improve performance (the same cannot be said for external sort operations, which appear to be almost unaffected by memory availability provided it's at least possible to get a single merge pass). The PostgreSQL 13 release notes should advise users that increasing hash_mem_multiplier can help with performance regressions associated with hash aggregation. That can be taken care of by a later commit. Author: Peter Geoghegan Reviewed-By: Álvaro Herrera, Jeff Davis Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com Backpatch: 13-, where disk-based hash aggregation was introduced.
2020-07-29 23:14:58 +02:00
1996-10-31 08:10:14 +01:00
#endif /* MISCADMIN_H */