postgresql

Commit Graph

Author	SHA1	Message	Date
Peter Eisentraut	794f10f6b9	Add some UUID support functions Add uuid_extract_timestamp() and uuid_extract_version(). Author: Andrey Borodin Reviewed-by: Sergey Prokhorenko, Kirk Wolak, Przemysław Sztoch Reviewed-by: Nikolay Samokhvalov, Jelte Fennema-Nio, Aleksander Alekseev Reviewed-by: Peter Eisentraut, Chris Travers, Lukas Fittl Discussion: https://postgr.es/m/CAAhFRxitJv%3DyoGnXUgeLB_O%2BM7J2BJAmb5jqAT9gZ3bij3uLDA%40mail.gmail.com	2024-03-19 09:32:04 +01:00
Jeff Davis	846311051e	Address more review comments on commit `2d819a08a1`. Based on comments from Peter Eisentraut. * Document CREATE DATABASE ... BUILTIN_LOCALE. * Determine required encoding based on locale name for CREATE COLLATION. Use -1 for "C" (requires catversion bump). * initdb output fixups. * Make ctype_is_c a constant true for now. * Fixups to ICU 010_create_database.pl test. Discussion: https://postgr.es/m/4135cf11-206d-40ed-96c0-9363c1232379@eisentraut.org	2024-03-18 11:58:13 -07:00
Peter Eisentraut	48018f1d8c	Add some const decorations Discussion: https://www.postgresql.org/message-id/flat/5ac50071-f2ed-4ace-a8fd-b892cffd33eb@www.fastmail.com	2024-03-18 12:07:09 +01:00
Heikki Linnakangas	05c3980e7f	Move code for backend startup to separate file This is code that runs in the backend process after forking, rather than postmaster. Move it out of postmaster.c for clarity. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:38:10 +02:00
Heikki Linnakangas	aafc05de1b	Refactor postmaster child process launching Introduce new postmaster_child_launch() function that deals with the differences in EXEC_BACKEND mode. Refactor the mechanism of passing information from the parent to child process. Instead of using different command-line arguments when launching the child process in EXEC_BACKEND mode, pass a variable-length blob of startup data along with all the global variables. The contents of that blob depend on the kind of child process being launched. In !EXEC_BACKEND mode, we use the same blob, but it's simply inherited from the parent to child process. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:35:30 +02:00
Heikki Linnakangas	f1baed18bc	Move some functions from postmaster.c to a new source file This just moves the functions, with no other changes, to make the next commits smaller and easier to review. The moved functions are related to launching postmaster child processes in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-18 11:35:05 +02:00
Daniel Gustafsson	d6607016c7	Support json_errdetail in FRONTEND code Allocate memory for the error message inside memory owned by the JsonLexContext and move responsibility away from the caller for freeing it. This means that we can partially revert `b44669b2ca` as this is now safe to use in FRONTEND code. The motivation for this comes from the OAuth and incremental JSON patchsets but it also adds value on its own. Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com	2024-03-17 23:56:15 +01:00
Dean Rasheed	c649fa24a4	Add RETURNING support to MERGE. This allows a RETURNING clause to be appended to a MERGE query, to return values based on each row inserted, updated, or deleted. As with plain INSERT, UPDATE, and DELETE commands, the returned values are based on the new contents of the target table for INSERT and UPDATE actions, and on its old contents for DELETE actions. Values from the source relation may also be returned. As with INSERT/UPDATE/DELETE, the output of MERGE ... RETURNING may be used as the source relation for other operations such as WITH queries and COPY commands. Additionally, a special function merge_action() is provided, which returns 'INSERT', 'UPDATE', or 'DELETE', depending on the action executed for each row. The merge_action() function can be used anywhere in the RETURNING list, including in arbitrary expressions and subqueries, but it is an error to use it anywhere outside of a MERGE query's RETURNING list. Dean Rasheed, reviewed by Isaac Morland, Vik Fearing, Alvaro Herrera, Gurjeet Singh, Jian He, Jeff Davis, Merlin Moncure, Peter Eisentraut, and Wolfgang Walther. Discussion: http://postgr.es/m/CAEZATCWePEGQR5LBn-vD6SfeLZafzEm2Qy_L_Oky2=qw2w3Pzg@mail.gmail.com	2024-03-17 13:58:59 +00:00
Peter Eisentraut	6a004f1be8	Add attstattarget to FormExtraData_pg_attribute This allows setting attstattarget when a relation is created. We make use of this by having index_concurrently_create_copy() copy over the attstattarget values when the new index is created, instead of having index_concurrently_swap() fix it up later. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-03-17 12:38:27 +01:00
Peter Eisentraut	d939cb2fd6	Generalize handling of nullable pg_attribute columns in DDL DDL code uses tuple descriptors to pass around pg_attribute values during table and index creation. But tuple descriptors don't include the variable-length/nullable columns of pg_attribute, so they have to be handled separately. Right now, the attoptions field is handled in a one-off way with a separate argument passed to InsertPgAttributeTuples(). The other affected fields of pg_attribute are right now not needed at relation creation time. The goal of this patch is to generalize this to allow handling additional variable-length/nullable columns of pg_attribute in a similar manner. For that, create a new struct FormExtraData_pg_attribute, which is to be passed around in parallel to the tuple descriptor and optionally supplies the additional columns. Right now, this struct only contains one field for attoptions, so no functionality is actually changed by this. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-03-17 12:30:51 +01:00
Peter Eisentraut	012460ee93	Make stxstattarget nullable To match attstattarget change (commit `4f622503d6`). The logic inside CreateStatistics() is clarified a bit compared to that previous patch, and so here we also update ATExecSetStatistics() to match. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-03-17 12:26:26 +01:00
Peter Eisentraut	20e58105ba	Separate equalRowTypes() from equalTupleDescs() This introduces a new function equalRowTypes() that is effectively a subset of equalTupleDescs() but only compares the number of attributes and attribute name, type, typmod, and collation. This is enough for most existing uses of equalTupleDescs(), which are changed to use the new function. The only remaining callers of equalTupleDescs() are those that really want to check the full tuple descriptor as such, without concern about record or row or record type semantics. The existing function hashTupleDesc() is renamed to hashRowType(), because it now corresponds more to equalRowTypes(). The purpose of this change is to be clearer about the semantics of the equality asked for by each caller. (At least one caller had a comment that questioned whether equalTupleDescs() was too restrictive.) For example, `4f622503d6` removed attstattarget from the tuple descriptor structure. It was not fully clear at the time how this should affect equalTupleDescs(). Now the answer is clear: By their own definitions, equalRowTypes() does not care, and equalTupleDescs() just compares whatever is in the tuple descriptor but does not care why it is in there. Reviewed-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/f656d6d9-6660-4518-a006-2f65cafbebd1%40eisentraut.org	2024-03-17 05:58:04 +01:00
Daniel Gustafsson	b783186515	Add destroyStringInfo function for cleaning up StringInfos destroyStringInfo() is a counterpart to makeStringInfo(), freeing a palloc'd StringInfo and its data. This is a convenience function to align the StringInfo API with the PQExpBuffer API. Originally added in the OAuth patchset, it was extracted and committed separately in order to aid upcoming JSON work. Author: Daniel Gustafsson <daniel@yesql.se> Author: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi+mWdTd6ujtyF7MsvXvk7ToLRVG_tYAcaGbQLvf=N4KrQw@mail.gmail.com	2024-03-16 23:18:28 +01:00
Nathan Bossart	d1162cfda8	Add pg_column_toast_chunk_id(). This function returns the chunk_id of an on-disk TOASTed value. If the value is un-TOASTed or not on-disk, it returns NULL. This is useful for identifying which values are actually TOASTed and for investigating "unexpected chunk number" errors. Bumps catversion. Author: Yugo Nagata Reviewed-by: Jian He Discussion: https://postgr.es/m/20230329105507.d764497456eeac1ca491b5bd%40sraoss.co.jp	2024-03-14 10:58:00 -05:00
Heikki Linnakangas	84c18acaf6	Remove redundant snapshot copying from parallel leader to workers The parallel query infrastructure copies the leader backend's active snapshot to the worker processes. But BitmapHeapScan node also had bespoken code to pass the snapshot from leader to the worker. That was redundant, so remove it. The removed code was analogous to the snapshot serialization in table_parallelscan_initialize(), but that was the wrong role model. A parallel bitmap heap scan is more like an independent non-parallel bitmap heap scan in each parallel worker as far as the table AM is concerned, because the coordination is done in nodeBitmapHeapscan.c, and the table AM doesn't need to know anything about it. This relies on the assumption that es_snapshot == GetActiveSnapshot(). That's not a new assumption, things would get weird if you used the QueryDesc's snapshot for visibility checks in the scans, but the active snapshot for evaluating quals, for example. This could use some refactoring and cleanup, but for now, just add some assertions. Reviewed-by: Dilip Kumar, Robert Haas Discussion: https://www.postgresql.org/message-id/5f3b9d59-0f43-419d-80ca-6d04c07cf61a@iki.fi	2024-03-14 15:18:10 +02:00
Robert Haas	2346df6fc3	Allow a no-wait lock acquisition to succeed in more cases. We don't determine the position at which a process waiting for a lock should insert itself into the wait queue until we reach ProcSleep(), and we may at that point discover that we must insert ourselves ahead of everyone who wants a conflicting lock, in which case we obtain the lock immediately. Up until now, a no-wait lock acquisition would fail in such cases, erroneously claiming that the lock couldn't be obtained immediately. Fix that by trying ProcSleep even in the no-wait case. No back-patch for now, because I'm treating this as an improvement to the existing no-wait feature. It could instead be argued that it's a bug fix, on the theory that there should never be any case whatsoever where no-wait fails to obtain a lock that would have been obtained immediately without no-wait, but I'm reluctant to interpret the semantics of no-wait that strictly. Robert Haas and Jingxian Li Discussion: http://postgr.es/m/CA+TgmobCH-kMXGVpb0BB-iNMdtcNkTvcZ4JBxDJows3kYM+GDg@mail.gmail.com	2024-03-14 08:56:06 -04:00
Alexander Korotkov	e85662df44	Fix false reports in pg_visibility Currently, pg_visibility computes its xid horizon using the GetOldestNonRemovableTransactionId(). The problem is that this horizon can sometimes go backward. That can lead to reporting false errors. In order to fix that, this commit implements a new function GetStrictOldestNonRemovableTransactionId(). This function computes the xid horizon, which would be guaranteed to be newer or equal to any xid horizon computed before. We have to do the following to achieve this. 1. Ignore processes xmin's, because they consider connection to other databases that were ignored before. 2. Ignore KnownAssignedXids, because they are not database-aware. At the same time, the primary could compute its horizons database-aware. 3. Ignore walsender xmin, because it could go backward if some replication connections don't use replication slots. As a result, we're using only currently running xids to compute the horizon. Surely these would significantly sacrifice accuracy. But we have to do so to avoid reporting false errors. Inspired by earlier patch by Daniel Shelepanov and the following discussion with Robert Haas and Tom Lane. Discussion: https://postgr.es/m/1649062270.289865713%40f403.i.mail.ru Reviewed-by: Alexander Lakhin, Dmitry Koval	2024-03-14 13:12:05 +02:00
Jeff Davis	2d819a08a1	Introduce "builtin" collation provider. New provider for collations, like "libc" or "icu", but without any external dependency. Initially, the only locale supported by the builtin provider is "C", which is identical to the libc provider's "C" locale. The libc provider's "C" locale has always been treated as a special case that uses an internal implementation, without using libc at all -- so the new builtin provider uses the same implementation. The builtin provider's locale is independent of the server environment variables LC_COLLATE and LC_CTYPE. Using the builtin provider, the database collation locale can be "C" while LC_COLLATE and LC_CTYPE are set to "en_US", which is impossible with the libc provider. By offering a new builtin provider, it clarifies that the semantics of a collation using this provider will never depend on libc, and makes it easier to document the behavior. Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798659@joeconway.com Discussion: https://postgr.es/m/dd9261f4-7a98-4565-93ec-336c1c110d90@manitou-mail.org Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Vérité, Peter Eisentraut, Jeremy Schneider	2024-03-13 23:33:44 -07:00
Peter Eisentraut	6ab2e8385d	Put genbki.pl output into src/include/catalog/ directly With the makefile rules, the output of genbki.pl was written to src/backend/catalog/, and then the header files were linked to src/include/catalog/. This changes it so that the output files are written directly to src/include/catalog/. This makes the logic simpler, and it also makes the behavior consistent with the meson build system. Also, the list of catalog files is now kept in parallel in src/include/catalog/{meson.build,Makefile}, while before the makefiles had it in src/backend/catalog/Makefile. Reviewed-by: Andreas Karlsson <andreas@proxel.se> Discussion: https://www.postgresql.org/message-id/flat/21b74bdc-183d-4dd5-9c27-9378d178f459@eisentraut.org	2024-03-14 07:11:21 +01:00
Alexander Korotkov	e820db5b56	Improve documentation for pg_stat_checkpointer fields pg_stat_checkpointer contains statistics for checkpoints and restartpoints. Before `12915a58ee` documentation said only about checkpoints implying that restartpoint is the variation of checkpoint. `12915a58ee` introduced new separate statistics fields for restartpoints. This commit explicitly documents fields that are relevant for both checkpoints and restartpoints. Reported-by: Magnus Hagander Discussion: https://postgr.es/m/CABUevExav5-SR0x%2BG9kBUMV0G8XsvSUfuyyqmYBBJi6VHns6sw%40mail.gmail.com Reviewed-by: Anton A. Melnikov	2024-03-14 02:17:59 +02:00
Nathan Bossart	ecb0fd3372	Reintroduce MAINTAIN privilege and pg_maintain predefined role. Roles with MAINTAIN on a relation may run VACUUM, ANALYZE, REINDEX, REFRESH MATERIALIZE VIEW, CLUSTER, and LOCK TABLE on the relation. Roles with privileges of pg_maintain may run those same commands on all relations. This was previously committed for v16, but it was reverted in commit `151c22deee` due to concerns about search_path tricks that could be used to escalate privileges to the table owner. Commits `2af07e2f74`, `59825d1639`, and `c7ea3f4229` resolved these concerns by restricting search_path when running maintenance commands. Bumps catversion. Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/20240305161235.GA3478007%40nathanxps13	2024-03-13 14:49:26 -05:00
Robert Haas	2041bc4276	Add the system identifier to backup manifests. Before this patch, if you took a full backup on server A and then tried to use the backup manifest to take an incremental backup on server B, it wouldn't know that the manifest was from a different server and so the incremental backup operation could potentially complete without error. When you later tried to run pg_combinebackup, you'd find out that your incremental backup was and always had been invalid. That's poor timing, because nobody likes finding out about backup problems only at restore time. With this patch, you'll get an error when trying to take the (invalid) incremental backup, which seems a lot nicer. Amul Sul, revised by me. Review by Michael Paquier. Discussion: http://postgr.es/m/CA+TgmoYLZzbSAMM3cAjV4Y+iCRZn-bR9H2+Mdz7NdaJFU1Zb5w@mail.gmail.com	2024-03-13 15:12:33 -04:00
Robert Haas	dbfc447165	Expose new function get_controlfile_by_exact_path(). This works just like get_controlfile(), but expects the path to the control file rather than the path to the data directory that contains the control file. This makes more sense in cases where the caller has already constructed the path to the control file itself. Amul Sul and Robert Haas, reviewed by Michael Paquier	2024-03-13 12:06:44 -04:00
Tom Lane	6ee3261e9b	Fix confusion about the return rowtype of SQL-language procedures. There is a very ancient hack in check_sql_fn_retval that allows a single SELECT targetlist entry of composite type to be taken as supplying all the output columns of a function returning composite. (This is grotty and fundamentally ambiguous, but it's really hard to do nested composite-returning functions without it.) As far as I know, that doesn't cause any problems in ordinary functions. It's disastrous for procedures however. All procedures that have any output parameters are labeled with prorettype RECORD, and the CALL code expects it will get back a record with one column per output parameter, regardless of whether any of those parameters is composite. Doing something else leads to an assertion failure or core dump. This is simple enough to fix: we just need to not apply that rule when considering procedures. However, that requires adding another argument to check_sql_fn_retval, which at least in principle might be getting called by external callers. Therefore, in the back branches convert check_sql_fn_retval into an ABI-preserving wrapper around a new function check_sql_fn_retval_ext. Per report from Yahor Yuzefovich. This has been broken since we implemented procedures, so back-patch to all supported branches. Discussion: https://postgr.es/m/CABz5gWHSjj2df6uG0NRiDhZ_Uz=Y8t0FJP-_SVSsRsnrQT76Gg@mail.gmail.com	2024-03-12 18:16:25 -04:00
David Rowley	fe4750effd	Fix incorrect filename reference in comment Author: Cary Huang Discussion: https://postgr.es/m/18e34071af0.dbfc9663424635.8571906799773344646@highgo.ca	2024-03-13 09:34:11 +13:00
Heikki Linnakangas	4945e4ed4a	Move initialization of the Port struct to the child process In postmaster, use a more lightweight ClientSocket struct that encapsulates just the socket itself and the remote endpoint's address that you get from accept() call. ClientSocket is passed to the child process, which initializes the bigger Port struct. This makes it more clear what information postmaster initializes, and what is left to the child process. Rename the StreamServerPort and StreamConnection functions to make it more clear what they do. Remove StreamClose, replacing it with plain closesocket() calls. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-12 13:42:38 +02:00
Heikki Linnakangas	d162c3a73b	Pass CAC as an argument to the backend process We used to smuggle it to the child process in the Port struct, but it seems better to pass it down as a separate argument. This paves the way for the next commit, which moves the initialization of the Port struct to the backend process, after forking. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2024-03-12 13:42:36 +02:00
Michael Paquier	a04ddd077e	Improve support for ExplainOneQuery() hook There is a hook called ExplainOneQuery_hook that gives modules the possibility to plug into this code path, but, like utility.c for utility statement execution, there is no corresponding "standard" routine in the case of EXPLAIN executed for one Query. This commit adds a new standard_ExplainOneQuery() in explain.c, which is able to run explain on a non-utility Query without calling its hook. Per the feedback received from a couple of hackers, this change gives the possibility to cut a few hundred lines of code in some of the popular out-of-core modules as these maintained a copy of ExplainOneQuery(), adding custom extra information at the beginning or the end of the EXPLAIN output. Author: Mats Kindahl Reviewed-by: Aleksander Alekseev, Jelte Fennema-Nio, Andrei Lepikhov Discussion: https://postgr.es/m/CA+14427V_B4EAoC_o-iYYucRdMSOTfpuH9k-QbexffY1HYJBiA@mail.gmail.com	2024-03-11 08:40:40 +09:00
Jeff Davis	f696c0cd5f	Catalog changes preparing for builtin collation provider. Rename pg_collation.colliculocale to colllocale, and pg_database.daticulocale to datlocale. These names reflects that the fields will be useful for the upcoming builtin provider as well, not just for ICU. This is purely a rename; no changes to the meaning of the fields. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Peter Eisentraut	2024-03-09 14:48:18 -08:00
Jeff Davis	33ee2550d3	Fix type signedness error in commit `5c40364dd6`. Use ssize_t instead of size_t. Discussion: https://postgr.es/m/b20d6d97-7338-48ea-ba33-837a1c8ef98e@iki.fi Reported-by: Heikki Linnakangas	2024-03-08 16:00:46 -08:00
Alvaro Herrera	270af6f0df	Admit deferrable PKs into rd_pkindex, but flag them as such ... and in particular don't return them as replica identity. The motivation for this change is letting the primary keys be seen by code that derives NOT NULL constraints from them, when creating inheritance children; before this change, if you had a deferrable PK, pg_dump would not recreate the attnotnull marking properly, because the column would not be considered as having anything to back said marking after dropping the throwaway NOT NULL constraint. The reason we don't want these PKs as replica identities is that replication can corrupt data, if the uniqueness constraint is transiently broken. Reported-by: Amul Sul <sulamul@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/CAAJ_b94QonkgsbDXofakHDnORQNgafd1y3Oa5QXfpQNJyXyQ7A@mail.gmail.com	2024-03-08 16:32:29 +01:00
Alexander Korotkov	4c1973fcae	Avoid recursion in MemoryContext functions You might run out of stack space with recursion, which is not nice in functions that might be used e.g. at cleanup after transaction abort. MemoryContext contains pointer to parent and siblings, so we can traverse a tree of contexts iteratively, without using stack. Refactor the functions to do that. MemoryContextStats() still recurses, but it now has a limit to how deep it recurses. Once the limit is reached, it prints just a summary of the rest of the hierarchy, similar to how it summarizes contexts with lots of children. That seems good anyway, because a context dump with hundreds of nested contexts isn't very readable. Report by Egor Chindyaskin and Alexander Lakhin. Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru Author: Heikki Linnakangas Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov, Tom Lane	2024-03-08 13:18:30 +02:00
John Naylor	ab6ae62603	Fix link error for test_radixtree module on Windows Add PGDLLIMPORT to pg_popcount32/64. In passing, fix a typo. Diagnosis by Masahiko Sawada, patch by David Rowley Per buildfarm members drongo and fairywren Discussion: https://postgr.es/m/CAD21AoAMm1mQd%3Dw4PrfrKK%3DOMP8j8%3D7ntJRPF8%2B%3D10iUuvwiCA%40mail.gmail.com Discussion: https://postgr.es/m/CAApHDvov7724UrD1Ug0D1eV%2B9Pd_x5VEQmw-6HVG9w1WdCxXPA%40mail.gmail.com	2024-03-08 10:57:40 +07:00
Amit Kapila	bf279ddd1c	Introduce a new GUC 'standby_slot_names'. This patch provides a way to ensure that physical standbys that are potential failover candidates have received and flushed changes before the primary server making them visible to subscribers. Doing so guarantees that the promoted standby server is not lagging behind the subscribers when a failover is necessary. The logical walsender now guarantees that all local changes are sent and flushed to the standby servers corresponding to the replication slots specified in 'standby_slot_names' before sending those changes to the subscriber. Additionally, the SQL functions pg_logical_slot_get_changes, pg_logical_slot_peek_changes and pg_replication_slot_advance are modified to ensure that they process changes for failover slots only after physical slots specified in 'standby_slot_names' have confirmed WAL receipt for those. Author: Hou Zhijie and Shveta Malik Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-03-08 08:10:45 +05:30
Michael Paquier	4f8c1e7aaf	Update comment of AlterTableCmd->name in parsenodes.h Since `b0483263dd`, this field can be used to store an access method name for ALTER TABLE, but access methods were not mentioned in the field's description. Issue noticed while working on the area. Discussion: https://postgr.es/m/ZeWKgCtk6xiAsDsc@paquier.xyz	2024-03-08 08:44:13 +09:00
Jeff Davis	5c40364dd6	Unicode case mapping tables and functions. Implements Unicode simple case mapping, in which all code points map to exactly one other code point unconditionally. These tables are generated from UnicodeData.txt, which is already being used by other infrastructure in src/common/unicode. The tables are checked into the source tree, so they only need to be regenerated when we update the Unicode version. In preparation for the builtin collation provider, and possibly useful for other callers. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Peter Eisentraut, Daniel Verite, Jeremy Schneider	2024-03-07 11:15:06 -08:00
Peter Eisentraut	6d470211e5	Fix description and grouping of RangeTblEntry.inh The inh field of RangeTblEntry was doubly confusingly documented. Some parts of the code insisted that it was only valid for RTE_RELATION entries, other parts said the field was valid for all entries. Neither was quite correct. More correctly, the field is valid for RTE_RELATION entries but is also used in the planner for RTE_SUBQUERY entries. So it makes more sense to group it with other fields that are primarily for RTE_RELATION but borrowed by RTE_SUBQUERY. (The exact position was chosen so that it is next to relkind for better struct packing, and next to relid, since relid and inh are sort of the input fields and the others are filled in later.) Also add documentation for the planner's use at the struct definition. Discussion: https://www.postgresql.org/message-id/6c1fbccc-85c8-40d3-b08b-4f47f2093711@eisentraut.org	2024-03-07 12:13:09 +01:00
John Naylor	e444ebcb85	Fix incorrect format specifier for int64 Follow-up to `ee1b30f12`, per buildfarm member mamba. Discussion: https://postgr.es/m/CANWCAZYwyRMU%2BOTVOjK%3Dno1hm-W3ZQ5vrSFM1MFAaLtLydvwzA%40mail.gmail.com	2024-03-07 14:26:52 +07:00
John Naylor	ac234e6377	Fix redefinition of typedefs Per buildfarm members sifaka and longfin, clang with -Wtypedef-redefinition warns of duplicate typedefs unless building with C11. Follow-up to `ee1b30f12`. Masahiko Sawada Discussion: https://postgr.es/m/CANWCAZauSg%3DLUbBbXhpeQtBuPifmzQNTYS6O8NsoAPz1zL-Txg%40mail.gmail.com	2024-03-07 14:11:49 +07:00
John Naylor	ee1b30f128	Add template for adaptive radix tree This implements a radix tree data structure based on the design in "The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases" by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main technique that makes it adaptive is using several different node types, each with a different capacity of elements, and a different algorithm for accessing them. The nodes start small and grow/shrink as needed. The main advantage over hash tables is efficient sorted iteration and better memory locality when successive keys are lexicographically close together. The implementation currently assumes 64-bit integer keys, and traversing the tree is in general slower than a linear probing hash table, so this is not a general-purpose associative array. The paper describes two other techniques not implemented here, namely "path compression" and "lazy expansion". These can further reduce memory usage and speed up traversal, but the former would add significant complexity and the latter requires storing the full key with the value. We do trivially compress the path when leading bytes of the key are zeros, however. For value storage, we use "combined pointer/value slots", as recommended in the paper. Values of size equal or smaller than the the platform's pointer type are stored in the array of child pointers in the last level node, while larger values are each stored in a separate allocation. This is for now fixed at compile time, but it would be fairly trivial to allow determining at runtime how variable-length values are stored. One innovation in our implementation compared to the ART paper is decoupling the notion of node "size class" from "kind". The size classes within a given node kind have the same underlying type, but a variable capacity for children, so we can introduce additional node sizes with little additional code. To enable different use cases to specialize for different value types and for shared/local memory, we use macro-templatized code generation in the same manner as simplehash.h and sort_template.h. Future commits will use this infrastructure for storing TIDs. Patch by Masahiko Sawada and John Naylor, but a substantial amount of credit is due to Andres Freund, whose proof-of-concept was a valuable source of coding idioms and awareness of performance pitfalls, and who reviewed earlier versions. Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-03-07 12:40:11 +07:00
Michael Paquier	099ca50bd4	Revert "Fix parallel-safety check of expressions and predicate for index builds" This reverts commit `eae7be600b`, following a discussion with Tom Lane, due to concerns that this impacts the decisions made by the planner for the number of workers spawned based on the inlining and const-folding of index expressions and predicate for cases that would have worked until this commit. Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us Backpatch-through: 12	2024-03-07 08:30:35 +09:00
Jeff Davis	ad49994538	Add Unicode property tables. Provide functions to test for Unicode properties, such as Alphabetic or Cased. These functions use tables derived from Unicode data files, similar to the tables for Unicode normalization or general category, and those tables can be updated with the 'update-unicode' build target. Use Unicode properties to provide functions to test for regex character classes, like 'punct' or 'alnum'. Infrastructure in preparation for a builtin collation provider, and may also be useful for other callers. Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com Reviewed-by: Daniel Verite, Peter Eisentraut, Jeremy Schneider	2024-03-06 12:50:01 -08:00
John Naylor	de7c6fe834	Fix signedness error in `9f225e992` for gcc The first argument of vshrq_n_s8 needs to be a signed vector type, but it was passed unsigned. Clang is more lax with conversion, but gcc needs a cast. Fix by me, tested by Masahiko Sawada Per buildfarm members splitfin, batta, widowbird, snakefly, parula, massasauga Discussion: https://postgr.es/m/20240306074106.mg6w4koohdlworbs%40alap3.anarazel.de	2024-03-06 15:55:55 +07:00
Michael Paquier	eae7be600b	Fix parallel-safety check of expressions and predicate for index builds As coded, the planner logic that calculates the number of parallel workers to use for a parallel index build uses expressions and predicates from the relcache, which are flattened for the planner by eval_const_expressions(). As reported in the bug, an immutable parallel-unsafe function flattened in the relcache would become a Const, which would be considered as parallel-safe, even if the predicate or the expressions including the function are not safe in parallel workers. Depending on the expressions or predicate used, this could cause the parallel build to fail. Tests are included that check parallel index builds with parallel-unsafe predicate and expressions. Two routines are added to lsyscache.h to be able to retrieve expressions and predicate of an index from its pg_index data. Reported-by: Alexander Lakhin Author: Tender Wang Reviewed-by: Jian He, Michael Paquier Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com Backpatch-through: 12	2024-03-06 17:23:56 +09:00
John Naylor	3e76a806cb	Move some bitmap logic out of bitmapset.c Move the logic for selecting appropriate pg_bitutils.h functions based on word size to bitmapset.h for wider visibility. Reviewed (in a previous version) by Tom Lane Discussion: https://postgr.es/m/CAFBsxsFW2JjTo58jtDB%2B3sZhxMx3t-3evew8%3DAcr%2BGGhC%2BkFaA%40mail.gmail.com	2024-03-06 14:30:16 +07:00
John Naylor	9f225e992b	Introduce helper SIMD functions for small byte arrays vector8_min - helper for emulating ">=" semantics vector8_highbit_mask - used to turn the result of a vector comparison into a bitmask Masahiko Sawada Reviewed by Nathan Bossart, with additional adjustments by me Discussion: https://postgr.es/m/CAFBsxsHbBm_M22gLBO%2BAZT4mfMq3L_oX3wdKZxjeNnT7fHsYMQ%40mail.gmail.com	2024-03-06 14:25:20 +07:00
Thomas Munro	d93627bcbe	Add --copy-file-range option to pg_upgrade. The copy_file_range() system call is available on at least Linux and FreeBSD, and asks the kernel to use efficient ways to copy ranges of a file. Options available to the kernel include sharing block ranges (similar to --clone mode), and pushing down block copies to the storage layer. For automated testing, see PG_TEST_PG_UPGRADE_MODE. (Perhaps in a later commit we could consider setting this mode for one of the CI targets.) Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGKe7Hb0-UNih8VD5UNZy5-ojxFb3Pr3xSBBL8qj2M2%3DdQ%40mail.gmail.com	2024-03-06 12:01:01 +13:00
Peter Eisentraut	e03349144b	Improve field order in RangeTblEntry When perminfoindex was added, it was just added at the end of the block. It would make sense to keep it closer to more related fields. In passing, also add an inline comment, like the other fields have. (Other field reorderings and documentation improvements in RangeTblEntry are being discussed, but it's better not to mix them together.) Discussion: https://www.postgresql.org/message-id/flat/6c1fbccc-85c8-40d3-b08b-4f47f2093711%40eisentraut.org	2024-03-05 13:34:43 +01:00
Peter Eisentraut	030e10ff1a	Rename pg_constraint.conwithoutoverlaps to conperiod pg_constraint.conwithoutoverlaps was recently added to support primary keys and unique constraints with the WITHOUT OVERLAPS clause. An upcoming patch provides the foreign-key side of this functionality, but the syntax there is different and uses the keyword PERIOD. It would make sense to use the same pg_constraint field for both of these, but then we should pick a more general name that conveys "this constraint has a temporal/period-related feature". conperiod works for that and is nicely compact. Changing this now avoids possibly having to introduce versioning into clients. Note there are still some "without overlaps" variables left, which deal specifically with the parsing of the primary key/unique constraint feature. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-03-05 11:24:17 +01:00
Jeff Davis	59825d1639	Fix buildfarm failures from `2af07e2f74`. Use GUC_ACTION_SAVE rather than GUC_ACTION_SET, necessary for working with parallel query. Now that the call requires more arguments, wrap the call in a new function to avoid code duplication and offer a place for a comment. Discussion: https://postgr.es/m/E1rhJpO-0027Wf-9L@gemulon.postgresql.org	2024-03-04 19:42:16 -08:00
Jeff Davis	2af07e2f74	Fix search_path to a safe value during maintenance operations. While executing maintenance operations (ANALYZE, CLUSTER, REFRESH MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to 'pg_catalog, pg_temp' to prevent inconsistent behavior. Functions that are used for functional indexes, in index expressions, or in materialized views and depend on a different search path must be declared with CREATE FUNCTION ... SET search_path='...'. This change was previously committed as `05e1737351`, then reverted in commit `2fcc7ee7af` because it was too late in the cycle. Preparation for the MAINTAIN privilege, which was previously reverted due to search_path manipulation hazards. Discussion: https://postgr.es/m/d4ccaf3658cb3c281ec88c851a09733cd9482f22.camel@j-davis.com Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com Reviewed-by: Greg Stark, Nathan Bossart, Noah Misch	2024-03-04 17:31:38 -08:00
Nathan Bossart	2c29e7fc95	Add macro for customizing an archiving WARNING message. Presently, if an archive module's check_configured_cb callback returns false, a generic WARNING message is emitted, which unfortunately provides no actionable details about the reason why the module is not configured. This commit introduces a macro that archive module authors can use to add a DETAIL line to this WARNING message. Co-authored-by: Tung Nguyen Reviewed-by: Daniel Gustafsson, Álvaro Herrera Discussion: https://postgr.es/m/4109578306242a7cd5661171647e11b2%40oss.nttdata.com	2024-03-04 15:41:42 -06:00
Tom Lane	e5bc9454e5	Explicitly list dependent types as extension members in pg_depend. Auto-generated array types, multirange types, and relation rowtypes are treated as dependent objects: they can't be dropped separately from the base object, nor can they have their own ownership or permissions. We previously felt that, for objects that are in an extension, only the base object needs to be listed as an extension member in pg_depend. While that's sufficient to prevent inappropriate drops, it results in undesirable answers if someone asks whether a dependent type belongs to the extension. It looks like the dependent type is just some random separately-created object that happens to depend on the base object. Notably, this results in postgres_fdw concluding that expressions involving an array type are not shippable to the remote server, even when the defining extension has been whitelisted. To fix, cause GenerateTypeDependencies to make extension dependencies for dependent types as well as their base objects, and adjust ExecAlterExtensionContentsStmt so that object addition and removal operations recurse to dependent types. The latter change means that pg_upgrade of a type-defining extension will end with the dependent type(s) now also listed as extension members, even if they were not that way in the source database. Normally we want pg_upgrade to precisely reproduce the source extension's state, but it seems desirable to make an exception here. This is arguably a bug fix, but we can't back-patch it since it causes changes in the expected contents of pg_depend. (Because it does, I've bumped catversion, even though there's no change in the immediate post-initdb catalog contents.) Tom Lane and David Geier Discussion: https://postgr.es/m/4a847c55-489f-4e8d-a664-fc6b1cbe306f@gmail.com	2024-03-04 14:49:36 -05:00
Daniel Gustafsson	cc09e6549f	Remove the adminpack contrib extension The adminpack extension was only used to support pgAdmin III, which in turn was declared EOL many years ago. Removing the extension also allows us to remove functions from core as well which were only used to support old version of adminpack. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> Discussion: https://postgr.es/m/CALj2ACUmL5TraYBUBqDZBi1C+Re8_=SekqGYqYprj_W8wygQ8w@mail.gmail.com	2024-03-04 12:39:22 +01:00
Heikki Linnakangas	24eebc65c2	Remove unused 'countincludesself' argument to pq_sendcountedtext() It has been unused since we removed support for protocol version 2.	2024-03-04 12:56:05 +02:00
Heikki Linnakangas	0dd094c4a0	Remove unused ParallelWorkerInfo.pid field The pid was originally used in error context of messages propagated from parallel workers, but commit `292794f82b` removed that. If the need arises in the future, you can also get the pid with "shm_mq_get_sender(pcxt->worker[i].error_mqh)->pid".	2024-03-04 12:56:02 +02:00
Heikki Linnakangas	393b5599e5	Use MyBackendType in more places to check what process this is Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(), IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of new AmProcess() macros that use MyBackendType. For consistency with the existing AmProcess() macros. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi	2024-03-04 10:25:12 +02:00
Heikki Linnakangas	067701f577	Remove MyAuxProcType, use MyBackendType instead MyAuxProcType was redundant with MyBackendType. Reviewed-by: Reid Thompson, Andres Freund Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi	2024-03-04 10:25:09 +02:00
Heikki Linnakangas	024c521117	Replace BackendIds with 0-based ProcNumbers Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:38:22 +02:00
Heikki Linnakangas	ab355e3a88	Redefine backend ID to be an index into the proc array Previously, backend ID was an index into the ProcState array, in the shared cache invalidation manager (sinvaladt.c). The entry in the ProcState array was reserved at backend startup by scanning the array for a free entry, and that was also when the backend got its backend ID. Things become slightly simpler if we redefine backend ID to be the index into the PGPROC array, and directly use it also as an index to the ProcState array. This uses a little more memory, as we reserve a few extra slots in the ProcState array for aux processes that don't need them, but the simplicity is worth it. Aux processes now also have a backend ID. This simplifies the reservation of BackendStatusArray and ProcSignal slots. You can now convert a backend ID into an index into the PGPROC array simply by subtracting 1. We still use 0-based "pgprocnos" in various places, for indexes into the PGPROC array, but the only difference now is that backend IDs start at 1 while pgprocnos start at 0. (The next commmit will get rid of the term "backend ID" altogether and make everything 0-based.) There is still a 'backendId' field in PGPROC, now part of 'vxid' which encapsulates the backend ID and local transaction ID together. It's needed for prepared xacts. For regular backends, the backendId is always equal to pgprocno + 1, but for prepared xact PGPROC entries, it's the ID of the original backend that processed the transaction. Reviewed-by: Andres Freund, Reid Thompson Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:37:28 +02:00
Thomas Munro	653b55b570	Return ssize_t in fd.c I/O functions. In the past, FileRead() and FileWrite() used types based on the Unix read() and write() functions from before C and POSIX standardization, though not exactly (we had int for amount instead of unsigned). In commit `2d4f1ba6` we changed to the appropriate standard C types, just like the modern POSIX functions they wrap, but again not exactly: the return type stayed as int. In theory, a ssize_t value could be returned by the underlying call that is too large for an int. That wasn't really a live bug, because we don't expect PostgreSQL code to perform reads or writes of gigabytes, and OSes probably apply internal caps smaller than that anyway. This change is done on the principle that the return might as well follow the standard interfaces consistently. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1672202.1703441340%40sss.pgh.pa.us	2024-03-02 12:09:28 +13:00
Michael Paquier	655dc31046	Simplify pg_enc2gettext_tbl[] with C99-designated initializer syntax This commit switches pg_enc2gettext_tbl[] in encnames.c to use a C99-designated initializer syntax. pg_bind_textdomain_codeset() is simplified so as it is possible to do a direct lookup at the gettext() array with a value of the enum pg_enc rather than doing a loop through all its elements, as long as the encoding value provided by GetDatabaseEncoding() is in the correct range of supported encoding values. Note that PG_MULE_INTERNAL gains a value in the array, pointing to NULL. Author: Jelte Fennema-Nio Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com	2024-03-01 18:03:48 +09:00
Nathan Bossart	bd5132db55	Introduce atomic read/write functions with full barrier semantics. Writing correct code using atomic variables is often difficult due to the memory barrier semantics (or lack thereof) of the underlying operations. This commit introduces atomic read/write functions with full barrier semantics to ease this cognitive load. For example, some spinlocks protect a single value, and these new functions make it easy to convert the value to an atomic variable (thus eliminating the need for the spinlock) without modifying the barrier semantics previously provided by the spinlock. Since these functions may be less performant than the other atomic reads and writes, they are not suitable for every use-case. However, using a single atomic operation with full barrier semantics may be more performant in cases where a separate explicit barrier would otherwise be required. The base implementations for these new functions are atomic exchanges (for writes) and atomic fetch/adds with 0 (for reads). These implementations can be overwritten with better architecture- specific versions as they are discovered. This commit leaves converting existing code to use these new functions as a future exercise. Reviewed-by: Andres Freund, Yong Li, Jeff Davis Discussion: https://postgr.es/m/20231110205128.GB1315705%40nathanxps13	2024-02-29 10:00:44 -06:00
Dean Rasheed	5f2e179bd3	Support MERGE into updatable views. This allows the target relation of MERGE to be an auto-updatable or trigger-updatable view, and includes support for WITH CHECK OPTION, security barrier views, and security invoker views. A trigger-updatable view must have INSTEAD OF triggers for every type of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command. An auto-updatable view must not have any INSTEAD OF triggers. Mixing auto-update and trigger-update actions (i.e., having a partial set of INSTEAD OF triggers) is not supported. Rule-updatable views are also not supported, since there is no rewriter support for non-SELECT rules with MERGE operations. Dean Rasheed, reviewed by Jian He and Alvaro Herrera. Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com	2024-02-29 15:56:59 +00:00
Michael Paquier	ada87a4d95	Use C99-designated initializer syntax for arrays related to encodings This updates the following lookup arrays to use C99-designated initializer syntax, indexed based on the enum pg_enc: pg_enc2icu_tbl[] pg_enc2name_tbl[] pg_wchar_table[] This is more readable, and removes problems with ordering mistakes as this removes dependencies between the arrays and their lookup index in the enum pg_enc. So, adding new encodings becomes easier, even if this does not happen often. Author: Jelte Fennema-Nio Reviewed-by: Jian He, Japin Li Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com	2024-02-29 09:54:25 +09:00
Alvaro Herrera	53c2a97a92	Improve performance of subsystems on top of SLRU More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by `bcdfa5f2e2` in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com	2024-02-28 17:05:31 +01:00
Heikki Linnakangas	0b16bb8776	Remove AIX support There isn't a lot of user demand for AIX support, we have a bunch of hacks to work around AIX-specific compiler bugs and idiosyncrasies, and no one has stepped up to the plate to properly maintain it. Remove support for AIX to get rid of that maintenance overhead. It's still supported for stable versions. The acute issue that triggered this decision was that after commit `8af2565248`, the AIX buildfarm members have been hitting this assertion: TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 472, PID: 2949728 Apperently the "pg_attribute_aligned(a)" attribute doesn't work on AIX for values larger than PG_IO_ALIGN_SIZE, for a static const variable. That could be worked around, but we decided to just drop the AIX support instead. Discussion: https://www.postgresql.org/message-id/20240224172345.32@rfd.leadboat.com Reviewed-by: Andres Freund, Noah Misch, Thomas Munro	2024-02-28 15:17:23 +04:00
Alvaro Herrera	bcdfa5f2e2	Rename SLRU elements in view pg_stat_slru The new names are intended to match those in an upcoming patch that adds a few GUCs to configure the SLRU buffer sizes. Backwards compatibility concern: this changes the accepted names for function pg_stat_slru_rest(). Since this function recognizes "any other string" as a request to reset the entry for "other", this means that calling it with the old names would silently reset "other" instead of doing nothing or throwing an error. Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql	2024-02-28 09:39:52 +01:00
Nathan Bossart	e1724af42c	Fix comments for the dshash_parameters struct. A recent commit added a copy_function member to the dshash_parameters struct, but it missed updating a couple of comments that refer to the function pointer members of this struct. One of those comments also refers to a tranche_name member and non- arg variants of the function pointer members, all of which were either removed during development or removed shortly after dshash table support was committed. Oversights in commits `8c0d7bafad`, `d7694fc148`, and `42a1de3013`. Discussion: https://postgr.es/m/20240227045213.GA2329190%40nathanxps13	2024-02-27 09:44:59 -06:00
Michael Paquier	ef5e2e9085	Remove unnecessary array object_classes[] in dependency.c object_classes[] provided unnecessary indirection between catalog OIDs and the enum ObjectClass when calling add_object_address(). This array has been originally introduced in `30ec31604d` and was useful because not all relation OIDs were compile-time constants back then, which has not been the case for a long time now for all the elements of ObjectClass. This commit removes object_classes[], switching to the catalog OIDs when calling add_object_address(). This shaves some code while saving in maintenance because it was necessary to maintain the enum ObjectClass and the array in sync when adding new object types. Reported-by: Jeff Davis Author: Jelte Fennema-Nio Reviewed-by: Jian He, Michael Paquier Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com	2024-02-27 15:18:17 +09:00
David Rowley	743112a2e9	Adjust memory allocation functions to allow sibling calls Many modern compilers are able to optimize function calls to functions where the parameters of the called function match a leading subset of the calling function's parameters. If there are no instructions in the calling function after the function is called, then the compiler is free to avoid any stack frame setup and implement the function call as a "jmp" rather than a "call". This is called sibling call optimization. Here we adjust the memory allocation functions in mcxt.c to allow this optimization. This requires moving some responsibility into the memory context implementations themselves. It's now the responsibility of the MemoryContext to check for malloc failures. This is good as it both allows the sibling call optimization, but also because most small and medium allocations won't call malloc and just allocate memory to an existing block. That can't fail, so checking for NULLs in that case isn't required. Also, traditionally it's been the responsibility of palloc and the other allocation functions in mcxt.c to check for invalid allocation size requests. Here we also move the responsibility of checking that into the MemoryContext. This isn't to allow the sibling call optimization, but more because most of our allocators handle large allocations separately and we can just add the size check when doing large allocations. We no longer check this for non-large allocations at all. To make checking the allocation request sizes and ERROR handling easier, add some helper functions to mcxt.c for the allocators to use. Author: Andres Freund Reviewed-by: David Rowley Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de	2024-02-27 16:39:42 +13:00
Nathan Bossart	42a1de3013	Add helper functions for dshash tables with string keys. Presently, string keys are not well-supported for dshash tables. The dshash code always copies key_size bytes into new entries' keys, and dshash.h only provides compare and hash functions that forward to memcmp() and tag_hash(), both of which do not stop at the first NUL. This means that callers must pad string keys so that the data beyond the first NUL does not adversely affect the results of copying, comparing, and hashing the keys. To better support string keys in dshash tables, this commit does a couple things: * A new copy_function field is added to the dshash_parameters struct. This function pointer specifies how the key should be copied into new table entries. For example, we only want to copy up to the first NUL byte for string keys. A dshash_memcpy() helper function is provided and used for all existing in-tree dshash tables without string keys. * A set of helper functions for string keys are provided. These helper functions forward to strcmp(), strcpy(), and string_hash(), all of which ignore data beyond the first NUL. This commit also adjusts the DSM registry's dshash table to use the new helper functions for string keys. Reviewed-by: Andy Fan Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13	2024-02-26 15:47:13 -06:00
Michael Paquier	449e798c77	Introduce sequence_*() access functions Similarly to tables and indexes, these functions are able to open relations with a sequence relkind, which is useful to make a distinction with the other relation kinds. Previously, commands/sequence.c used a mix of table_{close,open}() and relation_{close,open}() routines when manipulating sequence relations, so this clarifies the code. A direct effect of this change is to align the error messages produced when attempting DDLs for sequences on relations with an unexpected relkind, like a table or an index with ALTER SEQUENCE, providing an extra error detail about the relkind of the relation used in the DDL query. Author: Michael Paquier Reviewed-by: Tomas Vondra Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz	2024-02-26 16:04:59 +09:00
Heikki Linnakangas	d360e3cc60	Fix compiler warning on typedef redeclaration bulk_write.c:78:3: error: redefinition of typedef 'BulkWriteState' is a C11 feature [-Werror,-Wtypedef-redefinition] } BulkWriteState; ^ ../../../../src/include/storage/bulk_write.h:20:31: note: previous definition is here typedef struct BulkWriteState BulkWriteState; ^ 1 error generated. Per buildfarm animals 'sifaka' and 'longfin'. Discussion: https://www.postgresql.org/message-id/9e1f63c3-ef16-404c-b3cb-859a96eaba39@iki.fi	2024-02-23 17:39:27 +02:00
Heikki Linnakangas	8af2565248	Introduce a new smgr bulk loading facility. The new facility makes it easier to optimize bulk loading, as the logic for buffering, WAL-logging, and syncing the relation only needs to be implemented once. It's also less error-prone: We have had a number of bugs in how a relation is fsync'd - or not - at the end of a bulk loading operation. By centralizing that logic to one place, we only need to write it correctly once. The new facility is faster for small relations: Instead of of calling smgrimmedsync(), we register the fsync to happen at next checkpoint, which avoids the fsync latency. That can make a big difference if you are e.g. restoring a schema-only dump with lots of relations. It is also slightly more efficient with large relations, as the WAL logging is performed multiple pages at a time. That avoids some WAL header overhead. The sorted GiST index build did that already, this moves the buffering to the new facility. The changes to pageinspect GiST test needs an explanation: Before this patch, the sorted GiST index build set the LSN on every page to the special GistBuildLSN value, not the LSN of the WAL record, even though they were WAL-logged. There was no particular need for it, it just happened naturally when we wrote out the pages before WAL-logging them. Now we WAL-log the pages first, like in B-tree build, so the pages are stamped with the record's real LSN. When the build is not WAL-logged, we still use GistBuildLSN. To make the test output predictable, use an unlogged index. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi	2024-02-23 16:10:51 +02:00
Amit Kapila	93db6cbda0	Add a new slot sync worker to synchronize logical slots. By enabling slot synchronization, all the failover logical replication slots on the primary (assuming configurations are appropriate) are automatically created on the physical standbys and are synced periodically. The slot sync worker on the standby server pings the primary server at regular intervals to get the necessary failover logical slots information and create/update the slots locally. The slots that no longer require synchronization are automatically dropped by the worker. The nap time of the worker is tuned according to the activity on the primary. The slot sync worker waits for some time before the next synchronization, with the duration varying based on whether any slots were updated during the last cycle. A new parameter sync_replication_slots enables or disables this new process. On promotion, the slot sync worker is shut down by the startup process to drop any temporary slots acquired by the slot sync worker and to prevent the worker from trying to fetch the failover slots. A functionality to allow logical walsenders to wait for the physical will be done in a subsequent commit. Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-22 15:25:15 +05:30
Peter Eisentraut	fbc93b8b5f	Remove custom Constraint node read/write implementations This is part of an effort to reduce the number of special cases in the automatically generated node support functions. Allegedly, only certain fields of the Constraint node are valid based on contype. But this has historically not been kept up to date in the read/write functions. The Constraint node is only used for debugging DDL statements, so there are no strong requirements for its output, and there is no enforcement for its correctness. (There was no read support before a6bc3301925.) Commits `e7a552f303` and `abf46ad9c7` are examples of where omissions were fixed. This patch just removes the custom read/write implementations for the Constraint node type. Now we just output all the fields, which is a bit more than before, but at least we don't have to maintain these functions anymore. Also, we lose the string representation of the contype field, but for this marginal use case that seems tolerable. This patch also changes the documentation of the Constraint struct to put less emphasis on grouping fields by constraint type but rather document for each field how it's used. Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com> Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org	2024-02-22 07:07:12 +01:00
Michael Paquier	943f7ae1c8	Add lookup table for replication slot conflict reasons This commit switches the handling of the conflict cause strings for replication slots to use a table rather than being explicitly listed, using a C99-designated initializer syntax for the array elements. This makes the whole more readable while easing future maintenance with less areas to update when adding a new conflict reason. This is similar to `74a7306310`, but the scale of the change is smaller as there are less conflict causes than LWLock builtin tranche names. Author: Bharath Rupireddy Reviewed-by: Jelte Fennema-Nio Discussion: https://postgr.es/m/CALj2ACUxSLA91QGFrJsWNKs58KXb1C03mbuwKmzqqmoAKLwJaw@mail.gmail.com	2024-02-22 08:40:40 +09:00
Heikki Linnakangas	28f3915b73	Remove superfluous 'pgprocno' field from PGPROC It was always just the index of the PGPROC entry from the beginning of the proc array. Introduce a macro to compute it from the pointer instead. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-02-22 01:21:34 +02:00
Peter Eisentraut	74563f6b90	Revert "Improve compression and storage support with inheritance" This reverts commit `0413a55699`. pg_dump cannot currently dump all the structures that are allowed by this patch. This needs more work in pg_dump and more test coverage. Discussion: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7@eisentraut.org	2024-02-20 11:10:59 +01:00
Nathan Bossart	3b42bdb471	Use new overflow-safe integer comparison functions. Commit `6b80394781` introduced integer comparison functions designed to be as efficient as possible while avoiding overflow. This commit makes use of these functions in many of the in-tree qsort() comparators to help ensure transitivity. Many of these comparator functions should also see a small performance boost. Author: Mats Kindahl Reviewed-by: Andres Freund, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com	2024-02-16 14:05:36 -06:00
Nathan Bossart	6b80394781	Introduce overflow-safe integer comparison functions. This commit adds integer comparison functions that are designed to be as efficient as possible while avoiding overflow. A follow-up commit will make use of these functions in many of the in-tree qsort() comparators. The new functions are not better in all cases (e.g., when the comparator function is inlined), so it is important to consider the context before using them. Author: Mats Kindahl Reviewed-by: Tom Lane, Heikki Linnakangas, Andres Freund, Thomas Munro, Andrey Borodin, Fabrízio de Royes Mello Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com	2024-02-16 13:37:02 -06:00
Nathan Bossart	5497daf3aa	Replace calls to pg_qsort() with the qsort() macro. Calls to this function might give the impression that pg_qsort() is somehow different than qsort(), when in fact there is a qsort() macro in port.h that expands all in-tree uses to pg_qsort(). Reviewed-by: Mats Kindahl Discussion: https://postgr.es/m/CA%2B14426g2Wa9QuUpmakwPxXFWG_1FaY0AsApkvcTBy-YfS6uaw%40mail.gmail.com	2024-02-16 11:37:50 -06:00
Peter Eisentraut	0413a55699	Improve compression and storage support with inheritance A child table can specify a compression or storage method different from its parents. This was previously an error. (But this was inconsistently enforced because for example the settings could be changed later using ALTER TABLE.) This now also allows an explicit override if multiple parents have different compression or storage settings, which was previously an error that could not be overridden. The compression and storage properties remains unchanged in a child inheriting from parent(s) after its creation, i.e., when using ALTER TABLE ... INHERIT. (This is not changed.) Before this change, the error detail would mention the first pair of conflicting parent compression or storage methods. But with this change it waits till the child specification is considered by which time we may have encountered many such conflicting pairs. Hence the error detail after this change does not include the conflicting compression/storage methods. Those can be obtained from parent definitions if necessary. The code to maintain list of all conflicting methods or even the first conflicting pair does not seem worth the convenience it offers. This change is inline with what we do with conflicting default values. Before this commit, the specified storage method could be stored in ColumnDef::storage (CREATE TABLE ... LIKE) or ColumnDef::storage_name (CREATE TABLE ...). This caused the MergeChildAttribute() and MergeInheritedAttribute() to ignore a storage method specified in the child definition since it looked only at ColumnDef::storage. This commit removes ColumnDef::storage and instead uses ColumnDef::storage_name to save any storage method specification. This is similar to how compression method specification is handled. Author: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7@eisentraut.org	2024-02-16 13:27:46 +01:00
Alexander Korotkov	51efe38cb9	Introduce transaction_timeout This commit adds timeout that is expected to be used as a prevention of long-running queries. Any session within the transaction will be terminated after spanning longer than this timeout. However, this timeout is not applied to prepared transactions. Only transactions with user connections are affected. Discussion: https://postgr.es/m/CAAhFRxiQsRs2Eq5kCo9nXE3HTugsAAJdSQSmxncivebAxdmBjQ%40mail.gmail.com Author: Andrey Borodin <amborodin@acm.org> Author: Japin Li <japinli@hotmail.com> Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Nikolay Samokhvalov <samokhvalov@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: bt23nguyent <bt23nguyent@oss.nttdata.com> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com>	2024-02-15 23:56:12 +02:00
Nathan Bossart	8fd0498de2	Remove obsolete check in SIGTERM handler for the startup process. Thanks to commit `3b00fdba9f`, this check in the SIGTERM handler for the startup process is now obsolete and can be removed. Instead of leaving around the dead function write_stderr_signal_safe(), I've opted to just remove it for now. This partially reverts commit `97550c0711`. Reviewed-by: Andres Freund, Noah Misch Discussion: https://postgr.es/m/20231121212008.GA3742740%40nathanxps13	2024-02-14 17:09:31 -06:00
Nathan Bossart	8d8afd48d3	Allow pg_monitor to execute pg_current_logfile(). We allow roles with privileges of pg_monitor to execute functions like pg_ls_logdir(), so it seems natural that such roles would also be able to execute this function. Bumps catversion. Co-authored-by: Pavlo Golub Reviewed-by: Daniel Gustafsson Discussion: https://postgr.es/m/CAK7ymcLmEYWyQkiCZ64WC-HCzXAB0omM%3DYpj9B3rXe8vUAFMqw%40mail.gmail.com	2024-02-14 11:48:29 -06:00
Amit Kapila	ddd5f4f54a	Add a slot synchronization function. This commit introduces a new SQL function pg_sync_replication_slots() which is used to synchronize the logical replication slots from the primary server to the physical standby so that logical replication can be resumed after a failover or planned switchover. A new 'synced' flag is introduced in pg_replication_slots view, indicating whether the slot has been synchronized from the primary server. On a standby, synced slots cannot be dropped or consumed, and any attempt to perform logical decoding on them will result in an error. The logical replication slots on the primary can be synchronized to the hot standby by using the 'failover' parameter of pg-create-logical-replication-slot(), or by using the 'failover' option of CREATE SUBSCRIPTION during slot creation, and then calling pg_sync_replication_slots() on standby. For the synchronization to work, it is mandatory to have a physical replication slot between the primary and the standby aka 'primary_slot_name' should be configured on the standby, and 'hot_standby_feedback' must be enabled on the standby. It is also necessary to specify a valid 'dbname' in the 'primary_conninfo'. If a logical slot is invalidated on the primary, then that slot on the standby is also invalidated. If a logical slot on the primary is valid but is invalidated on the standby, then that slot is dropped but will be recreated on the standby in the next pg_sync_replication_slots() call provided the slot still exists on the primary server. It is okay to recreate such slots as long as these are not consumable on standby (which is the case currently). This situation may occur due to the following reasons: - The 'max_slot_wal_keep_size' on the standby is insufficient to retain WAL records from the restart_lsn of the slot. - 'primary_slot_name' is temporarily reset to null and the physical slot is removed. The slot synchronization status on the standby can be monitored using the 'synced' column of pg_replication_slots view. A functionality to automatically synchronize slots by a background worker and allow logical walsenders to wait for the physical will be done in subsequent commits. Author: Hou Zhijie, Shveta Malik, Ajin Cherian based on an earlier version by Peter Eisentraut Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-14 09:45:36 +05:30
Michael Paquier	06bd311bce	Revert "Refactor CopyReadAttributes{CSV,Text}() to use a callback in COPY FROM" This reverts commit `95fb5b4902`, for reasons similar to what led to `1aa8324b81`. In this case, the callback was called once per row, which is less worse than the previous callback introduced for COPY TO called once per argument for each row, still the patch set discussed to plug in custom routines to the COPY paths would be able to know which subroutine to use depending on its CopyFromState, so this led to a suboptimal approach at the end. For now, this part is reverted to consider better which approach to use. Discussion: https://postgr.es/m/20240206014125.qofww7ew3dx3v3uk@awork3.anarazel.de	2024-02-14 10:07:22 +09:00
Jeff Davis	91f2cae7a4	Read WAL directly from WAL buffers. If available, read directly from WAL buffers, avoiding the need to go through the filesystem. Only for physical replication for now, but can be expanded to other callers. In preparation for replicating unflushed WAL data. Author: Bharath Rupireddy Discussion: https://postgr.es/m/CALj2ACXKKK%3DwbiG5_t6dGao5GoecMwRkhr7GjVBM_jg54%2BNa%3DQ%40mail.gmail.com Reviewed-by: Andres Freund, Alvaro Herrera, Nathan Bossart, Dilip Kumar, Nitin Jadhav, Melih Mutlu, Kyotaro Horiguchi	2024-02-12 11:11:22 -08:00
Thomas Munro	65f438471b	Fix gai_strerror() thread-safety on Windows. Commit `5579388d` removed code that supplied a fallback implementation of getaddrinfo(), which was dead code on modern systems. One tiny piece of the removed code was still doing something useful on Windows, though: that OS's own gai_strerror()/gai_strerrorA() function returns a pointer to a static buffer that it overwrites each time, so it's not thread-safe. In rare circumstances, a multi-threaded client program could get an incorrect or corrupted error message. Restore the replacement gai_strerror() function, though now that it's only for Windows we can put it into a win32-specific file and cut it down to the errors that Windows documents. The error messages here are taken from FreeBSD, because Windows' own messages seemed too verbose. Back-patch to 16. Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Discussion: https://postgr.es/m/CA%2BhUKGKz%2BF9d2PTiXwfYV7qJw%2BWg2jzACgSDgPizUw7UG%3Di58A%40mail.gmail.com	2024-02-12 11:14:21 +13:00
Daniel Gustafsson	5c7038d70b	Refactor pipe_read_line to return the full line Commit `5b2f4afffe` refactored find_other_exec() and in the process created pipe_read_line() into a static routine for reading a single line of output, aimed at reading version numbers. Commit `a7e8ece41` later exposed it externally in order to read a postgresql.conf GUC using "postgres -C ..". Further, `f06b1c598` also made use of it for reading a version string much like find_other_exec(). The internal variable remained "pgver", even when used for other purposes. Since the function requires passing a buffer and its size, and at most size - 1 bytes will be read via fgets(), there is a truncation risk when using this for reading GUCs (like how pg_rewind does, though the risk in this case is marginal). To keep this as generic functionality for reading a line from a pipe, this refactors pipe_read_line() into returning an allocated buffer containing all of the line to remove the risk of silent truncation. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/DEDF73CE-D528-49A3-9089-B3592FD671A9@yesql.se	2024-02-09 15:03:16 +01:00
John Naylor	2579985086	Fix warnings in cpluspluscheck Various int variables were compared to macros that are of type size_t, which caused -Wsign-compare warnings in cpluspluscheck. Change those to size_t, which also better describes their purpose. Per report from Peter Eisentraut Discussion: https://postgr.es/m/486847dc-6de5-464a-938e-bac98ec2438b%40eisentraut.org	2024-02-08 10:07:26 +07:00
Alvaro Herrera	d172b717c6	Use atomic access for SlruShared->latest_page_number The new concurrency model proposed for slru.c to improve performance does not include any single lock that would coordinate processes doing concurrent reads/writes on SlruShared->latest_page_number. We can instead use atomic reads and writes for that variable. Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com	2024-02-06 10:54:10 +01:00
John Naylor	b83033c3cf	Further cosmetic review of hashfn_unstable.h In follow-up to `e97b672c8`, * Flesh out comments explaining the incremental interface * Clarify detection of zero bytes when hashing aligned C strings The latter was suggested and reviewed by Jeff Davis Discussion: https://postgr.es/m/48e8f8bbe0be9c789f98776c7438244ab7a7cc63.camel%40j-davis.com	2024-02-06 14:49:06 +07:00
John Naylor	9ed3ee5001	Simplify initialization of incremental hash state The standalone functions fasthash{32,64} use length for two purposes: how many bytes to hash, and how to perturb the internal seed. Developers using the incremental interface may not know the length ahead of time (e.g. for C strings). In this case, it's advised to pass length to the finalizer, but initialization still needed some length up front, in the form of a placeholder macro. Separate the concerns by having the standalone functions perturb the internal seed themselves from their own length parameter, allowing to remove "len" from fasthash_init(), as well as the placeholder macro. Discussion: https://postgr.es/m/CANWCAZbTUk2LOyhsFo33gjLyLAHZ7ucXCi5K9u%3D%2BPtnTShDKtw%40mail.gmail.com	2024-02-06 14:39:36 +07:00
Peter Eisentraut	1ae5ace755	Fix meson installation of new generated files Fix for 9b1a6f50b9: We want to install catalog/syscache_ids.h but not catalog/syscache_info.h. The meson code has this backwards. The makefiles are ok. Reported-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/CAJ7c6TMDGmAiozDjJQ3%3DP3cd-1BidC_GpitcAuU0aqq-r1eSoQ%40mail.gmail.com	2024-02-05 15:45:29 +01:00
Amit Kapila	dafbfed9ef	Enhance libpqrcv APIs to support slot synchronization. This patch provides support for regular (non-replication) connections in libpqrcv_connect(). This can be used to execute SQL statements on the primary server without starting a walsender. A new API libpqrcv_get_dbname_from_conninfo() is also added to extract the database name from the given connection-info. Note that this patch doesn't change any existing functionality but later patches implementing the slot synchronization will use this functionality to connect to the primary server to fetch required slot information. Author: Shveta Malik, Hou Zhijie, Ajin Cherian Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-02-05 10:54:06 +05:30
Michael Paquier	95fb5b4902	Refactor CopyReadAttributes{CSV,Text}() to use a callback in COPY FROM CopyReadAttributes{CSV,Text}() are used to parse lines for text and CSV format. This reduces the number of "if" branches that need to be checked when parsing fields in CSV and text mode when dealing with a COPY FROM, something that can become more noticeable with more attributes and more lines to process. Extracted from a larger patch by the same author. Author: Sutou Kouhei Discussion: https://postgr.es/m/20231204.153548.2126325458835528809.kou@clear-code.com	2024-02-05 09:46:02 +09:00
Heikki Linnakangas	21d9c3ee4e	Give SMgrRelation pointers a well-defined lifetime. After calling smgropen(), it was not clear how long you could continue to use the result, because various code paths including cache invalidation could call smgrclose(), which freed the memory. Guarantee that the object won't be destroyed until the end of the current transaction, or in recovery, the commit/abort record that destroys the underlying storage. smgrclose() is now just an alias for smgrrelease(). It closes files and forgets all state except the rlocator, but keeps the SMgrRelation object valid. A new smgrdestroy() function is used by rare places that know there should be no other references to the SMgrRelation. The short version: * smgrclose() is now just an alias for smgrrelease(). It releases resources, but doesn't destroy until EOX * smgrdestroy() now frees memory, and should rarely be used. Existing code should be unaffected, but it is now possible for code that has an SMgrRelation object to use it repeatedly during a transaction as long as the storage hasn't been physically dropped. Such code would normally hold a lock on the relation. This also replaces the "ownership" mechanism of SMgrRelations with a pin counter. An SMgrRelation can now be "pinned", which prevents it from being destroyed at end of transaction. There can be multiple pins on the same SMgrRelation. In practice, the pin mechanism is only used by the relcache, so there cannot be more than one pin on the same SMgrRelation. Except with swap_relation_files XXX Author: Thomas Munro, Heikki Linnakangas Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/CA%2BhUKGJ8NTvqLHz6dqbQnt2c8XCki4r2QvXjBQcXpVwxTY_pvA@mail.gmail.com	2024-01-31 12:31:02 +02:00
Amit Kapila	776621a5e4	Add a failover option to subscriptions. This commit introduces a new subscription option named 'failover', which provides users with the ability to set the failover property of the replication slot on the publisher when creating or altering a subscription. This uses the replication commands introduced by commit `7329240437` to enable the failover option for a logical replication slot. If the failover option is set to true, the associated replication slots (i.e. the main slot and the table sync slots) in the upstream database are enabled to be synchronized to the standbys. Note that the capability to sync the replication slots will be added in subsequent commits. Thanks to Masahiko Sawada for the design inputs. Author: Shveta Malik, Hou Zhijie, Ajin Cherian Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-01-30 16:49:28 +05:30
Nathan Bossart	97287bdfae	Move is_valid_ascii() to ascii.h. This function requires simd.h, which is a rather large dependency for a widely-used header file like pg_wchar.h. Furthermore, there is a report of a third-party tool that is struggling to use pg_wchar.h due to its dependence on simd.h (presumably because simd.h uses several intrinsics). Moving the function to the much less popular ascii.h resolves these issues for now. This commit is back-patched for the benefit of the aforementioned third-party tool. The simd.h dependency was only added in v16, but we've opted to back-patch to v15 so that is_valid_ascii() lives in the same file for all versions where it exists. This could break existing third-party code that uses the function, but we couldn't find any examples of such code. It should be possible to fix any code that this commit breaks by including ascii.h in the file that uses is_valid_ascii(). Author: Jubilee Young Reviewed-by: Tom Lane, John Naylor, Andres Freund, Eric Ridge Discussion: https://postgr.es/m/CAPNHn3oKJJxMsYq%2BqLYzVJOFrUcOr4OF1EC-KtFT-qh8nOOOtQ%40mail.gmail.com Backpatch-through: 15	2024-01-29 12:08:57 -06:00
Alvaro Herrera	5de890e361	Add EXPLAIN (MEMORY) to report planner memory consumption This adds a new "Memory:" line under the "Planning:" group (which currently only has "Buffers:") when the MEMORY option is specified. In order to make the reporting reasonably accurate, we create a separate memory context for planner activities, to be used only when this option is given. The total amount of memory allocated by that context is reported as "allocated"; we subtract memory in the context's freelists from that and report that result as "used". We use MemoryContextStatsInternal() to obtain the quantities. The code structure to show buffer usage during planning was not in amazing shape, so I (Álvaro) modified the patch a bit to clean that up in passing. Author: Ashutosh Bapat Reviewed-by: David Rowley, Andrey Lepikhov, Jian He, Andy Fan Discussion: https://www.postgresql.org/message-id/CAExHW5sZA=5LJ_ZPpRO-w09ck8z9p7eaYAqq3Ks9GDfhrxeWBw@mail.gmail.com	2024-01-29 17:53:03 +01:00
Amit Kapila	7329240437	Allow setting failover property in the replication command. This commit implements a new replication command called ALTER_REPLICATION_SLOT and a corresponding walreceiver API function named walrcv_alter_slot. Additionally, the CREATE_REPLICATION_SLOT command has been extended to support the failover option. These new additions allow the modification of the failover property of a replication slot on the publisher. A subsequent commit will make use of these commands in subscription commands and will add the tests as well to cover the functionality added/changed by this commit. Author: Hou Zhijie, Shveta Malik Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-01-29 09:37:23 +05:30
Masahiko Sawada	08e6344fd6	Remove ReorderBufferTupleBuf structure. Since commit `a4ccc1cef`, the 'node' and 'alloc_tuple_size' fields of the ReorderBufferTupleBuf structure are no longer used. This leaves only the 'tuple' field in the structure. Since keeping a single-field structure makes little sense, the ReorderBufferTupleBuf is removed entirely. The code is refactored accordingly. No back-patching since these are ABI changes in an exposed structure and functions, and there would be some risk of breaking extensions. Author: Aleksander Alekseev Reviewed-by: Amit Kapila, Masahiko Sawada, Reid Thompson Discussion: https://postgr.es/m/CAD21AoCvnuxiXXfRecp7g9+CeC35POQfhuQeJFr7_9u_Q5jc_Q@mail.gmail.com	2024-01-29 10:37:16 +09:00
Tom Lane	8ba6fdf905	Support TZ and OF format codes in to_timestamp(). Formerly, these were only supported in to_char(), but there seems little reason for that restriction. We should at least have enough support to permit round-tripping the output of to_char(). In that spirit, TZ accepts either zone abbreviations or numeric (HH or HH:MM) offsets, which are the cases that to_char() can output. In an ideal world we'd make it take full zone names too, but that seems like it'd introduce an unreasonable amount of ambiguity, since the rules for POSIX-spec zone names are so lax. OF is a subset of this, accepting only HH or HH:MM. One small benefit of this improvement is that we can simplify jsonpath's executeDateTimeMethod function, which no longer needs to consider the HH and HH:MM cases separately. Moreover, letting it accept zone abbreviations means it will accept "Z" to mean UTC, which is emitted by JSON.stringify() for example. Patch by me, reviewed by Aleksander Alekseev and Daniel Gustafsson Discussion: https://postgr.es/m/1681086.1686673242@sss.pgh.pa.us	2024-01-25 17:47:08 -05:00
Andrew Dunstan	66ea94e8e6	Implement various jsonpath methods This commit implements ithe jsonpath .bigint(), .boolean(), .date(), .decimal([precision [, scale]]), .integer(), .number(), .string(), .time(), .time_tz(), .timestamp(), and .timestamp_tz() methods. .bigint() converts the given JSON string or a numeric value to the bigint type representation. .boolean() converts the given JSON string, numeric, or boolean value to the boolean type representation. In the numeric case, only integers are allowed. We use the parse_bool() backend function to convert a string to a bool. .decimal([precision [, scale]]) converts the given JSON string or a numeric value to the numeric type representation. If precision and scale are provided for .decimal(), then it is converted to the equivalent numeric typmod and applied to the numeric number. .integer() and .number() convert the given JSON string or a numeric value to the int4 and numeric type representation. .string() uses the datatype's output function to convert numeric and various date/time types to the string representation. The JSON string representing a valid date/time is converted to the specific date or time type representation using jsonpath .date(), .time(), .time_tz(), .timestamp(), .timestamp_tz() methods. The changes use the infrastructure of the .datetime() method and perform the datatype conversion as appropriate. Unlike the .datetime() method, none of these methods accept a format template and use ISO DateTime format instead. However, except for .date(), the date/time related methods take an optional precision to adjust the fractional seconds. Jeevan Chalke, reviewed by Peter Eisentraut and Andrew Dunstan.	2024-01-25 10:15:43 -05:00
Peter Eisentraut	924d046dcf	Add a const decoration Useful for a subsequent patch. Discussion: https://www.postgresql.org/message-id/flat/52a125e4-ff9a-95f5-9f61-b87cf447e4da@eisentraut.org	2024-01-25 13:34:49 +01:00
Alvaro Herrera	55627ba2d3	Remove dummy_spinlock It's been unused since `1b468a131b` (2015).	2024-01-25 11:43:47 +01:00
Amit Kapila	c393308b69	Allow to enable failover property for replication slots via SQL API. This commit adds the failover property to the replication slot. The failover property indicates whether the slot will be synced to the standby servers, enabling the resumption of corresponding logical replication after failover. But note that this commit does not yet include the capability to sync the replication slot; the subsequent commits will add that capability. A new optional parameter 'failover' is added to the pg_create_logical_replication_slot() function. We will also enable to set 'failover' option for slots via the subscription commands in the subsequent commits. The value of the 'failover' flag is displayed as part of pg_replication_slots view. Author: Hou Zhijie, Shveta Malik, Ajin Cherian Reviewed-by: Peter Smith, Bertrand Drouvot, Dilip Kumar, Masahiko Sawada, Nisha Moond, Kuroda, Hayato, Amit Kapila Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com	2024-01-25 12:15:46 +05:30
Thomas Munro	820b5af73d	jit: Require at least LLVM 10. Remove support for older LLVM versions. The default on common software distributions will be at least LLVM 10 when PostgreSQL 17 ships. Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com	2024-01-25 15:42:34 +13:00
Masahiko Sawada	729439607a	Add progress reporting of skipped tuples during COPY FROM. `9e2d870119` enabled the COPY command to skip malformed data, however there was no visibility into how many tuples were actually skipped during the COPY FROM. This commit adds a new "tuples_skipped" column to pg_stat_progress_copy view to report the number of tuples that were skipped because they contain malformed data. Bump catalog version. Author: Atsushi Torikoshi Reviewed-by: Masahiko Sawada Discussion: https://postgr.es/m/d12fd8c99adcae2744212cb23feff6ed%40oss.nttdata.com	2024-01-25 10:57:41 +09:00
Peter Eisentraut	46a0cd4cef	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-01-24 16:34:37 +01:00
Amit Langote	1edb3b491b	Adjust populate_record_field() to handle errors softly This adds a Node *escontext parameter to it and a bunch of functions downstream to it, replacing any ereport()s in that path by either errsave() or ereturn() as appropriate. This also adds code to those functions where necessary to return early upon encountering a soft error. The changes here are mainly intended to suppress errors in the functions of jsonfuncs.c. Functions in any external modules, such as arrayfuncs.c, that those functions may in turn call are not changed here based on the assumption that the various checks in jsonfuncs.c functions should ensure that only values that are structurally valid get passed to the functions in those external modules. An exception is made for domain_check() to allow handling domain constraint violation errors softly. For testing, this adds a function jsonb_populate_record_valid(), which returns true if jsonb_populate_record() would finish without causing an error for the provided JSON object, false otherwise. Note that jsonb_populate_record() internally calls populate_record(), which in turn uses populate_record_field(). Extracted from a much larger patch to add SQL/JSON query functions. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He, Peter Eisentraut Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-01-24 15:04:33 +09:00
Amit Langote	aaaf9449ec	Add soft error handling to some expression nodes This adjusts the code for CoerceViaIO and CoerceToDomain expression nodes to handle errors softly. For CoerceViaIo, this adds a new ExprEvalStep opcode EEOP_IOCOERCE_SAFE, which is implemented in the new accompanying function ExecEvalCoerceViaIOSafe(). The only difference from EEOP_IOCOERCE's inline implementation is that the input function receives an ErrorSaveContext via the function's FunctionCallInfo.context, which it can use to handle errors softly. For CoerceToDomain, this simply entails replacing the ereport() in ExecEvalConstraintNotNull() and ExecEvalConstraintCheck() by errsave() passing it the ErrorSaveContext passed in the expression's ExprEvalStep. In both cases, the ErrorSaveContext to be used is passed by setting ExprState.escontext to point to it before calling ExecInitExprRec() on the expression tree whose errors are to be handled softly. Note that there's no functional change as of this commit as no call site of ExecInitExprRec() has been changed. This is intended for implementing new SQL/JSON expression nodes in future commits. Extracted from a much larger patch to add SQL/JSON query functions. Author: Nikita Glukhov <n.gluhov@postgrespro.ru> Author: Teodor Sigaev <teodor@sigaev.ru> Author: Oleg Bartunov <obartunov@gmail.com> Author: Alexander Korotkov <aekorotkov@gmail.com> Author: Andrew Dunstan <andrew@dunslane.net> Author: Amit Langote <amitlangote09@gmail.com> Reviewers have included (in no particular order) Andres Freund, Alexander Korotkov, Pavel Stehule, Andrew Alsup, Erik Rijkers, Zihong Yu, Himanshu Upadhyaya, Daniel Gustafsson, Justin Pryzby, Álvaro Herrera, Jian He, Peter Eisentraut Discussion: https://postgr.es/m/cd0bb935-0158-78a7-08b5-904886deac4b@postgrespro.ru Discussion: https://postgr.es/m/20220616233130.rparivafipt6doj3@alap3.anarazel.de Discussion: https://postgr.es/m/abd9b83b-aa66-f230-3d6d-734817f0995d%40postgresql.org Discussion: https://postgr.es/m/CA+HiwqHROpf9e644D8BRqYvaAPmgBZVup-xKMDPk-nd4EpgzHw@mail.gmail.com Discussion: https://postgr.es/m/CA+HiwqE4XTdfb1nW=Ojoy_tQSRhYt-q_kb6i5d4xcKyrLC1Nbg@mail.gmail.com	2024-01-24 15:04:33 +09:00
Peter Eisentraut	6eb6086faa	Fix makefiles for newly added files The new files added in `9b1a6f50b9` need to be mentioned in a few more places in the makefiles. Discussion: https://www.postgresql.org/message-id/flat/E1rSAY2-002hIk-4y%40gemulon.postgresql.org	2024-01-23 16:33:26 +01:00
Peter Eisentraut	9b1a6f50b9	Generate syscache info from catalog files Add a new genbki macros MAKE_SYSCACHE that specifies the syscache ID macro, the underlying index, and the number of buckets. From that, we can generate the existing tables in syscache.h and syscache.c via genbki.pl. Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec-73247ed41cde@eisentraut.org	2024-01-23 07:31:06 +01:00
David Rowley	b262ad440e	Add better handling of redundant IS [NOT] NULL quals Until now PostgreSQL has not been very smart about optimizing away IS NOT NULL base quals on columns defined as NOT NULL. The evaluation of these needless quals adds overhead. Ordinarily, anyone who came complaining about that would likely just have been told to not include the qual in their query if it's not required. However, a recent bug report indicates this might not always be possible. Bug 17540 highlighted that when we optimize Min/Max aggregates the IS NOT NULL qual that the planner adds to make the rewritten plan ignore NULLs can cause issues with poor index choice. That particular case demonstrated that other quals, especially ones where no statistics are available to allow the planner a chance at estimating an approximate selectivity for can result in poor index choice due to cheap startup paths being prefered with LIMIT 1. Here we take generic approach to fixing this by having the planner check for NOT NULL columns and just have the planner remove these quals (when they're not needed) for all queries, not just when optimizing Min/Max aggregates. Additionally, here we also detect IS NULL quals on a NOT NULL column and transform that into a gating qual so that we don't have to perform the scan at all. This also works for join relations when the Var is not nullable by any outer join. This also helps with the self-join removal work as it must replace strict join quals with IS NOT NULL quals to ensure equivalence with the original query. Author: David Rowley, Richard Guo, Andy Fan Reviewed-by: Richard Guo, David Rowley Discussion: https://postgr.es/m/CAApHDvqg6XZDhYRPz0zgOcevSMo0d3vxA9DvHrZtKfqO30WTnw@mail.gmail.com Discussion: https://postgr.es/m/17540-7aa1855ad5ec18b4%40postgresql.org	2024-01-23 18:09:18 +13:00
Michael Paquier	d86d20f0ba	Add backend support for injection points Injection points are a new facility that makes possible for developers to run custom code in pre-defined code paths. Its goal is to provide ways to design and run advanced tests, for cases like: - Race conditions, where processes need to do actions in a controlled ordered manner. - Forcing a state, like an ERROR, FATAL or even PANIC for OOM, to force recovery, etc. - Arbitrary sleeps. This implements some basics, and there are plans to extend it more in the future depending on what's required. Hence, this commit adds a set of routines in the backend that allows developers to attach, detach and run injection points: - A code path calling an injection point can be declared with the macro INJECTION_POINT(name). - InjectionPointAttach() and InjectionPointDetach() to respectively attach and detach a callback to/from an injection point. An injection point name is registered in a shmem hash table with a library name and a function name, which will be used to load the callback attached to an injection point when its code path is run. Injection point names are just strings, so as an injection point can be declared and run by out-of-core extensions and modules, with callbacks defined in external libraries. This facility is hidden behind a dedicated switch for ./configure and meson, disabled by default. Note that backends use a local cache to store callbacks already loaded, cleaning up their cache if a callback has found to be removed on a best-effort basis. This could be refined further but any tests but what we have here was fine with the tests I've written while implementing these backend APIs. Author: Michael Paquier, with doc suggestions from Ashutosh Bapat. Reviewed-by: Ashutosh Bapat, Nathan Bossart, Álvaro Herrera, Dilip Kumar, Amul Sul, Nazir Bilal Yavuz Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz	2024-01-22 10:15:50 +09:00
Alexander Korotkov	0452b461bc	Explore alternative orderings of group-by pathkeys during optimization. When evaluating a query with a multi-column GROUP BY clause, we can minimize sort operations or avoid them if we synchronize the order of GROUP BY clauses with the ORDER BY sort clause or sort order, which comes from the underlying query tree. Grouping does not imply any ordering, so we can compare the keys in arbitrary order, and a Hash Agg leverages this. But for Group Agg, we simply compared keys in the order specified in the query. This commit explores alternative ordering of the keys, trying to find a cheaper one. The ordering of group keys may interact with other parts of the query, some of which may not be known while planning the grouping. For example, there may be an explicit ORDER BY clause or some other ordering-dependent operation higher up in the query, and using the same ordering may allow using either incremental sort or even eliminating the sort entirely. The patch always keeps the ordering specified in the query, assuming the user might have additional insights. This introduces a new GUC enable_group_by_reordering so that the optimization may be disabled if needed. Discussion: https://postgr.es/m/7c79e6a5-8597-74e8-0671-1c39d124c9d6%40sigaev.ru Author: Andrei Lepikhov, Teodor Sigaev Reviewed-by: Tomas Vondra, Claudio Freire, Gavin Flower, Dmitry Dolgov Reviewed-by: Robert Haas, Pavel Borisov, David Rowley, Zhihong Yu Reviewed-by: Tom Lane, Alexander Korotkov, Richard Guo, Alena Rybakina	2024-01-21 22:21:36 +02:00
Tom Lane	075df6b208	Add planner support functions for range operators <@ and @>. These support functions will transform expressions with constant range values into direct comparisons on the range bound values, which are frequently better-optimizable. The transformation is skipped however if it would require double evaluation of a volatile or expensive element expression. Along the way, add the range opfamily OID to range typcache entries, since load_rangetype_info has to compute that anyway and it seems silly to duplicate the work later. Kim Johan Andersson and Jian He, reviewed by Laurenz Albe Discussion: https://postgr.es/m/94f64d1f-b8c0-b0c5-98bc-0793a34e0851@kimmet.dk	2024-01-20 13:57:54 -05:00
Nathan Bossart	8b2bcf3f28	Introduce the dynamic shared memory registry. Presently, the most straightforward way for a shared library to use shared memory is to request it at server startup via a shmem_request_hook, which requires specifying the library in shared_preload_libraries. Alternatively, the library can create a dynamic shared memory (DSM) segment, but absent a shared location to store the segment's handle, other backends cannot use it. This commit introduces a registry for DSM segments so that these other backends can look up existing segments with a library-specified string. This allows libraries to easily use shared memory without needing to request it at server startup. The registry is accessed via the new GetNamedDSMSegment() function. This function handles allocating the segment and initializing it via a provided callback. If another backend already created and initialized the segment, it simply attaches the segment. GetNamedDSMSegment() locks the registry appropriately to ensure that only one backend initializes the segment and that all other backends just attach it. The registry itself is comprised of a dshash table that stores the DSM segment handles keyed by a library-specified string. Reviewed-by: Michael Paquier, Andrei Lepikhov, Nikita Malakhov, Robert Haas, Bharath Rupireddy, Zhang Mingli, Amul Sul Discussion: https://postgr.es/m/20231205034647.GA2705267%40nathanxps13	2024-01-19 14:24:36 -06:00
Peter Eisentraut	6db4598fcb	Add stratnum GiST support function This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-01-19 15:42:13 +01:00
Alexander Korotkov	b725b7eec4	Rename COPY option from SAVE_ERROR_TO to ON_ERROR The option names now are "stop" (default) and "ignore". The future options could be "file 'filename.log'" and "table 'tablename'". Discussion: https://postgr.es/m/20240117.164859.2242646601795501168.horikyota.ntt%40gmail.com Author: Jian He Reviewed-by: Atsushi Torikoshi	2024-01-19 15:15:51 +02:00
John Naylor	dd0a0cfc81	Fixed misspelled byteswap function for big endian machines Per members lora and mamba	2024-01-19 13:26:18 +07:00
John Naylor	0aba255440	Add optimized C string hashing Given an already-initialized hash state and a NUL-terminated string, accumulate the hash of the string into the hash state and return the length for the caller to (optionally) save for the finalizer. This avoids a strlen call. If the string pointer is aligned, we can use a word-at-a-time algorithm for NUL lookahead. The aligned case is only used on 64-bit platforms, since it's not worth the extra complexity for 32-bit. Handling the tail of the string after finishing the word-wise loop was inspired by NetBSD's strlen(), but no code was taken since that is written in assembly language. As demonstration, use this in the search path cache. This brings the general case performance closer to the special case optimization done in commit `a86c61c9ee`. There are other places that could benefit, but that is left for future work. Jeff Davis and John Naylor Reviewed by Heikki Linnakangas, Jian He, Junwang Zhao Discussion: https://postgr.es/m/3820f030fd008ff14134b3e9ce5cc6dd623ed479.camel%40j-davis.com Discussion: https://postgr.es/m/b40292c99e623defe5eadedab1d438cf51a4107c.camel%40j-davis.com	2024-01-19 12:56:15 +07:00
John Naylor	e97b672c88	Add inline incremental hash functions for in-memory use It can be useful for a hash function to expose separate initialization, accumulation, and finalization steps. In particular, this is useful for building inline hash functions for simplehash. Instead of trying to whack around hash_bytes while maintaining its current behavior on all platforms, we base this work on fasthash (MIT licensed) which is simple, faster than hash_bytes for inputs over 12 bytes long, and also passes the hash function testing suite SMHasher. The fasthash functions have been reimplemented using our added-on incremental interface to validate that this method will still give the same answer, provided we have the input length ahead of time. This functionality lives in a new header hashfn_unstable.h. The name implies we have the freedom to change things across versions that would be unacceptable for our other hash functions that are used for e.g. hash indexes and hash partitioning. As such, these should only be used for in-memory data structures like hash tables. There is also no guarantee of being independent of endianness or pointer size. As demonstration, use fasthash for pgstat_hash_hash_key. Previously this called the 32-bit murmur finalizer on the three elements, then joined them with hash_combine(). The new function is simpler, faster and takes up less binary space. While the collision and bias behavior were almost certainly fine with the previous coding, now we have objective confidence of that. There are other places that could benefit from this, but that is left for future work. Reviewed by Jeff Davis, Heikki Linnakangas, Jian He, Junwang Zhao Credit to Andres Freund for the idea Discussion: https://postgr.es/m/20231122223432.lywt4yz2bn7tlp27%40awork3.anarazel.de	2024-01-19 12:44:09 +07:00
David Rowley	4b31063643	Fix broken Bitmapset optimization in DiscreteKnapsack() Some code in DiscreteKnapsack() attempted to zero all words in a Bitmapset by performing bms_del_members() to delete all the members from itself before replacing those members with members from another set. When that code was written, this was a valid way to manipulate the set in such a way to save from palloc having to be called to allocate a new Bitmapset. However, `00b41463c` modified Bitmapsets so that an empty set is always represented as NULL and this breaks the optimization as the Bitmapset code will always pfree the memory when the set becomes empty. Since DiscreteKnapsack() has been coded to avoid as many unneeded allocations as possible, it seems risky to not fix this. Here we add bms_replace_members() to effectively perform an in-place copy of another set, reusing the memory of the existing set, when possible. This got broken in v16, but no backpatch for now as there've been no complaints. Reviewed-by: Richard Guo Discussion: https://postgr.es/m/CAApHDvoTCBkBU2PJghNOFUiO0q=QP4WAWHi5sJP6_4=b2WodrA@mail.gmail.com	2024-01-19 10:44:36 +13:00
Robert Haas	c120550edb	Optimize vacuuming of relations with no indexes. If there are no indexes on a relation, items can be marked LP_UNUSED instead of LP_DEAD when pruning. This significantly reduces WAL volume, since we no longer need to emit one WAL record for pruning and a second to change the LP_DEAD line pointers thus created to LP_UNUSED. Melanie Plageman, reviewed by Andres Freund, Peter Geoghegan, and me Discussion: https://postgr.es/m/CAAKRu_bgvb_k0gKOXWzNKWHt560R0smrGe3E8zewKPs8fiMKkw%40mail.gmail.com	2024-01-18 10:03:42 -05:00
Michael Paquier	8013850c85	Add try_index_open(), conditional variant of index_open() try_index_open() is able to open an index if its relkind fits, except that it would return NULL instead of generated an error if the relation does not exist. This new routine will be used by an upcoming patch to make REINDEX on partitioned relations more robust when an index in a partition tree is dropped. Extracted from a larger patch by the same author. Author: Fei Changhong Discussion: https://postgr.es/m/tencent_6A52106095ACDE55333E3AD33F304C0C3909@qq.com Backpatch-through: 14	2024-01-18 15:04:24 +09:00
Alexander Korotkov	9e2d870119	Add new COPY option SAVE_ERROR_TO Currently, when source data contains unexpected data regarding data type or range, the entire COPY fails. However, in some cases, such data can be ignored and just copying normal data is preferable. This commit adds a new option SAVE_ERROR_TO, which specifies where to save the error information. When this option is specified, COPY skips soft errors and continues copying. Currently, SAVE_ERROR_TO only supports "none". This indicates error information is not saved and COPY just skips the unexpected data and continues running. Later works are expected to add more choices, such as 'log' and 'table'. Author: Damir Belyalov, Atsushi Torikoshi, Alex Shulgin, Jian He Discussion: https://postgr.es/m/87k31ftoe0.fsf_-_%40commandprompt.com Reviewed-by: Pavel Stehule, Andres Freund, Tom Lane, Daniel Gustafsson, Reviewed-by: Alena Rybakina, Andy Fan, Andrei Lepikhov, Masahiko Sawada Reviewed-by: Vignesh C, Atsushi Torikoshi	2024-01-16 23:08:53 +02:00
Heikki Linnakangas	c6b86eaa55	Add missing PGDLLIMPORT markings Since commit `8ec569479f`, we have a policy of marking all backend variables with PGDLLIMPORT. Reported-by: Anton A. Melnikov Discussion: https://www.postgresql.org/message-id/0b78546c-ffef-4cd9-9ba1-d1e6aab88cea@postgrespro.ru	2024-01-16 13:53:28 +02:00
Peter Eisentraut	4f622503d6	Make attstattarget nullable This changes the pg_attribute field attstattarget into a nullable field in the variable-length part of the row. If no value is set by the user for attstattarget, it is now null instead of previously -1. This saves space in pg_attribute and tuple descriptors for most practical scenarios. (ATTRIBUTE_FIXED_PART_SIZE is reduced from 108 to 104.) Also, null is the semantically more correct value. The ANALYZE code internally continues to represent the default statistics target by -1, so that that code can avoid having to deal with null values. But that is now contained to the ANALYZE code. Only the DDL code deals with attstattarget possibly null. For system columns, the field is now always null. The ANALYZE code skips system columns anyway. To set a column's statistics target to the default value, the new command form ALTER TABLE ... SET STATISTICS DEFAULT can be used. (SET STATISTICS -1 still works.) Reviewed-by: Alvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://www.postgresql.org/message-id/flat/4da8d211-d54d-44b9-9847-f2a9f1184c76@eisentraut.org	2024-01-13 18:14:53 +01:00
Michael Paquier	e72a37528d	Refactor code checking for file existence jit.c and dfgr.c had a copy of the same code to check if a file exists or not, with a twist: jit.c did not check for EACCES when failing the stat() call for the path whose existence is tested. This refactored routine will be used by an upcoming patch. Reviewed-by: Ashutosh Bapat Discussion: https://postgr.es/m/ZTiV8tn_MIb_H2rE@paquier.xyz	2024-01-12 12:04:51 +09:00
Robert Haas	d9ef650fca	Add new function pg_get_wal_summarizer_state(). This makes it possible to access information about the progress of WAL summarization from SQL. The previously-added functions pg_available_wal_summaries() and pg_wal_summary_contents() only examine on-disk state, but this function exposes information from the server's shared memory. Discussion: http://postgr.es/m/CA+Tgmobvqqj-DW9F7uUzT-cQqs6wcVb-Xhs=w=hzJnXSE-kRGw@mail.gmail.com	2024-01-11 12:41:18 -05:00
Nathan Bossart	5b1b9bce84	Cross-check lists of predefined LWLocks. Both lwlocknames.txt and wait_event_names.txt contain a list of all the predefined LWLocks, i.e., those with predefined positions within MainLWLockArray. It is easy to miss one or the other, especially since the list in wait_event_names.txt omits the "Lock" suffix from all the LWLock wait events. This commit adds a cross- check of these lists to the script that generates lwlocknames.h. If the lists do not match exactly, building will fail. Suggested-by: Robert Haas Reviewed-by: Robert Haas, Michael Paquier, Bertrand Drouvot Discussion: https://postgr.es/m/20240102173120.GA1061678%40nathanxps13	2024-01-09 11:05:19 -06:00
Alexander Korotkov	30b4955a46	Fix misuse of RelOptInfo.unique_for_rels cache by SJE When SJE uses RelOptInfo.unique_for_rels cache, it passes filtered quals to innerrel_is_unique_ext(). That might lead to an invalid match to cache entries made by previous non self-join checking calls. Add UniqueRelInfo.self_join flag to prevent such cases. Also, fix that SJE should require a strict match of outerrelids to make sure UniqueRelInfo.extra_clauses are valid. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/4788f781-31bd-9796-d7d6-588a751c8787%40gmail.com	2024-01-09 00:09:06 +02:00
Noah Misch	d3c5f37dd5	Make dblink interruptible, via new libpqsrv APIs. This replaces dblink's blocking libpq calls, allowing cancellation and allowing DROP DATABASE (of a database not involved in the query). Apart from explicit dblink_cancel_query() calls, dblink still doesn't cancel the remote side. The replacement for the blocking calls consists of new, general-purpose query execution wrappers in the libpqsrv facility. Out-of-tree extensions should adopt these. Use them in postgres_fdw, replacing a local implementation from which the libpqsrv implementation derives. This is a bug fix for dblink. Code inspection identified the bug at least thirteen years ago, but user complaints have not appeared. Hence, no back-patch for now. Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com	2024-01-08 11:39:56 -08:00
Noah Misch	0efc831847	Remove excess #include "utils/wait_event.h". This simplifies copying libpq/libpq-be-fe-helpers.h into extensions, because some supported PostgreSQL versions lack the other header. Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com	2024-01-08 11:39:56 -08:00
Noah Misch	196799d676	Fix missing word in comment.	2024-01-08 11:39:56 -08:00
Tom Lane	89b69db82a	Allow examine_simple_variable() to work on INSERT RETURNING Vars. Since commit `599b33b94`, this function assumed that every RTE_RELATION RangeTblEntry would have an associated RelOptInfo. But that's not so: we only build RelOptInfos for relations that are scanned by the query. In particular the target of an INSERT won't have one, so that Vars appearing in an INSERT ... RETURNING list will not have an associated RelOptInfo. This apparently wasn't a problem before commit `f7816aec2` taught examine_simple_variable() to drill down into CTEs containing INSERT RETURNING, but it is now. To fix, add a fallback code path that gets the userid to use directly from the RTEPermissionInfo associated with the RTE. (Sadly, we must have two code paths, because not every RTE has a RTEPermissionInfo either.) Per report from Alexander Lakhin. No back-patch, since the case is apparently unreachable before `f7816aec2`. Discussion: https://postgr.es/m/608a4886-6c60-0f9e-97d5-591256bd4150@gmail.com	2024-01-08 11:48:44 -05:00
Tom Lane	9391f71523	Teach estimate_array_length() to use statistics where available. If we have DECHIST statistics about the argument expression, use the average number of distinct elements as the array length estimate. (It'd be better to use the average total number of elements, but that is not currently calculated by compute_array_stats(), and it's unclear that it'd be worth extra effort to get.) To do this, we have to change the signature of estimate_array_length to pass the "root" pointer. While at it, also change its result type to "double". That's probably not really necessary, but it avoids any risk of overflow of the value extracted from DECHIST. All existing callers are going to use the result in a "double" calculation anyway. Paul Jungwirth, reviewed by Jian He and myself Discussion: https://postgr.es/m/CA+renyUnM2d+SmrxKpDuAdpiq6FOM=FByvi6aS6yi__qyf6j9A@mail.gmail.com	2024-01-04 18:36:19 -05:00
Nathan Bossart	14dd0f27d7	Add macros for looping through a List without a ListCell. Many foreach loops only use the ListCell pointer to retrieve the content of the cell, like so: ListCell *lc; foreach(lc, mylist) { int myint = lfirst_int(lc); ... } This commit adds a few convenience macros that automatically declare the loop variable and retrieve the current cell's contents. This allows us to rewrite the previous loop like this: foreach_int(myint, mylist) { ... } This commit also adjusts a few existing loops in order to add coverage for the new/adjusted macros. There is presently no plan to bulk update all foreach loops, as that could introduce a significant amount of back-patching pain. Instead, these macros are primarily intended for use in new code. Author: Jelte Fennema-Nio Reviewed-by: David Rowley, Alvaro Herrera, Vignesh C, Tom Lane Discussion: https://postgr.es/m/CAGECzQSwXKnxGwW1_Q5JE%2B8Ja20kyAbhBHO04vVrQsLcDciwXA%40mail.gmail.com	2024-01-04 16:09:34 -06:00
Peter Eisentraut	5d06e99a3c	ALTER TABLE command to change generation expression This adds a new ALTER TABLE subcommand ALTER COLUMN ... SET EXPRESSION that changes the generation expression of a generated column. The syntax is not standard but was adapted from other SQL implementations. This command causes a table rewrite, using the usual ALTER TABLE mechanisms. The implementation is similar to and makes use of some of the infrastructure of the SET DATA TYPE subcommand (for example, rebuilding constraints and indexes afterwards). The new command requires a new pass in AlterTablePass, and the ADD COLUMN pass had to be moved earlier so that combinations of ADD COLUMN and SET EXPRESSION can work. Author: Amul Sul <sulamul@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAAJ_b94yyJeGA-5M951_Lr+KfZokOp-2kXicpmEhi5FXhBeTog@mail.gmail.com	2024-01-04 16:28:54 +01:00
Amit Kapila	007693f2a3	Track conflict_reason in pg_replication_slots. This patch changes the existing 'conflicting' field to 'conflict_reason' in pg_replication_slots. This new field indicates the reason for the logical slot's conflict with recovery. It is always NULL for physical slots, as well as for logical slots which are not invalidated. The non-NULL values indicate that the slot is marked as invalidated. Possible values are: wal_removed = required WAL has been removed. rows_removed = required rows have been removed. wal_level_insufficient = the primary doesn't have a wal_level sufficient to perform logical decoding. The existing users of 'conflicting' column can get the same answer by using 'conflict_reason' IS NOT NULL. Author: Shveta Malik Reviewed-by: Amit Kapila, Bertrand Drouvot, Michael Paquier Discussion: https://postgr.es/m/ZYOE8IguqTbp-seF@paquier.xyz	2024-01-04 08:26:25 +05:30
Bruce Momjian	29275b1d17	Update copyright for 2024 Reported-by: Michael Paquier Discussion: https://postgr.es/m/ZZKTDPxBBMt3C0J9@paquier.xyz Backpatch-through: 12	2024-01-03 20:49:05 -05:00
Peter Eisentraut	59fd390d5e	Second attempt at organizing jsonpath operators and methods Second attempt at `283a95da92`. Since we can't reorder the enum values of JsonPathItemType, instead reorder the switch cases where they are used to generally follow the order of the enum values, for better maintainability.	2024-01-03 21:56:41 +01:00
Peter Eisentraut	0958f8f6bf	Revert "Reorganise jsonpath operators and methods" This reverts commit `283a95da92`. The reordering of JsonPathItemType affects the binary on-disk compatibility of the jsonpath type, so we must not change it. Revert for now and consider.	2024-01-03 21:02:49 +01:00
Peter Eisentraut	283a95da92	Reorganise jsonpath operators and methods Various jsonpath operators and methods add various keywords, switch cases, and documentation entries in some order. However, they are not consistent; reorder them for better maintainability or readability. Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com	2024-01-03 11:25:33 +01:00
Peter Eisentraut	c1b9e1e56d	Add numeric_int8_opt_error() to optionally suppress errors This matches the existing numeric_int4_opt_error() (see commit `16d489b0fe`). It will be used by a future JSON-related patch, which wants to report errors in its own way and thus does not want the internal functions to throw any error. Author: Jeevan Chalke <jeevan.chalke@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/CAM2+6=XjTyqrrqHAOj80r0wVQxJSxc0iyib9bPC55uFO9VKatg@mail.gmail.com	2024-01-03 10:05:35 +01:00
Robert Haas	e62e73f3a2	gist: fix typo "split(t)ed" -> "split" Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org	2024-01-02 12:24:28 -05:00
Robert Haas	0d9937d118	Fix typos in comments and in one isolation test. Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions by me. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org	2024-01-02 12:05:41 -05:00
Peter Eisentraut	141752bbb0	Fix typos in simplehash.h Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/18252-d46d27900a277d87@postgresql.org	2024-01-02 10:18:47 +01:00
Amit Kapila	9a17be1e24	Allow upgrades to preserve the full subscription's state. This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud	2024-01-02 08:08:46 +05:30
Tomas Vondra	6c63bcbf3c	Minor cleanup of the BRIN parallel build code Commit `b437571714` added support for parallel builds for BRIN indexes, using code similar to BTREE parallel builds, and also a new tuplesort variant. This commit simplifies the new code in two ways: * The "spool" grouping tuplesort and the heap/index is not necessary. The heap/index are available as separate arguments, causing confusion. So remove the spool, and use the tuplesort directly. * The new tuplesort variant does not need the heap/index, as it sorts simply by the range block number, without accessing the tuple data. So simplify that too. Initial report and patch by Ranier Vilela, further cleanup by me. Author: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com	2023-12-30 23:15:04 +01:00
Peter Eisentraut	a740b213d4	Add GUC backtrace_on_internal_error When enabled (default off), this logs a backtrace anytime elog() or an equivalent ereport() for internal errors is called. This is not well covered by the existing backtrace_functions, because there are many equally-worded low-level errors in many functions. And if you find out where the error is, then you need to manually rewrite the elog() to ereport() to attach the errbacktrace(), which is annoying. Having a backtrace automatically on every elog() call could be very helpful during development for various kinds of common errors from palloc, syscache, node support, etc. Discussion: https://www.postgresql.org/message-id/flat/ba76c6bc-f03f-4285-bf16-47759cfcab9e@eisentraut.org	2023-12-30 11:43:57 +01:00
Peter Eisentraut	c538592959	Make all Perl warnings fatal There are a lot of Perl scripts in the tree, mostly code generation and TAP tests. Occasionally, these scripts produce warnings. These are probably always mistakes on the developer side (true positives). Typical examples are warnings from genbki.pl or related when you make a mess in the catalog files during development, or warnings from tests when they massage a config file that looks different on different hosts, or mistakes during merges (e.g., duplicate subroutine definitions), or just mistakes that weren't noticed because there is a lot of output in a verbose build. This changes all warnings into fatal errors, by replacing use warnings; by use warnings FATAL => 'all'; in all Perl files. Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org	2023-12-29 18:20:00 +01:00
Tom Lane	58054de2d0	Improve the implementation of information_schema._pg_expandarray(). This function was originally coded with a handmade expansion of the array subscripts. We can do it a little faster and far more legibly today, by using unnest() WITH ORDINALITY. While at it, let's apply the rowcount estimation support that exists for the underlying unnest() function: reduce the default ROWS estimate to 100 and attach array_unnest_support. I'm not sure that array_unnest_support can do anything useful today with the call sites that exist in information_schema, but it can't hurt, and the existing default rowcount of 1000 is surely much too high for any of these cases. The psql.sql regression script is using _pg_expandarray() as a test case for \sf+. While we could keep doing so, the new one-line function body makes a poor test case for \sf+ row-numbering, so switch it to print another information_schema function. Discussion: https://postgr.es/m/1424303.1703355485@sss.pgh.pa.us	2023-12-27 15:55:46 -05:00
Alexander Korotkov	7e6fb5da41	Improvements and fixes for `e0b1ee17dc` `e0b1ee17dc` introduced optimization for matching B-tree scan keys required for the directional scan. However, it incorrectly assumed that all keys required for opposite direction scan are satisfied by _bt_first(). It has been illustrated that with multiple scan keys over the same column, a lesser one (according to the scan direction) could win leaving the other one unsatisfied. Instead of relying on _bt_first() this commit introduces code that memorizes whether there was at least one match on the page. If that's true we know that keys required for opposite-direction scan are satisfied as soon as corresponding values are not NULLs. Also, this commit simplifies the description for the optimization of keys required for the current direction scan. Now the flag used for this is named continuescanPrechecked and means exactly that *continuescan flag is known to be true for the last item on the page. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com Reviewed-by: Pavel Borisov	2023-12-27 14:35:08 +02:00
Alexander Korotkov	06b10f80ba	Remove BTScanOpaqueData.firstPage It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData. This commit makes it an argument of the _bt_readpage() function. We can easily distinguish first-time and repeated calls (within the scan) of this function. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com Reviewed-by: Pavel Borisov	2023-12-27 14:21:49 +02:00
Alexander Korotkov	7d58f2342b	REALLOCATE_BITMAPSETS manual compile-time option This option forces each bitmapset modification to reallocate bitmapset. This is useful for debugging hangling pointers to bitmapset's. Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov	2023-12-27 03:57:57 +02:00
Alexander Korotkov	12915a58ee	Enhance checkpointer restartpoint statistics Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding three new columns: restartpoints_timed, restartpoints_req, and restartpoints_done. These additions aim to improve the visibility and monitoring of restartpoint processes on replicas. Previously, it was challenging to differentiate between successful and failed restartpoint requests. This limitation arises because restartpoints on replicas are dependent on checkpoint records from the primary, and cannot occur more frequently than these checkpoints. The new columns allow for clear distinction and tracking of restartpoint requests, their triggers, and successful completions. This enhancement aids database administrators and developers in better understanding and diagnosing issues related to restartpoint behavior, particularly in scenarios where restartpoint requests may fail. System catalog is changed. Catversion is bumped. Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru Author: Anton A. Melnikov Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov	2023-12-25 01:12:36 +02:00
Robert Haas	49f2194ed5	Fix numerous typos in incremental backup commits. Apparently, spell check would have been a really good idea. Alexander Lakhin, with a few additions as per an off-list report from Andres Freund. Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com	2023-12-21 15:36:17 -05:00
Robert Haas	dc21234005	Add support for incremental backup. To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior. An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups. Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to Jakub for incredibly helpful and extensive testing. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com	2023-12-20 09:49:12 -05:00
Robert Haas	174c480508	Add a new WAL summarizer process. When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com	2023-12-20 08:42:28 -05:00
Robert Haas	aafc07c7a1	Move src/bin/pg_verifybackup/parse_manifest.c into src/common. This makes it possible for the code to be easily reused by other client-side tools, and/or by the server. Patch by me. Review of this patch in particular by at least Peter Eisentraut; reviewers for the patch series in general include Dilip Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com	2023-12-19 15:24:31 -05:00
Tom Lane	7e1ce2b3de	Prevent integer overflow when forming tuple width estimates. It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org	2023-12-19 11:12:16 -05:00
Peter Eisentraut	2a607fb822	Update comment for Cardinality typedef Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-Zd7Yy80RL1NdskLLo-wz6QoqsbC5TKs%3D3yZxG3BT_aA%40mail.gmail.com	2023-12-19 14:58:47 +01:00
Heikki Linnakangas	3c080fb4fa	Simplify newNode() by removing special cases - Remove MemoryContextAllocZeroAligned(). It was supposed to be a faster version of MemoryContextAllocZero(), but modern compilers turn the MemSetLoop() into a call to memset() anyway, making it more or less identical to MemoryContextAllocZero(). That was the only user of MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast(). - Convert newNode() to a static inline function. When this was originally originally written, it was written as a macro because testing showed that gcc didn't inline the size check as we intended. Modern compiler versions do, and now that it just calls palloc0() there is no size-check to inline anyway. One nice effect is that the palloc0() takes one less argument than MemoryContextAllocZeroAligned(), which saves a few instructions in the callers of newNode(). Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi	2023-12-19 12:11:47 +02:00
Tom Lane	8b965c549d	compute_bitmap_pages' loop_count parameter should be double not int. The value was double in the original implementation of this logic. Commit `da08a6598` pulled it out into a subroutine, but carelessly declared the parameter as int when it should have been double. On most platforms, the only ill effect would be to clamp the value to be not more than INT_MAX, which would seldom be exceeded and probably wouldn't change the estimates too much anyway. Nonetheless, it's wrong and can cause complaints from ubsan. While here, improve the comments and parameter names. This is an ABI change in a globally exposed subroutine, so back-patching would create some risk of breaking extensions. The value of the fix doesn't seem high enough to warrant taking that risk, so fix in HEAD only. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com	2023-12-18 12:46:10 -05:00
Nathan Bossart	64b1fb5f03	Optimize pg_atomic_exchange_u32 and pg_atomic_exchange_u64. Presently, all platforms implement atomic exchanges by performing an atomic compare-and-swap in a loop until it succeeds. This can be especially expensive when there is contention on the atomic variable. This commit optimizes atomic exchanges on many platforms by using compiler intrinsics, which should compile into something much less expensive than a compare-and-swap loop. Since these intrinsics have been available for some time, the inline assembly implementations are omitted. Suggested-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20231129212905.GA1258737%40nathanxps13	2023-12-18 10:53:32 -06:00
Thomas Munro	4908c58720	Provide vectored variants of smgrread() and smgrwrite(). smgrreadv() and smgrwritev() and their md.c implementations call FileReadV() and FileWriteV(). A range of disk blocks beginning at 'blocknum' and extending for 'nblocks' can be scattered to or gathered from multiple buffers with a single system call. The traditional smgrread() and smgrwrite() functions are implemented in terms of the new functions. Later commits will introduce calls with nblocks > 1, but the following behavioral changes can be seen already: * After a short transfer we'll now retry until we eventually read 0 bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we would infer the reason. Retrying is consistent with xlog.c's treatment of large WAL writes, and arguably also xlog.c and fd.c's treatment of EINTR. Arbitrary short returns for larger transfers have been observed on several OSes, and might in theory also happen for transient reasons with our own pg_pv() fallback code. After unexpected EOF or -1, the error thrown now talks about a range even for the single block case, eg "blocks 42..42". Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-18 15:01:50 +13:00
Michael Paquier	3c9d9acae0	Refactor pgstat_prepare_io_time() with an input argument instead of a GUC Originally, this routine relied on track_io_timing to check if a time interval for an I/O operation stored in pg_stat_io should be initialized or not. However, the addition of WAL statistics to pg_stat_io requires that the initialization happens when track_wal_io_timing is enabled, which is dependent on the code path where the I/O operation happens. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com	2023-12-16 20:16:20 +01:00
Alvaro Herrera	a6be0600ac	Remove useless LIMIT_OPTION_DEFAULT value from LimitOption During the development that led to commit `357889eb17`, for a time we had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely removed later on, before commit. Complete the removal now. Author: Zhang Mingli <avamingli@gmail.com> Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark	2023-12-16 18:20:03 +01:00
Thomas Munro	b485ad7f07	Provide multi-block smgrprefetch(). Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a single block at a time. Add an nblocks argument so that we can do the same for a range of blocks. This usually produces a single system call, but might need to loop if it crosses a segment boundary. Initially it is only called with nblocks == 1, but proposed patches will make wider calls. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version) Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-16 17:51:21 +13:00
Tom Lane	59bd34c2fa	Fix bugs in manipulation of large objects. In v16 and up (since commit `afbfc0298`), large object ownership checking has been broken because object_ownercheck() didn't take care of the discrepancy between our object-address representation of large objects (classId == LargeObjectRelationId) and the catalog where their ownership info is actually stored (LargeObjectMetadataRelationId). This resulted in failures such as "unrecognized class ID: 2613" when trying to update blob properties as a non-superuser. Poking around for related bugs, I found that AlterObjectOwner_internal would pass the wrong classId to the PostAlterHook in the no-op code path where the large object already has the desired owner. Also, recordExtObjInitPriv checked for the wrong classId; that bug is only latent because the stanza is dead code anyway, but as long as we're carrying it around it should be less wrong. These bugs are quite old. In HEAD, we can reduce the scope for future bugs of this ilk by changing AlterObjectOwner_internal's API to let the translation happen inside that function, rather than requiring callers to know about it. A more bulletproof fix, perhaps, would be to start using LargeObjectMetadataRelationId as the dependency and object-address classId for blobs. However that has substantial risk of breaking third-party code; even within our own code, it'd create hassles for pg_dump which would have to cope with a version-dependent representation. For now, keep the status quo. Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us	2023-12-15 13:55:05 -05:00
Thomas Munro	871fe4917e	Provide vectored variants of FileRead() and FileWrite(). FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for fd.c's virtual file descriptors. The simple FileRead() and FileWrite() functions are now implemented in terms of the vectored functions, to avoid code duplication, and they are converted back to the corresponding simple system calls further down (commit `15c9ac36`). Later work will make more interesting multi-iovec calls. The traditional behavior of reporting a "fake" ENOSPC error is simplified. It's now always set for non-failing writes, for the benefit of callers that expect to log a meaningful "%m" if they determine that the write was short. (Perhaps we should consider getting rid of that expectation one day.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-12 13:12:43 +13:00
Thomas Munro	0c6be59f5e	Provide helper for retrying partial vectored I/O. compute_remaining_iovec() is a re-usable routine for retrying after pg_readv() or pg_writev() reports a short transfer. This will gain new users in a later commit, but can already replace the open-coded equivalent code in the existing pg_pwritev_with_retry() function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-12 10:57:18 +13:00
Thomas Munro	baf7c93ed5	Define unconstify() and unvolatize() for C++. These two macros wouldn't work if used in an inline function definition in a header seen by g++, because __builtin_types_compatible_p is only available in C. Redirect to standard C++ const_cast (which also adds/removes volatile despite its name). Per cpluspluscheck failure in a development branch. Suggested-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com	2023-12-12 09:46:46 +13:00
Alvaro Herrera	d3fe6e90ba	Simplify productions for FORMAT JSON [ ENCODING name ] This removes the production json_encoding_clause_opt, instead merging it into json_format_clause. Also remove the auxiliary makeJsonEncoding() function. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql	2023-12-11 11:55:34 +01:00
Michael Paquier	c7a3e6b46d	Remove trace_recovery_messages This GUC was intended as a debugging help in the 9.0 area when hot standby and streaming replication were being developped, able to offer more information at LOG level rather than DEBUGn. There are more tools available these days that are able to offer rather equivalent information, like pg_waldump introduced in 9.3. It is not obvious how this facility is useful these days, so let's remove it. Author: Bharath Rupireddy Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz	2023-12-11 11:49:02 +01:00
Peter Eisentraut	90834ceccd	Remove some unnecessary includes of "access/xlog_internal.h" There were a few places where access/xlog_internal.h was apparently included unnecessarily. In some of those places, a more specific header file (that somehow came in via access/xlog_internal.h) can be used instead. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/a56a6eec-eb14-471b-9570-3cac23603964%40eisentraut.org	2023-12-10 07:46:06 +01:00
Jeff Davis	867dd2dc87	Cache opaque handle for GUC option to avoid repeasted lookups. When setting GUCs from proconfig, performance is important, and hash lookups in the GUC table are significant. Per suggestion from Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com Reviewed-by: John Naylor	2023-12-08 11:16:01 -08:00
Peter Geoghegan	c9c0589fda	Optimize nbtree backward scan boundary cases. Teach _bt_binsrch (and related helper routines like _bt_search and _bt_compare) about the initial positioning requirements of backward scans. Routines like _bt_binsrch already know all about "nextkey" searches, so it seems natural to teach them about "goback"/backward searches, too. These concepts are closely related, and are much easier to understand when discussed together. Now that certain implementation details are hidden from _bt_first, it's straightforward to add a new optimization: backward scans using the < strategy now avoid extra leaf page accesses in certain "boundary cases". Consider the following example, which uses the tenk1 table (and its tenk1_hundred index) from the standard regression tests: SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1; Before this commit, nbtree would scan two leaf pages, even though it was only really necessary to scan one leaf page. We'll now descend straight to the leaf page containing a (12, -inf) high key instead. The scan will locate matching non-pivot tuples with "hundred" values starting from the value 11. The scan won't waste a page access on the right sibling leaf page, which cannot possibly contain any matching tuples. You can think of the optimization added by this commit as disabling an optimization (the _bt_compare "!pivotsearch" behavior that was added to Postgres 12 in commit `dd299df8`) for a small subset of cases where it was always counterproductive. Equivalently, you can think of the new optimization as extending the "pivotsearch" behavior that page deletion by VACUUM has long required (since the aforementioned Postgres 12 commit went in) to other, similar cases. Obviously, this isn't strictly necessary for these new cases (unlike VACUUM, _bt_first is prepared to move the scan to the left once on the leaf level), but the underlying principle is the same. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com	2023-12-08 11:05:17 -08:00
Tomas Vondra	b437571714	Allow parallel CREATE INDEX for BRIN indexes Allow using multiple worker processes to build BRIN index, which until now was supported only for BTREE indexes. For large tables this often results in significant speedup when the build is CPU-bound. The work is split in a simple way - each worker builds BRIN summaries on a subset of the table, determined by the regular parallel scan used to read the data, and feeds them into a shared tuplesort which sorts them by blkno (start of the range). The leader then reads this sorted stream of ranges, merges duplicates (which may happen if the parallel scan does not align with BRIN pages_per_range), and adds the resulting ranges into the index. The number of duplicate results produced by workers (requiring merging in the leader process) should be fairly small, thanks to how parallel scans assign chunks to workers. The likelihood of duplicate results may increase for higher pages_per_range values, but then there are fewer page ranges in total. In any case, we expect the merging to be much cheaper than summarization, so this should be a win. Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for BRIN indexes (e.g. uniqueness checks). This also introduces a new index AM flag amcanbuildparallel, determining whether to attempt to start parallel workers for the index build. Original patch by me, with reviews and substantial reworks by Matthias van de Meent, certainly enough to make him a co-author. Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com	2023-12-08 18:15:26 +01:00
Heikki Linnakangas	b31ba5310b	Rename ShmemVariableCache to TransamVariables The old name was misleading: It's not a cache, the values kept in the struct are the authoritative source. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi	2023-12-08 09:47:15 +02:00
Heikki Linnakangas	15916ffb04	Initialize ShmemVariableCache like other shmem areas For sake of consistency. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi	2023-12-08 09:46:59 +02:00
Jeff Davis	719b342d36	Shrink Unicode category table. Missing entries can implicitly be considered "unassigned". Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com	2023-12-07 15:44:03 -08:00
David Rowley	d16a0c1e2e	Verify that attribute counts match in ExecCopySlot tts_virtual_copyslot() contained an Assert that checked that the srcslot contained <= attributes than the dstslot. This seems to be backwards as if the srcslot contained fewer attributes then the dstslot could be left with stale Datum values from the previously stored tuple. It might make more sense to allow the source to contain additional attributes and only copy the leading ones that also exist in the destination, however, that's not what we're doing here. Here we just remove the Assert from tts_virtual_copyslot() and add an Assert to ExecCopySlot() to verify the attribute counts match. There does not seem to be any places where the destination contains fewer attributes, so instead of going to the trouble of making the code properly handle this, let's just ensure the attribute counts match. If this Assert fails then that will indicate that we do have cases that require us to handle the srcslot with more attributes than the dstslot. It seems better to only write this code if there's a genuine requirement for it rather than write it now only to leave it untested. Thanks to Andres Freund for helping with the analysis of this. Discussion: https://postgr.es/m/CAApHDvpMAvBL0T+TRORquyx1iqFQKMVTXtqNocOw0Pa2uh1heg@mail.gmail.com	2023-12-07 21:28:24 +13:00
Amit Kapila	0bf62460bb	Fix issues in binary_upgrade_logical_slot_has_caught_up(). The commit `29d0a77fa6` labelled binary_upgrade_logical_slot_has_caught_up() as a non-strict function to allow providing a better error message to callers in case the passed slot_name is NULL. On further discussion, it seems that it is not helpful to have a different error message for NULL input in this function, so this patch marks the function as strict. This patch also removes the explicit permission check to use replication slots as this function is invoked only by superusers and instead adds an Assert. Reported-by: Masahiko Sawada Author: Hayato Kuroda Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com	2023-12-07 08:42:48 +05:30
Heikki Linnakangas	e7c6efe305	Remove now-unnecessary Autovacuum[Launcher\|Worker]IAm functions After commit `fd5e8b440d`, InitProcess() is called later in the EXEC_BACKEND startup sequence, so it's enough to set the am_autovacuum_[launcher\|worker] variables at the same place as in the !EXEC_BACKEND case.	2023-12-04 15:34:37 +02:00
Peter Eisentraut	457428d9e9	Remove unnecessary include of <math.h> This was probably never necessary. (The header used to use random(), but that shouldn't require <math.h> either. In any case, that's gone, too.) Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org	2023-12-04 06:35:22 +01:00
Peter Eisentraut	da67cb0a44	Remove unnecessary include of <sys/socket.h> This was put here as part of a mechanical replacement of the old "getaddrinfo.h" with <netdb.h> plus <sys/socket.h> (commit `5579388d2d`). But here, we only need netdb.h (for NI_MAXHOST), not sys/socket.h. Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org	2023-12-04 06:35:22 +01:00
Peter Eisentraut	dffb2b478f	Remove unnecessary includes of <signal.h> These were once needed for sig_atomic_t, but that no longer appears in these headers, so the include is not needed. Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org	2023-12-04 06:35:22 +01:00
Michael Paquier	f21848de20	Add support for REINDEX in event triggers This commit adds support for REINDEX in event triggers, making this command react for the events ddl_command_start and ddl_command_end. The indexes rebuilt are collected with the ReindexStmt emitted by the caller, for the concurrent and non-concurrent paths. Thanks to that, it is possible to know a full list of the indexes that a single REINDEX command has worked on. Author: Garrett Thornburg, Jian He Reviewed-by: Jim Jones, Michael Paquier Discussion: https://postgr.es/m/CAEEqfk5bm32G7sbhzHbES9WejD8O8DCEOaLkxoBP7HNWxjPpvg@mail.gmail.com	2023-12-04 09:53:49 +09:00
Heikki Linnakangas	388491f1e5	Pass BackgroundWorker entry in the parameter file in EXEC_BACKEND mode The BackgroundWorker struct is now passed the same way as the Port struct. Seems more consistent. This makes it possible to move InitProcess later in SubPostmasterMain (in next commit), as we no longer need to access shared memory to read background worker entry. Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-12-03 16:38:54 +02:00
Heikki Linnakangas	69d903367c	Refactor CreateSharedMemoryAndSemaphores For clarity, have separate functions for creating the shared memory and semaphores at postmaster or single-user backend startup, and for attaching to existing shared memory structures in EXEC_BACKEND case. CreateSharedMemoryAndSemaphores() is now called only at postmaster startup, and a new AttachSharedMemoryStructs() function is called at backend startup in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-12-03 16:09:42 +02:00
Thomas Munro	15c9ac3629	Optimize pg_readv/pg_pwritev single vector case. For the trivial case of iovcnt == 1, kernels are measurably slower at dealing with the more complex arguments of preadv/pwritev than the equivalent plain old pread/pwrite. The overheads are worth it for iovcnt > 1, but for 1 let's just redirect to the cheaper calls. While we could leave it to callers to worry about that, we already have to have our own pg_ wrappers for portability reasons so it seems reasonable to centralize this knowledge there (thanks to Heikki for this suggestion). Try to avoid function call overheads by making them inlinable, which might also allow the compiler to avoid the branch in some cases. For systems that don't have preadv and pwritev (currently: Windows and [closed] Solaris), we might as well pull the replacement functions up into the static inline functions too. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-11-29 17:19:25 +13:00
Alexander Korotkov	2cdf131c46	Use larger segment file names for pg_notify This avoids the wraparound in async.c and removes the corresponding code complexity. The maximum amount of allocated SLRU pages for NOTIFY / LISTEN queue is now determined by the max_notify_queue_pages GUC. The default value is 1048576. It allows to consume up to 8 GB of disk space which is exactly the limit we had previously. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com	2023-11-29 01:41:48 +02:00
Alexander Korotkov	4ed8f0913b	Index SLRUs by 64-bit integers rather than by 32-bit integers We've had repeated bugs in the area of handling SLRU wraparound in the past, some of which have caused data loss. Switching to an indexing system for SLRUs that does not wrap around should allow us to get rid of a whole bunch of problems and improve the overall reliability of the system. This particular patch however only changes the indexing and doesn't address the wraparound per se. This is going to be done in the following patches. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com	2023-11-29 01:40:56 +02:00

... 2 3 4 5 6 ...

11513 Commits