postgresql/src/common/f2s.c

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

804 lines
21 KiB
C
Raw Permalink Normal View History

Change floating-point output format for improved performance. Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
/*---------------------------------------------------------------------------
*
* Ryu floating-point output for single precision.
*
* Portions Copyright (c) 2018-2024, PostgreSQL Global Development Group
Change floating-point output format for improved performance. Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
*
* IDENTIFICATION
* src/common/f2s.c
*
* This is a modification of code taken from github.com/ulfjack/ryu under the
* terms of the Boost license (not the Apache license). The original copyright
* notice follows:
*
* Copyright 2018 Ulf Adams
*
* The contents of this file may be used under the terms of the Apache
* License, Version 2.0.
*
* (See accompanying file LICENSE-Apache or copy at
* http://www.apache.org/licenses/LICENSE-2.0)
*
* Alternatively, the contents of this file may be used under the terms of the
* Boost Software License, Version 1.0.
*
* (See accompanying file LICENSE-Boost or copy at
* https://www.boost.org/LICENSE_1_0.txt)
*
* Unless required by applicable law or agreed to in writing, this software is
* distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied.
*
*---------------------------------------------------------------------------
*/
#ifndef FRONTEND
#include "postgres.h"
#else
#include "postgres_fe.h"
#endif
#include "common/shortest_dec.h"
#include "digit_table.h"
#include "ryu_common.h"
Change floating-point output format for improved performance. Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
#define FLOAT_MANTISSA_BITS 23
#define FLOAT_EXPONENT_BITS 8
#define FLOAT_BIAS 127
/*
* This table is generated (by the upstream) by PrintFloatLookupTable,
* and modified (by us) to add UINT64CONST.
*/
#define FLOAT_POW5_INV_BITCOUNT 59
static const uint64 FLOAT_POW5_INV_SPLIT[31] = {
UINT64CONST(576460752303423489), UINT64CONST(461168601842738791), UINT64CONST(368934881474191033), UINT64CONST(295147905179352826),
UINT64CONST(472236648286964522), UINT64CONST(377789318629571618), UINT64CONST(302231454903657294), UINT64CONST(483570327845851670),
UINT64CONST(386856262276681336), UINT64CONST(309485009821345069), UINT64CONST(495176015714152110), UINT64CONST(396140812571321688),
UINT64CONST(316912650057057351), UINT64CONST(507060240091291761), UINT64CONST(405648192073033409), UINT64CONST(324518553658426727),
UINT64CONST(519229685853482763), UINT64CONST(415383748682786211), UINT64CONST(332306998946228969), UINT64CONST(531691198313966350),
UINT64CONST(425352958651173080), UINT64CONST(340282366920938464), UINT64CONST(544451787073501542), UINT64CONST(435561429658801234),
UINT64CONST(348449143727040987), UINT64CONST(557518629963265579), UINT64CONST(446014903970612463), UINT64CONST(356811923176489971),
UINT64CONST(570899077082383953), UINT64CONST(456719261665907162), UINT64CONST(365375409332725730)
};
#define FLOAT_POW5_BITCOUNT 61
static const uint64 FLOAT_POW5_SPLIT[47] = {
UINT64CONST(1152921504606846976), UINT64CONST(1441151880758558720), UINT64CONST(1801439850948198400), UINT64CONST(2251799813685248000),
UINT64CONST(1407374883553280000), UINT64CONST(1759218604441600000), UINT64CONST(2199023255552000000), UINT64CONST(1374389534720000000),
UINT64CONST(1717986918400000000), UINT64CONST(2147483648000000000), UINT64CONST(1342177280000000000), UINT64CONST(1677721600000000000),
UINT64CONST(2097152000000000000), UINT64CONST(1310720000000000000), UINT64CONST(1638400000000000000), UINT64CONST(2048000000000000000),
UINT64CONST(1280000000000000000), UINT64CONST(1600000000000000000), UINT64CONST(2000000000000000000), UINT64CONST(1250000000000000000),
UINT64CONST(1562500000000000000), UINT64CONST(1953125000000000000), UINT64CONST(1220703125000000000), UINT64CONST(1525878906250000000),
UINT64CONST(1907348632812500000), UINT64CONST(1192092895507812500), UINT64CONST(1490116119384765625), UINT64CONST(1862645149230957031),
UINT64CONST(1164153218269348144), UINT64CONST(1455191522836685180), UINT64CONST(1818989403545856475), UINT64CONST(2273736754432320594),
UINT64CONST(1421085471520200371), UINT64CONST(1776356839400250464), UINT64CONST(2220446049250313080), UINT64CONST(1387778780781445675),
UINT64CONST(1734723475976807094), UINT64CONST(2168404344971008868), UINT64CONST(1355252715606880542), UINT64CONST(1694065894508600678),
UINT64CONST(2117582368135750847), UINT64CONST(1323488980084844279), UINT64CONST(1654361225106055349), UINT64CONST(2067951531382569187),
UINT64CONST(1292469707114105741), UINT64CONST(1615587133892632177), UINT64CONST(2019483917365790221)
};
static inline uint32
pow5Factor(uint32 value)
{
uint32 count = 0;
for (;;)
{
Assert(value != 0);
const uint32 q = value / 5;
const uint32 r = value % 5;
if (r != 0)
break;
value = q;
++count;
}
return count;
}
/* Returns true if value is divisible by 5^p. */
static inline bool
multipleOfPowerOf5(const uint32 value, const uint32 p)
{
return pow5Factor(value) >= p;
}
/* Returns true if value is divisible by 2^p. */
static inline bool
multipleOfPowerOf2(const uint32 value, const uint32 p)
{
/* return __builtin_ctz(value) >= p; */
return (value & ((1u << p) - 1)) == 0;
}
/*
* It seems to be slightly faster to avoid uint128_t here, although the
* generated code for uint128_t looks slightly nicer.
*/
static inline uint32
mulShift(const uint32 m, const uint64 factor, const int32 shift)
{
/*
* The casts here help MSVC to avoid calls to the __allmul library
* function.
*/
const uint32 factorLo = (uint32) (factor);
const uint32 factorHi = (uint32) (factor >> 32);
const uint64 bits0 = (uint64) m * factorLo;
const uint64 bits1 = (uint64) m * factorHi;
Assert(shift > 32);
#ifdef RYU_32_BIT_PLATFORM
/*
* On 32-bit platforms we can avoid a 64-bit shift-right since we only
* need the upper 32 bits of the result and the shift value is > 32.
*/
const uint32 bits0Hi = (uint32) (bits0 >> 32);
uint32 bits1Lo = (uint32) (bits1);
uint32 bits1Hi = (uint32) (bits1 >> 32);
bits1Lo += bits0Hi;
bits1Hi += (bits1Lo < bits0Hi);
const int32 s = shift - 32;
return (bits1Hi << (32 - s)) | (bits1Lo >> s);
#else /* RYU_32_BIT_PLATFORM */
const uint64 sum = (bits0 >> 32) + bits1;
const uint64 shiftedSum = sum >> (shift - 32);
Assert(shiftedSum <= PG_UINT32_MAX);
Change floating-point output format for improved performance. Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
return (uint32) shiftedSum;
#endif /* RYU_32_BIT_PLATFORM */
}
static inline uint32
mulPow5InvDivPow2(const uint32 m, const uint32 q, const int32 j)
{
return mulShift(m, FLOAT_POW5_INV_SPLIT[q], j);
}
static inline uint32
mulPow5divPow2(const uint32 m, const uint32 i, const int32 j)
{
return mulShift(m, FLOAT_POW5_SPLIT[i], j);
}
static inline uint32
decimalLength(const uint32 v)
{
/* Function precondition: v is not a 10-digit number. */
/* (9 digits are sufficient for round-tripping.) */
Assert(v < 1000000000);
if (v >= 100000000)
{
return 9;
}
if (v >= 10000000)
{
return 8;
}
if (v >= 1000000)
{
return 7;
}
if (v >= 100000)
{
return 6;
}
if (v >= 10000)
{
return 5;
}
if (v >= 1000)
{
return 4;
}
if (v >= 100)
{
return 3;
}
if (v >= 10)
{
return 2;
}
return 1;
}
/* A floating decimal representing m * 10^e. */
typedef struct floating_decimal_32
{
uint32 mantissa;
int32 exponent;
} floating_decimal_32;
static inline floating_decimal_32
f2d(const uint32 ieeeMantissa, const uint32 ieeeExponent)
{
int32 e2;
uint32 m2;
if (ieeeExponent == 0)
{
/* We subtract 2 so that the bounds computation has 2 additional bits. */
e2 = 1 - FLOAT_BIAS - FLOAT_MANTISSA_BITS - 2;
m2 = ieeeMantissa;
}
else
{
e2 = ieeeExponent - FLOAT_BIAS - FLOAT_MANTISSA_BITS - 2;
m2 = (1u << FLOAT_MANTISSA_BITS) | ieeeMantissa;
}
#if STRICTLY_SHORTEST
const bool even = (m2 & 1) == 0;
const bool acceptBounds = even;
#else
const bool acceptBounds = false;
#endif
/* Step 2: Determine the interval of legal decimal representations. */
const uint32 mv = 4 * m2;
const uint32 mp = 4 * m2 + 2;
/* Implicit bool -> int conversion. True is 1, false is 0. */
const uint32 mmShift = ieeeMantissa != 0 || ieeeExponent <= 1;
const uint32 mm = 4 * m2 - 1 - mmShift;
/* Step 3: Convert to a decimal power base using 64-bit arithmetic. */
uint32 vr,
vp,
vm;
int32 e10;
bool vmIsTrailingZeros = false;
bool vrIsTrailingZeros = false;
uint8 lastRemovedDigit = 0;
if (e2 >= 0)
{
const uint32 q = log10Pow2(e2);
e10 = q;
const int32 k = FLOAT_POW5_INV_BITCOUNT + pow5bits(q) - 1;
const int32 i = -e2 + q + k;
vr = mulPow5InvDivPow2(mv, q, i);
vp = mulPow5InvDivPow2(mp, q, i);
vm = mulPow5InvDivPow2(mm, q, i);
if (q != 0 && (vp - 1) / 10 <= vm / 10)
{
/*
* We need to know one removed digit even if we are not going to
* loop below. We could use q = X - 1 above, except that would
* require 33 bits for the result, and we've found that 32-bit
* arithmetic is faster even on 64-bit machines.
*/
const int32 l = FLOAT_POW5_INV_BITCOUNT + pow5bits(q - 1) - 1;
lastRemovedDigit = (uint8) (mulPow5InvDivPow2(mv, q - 1, -e2 + q - 1 + l) % 10);
}
if (q <= 9)
{
/*
* The largest power of 5 that fits in 24 bits is 5^10, but q <= 9
* seems to be safe as well.
*
* Only one of mp, mv, and mm can be a multiple of 5, if any.
*/
if (mv % 5 == 0)
{
vrIsTrailingZeros = multipleOfPowerOf5(mv, q);
}
else if (acceptBounds)
{
vmIsTrailingZeros = multipleOfPowerOf5(mm, q);
}
else
{
vp -= multipleOfPowerOf5(mp, q);
}
}
}
else
{
const uint32 q = log10Pow5(-e2);
e10 = q + e2;
const int32 i = -e2 - q;
const int32 k = pow5bits(i) - FLOAT_POW5_BITCOUNT;
int32 j = q - k;
vr = mulPow5divPow2(mv, i, j);
vp = mulPow5divPow2(mp, i, j);
vm = mulPow5divPow2(mm, i, j);
if (q != 0 && (vp - 1) / 10 <= vm / 10)
{
j = q - 1 - (pow5bits(i + 1) - FLOAT_POW5_BITCOUNT);
lastRemovedDigit = (uint8) (mulPow5divPow2(mv, i + 1, j) % 10);
}
if (q <= 1)
{
/*
* {vr,vp,vm} is trailing zeros if {mv,mp,mm} has at least q
* trailing 0 bits.
*/
/* mv = 4 * m2, so it always has at least two trailing 0 bits. */
vrIsTrailingZeros = true;
if (acceptBounds)
{
/*
* mm = mv - 1 - mmShift, so it has 1 trailing 0 bit iff
* mmShift == 1.
*/
vmIsTrailingZeros = mmShift == 1;
}
else
{
/*
* mp = mv + 2, so it always has at least one trailing 0 bit.
*/
--vp;
}
}
else if (q < 31)
{
/* TODO(ulfjack):Use a tighter bound here. */
vrIsTrailingZeros = multipleOfPowerOf2(mv, q - 1);
}
}
/*
* Step 4: Find the shortest decimal representation in the interval of
* legal representations.
*/
uint32 removed = 0;
uint32 output;
if (vmIsTrailingZeros || vrIsTrailingZeros)
{
/* General case, which happens rarely (~4.0%). */
while (vp / 10 > vm / 10)
{
vmIsTrailingZeros &= vm - (vm / 10) * 10 == 0;
vrIsTrailingZeros &= lastRemovedDigit == 0;
lastRemovedDigit = (uint8) (vr % 10);
vr /= 10;
vp /= 10;
vm /= 10;
++removed;
}
if (vmIsTrailingZeros)
{
while (vm % 10 == 0)
{
vrIsTrailingZeros &= lastRemovedDigit == 0;
lastRemovedDigit = (uint8) (vr % 10);
vr /= 10;
vp /= 10;
vm /= 10;
++removed;
}
}
if (vrIsTrailingZeros && lastRemovedDigit == 5 && vr % 2 == 0)
{
/* Round even if the exact number is .....50..0. */
lastRemovedDigit = 4;
}
/*
* We need to take vr + 1 if vr is outside bounds or we need to round
* up.
*/
output = vr + ((vr == vm && (!acceptBounds || !vmIsTrailingZeros)) || lastRemovedDigit >= 5);
}
else
{
/*
* Specialized for the common case (~96.0%). Percentages below are
* relative to this.
*
* Loop iterations below (approximately): 0: 13.6%, 1: 70.7%, 2:
* 14.1%, 3: 1.39%, 4: 0.14%, 5+: 0.01%
*/
while (vp / 10 > vm / 10)
{
lastRemovedDigit = (uint8) (vr % 10);
vr /= 10;
vp /= 10;
vm /= 10;
++removed;
}
/*
* We need to take vr + 1 if vr is outside bounds or we need to round
* up.
*/
output = vr + (vr == vm || lastRemovedDigit >= 5);
}
const int32 exp = e10 + removed;
floating_decimal_32 fd;
fd.exponent = exp;
fd.mantissa = output;
return fd;
}
static inline int
to_chars_f(const floating_decimal_32 v, const uint32 olength, char *const result)
{
/* Step 5: Print the decimal representation. */
int index = 0;
uint32 output = v.mantissa;
int32 exp = v.exponent;
/*----
* On entry, mantissa * 10^exp is the result to be output.
* Caller has already done the - sign if needed.
*
* We want to insert the point somewhere depending on the output length
* and exponent, which might mean adding zeros:
*
* exp | format
* 1+ | ddddddddd000000
* 0 | ddddddddd
* -1 .. -len+1 | dddddddd.d to d.ddddddddd
* -len ... | 0.ddddddddd to 0.000dddddd
*/
uint32 i = 0;
int32 nexp = exp + olength;
if (nexp <= 0)
{
/* -nexp is number of 0s to add after '.' */
Assert(nexp >= -3);
/* 0.000ddddd */
index = 2 - nexp;
/* copy 8 bytes rather than 5 to let compiler optimize */
memcpy(result, "0.000000", 8);
}
else if (exp < 0)
{
/*
* dddd.dddd; leave space at the start and move the '.' in after
*/
index = 1;
}
else
{
/*
* We can save some code later by pre-filling with zeros. We know that
* there can be no more than 6 output digits in this form, otherwise
* we would not choose fixed-point output. memset 8 rather than 6
* bytes to let the compiler optimize it.
*/
Assert(exp < 6 && exp + olength <= 6);
memset(result, '0', 8);
}
while (output >= 10000)
{
const uint32 c = output - 10000 * (output / 10000);
const uint32 c0 = (c % 100) << 1;
const uint32 c1 = (c / 100) << 1;
output /= 10000;
memcpy(result + index + olength - i - 2, DIGIT_TABLE + c0, 2);
memcpy(result + index + olength - i - 4, DIGIT_TABLE + c1, 2);
i += 4;
}
if (output >= 100)
{
const uint32 c = (output % 100) << 1;
output /= 100;
memcpy(result + index + olength - i - 2, DIGIT_TABLE + c, 2);
i += 2;
}
if (output >= 10)
{
const uint32 c = output << 1;
memcpy(result + index + olength - i - 2, DIGIT_TABLE + c, 2);
}
else
{
result[index] = (char) ('0' + output);
}
if (index == 1)
{
/*
* nexp is 1..6 here, representing the number of digits before the
* point. A value of 7+ is not possible because we switch to
* scientific notation when the display exponent reaches 6.
*/
Assert(nexp < 7);
/* gcc only seems to want to optimize memmove for small 2^n */
if (nexp & 4)
{
memmove(result + index - 1, result + index, 4);
index += 4;
}
if (nexp & 2)
{
memmove(result + index - 1, result + index, 2);
index += 2;
}
if (nexp & 1)
{
result[index - 1] = result[index];
}
result[nexp] = '.';
index = olength + 1;
}
else if (exp >= 0)
{
/* we supplied the trailing zeros earlier, now just set the length. */
index = olength + exp;
}
else
{
index = olength + (2 - nexp);
}
return index;
}
static inline int
to_chars(const floating_decimal_32 v, const bool sign, char *const result)
{
/* Step 5: Print the decimal representation. */
int index = 0;
uint32 output = v.mantissa;
uint32 olength = decimalLength(output);
int32 exp = v.exponent + olength - 1;
if (sign)
result[index++] = '-';
/*
* The thresholds for fixed-point output are chosen to match printf
* defaults. Beware that both the code of to_chars_f and the value of
* FLOAT_SHORTEST_DECIMAL_LEN are sensitive to these thresholds.
*/
if (exp >= -4 && exp < 6)
return to_chars_f(v, olength, result + index) + sign;
/*
* If v.exponent is exactly 0, we might have reached here via the small
* integer fast path, in which case v.mantissa might contain trailing
* (decimal) zeros. For scientific notation we need to move these zeros
* into the exponent. (For fixed point this doesn't matter, which is why
* we do this here rather than above.)
*
* Since we already calculated the display exponent (exp) above based on
* the old decimal length, that value does not change here. Instead, we
* just reduce the display length for each digit removed.
*
* If we didn't get here via the fast path, the raw exponent will not
* usually be 0, and there will be no trailing zeros, so we pay no more
* than one div10/multiply extra cost. We claw back half of that by
* checking for divisibility by 2 before dividing by 10.
*/
if (v.exponent == 0)
{
while ((output & 1) == 0)
{
const uint32 q = output / 10;
const uint32 r = output - 10 * q;
if (r != 0)
break;
output = q;
--olength;
}
}
/*----
* Print the decimal digits.
* The following code is equivalent to:
*
* for (uint32 i = 0; i < olength - 1; ++i) {
* const uint32 c = output % 10; output /= 10;
* result[index + olength - i] = (char) ('0' + c);
* }
* result[index] = '0' + output % 10;
*/
uint32 i = 0;
while (output >= 10000)
{
const uint32 c = output - 10000 * (output / 10000);
const uint32 c0 = (c % 100) << 1;
const uint32 c1 = (c / 100) << 1;
output /= 10000;
memcpy(result + index + olength - i - 1, DIGIT_TABLE + c0, 2);
memcpy(result + index + olength - i - 3, DIGIT_TABLE + c1, 2);
i += 4;
}
if (output >= 100)
{
const uint32 c = (output % 100) << 1;
output /= 100;
memcpy(result + index + olength - i - 1, DIGIT_TABLE + c, 2);
i += 2;
}
if (output >= 10)
{
const uint32 c = output << 1;
/*
* We can't use memcpy here: the decimal dot goes between these two
* digits.
*/
result[index + olength - i] = DIGIT_TABLE[c + 1];
result[index] = DIGIT_TABLE[c];
}
else
{
result[index] = (char) ('0' + output);
}
/* Print decimal point if needed. */
if (olength > 1)
{
result[index + 1] = '.';
index += olength + 1;
}
else
{
++index;
}
/* Print the exponent. */
result[index++] = 'e';
if (exp < 0)
{
result[index++] = '-';
exp = -exp;
}
else
result[index++] = '+';
memcpy(result + index, DIGIT_TABLE + 2 * exp, 2);
index += 2;
return index;
}
static inline bool
f2d_small_int(const uint32 ieeeMantissa,
const uint32 ieeeExponent,
floating_decimal_32 *v)
{
const int32 e2 = (int32) ieeeExponent - FLOAT_BIAS - FLOAT_MANTISSA_BITS;
/*
* Avoid using multiple "return false;" here since it tends to provoke the
* compiler into inlining multiple copies of f2d, which is undesirable.
*/
if (e2 >= -FLOAT_MANTISSA_BITS && e2 <= 0)
{
/*----
* Since 2^23 <= m2 < 2^24 and 0 <= -e2 <= 23:
* 1 <= f = m2 / 2^-e2 < 2^24.
*
* Test if the lower -e2 bits of the significand are 0, i.e. whether
* the fraction is 0. We can use ieeeMantissa here, since the implied
* 1 bit can never be tested by this; the implied 1 can only be part
* of a fraction if e2 < -FLOAT_MANTISSA_BITS which we already
* checked. (e.g. 0.5 gives ieeeMantissa == 0 and e2 == -24)
*/
const uint32 mask = (1U << -e2) - 1;
const uint32 fraction = ieeeMantissa & mask;
if (fraction == 0)
{
/*----
* f is an integer in the range [1, 2^24).
* Note: mantissa might contain trailing (decimal) 0's.
* Note: since 2^24 < 10^9, there is no need to adjust
* decimalLength().
*/
const uint32 m2 = (1U << FLOAT_MANTISSA_BITS) | ieeeMantissa;
v->mantissa = m2 >> -e2;
v->exponent = 0;
return true;
}
}
return false;
}
/*
* Store the shortest decimal representation of the given float as an
* UNTERMINATED string in the caller's supplied buffer (which must be at least
* FLOAT_SHORTEST_DECIMAL_LEN-1 bytes long).
*
* Returns the number of bytes stored.
*/
int
float_to_shortest_decimal_bufn(float f, char *result)
{
/*
* Step 1: Decode the floating-point number, and unify normalized and
* subnormal cases.
*/
const uint32 bits = float_to_bits(f);
/* Decode bits into sign, mantissa, and exponent. */
const bool ieeeSign = ((bits >> (FLOAT_MANTISSA_BITS + FLOAT_EXPONENT_BITS)) & 1) != 0;
const uint32 ieeeMantissa = bits & ((1u << FLOAT_MANTISSA_BITS) - 1);
const uint32 ieeeExponent = (bits >> FLOAT_MANTISSA_BITS) & ((1u << FLOAT_EXPONENT_BITS) - 1);
/* Case distinction; exit early for the easy cases. */
if (ieeeExponent == ((1u << FLOAT_EXPONENT_BITS) - 1u) || (ieeeExponent == 0 && ieeeMantissa == 0))
{
return copy_special_str(result, ieeeSign, (ieeeExponent != 0), (ieeeMantissa != 0));
Change floating-point output format for improved performance. Previously, floating-point output was done by rounding to a specific decimal precision; by default, to 6 or 15 decimal digits (losing information) or as requested using extra_float_digits. Drivers that wanted exact float values, and applications like pg_dump that must preserve values exactly, set extra_float_digits=3 (or sometimes 2 for historical reasons, though this isn't enough for float4). Unfortunately, decimal rounded output is slow enough to become a noticable bottleneck when dealing with large result sets or COPY of large tables when many floating-point values are involved. Floating-point output can be done much faster when the output is not rounded to a specific decimal length, but rather is chosen as the shortest decimal representation that is closer to the original float value than to any other value representable in the same precision. The recently published Ryu algorithm by Ulf Adams is both relatively simple and remarkably fast. Accordingly, change float4out/float8out to output shortest decimal representations if extra_float_digits is greater than 0, and make that the new default. Applications that need rounded output can set extra_float_digits back to 0 or below, and take the resulting performance hit. We make one concession to portability for systems with buggy floating-point input: we do not output decimal values that fall exactly halfway between adjacent representable binary values (which would rely on the reader doing round-to-nearest-even correctly). This is known to be a problem at least for VS2013 on Windows. Our version of the Ryu code originates from https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the following (significant) modifications: - Output format is changed to use fixed-point notation for small exponents, as printf would, and also to use lowercase 'e', a minimum of 2 exponent digits, and a mandatory sign on the exponent, to keep the formatting as close as possible to previous output. - The output of exact midpoint values is disabled as noted above. - The integer fast-path code is changed somewhat (since we have fixed-point output and the upstream did not). - Our project style has been largely applied to the code with the exception of C99 declaration-after-statement, which has been retained as an exception to our present policy. - Most of upstream's debugging and conditionals are removed, and we use our own configure tests to determine things like uint128 availability. Changing the float output format obviously affects a number of regression tests. This patch uses an explicit setting of extra_float_digits=0 for test output that is not expected to be exactly reproducible (e.g. due to numerical instability or differing algorithms for transcendental functions). Conversions from floats to numeric are unchanged by this patch. These may appear in index expressions and it is not yet clear whether any change should be made, so that can be left for another day. This patch assumes that the only supported floating point format is now IEEE format, and the documentation is updated to reflect that. Code by me, adapting the work of Ulf Adams and other contributors. References: https://dl.acm.org/citation.cfm?id=3192369 Reviewed-by: Tom Lane, Andres Freund, Donald Dong Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk
2019-02-13 16:20:33 +01:00
}
floating_decimal_32 v;
const bool isSmallInt = f2d_small_int(ieeeMantissa, ieeeExponent, &v);
if (!isSmallInt)
{
v = f2d(ieeeMantissa, ieeeExponent);
}
return to_chars(v, ieeeSign, result);
}
/*
* Store the shortest decimal representation of the given float as a
* null-terminated string in the caller's supplied buffer (which must be at
* least FLOAT_SHORTEST_DECIMAL_LEN bytes long).
*
* Returns the string length.
*/
int
float_to_shortest_decimal_buf(float f, char *result)
{
const int index = float_to_shortest_decimal_bufn(f, result);
/* Terminate the string. */
Assert(index < FLOAT_SHORTEST_DECIMAL_LEN);
result[index] = '\0';
return index;
}
/*
* Return the shortest decimal representation as a null-terminated palloc'd
* string (outside the backend, uses malloc() instead).
*
* Caller is responsible for freeing the result.
*/
char *
float_to_shortest_decimal(float f)
{
char *const result = (char *) palloc(FLOAT_SHORTEST_DECIMAL_LEN);
float_to_shortest_decimal_buf(f, result);
return result;
}