bpo-37986: Improve perfomance of PyLong_FromDouble() by sir-sigurd · Pull Request #15611 · python/cpython

sir-sigurd · 2019-08-30T09:00:08Z

https://bugs.python.org/issue37986

gpshead · 2019-09-11T15:16:05Z

This is already on the slow path, it seems safest to keep this check in place even though it should've been handled by the above int range checks. smart compilers would see that (no idea how many are smart enough to unroll frexp and understand).

I do not think it makes sense to keep this code.

Either seems fine to me. Personally, I'd probably keep the check out of defensiveness (someone could, for whatever reason, move the fast path out at some point in the future; it's nice if the slow path remains valid in that case), but I'm happy for this to be merged as is. Do we at least have unit tests that cover this case?

smart compilers would see that (no idea how many are smart enough to unroll frexp and understand).

At least gcc is not smart enough.

a compromise is to turn it into assert(expr <= 0); as protection against future code changes breaking our assumption. Our buildbots run --with-pydebug builds where assertions are enabled.

gpshead · 2019-09-11T15:47:08Z

int_max is an imprecise value on platforms where sizeof(long) >= sizeof(double). Most 64-bit systems have long's larger than a double's 53-bit mantissa (and likely all platforms when considering long long per the above comment).

Will it be truncated in the right direction (towards zero) to avoid this triggering on values with undefined conversion behavior?

the previous code used LONG_MIN < v and v < LONG_MAX directly rather than LONG_MAX + 1 stored into a double. (I believe the C promotion will promoted those values to a double before comparison as all floating point types have a higher rank than integer types)

The original comment explains why you should use < LONG_MAX. I would keep the original comment and the code, and just move it into PyLong_FromDouble().

I think I had to add comment about this: I assumed that LONG_MAX == 2 ** (CHAR_BIT * sizeof(long) - 1) - 1 and LONG_MIN == -2 ** (CHAR_BIT * sizeof(long) - 1), i.e. (unsigned long)LONG_MAX + 1 is a power of two and can be exactly represented by double (assuming that FLT_RADIX == 2). Does that make sense?

(Originally I wrote it like this: const double int_max = pow(2, CHAR_BIT * sizeof(long) - 1), see #15611 (comment))

Here I'm trying to demonstrate correctness of this approach:

In [66]: SIZEOF_LONG = 8; CHAR_BITS = 8 In [67]: LONG_MAX = (1 << (SIZEOF_LONG * CHAR_BITS - 1)) - 1; LONG_MIN = -LONG_MAX - 1 In [68]: int_max = float(LONG_MAX + 1) In [69]: int_max == LONG_MAX + 1 Out[69]: True In [70]: def cast_to_long(dval): ...: assert isinstance(dval, float) ...: wholepart = math.trunc(dval) ...: if LONG_MIN <= wholepart <= LONG_MAX: ...: return wholepart ...: raise RuntimeError('undefined behavior') In [71]: def long_from_double(dval): ...: assert isinstance(dval, float) ...: if -int_max <= dval < int_max: ...: return cast_to_long(dval) ...: raise ValueError('float is out of range, use frexp()') In [72]: long_from_double(int_max) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-72-280887471997> in <module>() ----> 1 long_from_double(int_max) <ipython-input-71-ccaef6014bf1> in long_from_double(dval) 3 if -int_max <= dval < int_max: 4 return cast_to_long(dval) ----> 5 raise ValueError('float is out of range, use frexp()') ValueError: float is out of range, use frexp() In [73]: int_max.hex() Out[73]: '0x1.0000000000000p+63' In [74]: long_from_double(float.fromhex('0x1.fffffffffffffp+62')) Out[74]: 9223372036854774784 In [75]: long_from_double(float.fromhex('-0x1.fffffffffffffp+62')) Out[75]: -9223372036854774784 In [76]: long_from_double(-int_max) Out[76]: -9223372036854775808 In [77]: long_from_double(float.fromhex('-0x1.0000000000001p+63')) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-77-de5e9e1eba23> in <module>() ----> 1 long_from_double(float.fromhex('-0x1.0000000000001p+63')) <ipython-input-71-ccaef6014bf1> in long_from_double(dval) 3 if -int_max <= dval < int_max: 4 return cast_to_long(dval) ----> 5 raise ValueError('float is out of range, use frexp()') ValueError: float is out of range, use frexp()

I think this is fine, under reasonable assumptions on the platform. LONG_MAX + 1 must be a power of 2 (follows from C99 §6.2.6.2p2), and while it's theoretically possible that double will be unable to represent LONG_MAX + 1 exactly, that seems highly unlikely in practice. So the conversion to double must be exact (C99 §6.3.1.4p2).

It's not safe based purely on the C standard to assume that LONG_MIN = -LONG_MAX - 1: the integer representation could be ones' complement or sign-magnitude, in which case LONG_MIN = -LONG_MAX. But that assumption is safe in practice for any platform that Python's likely to meet, and we make the assumption of two's complement for signed integers elsewhere in the codebase. If we're worried enough about this, we could change the -int_max <= dval comparison to -int_max < dval. On balance, I'd suggest making that change (partly just for the aesthetics of the symmetry).

Believe it or not, it's also not safe based purely on the C standard to assume that (unsigned long)LONG_MAX + 1 is representable as an unsigned long: C99 §6.2.5p9 only guarantees that nonnegative long values are representable as unsigned long But the chance of that not being true in practice is negligible (at least until someone tries to port CPython to the DS9000). And the failure mode is benign: we'd just end up never taking the fast path.

Re-reading all this, I had one more worry (which is why I dismissed my own review): what happens if the exact value of dval lies strictly between LONG_MAX and LONG_MAX + 1? In that case we could end up converting a double that, strictly speaking, is outside the range of long. But it turns out that we're safe, because C99 is quite explicit here: §6.3.1.4p1 says (emphasis mine):

If the value of the integral part cannot be represented by the integer type, the behavior is undefined.

So any double value that's strictly smaller than LONG_MAX + 1 should be fine.

it's also not safe based purely on the C standard to assume that (unsigned long)LONG_MAX + 1 is representable as an unsigned long

Then I think we could use ((double)(LONG_MAX / 2 + 1)) * 2, but does it worth it?

It's not safe based purely on the C standard to assume that LONG_MIN = -LONG_MAX - 1: the integer representation could be ones' complement or sign-magnitude, in which case LONG_MIN = -LONG_MAX. But that assumption is safe in practice for any platform that Python's likely to meet, and we make the assumption of two's complement for signed integers elsewhere in the codebase.

Shouldn't we formally state that we support only two's complement representation?
BTW it was proposed to abandon other representations and it looks like committee is agree with that.

the-knights-who-say-ni added the CLA signed label Aug 30, 2019

bedevere-bot added the awaiting review label Aug 30, 2019

sir-sigurd force-pushed the float-as-double-macro branch from 7555a03 to 1df092e Compare August 30, 2019 09:01

sir-sigurd changed the title ~~Improve perfomance of PyLong_FromDouble()~~ bpo-37986: Improve perfomance of PyLong_FromDouble() Aug 30, 2019

sir-sigurd commented Aug 30, 2019

View reviewed changes

Comment thread Objects/longobject.c Outdated

serhiy-storchaka reviewed Aug 30, 2019

View reviewed changes

Comment thread Objects/longobject.c Outdated

sir-sigurd force-pushed the float-as-double-macro branch from 1df092e to 0572857 Compare August 30, 2019 09:27

gpshead self-assigned this Sep 10, 2019

gpshead reviewed Sep 11, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

bpo-37986: Improve perfomance of PyLong_FromDouble()#15611

bpo-37986: Improve perfomance of PyLong_FromDouble()#15611
mdickinson merged 3 commits intopython:masterfrom
sir-sigurd:float-as-double-macro

sir-sigurd commented Aug 30, 2019 •

edited by bedevere-bot

Loading

Uh oh!

Uh oh!

Uh oh!

gpshead Sep 11, 2019

Uh oh!

serhiy-storchaka Sep 11, 2019

Uh oh!

mdickinson Oct 26, 2019

Uh oh!

sir-sigurd Oct 26, 2019

Uh oh!

gpshead Oct 26, 2019 •

edited

Loading

Uh oh!

serhiy-storchaka Oct 26, 2019

Uh oh!

gpshead Sep 11, 2019

Uh oh!

serhiy-storchaka Sep 11, 2019

Uh oh!

sir-sigurd Sep 11, 2019 •

edited

Loading

Uh oh!

sir-sigurd Sep 11, 2019

Uh oh!

mdickinson Oct 26, 2019

Uh oh!

mdickinson Oct 26, 2019

Uh oh!

sir-sigurd Oct 26, 2019

Uh oh!

mdickinson Oct 27, 2019

Uh oh!

Conversation

sir-sigurd commented Aug 30, 2019 • edited by bedevere-bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gpshead Oct 26, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sir-sigurd Sep 11, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

sir-sigurd commented Aug 30, 2019 •

edited by bedevere-bot

Loading

gpshead Oct 26, 2019 •

edited

Loading

sir-sigurd Sep 11, 2019 •

edited

Loading