Exposing Floating Point – Bartosz Ciechanowski (2019)
In depth explanation of floating point format.
Full article excerpt tap to expand
Despite everyday use, floating point numbers are often understood in a hand-wavy manner and their behavior raises many eyebrows. Over the course of this article I’d like to show that things aren’t actually that complicated. This blog post is a companion to my recently launched website – float.exposed. Other than exploiting the absurdity of present day list of top level domains, it’s intended to be a handy tool for inspecting floating point numbers. While I encourage you to play with it, the purpose of many of its elements may be exotic at first. By the time we’ve finished, however, all of them will hopefully become familiar. On a technical note, by floating point I’m referring to the ubiquitous IEEE 754 binary floating point format. Types half, float, and double are understood to be binary16, binary32, and binary64 respectively. There were other formats back in the day, but whatever device you’re reading this on is pretty much guaranteed to use IEEE 754. With the formalities out of the way, let’s start at the shallow end of the pool. Writing Numbers We’ll begin with the very basics of writing numeric values. The initial steps may seem trivial, but starting from the first principles will help us build a working model of floating point numbers. Decimal Numbers Consider the number 327.849. Digits to the left of the decimal point represent increasing powers of ten, while digits to the right of the decimal point represent decreasing powers of ten: 3 102 2 101 7 100 . 8 10−1 4 10−2 9 10−3 Even though this notation is very natural, it has a few disadvantages: small numbers like 0.000000000653 require skimming over many zeros before they start “showing” actually useful digits it’s hard to estimate the magnitude of large numbers like 7298345251 at a glance at some point the distant digits of a number become increasingly less significant and could often be dropped, yet for big numbers we don’t save any space by replacing them with zeros, e.g. 7298000000 By “small” and “big” numbers I’m referring to their magnitude so −4205 is understood to be bigger than 0.03 even though it’s to the left of it on the real number line. Scientific notation solves all these problems. It shifts the decimal point to right after the first non-zero digit and sets the exponent accordingly: +3.27849×102 Scientific notation has three major components: the sign (+), the significand (3.27849), and the exponent (2). For positive values the “+” sign is often omitted, but we’ll keep it around for the sake of verbosity. Note that the “10” simply shows that we’re dealing with base-10 system. The aforementioned disadvantages disappear: the 0-heavy small number is presented as 6.53×10−10 with all the pesky zeros removed just by looking at the first digit and the exponent of 7.298345251×109 we know that number is roughly 7 billion we can drop the unwanted distant digits from the tail to get 7.298×109 Continuing with the protagonist of this section, if we’re only interested in 4 most significant digits we can round the number using one of the many rounding rules: +3.278×102 The number of digits shown describes the precision we’re dealing with. A number with 8 digits of precision could be printed as: +3.2784900×102 Binary Numbers With the familiar base-10 out of the way, let’s look at the binary numbers. The rules of the game are exactly the same, it’s just that the base is 2 and not 10. Digits to the left of the binary point represent increasing powers of two, while digits to the…
This excerpt is published under fair use for community discussion. Read the full article at Ciechanow.