proxy70

                                  Fixed Point

   Fixed point arithmetic is a simple and often [1]good enough method of
   computer representation of [2]fractional numbers (i.e. numbers with higher
   precision than [3]integers, e.g. 4.03), as opposed to [4]floating point
   which is a more complicated way of doing this which in most cases we
   consider a worse, [5]bloated alternative. Probably in 99% cases when you
   think you need floating point, fixed point will do just fine. Fixed point
   arithmetic is not to be [6]confused with fixed point of a function in
   mathematics (fixed point of a function f(x) is such x that f(x) = x), a
   completely unrelated term.

   Fixed point has at least these advantages over floating point:

     * It doesn't require a special hardware coprocessor for efficient
       execution and so doesn't introduce a [7]dependency. Programs using
       floating point will run extremely slowly on systems without float
       hardware support as they have to emulate the complex hardware in
       software, while fixed point will run just as fast as integer
       arithmetic. For this reason fixed point is very often used in
       [8]embedded computers.
     * It is natural, easier to understand and therefore better predictable,
       less tricky, [9]KISS, [10]suckless. (Float's IEEE 754 standard is 58
       pages long, the paper What Every Computer Scientist Should Know About
       Floating-Point Arithmetic has 48 pages.)
     * Is easier to implement and so supported in many more systems. Any
       language or format supporting integers also supports fixed point.
     * It isn't ugly and in [11]two's complement doesn't waste values (unlike
       IEEE 754 with positive and negative zero, denormalized numbers, many
       [12]NaNs etc.).
     * Some simpler (i.e. better) programming languages such as [13]comun
       don't support float at all, while fixed point can be used in any
       language that supports integers.

How It Works

   Fixed point uses a fixed (hence the name) number of digits (bits in
   binary) for the integer part and the rest for the fractional part (whereas
   floating point's fractional part varies in size). I.e. we split the binary
   representation of the number into two parts (integer and fractional) by
   IMAGINING a radix point at some place in the binary representation. That's
   basically it. Fixed point therefore spaces numbers [14]uniformly, as
   opposed to floating point whose spacing of numbers is non-uniform.

   So, we can just use an integer data type as a fixed point data type, there
   is no need for libraries or special hardware support. We can also perform
   operations such as addition the same way as with integers. For example if
   we have a binary integer number represented as 00001001, 9 in decimal, we
   may say we'll be considering a radix point after let's say the sixth
   place, i.e. we get 000010.01 which we interpret as 2.25 (2^2 + 2^(-2)).
   The binary value we store in a variable is the same (as the radix point is
   only imagined), we only INTERPRET it differently.

   We may look at it this way: we still use integers but we use them to count
   smaller fractions than 1. For example in a 3D game where our basic spatial
   unit is 1 meter our variables may rather contain the number of centimeters
   (however in practice we should use powers of two, so rather 1/128ths of a
   meter). In the example in previous paragraph we count 1/4ths (we say our
   scaling factor is 1/4), so actually the number represented as 00000100 is
   what in floating point we'd write as 1.0 (00000100 is 4 and 4 * 1/4 = 1),
   while 00000001 means 0.25.

   This has just one consequence: we have to [15]normalize results of
   multiplication and division (addition and subtraction work just as with
   integers, we can normally use the + and - operators). I.e. when
   multiplying, we have to divide the result by the inverse of the fractions
   we're counting, i.e. by 4 in our case (1/(1/4) = 4). Similarly when
   dividing, we need to MULTIPLY the result by this number. This is because
   we are using fractions as our units and when we multiply two numbers in
   those units, the units multiply as well, i.e. in our case multiplying two
   numbers that count 1/4ths give a result that counts 1/16ths, we need to
   divide this by 4 to get the number of 1/4ths back again (this works the
   same as e.g. units in physics, multiplying number of meters by number of
   meters gives meters squared). For example the following integer
   multiplication:

   00001000 * 00000010 = 00010000 (8 * 2 = 16)

   in our system has to be normalized like this:

   (000010.00 * 000000.10) / 4 = 000001.00 (2.0 * 0.5 = 1.0)

   SIDE NOTE: in practice you may see division replaced by the shift operator
   (instead of /4 you'll see >> 2).

   With this normalization we also have to think about how to bracket
   expressions to prevent rounding errors and [16]overflows, for example
   instead of (x / y) * 4 we may want to write (x * 4) / y; imagine e.g. x
   being 00000010 (0.5) and y being 00000100 (1.0), the former would result
   in 0 (incorrect, rounding error) while the latter correctly results in
   0.5. The bracketing depends on what values you expect to be in the
   variables so it can't really be done automatically by a compiler or
   library (well, it might probably be somehow handled at [17]runtime, but of
   course, that will be slower). There are also ways to prevent overflows
   e.g. with clever [18]bit hacks.

   The normalization and bracketing are basically the only things you have to
   think about, apart from this everything works as with integers. Remember
   that this all also works with negative number in [19]two's complement, so
   you can use a signed integer type without any extra trouble.

   Remember to always use a power of two scaling factor -- this is crucial
   for performance. I.e. you want to count 1/2th, 1/4th, 1/8ths etc., but NOT
   1/10ths, as might be tempting. Why are power of two good here? Because
   computers work in binary and so the normalization operations with powers
   of two (division and multiplication by the scaling factor) can easily be
   optimized by the compiler to a mere [20]bit shift, an operation much
   faster than multiplication or division.

Code Example

   For start let's compare basic arithmetic operations in [21]C written with
   floating point and the same code written with fixed point. Consider the
   floating point code first:

 float
   a = 21,
   b = 3.0 / 4.0,
   c = -10.0 / 3.0;
  
 a = a * b;   // multiplication
 a += c;      // addition
 a /= b;      // division
 a -= 10;     // subtraction
 a /= 3;      // division
  
 printf("%f\n",a);

   Equivalent code with fixed point may look as follows:

 #define UNIT 1024       // our "1.0" value

 int
   a = 21 * UNIT,
   b = (3 * UNIT) / 4,   // note the brackets, (3 / 4) * UNIT would give 0
   c = (-10 * UNIT) / 3;

 a = (a * b) / UNIT;     // multiplication, we have to normalize
 a += c;                 // addition, no normalization needed
 a = (a * UNIT) / b;     // division, normalization needed, note the brackets
 a -= 10 * UNIT;         // subtraction
 a /= 3;                 // division by a number NOT in UNITs, no normalization needed
  
 printf("%d.%d%d%d\n",   // writing a nice printing function is left as an exercise :)
   a / UNIT,
   ((a * 10) / UNIT) % 10,
   ((a * 100) / UNIT) % 10,
   ((a * 1000) / UNIT) % 10);

   These examples output 2.185185 and 2.184, respectively.

   Now consider another example: a simple [22]C program using fixed point
   with 10 fractional bits, computing [23]square roots of numbers from 0 to
   10.

 #include <stdio.h>

 typedef int Fixed;

 #define UNIT_FRACTIONS 1024 // 10 fractional bits, 2^10 = 1024

 #define INT_TO_FIXED(x) ((x) * UNIT_FRACTIONS)

 Fixed fixedSqrt(Fixed x)
 {
   // stupid brute force square root

   int previousError = -1;
  
   for (int test = 0; test <= x; ++test)
   {
     int error = x - (test * test) / UNIT_FRACTIONS;

     if (error == 0)
       return test;
     else if (error < 0)
       error *= -1;

     if (previousError > 0 && error > previousError)
       return test - 1;

     previousError = error;
   }

   return 0;
 }

 void fixedPrint(Fixed x)
 {
   printf("%d.%03d",x / UNIT_FRACTIONS,
     ((x % UNIT_FRACTIONS) * 1000) / UNIT_FRACTIONS);
 }

 int main(void)
 {
   for (int i = 0; i <= 10; ++i)
   {
     printf("%d: ",i);
    
     fixedPrint(fixedSqrt(INT_TO_FIXED(i)));
    
     putchar('\n');
   }
  
   return 0;
 }

   The output is:

 0: 0.000
 1: 1.000
 2: 1.414
 3: 1.732
 4: 2.000
 5: 2.236
 6: 2.449
 7: 2.645
 8: 2.828
 9: 3.000
 10: 3.162

Links:
1. good_enough.md
2. rational_number.md
3. integer.md
4. float.md
5. bloat.md
6. often_confused.md
7. dependency.md
8. embedded.md
9. kiss.md
10. suckless.md
11. twos_complement.md
12. nan.md
13. comun.md
14. uniformity.md
15. normalization.md
16. overflow.md
17. runtime.md
18. bit_hack.md
19. twos_complement.md
20. bit_shift.md
21. c.md
22. c.md
23. sqrt.md