FizzBuzz

   FizzBuzz is a relatively simple programming problem that's famous/infamous
   by having a number of different approach solutions and is often used e.g.
   in interviews to test the skills of potential hires. It comes from a child
   game that teaches basic integer division in which kids are supposed to
   shout a specific word if a number is divisible by some other number --
   what's of interest about the problem is not the solution itself (which is
   trivial) but rather how one should structure an [1]algorithm that solves
   the problem. The problem is stated as follows:

   Write a program that writes out numbers from 1 to 100 (including both);
   however if a number is divisible by 3, write "Fizz" instead of the number,
   if the number is divisible by 5, write "Buzz" instead of it and if it is
   divisible by both 3 and 5, write "FizzBuzz" instead of it.

   The statement may of course differ slightly, for example in saying how the
   output should be formatted or by specifying goals such as "make the
   program as short as possible" or "make the program as fast as possible".
   For the sake of this article let's consider the following the correct
   output of the algorithm:

   1, 2, Fizz, 4, Buzz, Fizz, 7, 8, Fizz, Buzz, 11, Fizz, 13, 14, FizzBuzz,
   16, 17, Fizz, 19, Buzz, Fizz, 22, 23, Fizz, Buzz, 26, Fizz, 28, 29,
   FizzBuzz, 31, 32, Fizz, 34, Buzz, Fizz, 37, 38, Fizz, Buzz, 41, Fizz, 43,
   44, FizzBuzz, 46, 47, Fizz, 49, Buzz, Fizz, 52, 53, Fizz, Buzz, 56, Fizz,
   58, 59, FizzBuzz, 61, 62, Fizz, 64, Buzz, Fizz, 67, 68, Fizz, Buzz, 71,
   Fizz, 73, 74, FizzBuzz, 76, 77, Fizz, 79, Buzz, Fizz, 82, 83, Fizz, Buzz,
   86, Fizz, 88, 89, FizzBuzz, 91, 92, Fizz, 94, Buzz, Fizz, 97, 98, Fizz,
   Buzz

   Why the fuss around FizzBuzz? Well, firstly it dodges an obvious single
   elegant solution that many similar problems usually have and it leads a
   beginner to a difficult situation that can reveal a lot about his
   experience and depth of his knowledge. The tricky part lies in having to
   check not only divisibility by 3 and 5, but also by BOTH at once, which
   when following basic programming instincts ("just if-then-else
   everything") leads to inefficiently checking the same divisibility twice
   and creating some extra ugly if branches and also things like reusing
   [2]magic constants in multiple places, conflicting the "[3]DRY" principle
   etc. It can also show if the guy knows things usually unknown to beginners
   such as that the modulo operation with non-power-of-two is usually
   expensive and we want to minimize its use. However it is greatly useful
   even when an experienced programmer faces it because it can serve a good,
   deeper discussion about things like [4]optimization; while FizzBuzz itself
   has no use and optimizing algorithm that processes 100 numbers is
   completely pointless, the problem is similar to some problems in practice
   in which the approach to solution often becomes critical, considering
   [5]scalability. In practice we may very well encounter FizzBuzze's big
   brother, a problem in which we'll need to check not 100 numbers but 100
   million numbers per second and check not only divisibility by 3 and 5, but
   by let's say all [6]prime numbers. Problems like this come up e.g. in
   [7]cryptography all the time, so we really have to come to discussing
   [8]time complexity classes, [9]instruction sets and hardware acceleration,
   [10]parallelism, possibly even [11]quantum computing, different
   [12]paradigms etc. So really FizzBuzz is like a kind of great conversation
   starter, a bag of topics, a good training example and so on.

   TODO: some history etc.

Implementations

   Let's see how we can implement, improve and [13]optimize FizzBuzz in
   [14]C. Keep in mind the question of scalability, i.e. try to imagine how
   the changes we make to the algorithm would manifest if the problem grew,
   i.e. if for example we wanted to check divisibility by many more numbers
   than just 1 and 5 etc. We will only focus on optimizing the core of the
   algorithm, i.e. the divisibility checking, without caring about other
   things like optimizing printing the commas between numbers and whatnot.
   Also we'll be supposing all compiler optimization are turned off so that
   the excuse "compiler will optimize this" can't be used :)

   For starters let us write a kind of vanilla, [15]naive solution that
   everyone will likely come up with as his first attempt. A complete noob
   will fail to produce even this basic version, a slightly advanced
   programmer (we might say a "[16]coder") may submit this as the [17]final
   solution.

 #include <stdio.h>

 int main(void)
 {
   for (int i = 1; i <= 100; ++i)
   {
     if (i != 1)
       printf(", ");

     if (i % 3 == 0 && i % 5 == 0)
       printf("FizzBuzz");
     else if (i % 3 == 0) // checking divisibility by 3 again :/
       printf("Fizz");
     else if (i % 5 == 0) // checking divisibility by 5 again :/
       printf("Buzz");
     else
       printf("%d",i);
   }

   putchar('\n');
   return 0;
 }

   It [18]works, however with a number of issues. Firstly we see that for
   every number we check we potentially test the divisibility by 3 and 5
   twice, which is not [19]good, considering division (and modulo) are one of
   the slowest instructions. We also reuse the magic constants 3 and 5 in
   different places, which would start to create a huge mess if we were
   dealing with many more divisors. There is also a lot of branching, in the
   main divisibility check we may jump up to three times for the checked
   number -- jump instruction are slow and we'd like to avoid them (again,
   consider we were checking e.g. divisibility by 1000 different numbers). A
   first, tiny optimization (that here will likely be performed
   automatically) to notice is that i % 3 == 0 && i % 5 == 0 can be changed
   to just i % 15 == 0.

   When asked to optimize the algorithm a bit more one might come up with
   something like this:

 #include <stdio.h>

 int main(void)
 {
   for (int i = 1; i <= 100; ++i)
   {
     if (i != 1)
       printf(", ");

     int printNum = 1;

     if (i % 3 == 0)
     {
       printf("Fizz");
       printNum = 0;
     }

     if (i % 5 == 0)
     {
       printf("Buzz");
       printNum = 0;
     }

     if (printNum)
       printf("%d",i);
   }

   putchar('\n');
   return 0;
 }

   Now we check the divisibility by 3 and 5 only once for each tested number,
   and also keep only one occurrence of each constant in a single place,
   that's good. But we still keep the slow branching.

   A bit more experienced programmer may now come with something like this:

 #include <stdio.h>

 int main(void)
 {
   for (int i = 1; i <= 100; ++i)
   {
     if (i != 1)
       printf(", ");

     switch ((i % 3 == 0) + ((i % 5 == 0) << 1))
     {
       case 1: printf("Fizz"); break;
       case 2: printf("Buzz"); break;
       case 3: printf("FizzBuzz"); break;
       default: printf("%d",i); break;
     }

   }

   putchar('\n');
   return 0;
 }

   This solution utilizes a [20]switch structure to only perform single
   branching in the divisibility check, based on a 2 bit value that in its
   upper bit records divisibility by 5 and in the lower bit divisibility by
   3. This gives us 4 possible values: 0 (divisible by none), 1 (divisible by
   3), 2 (divisible by 5) and 3 (divisible by both). The switch structure by
   default creates a jump table that branches right into the correct label in
   O(1).

   We can even go as far as avoiding any branching at all with so called
   [21]branchless programming, even though in this specific case saving one
   branch is probably not worth the cost of making it happen. But for the
   sake of completeness we can do e.g. something as follows.

 #include <stdio.h>

 char str[] = "\0\0\0\0\0\0\0\0Fizz\0\0\0\0Buzz\0\0\0\0FizzBuzz";

 int main(void)
 {
   for (int i = 1; i <= 100; ++i)
   {
     if (i != 1)
       printf(", ");

     // look mom, no branches!
     char *s = str;

     *s = '1'; // convert number to string
     s += i >= 100;
     *s = '0' + (i / 10) % 10;
     s += (*s != '0') | (i >= 100);
     *s = '0' + i % 10;

     int offset = ((i % 3 == 0) + ((i % 5 == 0) << 1)) << 3;
     printf(str + offset);
   }

   putchar('\n');
   return 0;
 }

   The idea is to have a kind of [22]look up table of all options we can
   print, then take the thing to actually print out by indexing the table
   with the 2 bit divisibility value we used in the above example. Our lookup
   table here is the global string str, we can see it rather as an array of
   zero terminated strings, each one starting at the multiple of 8 index
   (this alignment to power of two will make the indexing more efficient as
   we'll be able to compute the offset with a mere bit shift as opposed to
   multiplication). The first item in the table is initially empty (all
   zeros) and in each loop cycle will actually be overwritten with the ASCII
   representation of currently checked number, the second item is "Fizz", the
   third item is "Buzz" and last one is "FizzBuzz". In each loop cycle we
   compute the 2 bit divisibility value, which will be a number 0 to 3, bit
   shift it by 3 to the left (multiply it by 8) and use that as an offset,
   i.e. the place where the printing function will start printing (also note
   that printing will stop at encountering a zero value). The conversion of
   number to ASCII is also implemented without any branches (and could be
   actually a bit simpler as we know e.g. the number 100 won't ever be
   printed). However notice that we pay a great price for all this: the code
   is quite ugly and unreadable and also performance-wise we many times waste
   time on converting the number to ASCII even if it then won't be printed,
   i.e. something that a branch can actually prevent, and the conversion
   actually further uses modulo and division instructions which we are trying
   to avoid in the first place... so at this point we probably overengineered
   this.

   If the problem asks for shortest code, even on detriment of
   [23]readability and efficiency, we might try the [24]code golfer approach:

 #include <stdio.h>
 #define P printf(
 int i;int main(){while(i<100){if(i++)P", ");int a=!(i%3)+!(i%5)*2;if(a)P"FizzBuzz\0Fizz"+(4+(a==1)*5)*(a!=3));else P"%d",i);}P"\n");}

   It's almost definitely not minimal but can be a good start.

   TODO: comun

Links:
1. algorithm.md
2. magic_constant.md
3. dry.md
4. optimization.md
5. scalability.md
6. prime.md
7. cryptography.md
8. time_complexity.md
9. isa.md
10. parallelism.md
11. quantum.md
12. paradigm.md
13. optimization.md
14. c.md
15. naive.md
16. coding.md
17. final_solution.md
18. just_werks.md
19. good.md
20. switch.md
21. branchless.md
22. lut.md
23. readability.md
24. code_golf.md