proxy70

                                    Sorting

   Sorting denotes the action of rearranging a sequence, such as a [1]list of
   [2]numbers, so that the elements are put in a specific [3]order (e.g.
   ascending or descending). In [4]computer science sorting enjoys the status
   of a wide and curious topic, there are dozens, maybe hundreds of sorting
   [5]algorithms, each with pros and cons and different attributes are being
   studied, e.g. the algorithm's [6]time complexity, stability etc. Sorting
   algorithms are a favorite subject of programming classes as they provide a
   good exercise for [7]programming and analysis of algorithms and can be
   nicely put on tests :) Sorting algorithms are like [8]Pokemon for computer
   nerds, some are big, some are small and cute and everyone has a favorite.
   { Gotta implement them all? ~drummyfish }

   Some celebrities among sorting algorithms are the [9]bubble sort (a simple
   [10]KISS algorithm), [11]quick sort (a super fast one), [12]merge sort
   (also lightning fast) and [13]stupid sort (just tries different
   [14]permutations until it hits the jackpot).

   In our day-to-day lives we commonly get away with some of the simplest,
   uncomplicated sorting algorithms (such as [15]bubble sort or [16]insertion
   sort) anyway, unless we're programming a database or otherwise treating
   enormous amounts of data. If we need to sort just a few hundred of items
   and/or the sorting doesn't occur very often, a simple algorithm does the
   job well, sometimes even faster due to a potential initial overhead of a
   very complex algorithm. So always consider the [17]KISS approach first.

   Attributes of sorting algorithms we're generally interested in are the
   following:

     * [18]time and [19]space complexity: Time and space complexity hints on
       how fast the algorithm will run and how much memory it will need,
       specifically we're interested in the best, worst and average case
       depending on the length of the input sequence. Indeed we ideally want
       the fastest algorithm, but it has to be known that a better time
       complexity doesn't have to imply a faster run time in practice,
       especially with shorter sequences. An algorithm that's extremely fast
       in best case scenario may be extremely slow in non-ideal cases. With
       memory, we are often interested whether the algorithm works in place;
       such an algorithm only needs a constant amount of memory in addition
       to the memory that the sorted sequence takes, i.e. the sequence is
       sorted in the memory where it resides.
     * implementation complexity: A simple algorithm is better if it's
       [20]good enough. It may lead to e.g. smaller code size which may be a
       factor e.g. in [21]embedded.
     * stability: A stable sorting algorithm preserves the order of the
       elements that are considered equal. With pure numbers this of course
       doesn't matter, but if we're sorting more complex data structures
       (e.g. sorting records about people by their names), this attribute may
       become important.
     * comparative vs non-comparative: A comparative sort only requires a
       single operation that compares any two elements and says which one has
       a higher value -- such an algorithm is general and can be used for
       sorting any data, but its time complexity of the average case can't be
       better than O(n * log(n)). Non-comparison sorts can be faster as they
       may take advantage of other possible integer operations.
     * [22]recursion and [23]parallelism: Some algorithms are recursive in
       nature, some are not. Some algorithms can be parallelised e.g. with a
       [24]GPU which will greatly increase their speed.
     * other: There may be other specific, e.g. some algorithms are are slow
       if sorting an already sorted sequence (which is addressed by adaptive
       sorting), so we may have to also consider the nature of data we'll be
       sorting. Other times we may be interested e.g. in what machine
       instructions the algorithm will compile to etc.

   In practice not only the algorithm but also details of its implementation
   matters. For instance if we have a sequence of very large data structures
   to sort, we may want to avoid physically rearranging these structures in
   memory, this could be slow. In such scenario we may want to use indirect
   sorting: we create an additional list whose elements are indices to the
   main sequence, and we only sort this list of indices.

List Of Sorting Algorithms

   TODO

Example And Code

   For starters let's take a look at one of the simplest sorting algorithms,
   bubble sort. Its basic version looks something like this ([25]pseudocode):

 for j from 0 to N - 2 (inclusive)
   for i from 0 to to N - 2 - j (inclusive)
     is array[i] > array[i + 1]
       swap array[i] and array[i + 1]

   How does this work? Firstly notice there are two loops. The outer loop,
   with counter variable j, runs N - 1 times -- in each iteration of this
   loop we will ensure one value gets to its correct place; specifically the
   values will be getting to their correct places from the top -- highest
   values will be sorted first (you can also implement the algorithm the
   other way around too, to sort the lowest values first, try it as an
   exercise). This makes sense, imagine that we have e.g. a sequence of
   length N = 4 -- then the outer loop will run N - 1 = 3 times (j will have
   values 0, 1 and 2); after fist iteration 1 value will be in its correct
   place, after 2 iterations 3 values will be in place and after 3 iterations
   3 values will be in place which also means the last (forth) value has to
   be in place too, i.e. the array must be sorted. Now for the inner loop
   (with variable i): this one ensures actually getting the value in its
   place. Notice it goes from 0 to the top and always compares two neighbors
   in the array -- if the bottom neighbor is higher than the top neighbor,
   the loop swaps them, ensuring that the highest value will get to the top
   (it kind of "bubbles" up, hence the algorithm name). Also notice this loop
   doesn't always go to the very end of the array! It subtracts the value j
   from its top boundary because there the values that are already in place
   reside, so we don't need to sort them anymore; the inner loop can end
   earlier and earlier as the outer loop progresses. The algorithm would
   still work if we went through the whole array every time (try it), but its
   [26]time complexity would suffer, i.e. by noticing the inner loop can get
   progressively shorter we greatly [27]optimize the algorithm. Anyway, how
   the algorithm actually works is best seen on an example, so let's now try
   to use the algorithm to sort the following sequence:

 3 7 8 3 2

   The length of the sequence is N = 5, so j (the outer loop) will go from 0
   to 3. The following shows how the array changes (/\ shows comparison of
   neighbors, read top to bottom and left to right):

          j = 0        j = 1        j = 2        j = 3
    
                                                 swapped
 i = 0    /\           /\           /\           /\
          37832        37328        33278        23378     <-- SORTED
                                                  """"
                       swapped      swapped
 i = 1     /\           /\           /\
          37832        33728        32378  <--- last 3 items are in their places
                                      """
          swapped      swapped
 i = 2      /\           /\
          37382        33278  <--- last 2 items are in their places
                          ""
          swapped
 i = 3       /\
          37328   <--- last item is in its place
              "

   Hopefully it's at least a bit clear -- if not, try to perform the
   algorithm by hand, that's a practically guaranteed way of gaining
   understanding of the algorithm.

   Now let's see other algorithms and some actual runnable code. The
   following is a [28]C program that shows implementations of some of the
   common sorting algorithms and also measures their speed:

 #include <stdio.h>
 #include <time.h>
 #include <stdlib.h>

 #define N 64

 char array[N + 1]; // extra 1 for string terminating zero

 void swap(int i1, int i2)
 {
   int tmp = array[i1];
   array[i1] = array[i2];
   array[i2] = tmp;
 }

 void setupArray(void) // fills the array with pseudorandom ASCII letters
 {
   array[N] = 0;
   srand(123);

   for (int i = 0; i < N; ++i)
     array[i] = 'A' + rand() % 26;
 }

 void test(void (*sortFunction)(void), const char *name)
 {
   int timeTotal = 0;

   for (int i = 0; i < 64; ++i) // run the sort many times to average it a bit
   {
     setupArray();
     long int t = clock();
     sortFunction();
     timeTotal += clock() - t;
   }

   printf("%-10s: %s, CPU ticks: %d\n",name,array,(int) timeTotal);
 }

 // ============================ sort algorithms ================================

 void sortBubble(void)
 {
   for (int j = 0; j < N - 1; ++j)
     for (int i = 0; i < N - 1 - j; ++i)
       if (array[i] > array[i + 1])
         swap(i,i + 1);
 }

 void sortBubble2(void) // simple bubble s. improvement, faster if already sorted
 {
   for (int j = 0; j < N - 1; ++j)
   {
     int swapped = 0;

     for (int i = 0; i < N - 1 - j; ++i)
       if (array[i] > array[i + 1])
       {
         swap(i,i + 1);
         swapped = 1;
       }

     if (!swapped) // if no swap happened, the array is already sorted
       break;
   }
 }

 void sortInsertion(void)
 {
   for (int j = 1; j < N; ++j)
     for (int i = j; i > 0; --i)
       if (array[i] < array[i - 1]) // keep moving down until we find its place
         swap(i,i - 1);
       else
         break;
 }

 void sortSelection(void)
 {
   for (int j = 0; j < N - 1; ++j)
   {
     int min = j;

     for (int i = j + 1; i < N; ++i) // find the minimum
       if (array[i] < array[min])
         min = i;

     swap(j,min);
   }
 }

 void sortStupid(void)
 {
   while (1)
   {
     for (int i = 0; i < N; ++i) // check if the array is sorted
       if (i == (N - 1))
         return;
       else if (array[i] > array[i + 1])
         break; // we got to the end, the array is sorted

     for (int i = 0; i < N; ++i) // randomly shuffle the array
       swap(i,rand() % N);
   }
 }

 void _sortQuick(int a, int b) // helper recursive function for the main quick s.
 {
   if (b <= a || a < 0)
     return;

   int split = a - 1;

   for (int i = a; i < b; ++i)
     if (array[i] < array[b])
     {
       split++;
       swap(split,i);
     }

   split++;
   swap(split,b);

   _sortQuick(a,split - 1);
   _sortQuick(split + 1,b);
 }

 void sortQuick(void)
 {
   _sortQuick(0,N - 1);
 }

 int main(void)
 {
   setupArray();
   printf("array     : %s\n",array);

 #if 0 // stupid sort takes too long, only turn on while decreasing N to like 10
   test(sortStupid,"stupid");
 #endif
   test(sortBubble,"bubble");
   test(sortBubble2,"bubble2");
   test(sortInsertion,"insertion");
   test(sortBubble2,"selection");
   test(sortQuick,"quick");

   return 0;
 }

 // TODO: let's add more algorithms in the future :-)

   It may output for example:

 array     : RLPALFTOCFWGVJYPLLUNEPDBSOMIBMXSXMVLROZUWXARHAIUNCJTUNVMDHWHTTZT
 bubble    : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 1191
 bubble2   : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 1164
 insertion : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 665
 selection : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 1217
 quick     : AAABBCCDDEFFGHHHIIJJLLLLLMMMMNNNOOOPPPRRRSSTTTTTUUUUVVVWWWXXXYZZ, CPU ticks: 365

See Also

     * [29]searching
     * [30]pathfinding

Links:
1. list.md
2. number.md
3. order.md
4. compsci.md
5. algorithm.md
6. time_complexity.md
7. programming.md
8. pokemon.md
9. bubble_sort.md
10. kiss.md
11. quick_sort.md
12. merge_sort.md
13. bogosort.md
14. permutation.md
15. bubble_sort.md
16. insertion_sort.md
17. kiss.md
18. time_complexity.md
19. space_complexity.md
20. good_enough.md
21. embedded.md
22. recursion.md
23. parallel.md
24. gpu.md
25. pseudocode.md
26. time_complexity.md
27. optimization.md
28. c.md
29. search.md
30. pathfinding.md