Bootstrap/Boot

   In general bootstrapping (from the idiom "pull yourself up by your
   bootstraps"), sometimes shortened to just booting, refers to a clever
   process of self-establishing a relatively complex system starting from
   something very small, without much external help. Nature itself provides a
   beautiful example: a large plant capable of complex behavior (such as
   reproduction) initially grows ("bootstraps") from just a very tiny seed.
   As another example imagine something like a "civilization bootstrapping
   kit" that contains only a few primitive tools along with instructions on
   how to use those tools to mine ore, turn it into metal out of which one
   makes more tools which will be used to obtain more material and so on up
   until having basically all modern technology and factories set up in
   relatively short time ([1]civboot is a [2]project like this). The term
   bootstrapping is however especially relevant in relation to [3]computer
   technology -- here it possesses two main meanings:

     * The process by which a computer starts and sets up the [4]operating
       system after power on, which often involves several stages of loading
       various modules, running several bootloaders etc. This is
       traditionally called booting (rebooting means restarting the
       computer).
     * Utilizing the principle of bootstrapping for making greatly
       independent [5]software, i.e. software that doesn't [6]depend on other
       software as it can set itself up. This is usually what bootstrapping
       (the longer term) means. This is also greatly related to [7]self
       hosting, another principle whose idea is to "implement technology
       using itself".

Bootstrapping: Making Dependency-Free Software

   Bootstrapping -- as the general concept of letting a big thing grow out of
   a small seed -- may aid us in building extremely [8]free (as in freedom),
   [9]portable, self-contained (and yes, for those who care also more
   [10]secure) technology by reducing all its [11]dependencies to a bare
   minimum. If we are building a big computing environment (such as an
   operating system), we should make sure that all the big things it contains
   are made only with the smaller things that are further on built using yet
   smaller things and so on until some very tiny piece of code, i.e. we shall
   make sure there is always a way to set this whole system from the ground
   up, from a very small amount of initial code/tools. Being able to do this
   means our system is bootstrappable and it will allow us for example to set
   our whole system up on a completely new computing platform (e.g. a new CPU
   architecture) as long as we can set up that tiny initial prerequisite
   code. This furthermore removes the danger of dependencies that might kill
   our system and also allows security freaks to inspect the whole process of
   the system set up so that they can trust it (because even free software
   that sometime in the past touched a proprietary compiler can't generally
   be trusted -- see [12]trusting trust). I.e. bootstrapping means creating a
   very small amount of code that will self-establish our whole computing
   environment by first compiling small compilers that will then compile more
   complex compilers which will compile all the tools and programs etc. This
   topic is discussed for example in designing [13]programming language
   [14]compilers and [15]operating systems. For examples of bootstrapping see
   e.g. [16]DuskOS ([17]collapse-ready operating system that bootstraps
   itself from a tiny amount of code), [18]GNU [19]mes (bootstrapping system
   of the GNU operating system) or [20]comun (LRS programming language, now
   self hosted and bootstrappable e.g. from a few hundred lines of [21]C).

   Why concern ourselves with bootstrapping when we already have our systems
   set up? Besides the obvious elegance of this whole approach there are many
   other practical reasons -- as mentioned, some are concerned about
   "security", some want portability, control and independence -- one of
   other notable justifications is that we may lose our current technology
   due to societal [22]collapse, which is not improbable as it keeps
   happening throughout history over and over, so many people fear
   (rightfully so) that if by whatever disaster we lose our current
   computers, Internet etc., we will also lose with it all modern art, data,
   software we so painfully developed, digitized books and so on; not talking
   about the horrors that will follow if we're unable to quickly reestablish
   our computer networks we are so dependent on. Setting up what we currently
   have completely from scratch would be extremely difficult, a task for
   centuries -- just take a while to consider all the activity and knowledge
   that's required around the globe to create a single computer with all its
   billions of lines of code worth of software that makes it work. Knowledge
   of old technology gets lost -- to make modern computers we first needed
   older, primitive computers, but now that we only have modern computers no
   one remembers anymore how to make the older computers -- modern computers
   are sustaining themselves but once they're gone, we won't know how to make
   them again, i.e. if we lose computers, we will also lose tools for making
   computers. This applies on many levels (hardware, operating systems,
   programming languages and so on).

   Bootstrapping has to start with some initial prerequisite machine
   dependent binary code that kickstarts the self-establishing process, i.e.
   it's not possible to get rid of absolutely ALL binary code and have a pure
   bootstrappable code that would run on every computer -- that would require
   making a program that can native run on any computer, which can't be done
   -- but it is possible to get it to absolute minimum -- let's say a few
   dozen bytes of machine code that can even be hand-made on paper and can be
   easily inspected for "safety". This initial binary code is called
   bootstrapping binary seed. This code can be as simple as a mere translator
   of some extremely simple bytecode (that may consist only of handful of
   instructions) to the platform's assembly language. There even exists the
   extreme case of a single instruction computer, but in practice it's not
   necessary to go as far. The initial binary seed may then typically be used
   to translate a precompiled bytecode of our system's compiler to native
   runnable code and voila, we can now happily start compiling whatever we
   want.

   [23]Forth is a language that has traditionally been used for making
   bootstrapping environments; [24]Dusk OS is an example of such project.
   Similarly simple language such as [25]Lisp and [26]comun can work too (GNU
   Mes uses a combination of [27]Scheme and C).

   How to do this then? To make a computing environment that can bootstrap
   itself you can do it like this:

    1. Make a [28]simple [29]programming language L. You can choose e.g. the
       mentioned [30]Forth but you can even make your own, just remember to
       keep it extremely simple -- simplicity of the base language is the key
       feature here. If you also need a more complex language, write it in L.
       The language L will serve as tool for writing software for your
       platform, i.e. it will provide some comfort in programming (so that
       you don't have to write in assembly) but mainly it will be an
       [31]abstraction layer for the programs, it will allow them to run on
       any hardware/platform. The language therefore has to be [32]portable;
       it should probably abstracts things like [33]endianness, native
       integer size, control structures etc., so as to work nicely on all
       [34]CPUs, but it also mustn't have too much abstraction (such as
       [35]OOP) otherwise it will quickly get complicated. The language can
       compile e.g. to some kind of very simple [36]bytecode that will be
       easy to translate to any [37]assembly. Make the bytecode very simple
       (and document it well) as its complexity will later on determine the
       complexity of the bootstrap binary seed. At first you'll have to
       temporarily implement L in some already existing language, e.g. [38]C.
       NOTE: in theory you could just make bytecode, without making L, and
       just write your software in that bytecode, but the bytecode has to
       focus on being simple to translate, i.e. it will probably have few
       opcodes for example, which will be in conflict with making it at least
       somewhat comfortable to program on your platform. However one can try
       to make some compromise and it will save the complexity of translating
       language to bytecode, so it can be considered ([39]uxn seems to be
       doing this).
    2. Write L in itself, i.e. [40]self host it. This means you'll use L to
       write a [41]compiler of L that outputs L's bytecode. Once you do this,
       you have a completely independent language and can start using it
       instead of the original compiler of L written in another language. Now
       compile L with itself -- you'll get the bytecode of L compiler. At
       this point you can bootstrap L on any platform as long as you can
       execute the L bytecode on it -- this is why it was crucial to make L
       and its bytecode very simple. In theory it's enough to just interpret
       the bytecode but it's better to translate it to the platform's native
       machine code so that you get maximum efficiency (the nature of
       bytecode should make it so that it isn't really more diffiult to
       translate it than to interpret it). If for example you want to
       bootstrap on an [42]x86 CPU, you'll have to write a program (L
       compiler [43]backend) that translates the bytecode to x86 assembly; if
       we suppose that at the time of bootstrapping you will only have this
       x86 computer, you will have to write the translator in x86 assembly
       manually. If your bytecode really is simple and well made, it
       shouldn't be hard though (you will mostly be replacing your bytecode
       opcodes with given platform's machine code opcodes). Once you have the
       x86 backend, you can completely bootstrap L's compiler on any x86
       computer.
    3. Further help make L bootstrapable. This means making it even easier to
       execute the L bytecode on any given platform -- you may for example
       write backends (the bytecode translators) for common platforms like
       x86, ARM, RISC-V, C, Lisp and so on. You can also provide tests that
       will help check newly written backends for correctness. At this point
       you have L bootstrappable without any [44]work on the platforms for
       which you provide backends and on others it will just take a tiny bit
       of work to write its own translator.
    4. Write everything else in L. This means writing the platform itself and
       software such as various tools and libraries. You can potentially even
       use L to write a higher level language (e.g. C) for yet more comfort
       in programming. Since everything here is written in L and L can be
       bootstrapped, everything here can be bootstrapped as well.

Booting: Computer Starting Up

   Booting as in "staring computer up" is also a kind of setting up a system
   from the ground up -- we take it for granted but remember it takes some
   [45]work to get a computer from being powered off and having all RAM empty
   to having an operating system loaded, hardware checked and initialized,
   devices mounted etc.

   Starting up a simple computer -- such as some [46]MCU-based [47]embedded
   [48]open console that runs [49]bare metal programs -- isn't as complicated
   as booting up a mainstream [50]PC with an [51]operating system.

   First let's take a look at the simple computer. It may work e.g. like
   this: upon start the [52]CPU initializes its registers and simply starts
   executing instructions from some given memory address, let's suppose 0
   (you will find this in your CPU's data sheet). Here the memory is often
   e.g. [53]flash [54]ROM to which we can externally upload a program from
   another computer before we turn the CPU on -- in game consoles this can
   often be done through [55]USB. So we basically upload the program (e.g. a
   game) we want to run, turn the console on and it starts running it.
   However further steps are often added, for example there may really be
   some small, permanently flashed initial boot program at the initial
   execution address that will handle some things like initializing hardware
   (screen, speaker, ...), setting up [56]interrupts and so on (which
   otherwise would have to always be done by the main program itself) and it
   can also offer some functionality, for example a simple menu through which
   the user can select to actually load a program from SD card to flash
   memory (thanks to which we won't need external computer to reload
   programs). In this case we won't be uploading our main program to the
   initial execution address but rather somewhere else -- the initial
   bootloader will jump to this address once it's done its work.

   Now for the PC (the "IBM compatibles"): here things are more complicated
   due to the complexity of the whole platform, i.e. because we have to load
   an [57]operating system first, of which there can be several, each of
   which may be loadable from different storages ([58]harddisk, USB stick,
   [59]network, ...), also we have more complex [60]CPU that has to be set in
   certain operation mode, we have complex peripherals that need complex
   initializations etcetc. Generally there's a huge [61]bloated boot sequence
   and PCs infamously take longer and longer to start up despite skyrocketing
   hardware improvements -- that says something about state of technology.
   Anyway, it usually it works like this:

   { I'm not terribly experienced with this, verify everything. ~drummyfish }

    1. Computer is turned on, the CPU starts executing at some initial
       address (same as with the simple computer).
    2. From here CPU jumps to an address at which stage one [62]bootloader is
       located (bootloader is just a program that does the booting and as
       this is the first one in a line of potentially multiple bootloaders,
       it's called stage one). This address is in the [63]motherboard [64]ROM
       and in there typically [65]BIOS (or something similar that may be
       called e.g. [66]UEFI, depending on what standard it adheres to) is
       uploaded, i.e. BIOS is stage one bootloader. BIOS is the first
       software (we may also call it [67]firmware) that gets run, it's
       uploaded on the motherboard by the manufacturer and isn't supposed to
       be rewritten by the user, though some based people still rewrite it
       (ignoring the "read only" label :D), often to replace it with
       something more [68]free (e.g. [69]libreboot). BIOS is the most basic
       software that serves to make us able to use the computer at the most
       basic level without having to flash programs externally, i.e. to let
       us use keyboard and monitor, let us install an operating system from a
       CD drive etc. (It also offers a basic environment for programs that
       want to run before the operating system, but that's not important
       now.) BIOS is generally different on each computer model, it normally
       allows us to set up what (which device) the computer will try to load
       next -- for example we may choose to boot from harddisk or USB flash
       drive or from a CD. There is often some countdown during which if we
       don't intervene, the BIOS automatically tries to load what's in its
       current settings. Let's suppose it is set to boot from harddisk.
    3. BIOS performs the power on self test (POST) -- basically it makes sure
       everything is OK, that hardware works etc. If it's so, it continues on
       (otherwise halts).
    4. BIOS loads the [70]master boot record (MBR, the first sector of the
       device) from harddisk (or from another mass storage device, depending
       on its settings) into [71]RAM and executes it, i.e. it passes control
       to it. This will typically lead to loading the second stage
       bootloader.
    5. The code loaded from MBR is limited by size as it has to fit in one
       HDD [72]sector (which used to be only 512 bytes for a long time), so
       this code is here usually just to load the bigger code of the second
       stage bootloader from somewhere else and then again pass control to
       it.
    6. Now the second stage bootloader starts -- this is a bootloader whose
       job it is normally to finally load the actual operating system. Unlike
       BIOS this bootloader may quite easily be reinstalled by the user --
       oftentime installing an operating system will also cause installing
       some kind of second stage bootloader -- example may be [73]GRUB which
       is typically installed with [74]GNU/[75]Linux systems. This kind of
       bootloader may offer the user a choice of multiple operating systems,
       and possibly have other settings. In any case here the OS [76]kernel
       code is loaded and run.
    7. Voila, the kernel now starts running and here it's free to do its own
       initializations and manage everything, i.e. Linux will start the
       [77]PID 1 process, it will mount filesystems, run initial scripts
       etcetc.

Links:
1. civboot.md
2. project.md
3. computer.md
4. operating_system.md
5. software.md
6. depend.md
7. self_hosting.md
8. free_software.md
9. portability.md
10. security.md
11. dependency.md
12. trusting_trust.md
13. programming_language.md
14. compiler.md
15. os.md
16. duskos.md
17. collapse.md
18. gnu.md
19. mes.md
20. comun.md
21. c.md
22. collapse.md
23. forth.md
24. duskos.md
25. lisp.md
26. comun.md
27. scheme.md
28. kiss.md
29. programming_language.md
30. forth.md
31. abstraction.md
32. portability.md
33. byte_sex.md
34. cpu.md
35. oop.md
36. bytecode.md
37. assembly.md
38. c.md
39. uxn.md
40. self_hosting.md
41. compiler.md
42. x86.md
43. backend.md
44. work.md
45. work.md
46. mcu.md
47. embedded.md
48. open_console.md
49. bare_metal.md
50. pc.md
51. operating_system.md
52. cpu.md
53. flash.md
54. rom.md
55. usb.md
56. interrupt.md
57. operating_system.md
58. hdd.md
59. network.md
60. cpu.md
61. bloat.md
62. bootloader.md
63. motherboard.md
64. rom.md
65. bios.md
66. uefi.md
67. firmware.md
68. free_software.md
69. libreboot.md
70. mbr.md
71. ram.md
72. sector.md
73. grub.md
74. gnu.md
75. linux.md
76. kernel.md
77. init.md