proxy70

                                    Network

   WORK IN PROGRESS

   See also [1]Internet.

   Computer network is a set of multiple [2]computers that are interconnected
   and can communicate with each other. This allows the computers to share
   [3]information, collaborate on calculations, back up and mirror each
   other's data, allow people to communicate over large distances and so on.
   The largest and most famous one is the [4]Internet but indeed it's not the
   only one, there exist many local networks ([5]LANs), community networks,
   large networks separate from the Internet (isolated army networks,
   [6]North Korea's intranet, ...), virtual networks and so on -- these
   networks may differ greatly in many aspects, be it by their basic topology
   (which nodes are connected to which), protocols (languages the computers
   use to communicate), speed (latency and bandwidth), reliability,
   accessibility, usage policies and so on.

   From a mathematical point of view we tend to see a network as a [7]graph,
   so we usually call the computers in the network nodes.

   TODO

Basic Concepts

   Networks are hugely complicated, we can only give a very fast overview
   here. Hopefully it can be a good starting point. (However bear in mind
   that networking can also be done in a [8]KISS way too, especially if
   you're for example just letting two devices communicate. Always think
   about the problem at hand.)

   One of the very basic concepts is that of a [9]protocol -- basically the
   language and rules that the computers will use for the communication.
   Computers connected in the network may be quite different, the may run
   different [10]operating systems, programs, have different [11]hardware --
   this is all fine as long as they use the same protocol for the
   communication. A protocol specifies how communication is established, what
   formats the data will be sent in, what happens if someone is not
   responding etc. Examples of protocols are [12]IP, [13]TCP, [14]UDP,
   [15]ICMP, [16]HTTP and many others.

   Oftentimes we will talk about network parameters such as latency (also
   sometimes called ping -- time it takes a message to delivery to its
   destination), throughput (also called bandwidth -- how much data over time
   the network can transfer, measured in [17]bits per second), reliability,
   stability etc. Networks also have different topologies -- topology say how
   the nodes are interconnected, for example a fully connected network has
   every node (computer) connected to every other node directly (faster, more
   reliable, more efficient, but more expensive, complex, ...), a ring
   basically forms a circle of the nodes (each one is connected to two
   neighbors), a start has one central node to which all other nodes are
   connected etc. Choosing specific topology depends on situation.

   For computer networks the concept of packet switching is very important --
   packet switching is a way of delivering messages by splitting them into
   small [18]packets of data, assigning each packet metadata such as its
   number and destination address, then releasing all them all into the
   network, letting them find their ways to the destination (potentially
   through different paths) and then, once they all arrive, assembling them
   back to the original message. This is basically the invention of the
   Internet, it is contrasted with the originally used way of so called
   circuit switching in which a circuit was established between any nodes
   that wanted to communicate to basically allow them direct communication
   over a constant path (similarly to how phone networks worked: you would
   first call a telephone exchange, say to whom you wanted to talk and the
   lady would directly connect the cables so that you could talk to that
   guy). Packet switching may seems like an overcomplicated way of networking
   (for example packets may arrive in wrong order, they may get lost, we are
   also sending extra data in the packet headers etc.), but at bigger scales
   it really makes communication more efficient, decentralized and reliable
   (if some path in the network gets destroyed, the networks still keeps
   working). Even non-Internet networks now work on this principle, any
   computer network nowadays basically copies this mechanism and even uses
   the same protocols etc., so in networking we'll just be encountering
   packets everywhere.

   Another important concept is that of network layers. Unless we are dealing
   with a very simple 1-to-1 communication, we inevitably get a lot of
   complexity -- a message has to be chopped into packets, each of which will
   potentially travel through the network by different paths and some may
   even get lost; we have to ensure not only their fast and correct delivery
   between individuals neighboring nodes (some of which communicate over
   electrical cables, some through optical cables, some through air, ...) but
   that their correct routing/forwarding (i.e. that they are being pushed in
   the direction of their destination) and that they arrive in correct order
   and without errors (cause e.g. by noise). So this process is split into
   parts or layers, each one creating an [19]abstraction over certain part of
   this delivery -- each layer then has its own protocols, addressing and so
   on. Exactly which layers there are and what they are called is a matter of
   design and convention, it depends on what standard we use, but generally
   the layers are ordered from lowest (which ensure delivery between
   neighboring nodes) through middle (which ensure correct delivery over the
   whole network) to highest (which are concerned with how specific programs
   talk to each other). This is often compared to how post office works, i.e.
   how paper letter are delivered -- the highest level layer is just
   concerned with what human language the letter is written in and which men
   lead the communication, the lower levels are concerned with wrapping the
   letter in an envelope and putting an address and postal code on it, yet
   lower levels then try to deliver this to the local post office reliably,
   using whatever means are deemed best (cars, planes, ships, ...), and
   finally at the lowest level are the mailmen who deliver the letters to the
   house, again choosing the best way of doing so (walking, riding a bike,
   finding the shortest paths, ...). The problem of delivery is simplified by
   the fact that one layer doesn't have to care about the internal details of
   another layer, i.e. for example a man writing a letter is only concerned
   about passing the letter to the layer below (putting correct information
   on the envelope), he doesn't care at all if it will then be delivered by a
   truck or plane, through which cities it will fly, if it will eventually be
   delivered by a man or woman etc. Now two of the biggest standards for
   network layers are TCP/IP and OSI. The OSI model is more general, it
   defined 7 layers (application, presentation, session, transport, network,
   data link, physical -- also shortened to L7 through L1) and can be used
   for anything we could remotely call a network. TCP/IP is a bit simpler and
   is used for the Internet -- let's take a look at the TCP/IP layers (each
   one maps more or less to one or more OSI layers):

   layer             task                     addressing        protocol      
                                                                examples      
                     Communicate data (text   URL, email addr., HTTP, FPT,    
   Application layer or bin.) between         ...               DNS, ...      
                     programs.                
                     Break data into packets, IP addr. + port +               
   Transport layer   potentially ensure       proto             TCP, UDP, ...
                     reliability.             
   Internet layer    Deliver packet from node IP address        IPv4, IPv6,   
                     A to node B.                               ...           
                     Deliver bits of data                       Ethernet,     
   Link layer        between two neighoring   MAC address       Wifi, ...     
                     nodes.                   

   Now please keep in mind this separation into layers doesn't always have to
   be 100% respected, for example while on the application layer level we
   prefer "nice addresses" such as those used in email, we may sometimes
   resort to specifying raw IP addresses and ports too. Sometimes very
   specialized applications (e.g. some games that need to minimize latency)
   may decide to implement their own level of reliable delivery on
   application level, ignoring this potential service of transport layer.
   There may also appear protocols that span several layer or lie somewhere
   in between etc.

   [20]Routing is an important problem to solve in networking -- basically it
   means finding an [21]algorithm of finding delivery paths in the network,
   usually in a distributed way, i.e. we are trying to make it so that if
   some node in the network sends a packet to some other node (identified by
   its address), all other nodes will know what to do and how to efficiently
   get it there, i.e. every node should know whom to hand the packet over
   just from seeing its address. This is not trivial. Nodes usually maintain
   and update routing tables, i.e. they keep records of "which direction"
   various addresses lie in, but the situation is complicated by the fact
   that they practically can't record every single address (there are many of
   them and they change quickly) and also the routes on the Internet
   constantly change (some stop working, some get slow by higher traffic, new
   ones emerge etc.). Forwarding is related to routing, it is the process of
   moving data from the router's input to the correct output (while routing
   generally refers to the whole larger process of finding the whole path).

   With network programs/systems we talk about architectures -- there are two
   main types: client/server and peer to peer (P2P). Client server means
   there is one special, central computer (with usually quite powerful
   hardware) called server that offers services to many clients (other
   computers in the network) -- clients connect to the server and ask the
   server to do something for them (e.g. send them a website, store some
   files to them, fetch emails and so on); in this model even if clients
   communicate between themselves they communicate through the server, i.e.
   the server is very stressed and it's a weak point of the system, but it
   can also possibly better control and coordinate what's going on (for
   example it can try to prevent [22]cheating in games). Peer to peer
   architecture means that all participants are equal ("peers"): none of them
   is central, none of them has greater authority, they all run the same
   software and generally any of the peers can talk between themselves
   directly. Again, choice of architecture depends on our many things, we
   can't say one is inherently better than the other, but among freedom
   proponents P2P is usually favored for its anarchist, decentralized and
   more robust nature -- it is harder to censor or take down a P2P network.

   TODO: subnetwork, sockets, reliability, addresses, ports, NAT, ...

Code Examples

   First let's try writing some UDP C program under [23]Unix. Remember that
   UDP is the unreliable protocol, so it's possible our messages may get lost
   or distorted, but in programs that can handle some losses this is the
   faster and more KISS way. Our program will be peer-to-peer, it will create
   two sockets, one listening and one sending. It will make a few message
   exchange turns, in each turn it will send something to its partner, it
   will check if it itself got any message and then will wait for some time
   before the next round. Note that we will use a non-blocking receiving
   socket, i.e. checking if we have any messages won't pause our program if
   there is nothing to be received, we'll simply move on if there is nothing
   (that's how realtime games may do it, but other kinds of server may rather
   a use blocking socket if they intend to do nothing while waiting for a
   message). Also pay attention to the fact that the program will choose its
   port number based on a one letter "name" we give to the program -- this is
   so that if we test the programs on the same computer (where both will have
   the same IP address), they will choose different ports (different
   processes on the same computer cannot of course use the same port).

 #include <stdio.h>
 #include <stdlib.h>            // for exit
 #include <unistd.h>            // for sleep

 #include <arpa/inet.h>
 #include <sys/socket.h>

 #define BUFFER_LEN 8
 #define PORT_BASE 1230

 // run as ./program partner_addr partner_letter my_letter

 char buffer[BUFFER_LEN + 1];   // extra space for zero terminator
 char name;                     // name of this agent (single char)
 int sock = -1;                 // socket, for both sending and receiving

 void error(const char *msg)
 {
   printf("%c: ERROR, %s\n",name,msg);

   if (sock >= 0)
     close(sock);

   exit(1);
 }

 int main(int argc, char **argv)
 {
   if (argc < 4)
     error("give me correct arguments bitch");

   name = argv[3][0];
   char *addrStrDst = argv[1];
   int portSrc = PORT_BASE + name, // different name => different port
       portDst = PORT_BASE + argv[2][0];

   struct sockaddr_in addrSrc, addrDst;

   sock = socket(AF_INET,SOCK_DGRAM | SOCK_NONBLOCK,IPPROTO_UDP);

   if (sock < 0)
     error("couldn't create socket");

   addrSrc.sin_family = AF_INET;
   addrSrc.sin_port = htons(portSrc); // convert port to netw. endianness
   addrSrc.sin_addr.s_addr = htonl(INADDR_ANY);

   if (bind(sock,(struct sockaddr *) &addrSrc,sizeof(addrSrc)) < 0)
     error("couldn't bind socket");

   addrDst.sin_family = AF_INET;
   addrDst.sin_port = htons(portDst);

   if (inet_aton(addrStrDst,&addrDst.sin_addr) == 0)
     error("couldn't translate address");

   printf("%c: My name is %c, listening on port %d, "
     "gonna talk to %c (address %s, port %d).\n",
     name,name,portSrc,argv[2][0],addrStrDst,portDst);

   for (int i = 0; i < 4; ++i)
   {
     printf("%c: Checking messages...\n",name);

     int len = recv(sock,buffer,BUFFER_LEN,0);

     if (len > 0)
     {
       buffer[len] = 0;
       printf("%c: Got \"%s\"\n",name,buffer);
     }
     else
       printf("%c: Nothing.\n",name);

     for (int j = 0; j < BUFFER_LEN; ++j) // make some gibberish message
       buffer[j] = 'a' + (name + i * 3 + j * 2) % 26;
 
     printf("%c: Sending \"%s\"\n",name,buffer);

     if (sendto(/*sockOut*/sock,buffer,BUFFER_LEN,0,
       (struct sockaddr *) &addrDst,sizeof(addrDst)) < 0)
       printf("%c: Couldn't send it!\n",name);

     printf("%c: Waiting...\n",name);
     usleep(2000000);
   }

   printf("%c: That's enough, bye.\n",name);

   close(sock);

   return 0;
 }

   We can test this for example like this:

 ./program 127.0.0.1 A B & { sleep 1; ./program 127.0.0.1 B A; } &

   Which may print out something like this:

 B: My name is B, listening on port 1296, gonna talk to A (address 127.0.0.1, port 1295).
 B: Checking messages...
 B: Nothing.
 B: Sending "oqsuwyac"
 B: Waiting...
 A: My name is A, listening on port 1295, gonna talk to B (address 127.0.0.1, port 1296).
 A: Checking messages...
 A: Nothing.
 A: Sending "nprtvxzb"
 A: Waiting...
 B: Checking messages...
 B: Got "nprtvxzb"
 B: Sending "rtvxzbdf"
 B: Waiting...
 A: Checking messages...
 A: Got "rtvxzbdf"
 A: Sending "qsuwyace"
 A: Waiting...
 B: Checking messages...
 B: Got "qsuwyace"
 B: Sending "uwyacegi"
 B: Waiting...
 A: Checking messages...
 A: Got "uwyacegi"
 A: Sending "tvxzbdfh"
 A: Waiting...
 B: Checking messages...
 B: Got "tvxzbdfh"
 B: Sending "xzbdfhjl"
 B: Waiting...
 A: Checking messages...
 A: Got "xzbdfhjl"
 A: Sending "wyacegik"
 A: Waiting...
 B: That's enough, bye.
 A: That's enough, bye.

   TODO: TCP

Links:
1. internet.md
2. computer.md
3. information.md
4. internet.md
5. lan.md
6. kwangmyong.md
7. graph.md
8. kiss.md
9. protocol.md
10. operating_system.md
11. hw.md
12. ip.md
13. tcp.md
14. udp.md
15. icmp.md
16. http.md
17. bit.md
18. packet.md
19. abstraction.md
20. routing.md
21. algorithm.md
22. cheating.md
23. unix.md