CGI: Making web applications like it’s 90s
xwindows


Web programming today is a mess: gazillion of frameworks and libraries
thrown on top of each other, runaway complexity so rampant, while the
whole setup teetering closer to the state of house of cards than ever.
You know it had become so bad because people have now started shipping
you their computers (a la Docker) just for you to be able to run their
web applications…

Don’t you ever hate your middleware for pulling in millions of
dependencies? Feeling so done with juggling between multiple web
programming libraries? Getting tired of seeing your PHP script break on
every single PHP update that arrived? Looking for alternatives that
could shine even in lowly environments like routers and single board
computers? Would like something more retro and longer-standing for a
change?

If so, welcome to the olden world of CGI programming!

Introduction

At the most basic level of web-serving, when your browser sent a request
to the web server, the server would check for a file residing on that
URI path requested; if exists, it would give that file’s content to your
browser, the end. The point of CGI is to extend that with the following
idea: if that path is not pointing to a file, but rather to an
executable program; instead of serving the program binary to the client,
we run that program, with request body piped to its standard input, and
pipe its standard output back to the client as a response. (Lawyers will
say this is a sly oversimplification, but you get an idea)

  By the way, CGI stands for Common Gateway Interface. Of course, a very
  common question that follows would be why is it being called gateway:
  it was because in early 1990s, the main use of this kind of
  server-executed program was not web application, but for writing glue
  logic to access institution’s already-existing in-house infosystem,
  which previously only accessible as command line programs run via
  on-premise terminal or over telnet/dial-in shell session.

  Such glue logic programs would accept the request, invoke the
  on-server infosystem programs with correct parameters, dress its
  output a bit before sending that result to web browser; making them
  gateways to let users from the web access those in-house information
  systems. These were in fact, the main uses that pushed for the effort
  to standardize web servers into using the same common interface for
  running such gateway programs; and that’s where the name came from.

Anyway, by using just standard input/output and some environment
variables, it means you can use virtually any compiled programming
language, and any shebang-compatible interpreted programming language
for server-side web development. There would be no complicated protocol
you need to grok; and when you chose your language wisely, there would
be no dependency hell to watch out for, no API/ABI breakage to rewrite
around, and no upgrade treadmill forced on you. Life was definitely
simpler back in the days; and by using CGI, your life could be simple
today too.

For these reasons, while having limited amount of bling and bang to
offer, CGI has been standing through time, as the lowest common
denominator, programming language-agnostic, platform-independent scheme
for running web applications; from its first standardization at the dawn
of World Wide Web era nearly 3 decades ago, to today. And… did you know
that the development of PHP was originally become possible because of
CGI too?

Tilde.club have been supporting CGI programming on user web space since
17 May 2020. As CGI was originally conceived in shared institutional
Unix server environment; on a tilde, it means we are experiencing it in
its natural habitat.

Hello World!

As simple as it is, everybody has to start somewhere; so the following
are example Hello World CGI programs in many programming languages that
Tilde.club supports. All of them produce HTTP response with status code
200, text/plain MIME type, and simple Hello World text as a response
body. Note that every example scripts here all work under .cgi file
extension; other language-specific file extension that work would be
noted in each example.

-   Perl (also works with .pl file extension):

          #!/usr/bin/perl
          print "Status: 200\n";
          print "Content-Type: text/plain\n";
          print "\n";
          print "Hello World!\n";

    Note that Perl was the main language of choice back in the heyday of
    CGI programming.

-   Bourne shell script (also works with .sh file extension):

          #!/bin/sh
          echo "Status: 200"
          echo "Content-Type: text/plain"
          echo
          echo "Hello World!"

-   Python (usable under both 3.x and 2.x, also works with .py file
    extension):

          #!/usr/bin/python
          print("Status: 200")
          print("Content-Type: text/plain")
          print("")
          print("Hello World!")

-   AWK:

          #!/usr/bin/awk -E
          BEGIN {
              print "Status: 200"
              print "Content-Type: text/plain"
              print
              print "Hello World!"
          }

-   Lua (also works with .lua extension):

          #!/usr/bin/lua
          print("Status: 200")
          print("Content-Type: text/plain")
          print("")
          print("Hello World!")

-   Tcl:

          #!/usr/bin/tclsh
          puts "Status: 200"
          puts "Content-Type: text/plain"
          puts ""
          puts "Hello World!"

-   Common Lisp:

          #!/usr/bin/sbcl --script
          (progn
              (princ "Status: 200") (terpri)
              (princ "Content-Type: text/plain") (terpri)
              (terpri)
              (princ "Hello World!") (terpri)
          )

Pick the language you like, put the script (or executable) in a file
anywhere inside your public_html subdirectory of your Tilde.club home
directory, with appropriate file extension; and also make sure that the
thing is world-readable and world-executable (something like
chmod +rx YOURFILE.EXT would do). If you use other language that
compiles to a binary executable, just world-executable permission will
suffice.

The URL for accessing a CGI program from a web browser is no different
from accessing regular file hosted on your Tilde.club web space.

Note that there are no assembly, C, and C++ example here, and that is
intentional: you are supposed to already know such languages well
already —including how to program it safely and defensively— before even
thinking about trying them in this task.

Program Output

Output of your CGI programs is expected to have two parts:

1.  Lines you printed before the first blank line will be treated as
    HTTP response headers fields:

    -   The only exception is the Status: pseudo-header, which will not
        be output as a real response header, but its value will be
        rather used as HTTP status code of the response.
        -   When Status: pseudo-header is omitted, the HTTP status code
            of your response would be 200.
        -   Your program ought NOT to output this as a real HTTP
            response line (HTTP/1.0 200 OK and suchlike). Doing so is
            off-spec; and while some servers handle this okay,
            Tilde.club doesn’t.
    -   You MUST output Content-Type: header; or else the server would
        reject your program’s output and give HTTP 502 error to the
        client instead.
    -   A blank line ends the headers section.
    -   You should output headers (including the blank line terminating
        the headers) in platform’s native line ending, which is LF in
        case of Tilde.club and other GNU/Linux hosts; but in practice,
        CR/LF is accepted as well.

2.  And what you output after the first blank line is your response body
    (i.e. content). This part can use any line ending in case of text,
    or it could even be binary; as long as it fits with the
    Content-Type: header value you had just printed. Empty response body
    is allowed as well; by not outputting anything after that first
    blank line.

Program Input

Information from HTTP request arrive at your CGI program in two
different channels:

1.  Request line, request headers, misc request information, and server
    information: these arrive as environment variables.
2.  Request body: this arrives verbatim as standard input data.

Unless you are processing HTTP POST or PUT request (which are quite
advanced stuff), you don’t really need to look at request body at all.
So the information of interest are mostly contained in the environment
variables:

-   The HTTP request method used would be passed to your program as a
    value of environment variable REQUEST_METHOD.
-   The part after ? of request URI would be passed to your CGI program
    as the value of environment variable QUERY_STRING.
    -   This variable will always be present. If the request URI had no
        ?, or there was nothing after ?; the value would be empty.
-   Each request headers field’s name would be converted to uppercase,
    prepended with HTTP_, and set as environment variable with value
    equals to the header value received from the client. For example,
    Host: tilde.club header line would be converted to an environment
    variable HTTP_HOST with value tilde.club.

The following are environment variables from the CGI 1.1 specification
which are set for CGI programs in Tilde.club, in alphabetical order:

  CONTENT_LENGTH, CONTENT_TYPE, GATEWAY_INTERFACE, QUERY_STRING,
  REMOTE_ADDR, REMOTE_PORT, REQUEST_METHOD, SCRIPT_NAME, SERVER_NAME,
  SERVER_PORT, SERVER_PROTOCOL, SERVER_SOFTWARE

-   You can find out more what each of these variables mean in the
    original CGI 1.1 specification, linked in the Further Reading[1]
    section below.

And the following are other environment variables that which are not in
CGI 1.1 specification, but are set for CGI programs in Tilde.club:

  DOCUMENT_ROOT, DOCUMENT_URI, HTTPS, REDIRECT_STATUS, REQUEST_SCHEME,
  REQUEST_URI, SCRIPT_FILENAME, SERVER_ADDR

Notes:

-   REMOTE_HOST variable is always absent; including when the remote
    host does have a valid reverse-DNS address.
-   REMOTE_IDENT variable is always absent; including when the remote
    client does connect from a host with Identd service.
-   While both HTTPS and REQUEST_SCHEME variables could be used for
    discerning HTTPS from plain old HTTP request; checking for HTTPS
    value on is to be used if you expect your CGI program to be portable
    to Apache HTTP Server.
    -   This can be used for ensuring a correct version of Atom or RSS
        feed got served on a right protocol.

Program Execution

These are conditions that your CGI programs would be running in:

-   One instance of CGI program would be executed to service one
    request; and that instance would terminate at the end of response.
-   Multiple instances of a CGI program could be run at the same time.
-   Your CGI program would start after the server had read the entire
    request header from the client (but not the request body, if any);
    and only when the request URI matched your CGI program of course.
-   Your CGI program would be run inside its directory (and not other
    location like server binary’s directory).
-   Once your CGI program runs, its standard input would be fed with the
    request body (if any).
-   If you are going to process request body, you ought to do so before
    producing any output. (HTTP is a request-response protocol,
    remember?)
-   Everything your program output on standard error stream would go
    into the server’s error log.
-   CGI program that did not finish running for too long will cause the
    server to return HTTP 504 Gateway Time-out error to the client
    instead of its response.

Tips

-   Avoid making your CGI program a time hog; good CGI programs start
    quickly and finish quickly.
-   Avoid making your CGI program a resource hog; just like everything
    else you do on Tilde.club shell.
-   Avoid making your CGI program a security hole. For this reason,
    using C or C++ for a non-trivial CGI program are not recommended
    unless you actually know your craft.
-   Remember: it costs Tilde.club 1 program execution to service one
    HTTP request to a CGI program; use it responsibly and for things
    that matter.

Setup-Specific Notes

Following are tidbits specific to the CGI setup used in Tilde.club:

-   If you would like to make CGI program a directory index, name it
    index.cgi. (index.sh works too in case of shell script)
-   There is no database daemon of any kind. (If you would like to use
    SQLite, see below for a caveat about credential and files)

And some caveats:

-   There is no support for PATH_INFO environment variable; you can
    blame Nginx for this one.

    This mean you cannot simulate files and directory-like URIs (like
    /~SOMEONE/category.cgi/automobile/ev) under your CGI program; the
    server will simply return HTTP 404 error for such URIs even when
    category.cgi exist and being executable.

-   Avoid leaving files with following extensions in your web space when
    you don’t intend for them to be run as CGI:

    -   .cgi
    -   .pl
    -   .sh
    -   .py
    -   .lua

    This is because in current setup, requests to these files will be
    forwarded to a CGI handler anyway, even when their corresponding
    executable bit is not set; while it would not really run such
    script, it would result in HTTP 502 error being sent to client. If
    you would like to distribute these verbatim as source files, you
    might want to workaround by renaming such files to add .txt at the
    end.

-   CGI programs here run under web server’s credentials: user nginx and
    group nginx (user ID 994, group ID 990); tread carefully if you need
    to make your program read/write private files.

Further Reading

-   Common Gateway Interface version 1.1 specification (RFC 3875)[2],
    which is also available in text version[3].
-   Ten Million Users and Ten Years Later[4], a case study of CGI being
    a secret ingredient for developing web application that could stay
    in-service more than a decade.

[1] #further-reading

[2] https://www.rfc-editor.org/rfc/rfc3875.html

[3] https://www.rfc-editor.org/rfc/rfc3875.txt

[4] https://dl.acm.org/doi/10.1145/3472749.3474819