__    __       __
                          ___ / /   / /___ __/ /_
                         (_-</ _ \_/ __/\ \ / __/
                        /___/_//_(_)__//_\_\\__/

     My own best practices, with (portable) shell scripting
                                --- * ---


Shell scripting is similar to a martial art: it is *hard*, it takes
ages to master, you are likely to do it wrong, and on the long run it
gives you incredible powers.

This document is a humble collection of best practices for shell
scripting that I studied over time.  The focus is on the POSIX shell,
and on those extensions that are common among the various
implementations.

Disclaimer:
-----------

This file is a work in progress.

The ambition, on the long term, would be to ensure that all proposed
tricks work on the most popular shell interpreters:

 * GNU/Linux: bash, dash, busybox
 * OpenBSD: ksh
 * FreeBSd: sh

This goal has *NOT* been reached yet.

 -- Thou shalt know your shit! --
This is not meant to be a guide for the novice, nor a list of "things
that you should always do".  It assumes that the reader already knows
and uses shell scripting.  The reader is also not recommended to rely
on these tricks without understanding them deeply or without testing
them.

 -- I am NOT thy God --
I might be wrong.  Feel free to contact me if you think so.

Changelog:
----------

 2022-04-05 - Updates

 * Reviewed disclaimer
 * Added section, INTRODUCTION: THE PRINCIPLE OF ACCEPTANCE
 * Modified section, PARAMETERS VIA GETOPTS

 2022-03-28 - Updates

 * Disclaimer
 * Add table of contents
 * Mention "xargs" as parallelism tool

 2021-10-29 - Updates

 * Add "Variable definition assertions"
 * Minor text fixes

 2021-01-19 - First draft.

 * Still gathering the practices as I use them.  No verification of the
   actual portability of the techniques has been achieved, although it is
   considered a goal.  Things are known to work under bash and dash.

Table of Contents
-----------------

 0. INTRODUCTION: THE PRINCIPLE OF ACCEPTANCE
 1. VARIABLE DEFINITION ASSERTIONS
 2. PARAMETERS VIA GETOPTS
 3. VARIABLE LOCALITY
 4. WISE USE OF TRAPS
 5. ERROR HANDLING 1
 6. ERROR HANDLING 2
 7. BOOLEANS
 8. REMOTE COMMANDS VIA SSH
 9. XARGS HACKS

0. INTRODUCTION: THE PRINCIPLE OF ACCEPTANCE
--------------------------------------------

My educated opinion is that the shell was not meant to be a programming
language, and that it evolved into an inconsistent bunch of dialects
with a "toupet" specification (POSIX) meant to fix the desirable common
behaviours.

I don't believe in work-arounds.  For example, pipefail is a very good idea,
but it is a bashism.  I've seen[1] attmepts at providing this feature in a
portable way, but I tend to dislike such clunky solutions.

I've also seen some desperate attempts of a C++ programmer to implement RAII
in a bash script.  That dude eventually copy-pasted some terrible boilerplate
he found on stack overflow to achieve the desired result.  I had fun in
instill some doubt by challenging him to spot an alleged defect in the
boilerplate, and he eventually gave up for good.

I think people should embrace the limitations of the tools they are using.
A hammer is the best tool to put nails in a wall.  It makes no sense to
pretend it is a sledgehammer by extending the handle with some chopsticks.
Nor it makes sense to use a sledgehammer to put nails in a wall.

When it comes to the shell, I apply this principle by following a rule
of thumb: if you feel the urge of having hash tables, it is time to jump
to a proper programming language.

[1] https://stackoverflow.com/questions/13084352/how-to-implement-set-o-pipefail-in-a-posix-way-almost-done-expert-help-nee


1. VARIABLE DEFINITION ASSERTIONS
---------------------------------

Code that relies on a particular variable to be defined should refer to
it by the ${variable:?[word]} expansion, effectively implementing an
assertion:

  make_greeting() {
    printf "hello %s %s\n" "${name:?} "${surname:?}"
  }

See also `set -u`


2. PARAMETERS VIA GETOPTS
-------------------------

The `getopts` builtin can be used to parse single-dash flags, both at
script level and at function level.

The second case it is obviously more complex than plain positional
arguments, and probably makes sense only if the function is user-facing,
that is if the script is sourced, and the function is exposed for
interactive use.

  foo() {
    local opt
    local a=
    local b=
    local c=

    OPTIND=1; while getopts 'a:bc' opt; do
      case "$opt" in
      a)  a="$OPTARG"
      b)  b=1
      c)  c=1
      esac
    done
    shift $((OPTIND - 1))

    ...
  }

The previous example shows some useful patterns to keep in mind:

 * The `OPTIND` global must be reset to 1 before the first invocation of
   `getopts`.  Failing to do so exposes the flags scanning to the side
   effects from previous `getopts` invocations.  This is not needed if the
   function is defined to be run in a subshell, that is if it is
   defined as `foo() ( ... )` instead of `foo() { ...; }`

 * The local variables `a`, `b` and `c` need to be explicitly set to
   empty string to avoid the capture of existing values.
   This is important even if the variables are declared as `local`:
   see VARIABLE LOCALITY about shadowing.

 * The `b` and `c` variables act as booleans: false is represented by
   an empty value, while true is represented by an arbitrary non-empty
   value.  See BOOLEANS.

ABOUT OPTIONAL PARAMETERS:

The POSIX standard defines a mechanism to deal with flags having
optional arguments.  This boils down to prefixing the optstring with a
colon as in the following example:

  while getopts :a:b opt; do
   # ...
  done

This feature unfortunately leads to ambiguous results in case the
optional-parameter flag is followed by another flag:

  my_script -b -a    # should works as expected
  my_script -a -b    # the parser might interpret -b as argument of -a

Mitigating with this problem might be possible (for instance, by
rewinding $OPTIND when $OPTARG starts with a `-`), but it would result
in boilerplate.  My personal advice is to avoid the problem by not
relying on this feature.


3. VARIABLE LOCALITY
--------------------

Assigning a variable within a function affects the global variable namespace.

Most shells support the `local` keyword, which is unfortunately not defined by
POSIX.  The `local` keyword shows different behaviours in some corner cases
(that should obviously be avoided if portability matters).

Declaring a local variable within a function, shadowing an existing variable
(global, or local to a caller scope) should be safe, but no assumption
should be made on the value of the local variable before assignment.

The declaration and assignment of a local variable should be distinct (see
Shellcheck SC2155).


4. WISE USE OF TRAPS
--------------------

Traps are a great way to clear up residual state when the script exits.
Heads up: they will interfere with the return value of the shell script.

  atexit() {
    # first thing: take a copy of "$?" by declaring a local variable
    # and assigning it in a single statement.  It is generally not
    # recommended to assign a local variable while declaring it, but this
    # is an important exception: local will in fact succeed, effectively
    # setting "$?" to 0.
    local ex="$?"

    # Here goes clean up, which might succeed or fail, independently from
    # the rest of the script!
    # If the script uses `set -e`, make sure that the handler is not
    # terminated before time by a command failure!
    false || :

    # This is the right moment to clean multiple resources.
    rm -rf "$tempfile"

    exit "$ex"
  }

  trap atexit EXIT

  # Useful to have clean ^C interruption
  trap exit INT

Unfortunately there's room for only one exit handler, which is bound to "know
everything".  It is wise to keep C++ programmers away from shell scripts: I've
seen clumsy attempts at implementing RAII, and the complexity rose to
infinity!


5. ERROR HANDLING 1
-------------------

pipefail is a bashism, too bad!

remember:
 * pipes are subshells, error handling is only on the tail
 * 'set -e' is your friend, but a shady one!
   e.g. it is not honoured inside pipes
 * within functions, always behave like 'set -e' is not in place


6. ERROR HANDLING 2
-------------------

default values of variables:

tl;dr

 [ "$variable" ] || variable="$(gen_variable)"

 surprises from this form, as error checking is not effective (dash):

 : "${variable:="$(gen_variable)"}"


7. BOOLEANS
-----------

Perl-style: use empty strings for false, and anything for true.

Advantages: this works

  if [ -n "$variable" ]; then
    ...
  fi

Disadvantages:

 - The variables need to be emptied before use, to avoid accidentally picking
   up values that were previously assigned in the environment.

 - Heads up for `set -u`: if enabled, evaluating an empty variable will make
   the script fail.  But it is always possible to use `"${variable:-}"`.


8. REMOTE COMMANDS VIA SSH
--------------------------

It is generally possible to do

  ssh -T user@host command args ...

It is a bit difficult to pull this off if we want to run remotely scripts
of some complexity.

First off, consider using tools like Ansible, although they typically
require some additional packages to be installed remotely (e.g. Python,
in Ansible's case).

Alternative method:

  ssh_pipe() {
    ssh -T -l "${1:?user}" "${2:?host}" sh -xe
  }

  ssh_pipe bob 192.168.1.1 <<EOF
  ... # here goes a shell script!
  EOF

Beware of variable expansions: the local shell variables are expanded within
the heredoc, unless the marker is surrounded by quotes.  To understand this
compare the following forms:

  $ cat <<EOF
  $USER
  EOF

  $ cat <<'EOF'
  $USER
  EOF

Corollary: a neat trick to pass around local variables is the following:

  { cat <<END_OF_VARS; cat <<'END_OF_SCRIPT'; } | ssh_pipe bob 192.168.1.1
    remote_var="$local_var"
  END_OF_VARS
    echo "I can use $remote_var"
  END_OF_SCRIPT


9. XARGS HACKS
--------------

xargs is mostly useful to turn standard input to command line options, but it
also provides a quite powerful way of doing multi-processing out of the box,
without the need of installing special tools.

Execute 20 parallel instances of COMMAND, each eating up to 10 values from the
given sequence

  seq 1 1000 | xargs -P20 -L10 COMMAND