__ __ __ ___ / / / /___ __/ /_ (_-</ _ \_/ __/\ \ / __/ /___/_//_(_)__//_\_\\__/ My own best practices, with (portable) shell scripting --- * --- Shell scripting is similar to a martial art: it is *hard*, it takes ages to master, you are likely to do it wrong, and on the long run it gives you incredible powers. This document is a humble collection of best practices for shell scripting that I studied over time. The focus is on the POSIX shell, and on those extensions that are common among the various implementations. Disclaimer: ----------- This file is a work in progress. The ambition, on the long term, would be to ensure that all proposed tricks work on the most popular shell interpreters: * GNU/Linux: bash, dash, busybox * OpenBSD: ksh * FreeBSd: sh This goal has *NOT* been reached yet. -- Thou shalt know your shit! -- This is not meant to be a guide for the novice, nor a list of "things that you should always do". It assumes that the reader already knows and uses shell scripting. The reader is also not recommended to rely on these tricks without understanding them deeply or without testing them. -- I am NOT thy God -- I might be wrong. Feel free to contact me if you think so. Changelog: ---------- 2022-04-05 - Updates * Reviewed disclaimer * Added section, INTRODUCTION: THE PRINCIPLE OF ACCEPTANCE * Modified section, PARAMETERS VIA GETOPTS 2022-03-28 - Updates * Disclaimer * Add table of contents * Mention "xargs" as parallelism tool 2021-10-29 - Updates * Add "Variable definition assertions" * Minor text fixes 2021-01-19 - First draft. * Still gathering the practices as I use them. No verification of the actual portability of the techniques has been achieved, although it is considered a goal. Things are known to work under bash and dash. Table of Contents ----------------- 0. INTRODUCTION: THE PRINCIPLE OF ACCEPTANCE 1. VARIABLE DEFINITION ASSERTIONS 2. PARAMETERS VIA GETOPTS 3. VARIABLE LOCALITY 4. WISE USE OF TRAPS 5. ERROR HANDLING 1 6. ERROR HANDLING 2 7. BOOLEANS 8. REMOTE COMMANDS VIA SSH 9. XARGS HACKS 0. INTRODUCTION: THE PRINCIPLE OF ACCEPTANCE -------------------------------------------- My educated opinion is that the shell was not meant to be a programming language, and that it evolved into an inconsistent bunch of dialects with a "toupet" specification (POSIX) meant to fix the desirable common behaviours. I don't believe in work-arounds. For example, pipefail is a very good idea, but it is a bashism. I've seen[1] attmepts at providing this feature in a portable way, but I tend to dislike such clunky solutions. I've also seen some desperate attempts of a C++ programmer to implement RAII in a bash script. That dude eventually copy-pasted some terrible boilerplate he found on stack overflow to achieve the desired result. I had fun in instill some doubt by challenging him to spot an alleged defect in the boilerplate, and he eventually gave up for good. I think people should embrace the limitations of the tools they are using. A hammer is the best tool to put nails in a wall. It makes no sense to pretend it is a sledgehammer by extending the handle with some chopsticks. Nor it makes sense to use a sledgehammer to put nails in a wall. When it comes to the shell, I apply this principle by following a rule of thumb: if you feel the urge of having hash tables, it is time to jump to a proper programming language. [1] https://stackoverflow.com/questions/13084352/how-to-implement-set-o-pipefail-in-a-posix-way-almost-done-expert-help-nee 1. VARIABLE DEFINITION ASSERTIONS --------------------------------- Code that relies on a particular variable to be defined should refer to it by the ${variable:?[word]} expansion, effectively implementing an assertion: make_greeting() { printf "hello %s %s\n" "${name:?} "${surname:?}" } See also `set -u` 2. PARAMETERS VIA GETOPTS ------------------------- The `getopts` builtin can be used to parse single-dash flags, both at script level and at function level. The second case it is obviously more complex than plain positional arguments, and probably makes sense only if the function is user-facing, that is if the script is sourced, and the function is exposed for interactive use. foo() { local opt local a= local b= local c= OPTIND=1; while getopts 'a:bc' opt; do case "$opt" in a) a="$OPTARG" b) b=1 c) c=1 esac done shift $((OPTIND - 1)) ... } The previous example shows some useful patterns to keep in mind: * The `OPTIND` global must be reset to 1 before the first invocation of `getopts`. Failing to do so exposes the flags scanning to the side effects from previous `getopts` invocations. This is not needed if the function is defined to be run in a subshell, that is if it is defined as `foo() ( ... )` instead of `foo() { ...; }` * The local variables `a`, `b` and `c` need to be explicitly set to empty string to avoid the capture of existing values. This is important even if the variables are declared as `local`: see VARIABLE LOCALITY about shadowing. * The `b` and `c` variables act as booleans: false is represented by an empty value, while true is represented by an arbitrary non-empty value. See BOOLEANS. ABOUT OPTIONAL PARAMETERS: The POSIX standard defines a mechanism to deal with flags having optional arguments. This boils down to prefixing the optstring with a colon as in the following example: while getopts :a:b opt; do # ... done This feature unfortunately leads to ambiguous results in case the optional-parameter flag is followed by another flag: my_script -b -a # should works as expected my_script -a -b # the parser might interpret -b as argument of -a Mitigating with this problem might be possible (for instance, by rewinding $OPTIND when $OPTARG starts with a `-`), but it would result in boilerplate. My personal advice is to avoid the problem by not relying on this feature. 3. VARIABLE LOCALITY -------------------- Assigning a variable within a function affects the global variable namespace. Most shells support the `local` keyword, which is unfortunately not defined by POSIX. The `local` keyword shows different behaviours in some corner cases (that should obviously be avoided if portability matters). Declaring a local variable within a function, shadowing an existing variable (global, or local to a caller scope) should be safe, but no assumption should be made on the value of the local variable before assignment. The declaration and assignment of a local variable should be distinct (see Shellcheck SC2155). 4. WISE USE OF TRAPS -------------------- Traps are a great way to clear up residual state when the script exits. Heads up: they will interfere with the return value of the shell script. atexit() { # first thing: take a copy of "$?" by declaring a local variable # and assigning it in a single statement. It is generally not # recommended to assign a local variable while declaring it, but this # is an important exception: local will in fact succeed, effectively # setting "$?" to 0. local ex="$?" # Here goes clean up, which might succeed or fail, independently from # the rest of the script! # If the script uses `set -e`, make sure that the handler is not # terminated before time by a command failure! false || : # This is the right moment to clean multiple resources. rm -rf "$tempfile" exit "$ex" } trap atexit EXIT # Useful to have clean ^C interruption trap exit INT Unfortunately there's room for only one exit handler, which is bound to "know everything". It is wise to keep C++ programmers away from shell scripts: I've seen clumsy attempts at implementing RAII, and the complexity rose to infinity! 5. ERROR HANDLING 1 ------------------- pipefail is a bashism, too bad! remember: * pipes are subshells, error handling is only on the tail * 'set -e' is your friend, but a shady one! e.g. it is not honoured inside pipes * within functions, always behave like 'set -e' is not in place 6. ERROR HANDLING 2 ------------------- default values of variables: tl;dr [ "$variable" ] || variable="$(gen_variable)" surprises from this form, as error checking is not effective (dash): : "${variable:="$(gen_variable)"}" 7. BOOLEANS ----------- Perl-style: use empty strings for false, and anything for true. Advantages: this works if [ -n "$variable" ]; then ... fi Disadvantages: - The variables need to be emptied before use, to avoid accidentally picking up values that were previously assigned in the environment. - Heads up for `set -u`: if enabled, evaluating an empty variable will make the script fail. But it is always possible to use `"${variable:-}"`. 8. REMOTE COMMANDS VIA SSH -------------------------- It is generally possible to do ssh -T user@host command args ... It is a bit difficult to pull this off if we want to run remotely scripts of some complexity. First off, consider using tools like Ansible, although they typically require some additional packages to be installed remotely (e.g. Python, in Ansible's case). Alternative method: ssh_pipe() { ssh -T -l "${1:?user}" "${2:?host}" sh -xe } ssh_pipe bob 192.168.1.1 <<EOF ... # here goes a shell script! EOF Beware of variable expansions: the local shell variables are expanded within the heredoc, unless the marker is surrounded by quotes. To understand this compare the following forms: $ cat <<EOF $USER EOF $ cat <<'EOF' $USER EOF Corollary: a neat trick to pass around local variables is the following: { cat <<END_OF_VARS; cat <<'END_OF_SCRIPT'; } | ssh_pipe bob 192.168.1.1 remote_var="$local_var" END_OF_VARS echo "I can use $remote_var" END_OF_SCRIPT 9. XARGS HACKS -------------- xargs is mostly useful to turn standard input to command line options, but it also provides a quite powerful way of doing multi-processing out of the box, without the need of installing special tools. Execute 20 parallel instances of COMMAND, each eating up to 10 values from the given sequence seq 1 1000 | xargs -P20 -L10 COMMAND