Kahibaro
Discord Login Register

Error handling

Why Error Handling Matters in Shell Scripts

Shell scripts often glue many commands together. Any one of them can fail due to missing files, permissions, network issues, or user input errors. Without explicit error handling you get:

The goal of error handling is not only to detect failures, but to:

This chapter assumes you already understand basic script structure, variables, and functions.

Exit Status and Failure Conventions

Every command returns an exit status in $?. By convention:

You rarely read $? directly in advanced scripts; instead you let conditionals and the shell’s own behavior do most of the work.

Examples:

cmd
status=$?
if [ "$status" -ne 0 ]; then
    echo "cmd failed with status $status" >&2
fi

Most conditionals automatically use exit statuses:

Controlling Script Exit on Errors

`set -e` / `set -o errexit`

set -e tells the shell: “exit the script if any simple command fails (returns non‑zero), with some exceptions.”

set -e  # or: set -o errexit
cp file1 file2
echo "Copy succeeded"

If cp fails, the script terminates immediately.

Traps and Pitfalls of `set -e`

set -e is useful, but its behavior is nuanced:

Example:

set -e
if grep pattern file; then   # `grep` failure does not exit script
    echo "found"
fi

Because of these subtleties, many experienced scripters:

Making `set -e` Safer: `-u`, `-o pipefail`, and `-E`

Often combined options (in Bash):

set -euo pipefail
# or more explicitly:
set -o errexit -o nounset -o pipefail

pipefail is especially important: without it, only the exit status of the last command in a pipeline is visible.

set -e
false | true
echo "Still running"   # script continues; pipeline exit is 0 (= true)

With set -o pipefail, the pipeline returns non‑zero if any element fails:

set -eo pipefail
false | true
echo "Never reached"   # script exits due to 'false'

Patterns for Checking Errors

Using `&&` and `||`

&& runs the next command only if the previous succeeded; || runs the next command only if the previous failed.

Common patterns:

  build_app && deploy_app
  do_critical_thing || {
      echo "do_critical_thing failed" >&2
      exit 1
  }
  value=$(some_command) || value="default"

Be careful with cmd || exit 1 in scripts that are meant to continue for non‑critical errors. Reserve this for truly fatal conditions.

Explicit `if` Checks

For clarity, especially in longer scripts:

if ! do_step; then
    echo "ERROR: do_step failed" >&2
    # maybe clean up or skip, depending on the situation
    exit 1
fi

This is more verbose but easier to read and extend (e.g., logging, retries).

Handling Errors in Functions

Advanced scripts usually encapsulate tasks into functions. Error handling inside functions determines how failures propagate.

Returning Status vs Exiting

Two main styles:

  1. Functions return a status and caller decides:
   deploy() {
       rsync ...
       # rsync's status becomes the function's status
   }
   if ! deploy; then
       echo "Deploy failed" >&2
       exit 1
   fi
  1. Functions exit the whole script on fatal errors:
   fatal() {
       echo "FATAL: $*" >&2
       exit 1
   }
   deploy() {
       rsync ... || fatal "rsync failed"
   }
   deploy

Style 1 is more flexible and testable; style 2 is simpler for small scripts.

Consistent Return Codes

For predictable error handling:

Example:

# 0 = success, 1 = general error, 2 = invalid usage
backup() {
    if [ "$#" -ne 1 ]; then
        echo "Usage: backup DIR" >&2
        return 2
    fi
    # ...
}

Callers can make decisions based on specific codes.

Trapping and Handling Failures

The `trap` Builtin

trap lets you run commands when certain signals or pseudo-signals occur, including:

Cleanup on Exit

A common pattern is to do cleanup (temporary files, locks) in an EXIT trap:

tmpdir=$(mktemp -d)
cleanup() {
    rm -rf "$tmpdir"
}
trap cleanup EXIT
# use $tmpdir for work...

This cleanup runs whether the script succeeds, hits an error, or is exited manually (except for catastrophic terminations like kill -9).

Using `ERR` for Centralized Error Handling

In Bash, with set -e and set -E, you can centralize error reporting:

set -Eeuo pipefail
error_handler() {
    local exit_code=$?
    local line_no=$1
    echo "ERROR on line $line_no (exit code $exit_code)" >&2
}
trap 'error_handler $LINENO' ERR

Notes:

With this pattern, any unhandled failure triggers a standardized diagnostic.

Trapping Interrupts (Ctrl+C)

To avoid leaving half-done operations:

cleanup() {
    echo "Cleaning up..." >&2
    # remove temp files, unlock things, etc.
}
trap cleanup INT TERM
long_running_task

If the user presses Ctrl+C, cleanup runs before the script exits.

Pipeline Error Handling

Without pipefail, only the last command’s exit status is visible:

grep pattern file | sort | uniq
echo $?   # status of 'uniq', not 'grep'

To correctly handle errors anywhere in the pipeline:

  1. Enable pipefail:
   set -o pipefail
  1. Optionally check PIPESTATUS (Bash-specific array):
   set -o pipefail
   grep pattern file | sort | uniq
   status_grep=${PIPESTATUS[0]}
   status_sort=${PIPESTATUS[1]}
   status_uniq=${PIPESTATUS[2]}
   if [ "$status_grep" -ne 0 ]; then
       echo "grep failed with $status_grep" >&2
   fi

For most scripts, set -o pipefail plus set -e (or manual if checks) is sufficient.

Validating Inputs and Preconditions

Many errors can be prevented by upfront checks. This is part of error handling too.

Checking Arguments

Use explicit argument validation:

usage() {
    echo "Usage: $0 INPUT_FILE OUTPUT_DIR" >&2
    exit 2
}
[ "$#" -eq 2 ] || usage

Checking Files and Commands

Use test operators before doing critical work:

input=$1
if [ ! -f "$input" ]; then
    echo "ERROR: $input does not exist or is not a regular file" >&2
    exit 1
fi
if ! command -v jq >/dev/null 2>&1; then
    echo "ERROR: 'jq' is required but not installed" >&2
    exit 1
fi

By failing fast here, you avoid confusing errors deeper in the script.

Retrying and Recovery

Sometimes failures are transient (network hiccups, temporary locks). Instead of exiting immediately, you may want to retry.

Simple Retry Loop

retry() {
    local attempts=$1; shift
    local delay=$1; shift
    local i
    for ((i=1; i<=attempts; i++)); do
        if "$@"; then
            return 0
        fi
        echo "Attempt $i failed; retrying in $delay seconds..." >&2
        sleep "$delay"
    done
    echo "ERROR: command failed after $attempts attempts: $*" >&2
    return 1
}
# Usage:
retry 5 2 curl -fsS "https://example.com/api"

Rollback on Failure

For operations that change state (e.g., deployment), plan a rollback:

deploy_new_version() {
    backup_existing
    if ! install_new; then
        echo "Install failed, rolling back..." >&2
        restore_backup
        return 1
    fi
}

Rollback logic can get complex; scripts should keep it as straightforward and idempotent as possible.

Logging and Diagnostics

Good error handling includes good messages:

Example helper:

log_error() {
    echo "[$(date +'%Y-%m-%d %H:%M:%S')] ERROR: $*" >&2
}
do_step || log_error "do_step failed with status $?"

When combined with trap 'error_handler $LINENO' ERR, you can get both context and line numbers.

Defensive Patterns and Best Practices

By combining exit status conventions, set options, traps, and structured patterns (retries, cleanup, validation), you can write shell scripts that behave predictably even when things go wrong.

Views: 20

Comments

Please login to add a comment.

Don't have an account? Register now!