7.4 Writing Linux Tools

Table of Contents

Overview

Linux invites you to build your own tools. Many of the commands you already know started as small utilities written by someone who needed to automate a task or make a repetitive job easier. In this chapter you will see how to think about Linux tools as building blocks, how to design your own, and how to approach implementation across three common choices, Bash, Python, and C, without repeating the detailed language material that is covered in later child chapters.

The focus here is not on full language tutorials. Instead, you will learn how Linux tools behave, how they interact with the system and with each other, and what conventions make them feel native on a Unix like system.

What Makes a Program a “Linux Tool”?

A Linux tool is usually a command line program that follows a handful of strong conventions. These conventions make tools easy to combine, automate, and reuse.

A typical Linux tool:

It reads input from standard input, from files, or from the network. It writes output to standard output and errors to standard error. It uses exit status codes to signal success or failure. It can be combined with other tools using pipes and redirection. It is controlled by options and arguments, not interactive menus, unless it is a full screen tool.

You already know examples like ls, grep, find, and tar. Your own tools should aim to behave in a similar way so that users can predict how to use them.

There is an informal “Unix philosophy” that guides tool design: programs should do one thing well, work together, and handle text streams. You do not need to follow this strictly in every project, but it is a good starting point for tool design on Linux.

Command Line Arguments and Options

Command line tools are configured primarily through arguments and options. Arguments are the values you pass to a command. Options change how the command behaves.

For example, in:

grep -i pattern file.txt

grep is the program name. -i is an option. pattern and file.txt are arguments.

There are common conventions you should follow:

Short options use a single dash and a single letter, such as -h or -v. Long options use two dashes and a word, such as --help or --version. Options that take values usually either use a space or an equals sign, such as -o output.txt or --output=output.txt. Many commands let you combine short options, such as ls -la instead of ls -l -a.

In Bash, you typically parse options with utilities like getopts. In Python, you make use of standard modules such as argparse. In C, you often use getopt or getopt_long. Those implementations are covered in the child chapters, but the design questions are the same regardless of language.

When you design your own tools, make sure you:

Provide -h or --help to print usage. Provide --version if the tool will be installed system wide. Choose option names that match existing tools when the behavior is similar, for example, -q for quiet mode, -v for verbose, -o for output file.

Standard Streams and Pipelining

Linux tools expect to communicate using three standard streams: standard input (stdin), standard output (stdout), and standard error (stderr).

By default, stdin reads from the keyboard and stdout and stderr write to the terminal. The shell lets users redirect and combine these streams:

command > out.txt sends standard output to a file.
command 2> err.txt sends standard error to a file.
command < in.txt reads standard input from a file.
command1 | command2 passes command1’s output as command2’s input.

For your own tools, this leads to an important rule.

Important: Write normal results to standard output, diagnostics and errors to standard error, and never mix them arbitrarily.

If you print both data and error messages to standard output, other tools in a pipeline will see a corrupted stream. For example, a script that prints JSON must not print debugging messages to standard output, only to standard error.

All implementation languages you will see in later chapters offer APIs for reading and writing the standard streams. When designing your tool, always ask: can this tool be part of a pipeline, and if so, is its output clean and predictable?

Exit Status and Error Handling

Every program on Linux finishes with an exit status, also called a return code. By convention:

Exit code 0 means success. Nonzero values indicate some kind of problem.

The shell uses this value in conditionals. For example, && runs the next command only if the previous one had exit status 0.

command && echo "ok" || echo "failed"

Your tools should follow consistent exit code rules. Many projects follow patterns like:

0 success.
1 general error.
2 misuse of shell builtins or invalid arguments.
Other codes for more specific conditions, such as 3 “not found” or 4 “permission error”.

The exact codes are up to you, but you must document them in the --help text or manual page if users are expected to react to them programmatically.

Rule: Always use exit status 0 for success and nonzero for any type of failure. Never silently ignore serious errors.

In Bash you use exit N. In Python you can use sys.exit(N). In C you return from main or call exit. The important part is to decide in the design of your tool what counts as “success” and what must be treated as an error.

Input, Output, and Text vs Binary

Many traditional Unix tools operate on text, often line by line. This is why simple commands like grep and awk are so powerful. Text tools are easy to debug and combine.

For your own tools, consider this distinction early:

If your tool works with text, make sure it can read from standard input if no file is given, and print results to standard output. Write line oriented parsing where possible, because that works well in pipelines. Clearly document the expected format, such as “one record per line” or “tab separated values”.

If your tool works with binary data, such as images or archives, you must strictly separate data from logging or user messages. Binary tools should not print anything to standard output except the binary stream itself. All human readable messages should go to standard error.

You also need to think about buffering and performance. By default, C and many runtimes buffer I/O. In a pipeline this can lead to delayed output. You can either flush explicitly or configure line buffered mode if it fits your design.

Discoverability, Help, and Documentation

A good Linux tool explains itself if the user asks. Most users will try command -h or command --help first. You should implement at least one of those and display a short usage summary.

A typical help message includes:

The command name and a one line description.
A usage line that shows the syntax, such as usage: mytool [options] FILE....
A listing of options with short and long forms and descriptions.
A note about how input and output work, such as reading from standard input by default.
Information about exit codes if they are special.

In addition to this quick help, you can provide a manual page. Traditional man pages are structured into sections like NAME, SYNOPSIS, DESCRIPTION, OPTIONS, EXAMPLES, EXIT STATUS, and so on. Tools that are intended for wide use should have at least a basic man page so that man mytool works.

From a design perspective, always keep the help text updated as you add options. Small inconsistencies in documentation quickly make a tool frustrating to learn.

Designing for Composability

One of the main reasons to write Linux tools is to use them as building blocks inside scripts and bigger systems. Composability means that your tool is:

Predictable.
Noninteractive by default.
Controllable through options.
Suitable for pipelines.

Avoid forcing interactive prompts unless explicitly requested with a flag like --interactive. In automated systems there is often no one there to respond to prompts, so they will hang.

Prefer “filter” style behavior when it fits. A filter reads from standard input and writes to standard output. For example:

generate | myfilter | sort | uniq

Your own tool can take on different roles:

A data source, like ps or dmesg.
A transformer, like sed or tr.
A sink, like tee or xargs, which acts on data.

When you design the tool, ask: what streams does it consume and what does it produce? If you keep the interface clean, your tool may be useful far beyond what you originally intended.

Configuration, Defaults, and Environment Variables

Linux tools often get configuration from several layers: command line options, environment variables, and configuration files.

A common priority order is:

Command line options override everything else.
Environment variables provide user specific defaults.
Configuration files provide system or user wide defaults.
If nothing is configured, use reasonable built in defaults.

For example, a tool might read a default directory from an environment variable like MYTOOL_DIR, but let the user override it with --dir. The tool might also have a system configuration file in /etc/mytool.conf and a per user configuration file in $HOME/.config/mytool/config.

You do not have to implement all of these in every tool, but understanding the pattern helps you design tools that integrate well in Unix like environments. The child chapters will show how to read environment variables and files in Bash, Python, and C.

Installing and Packaging Your Tools

To turn a private script into a “real” Linux tool, you must make it easy to install and run.

At the simplest level, you can place an executable file somewhere in the user’s PATH, such as /usr/local/bin for system wide tools or $HOME/bin for your own tools. Make sure the file has the executable bit set and, for scripts, a proper shebang line like #!/usr/bin/env bash or #!/usr/bin/env python3.

For larger tools, you may want to:

Install support files in appropriate directories, for example libraries in /usr/local/lib, data in /usr/local/share/mytool, configuration in /etc/mytool.
Provide a man page in /usr/local/share/man.
Add shell completions for Bash, Zsh, or Fish so that command line options are easier to discover.

Eventually you might package your tool for distributions. This involves creating .deb packages for Debian like systems, .rpm packages for Red Hat like systems, or other formats. Packaging itself is a broader topic, but when you design a tool, think ahead about where its files will live, and avoid hard coding paths that will not work on other systems.

Security and Safety Considerations

Linux tools often run on servers and in automation where security matters. Even small utilities can become attack surfaces if they handle input from untrusted sources.

Some key principles apply no matter which language you use:

Never trust input from users or from the network. Always validate and sanitize it.
Avoid calling external programs with unsanitized data, especially through the shell, because this can lead to command injection.
Handle files carefully. Be cautious with temporary files, race conditions, and following symlinks. Prefer secure temporary file APIs instead of rolling your own file names in /tmp.
Check permissions and fail clearly if your tool needs privileges it does not have, instead of trying to work around security.

If your tool needs elevated privileges, consider separating a small privileged helper from an unprivileged main component, and keep the privileged part as simple as possible. These structures are more common in advanced tools but the mindset is worth developing early.

Performance and Scalability

At first, your tools may run on small datasets. Over time, scripts grow into critical utilities used in cron jobs and production systems. Thinking about performance early can save you from large rewrites.

Some general approaches apply regardless of language:

Prefer streaming processing to loading everything into memory. For text, read one line at a time, process it, and write it out.
Avoid unnecessary external processes inside loops. For example, running grep 10,000 times in a loop is usually worse than restructuring the problem so that one grep processes many lines.
Measure before optimizing. Use timing tools and profilers instead of guessing where the bottleneck is.
Consider concurrency only when single threaded performance is insufficient and you understand the implications.

Bash is convenient but slower and less suited to heavy computation. Python is good for higher level logic and moderate datasets. C offers the best control and performance at the cost of complexity. In practice, many Linux utilities are hybrids, for example Bash scripts that call C commands or Python tools that rely on C libraries.

Choosing the Right Language for Your Tool

You have three subchapters focused on Bash, Python, and C. From a design perspective, you can think about the tradeoffs like this:

Bash based tools are ideal for gluing existing commands together. They are excellent for short automation tasks, system administration helpers, and small filter like utilities. They are less suitable for complex data structures, large datasets, or long running services.

Python command line tools are a good choice when you need more structure, libraries, or portability while still writing relatively concise code. Python has rich libraries for networking, parsing, and APIs, and produces readable tools that others can maintain.

C based tools match traditional Unix utilities. They have the closest access to Linux system calls and can be highly efficient. They are more verbose and require careful handling of memory and low level details, but they are often chosen when performance and close integration with the kernel and C libraries are critical.

You do not need to commit to a single language forever. Sometimes the right solution is to prototype in Bash, reimplement in Python as the logic grows, and finally write performance sensitive parts in C if needed. The important part is to keep the interface of your tool stable so that users and scripts are not broken by internal changes.

Testing and Maintenance

Writing a Linux tool is not only about its first version. Maintenance and testing are what turn a quick script into a dependable utility.

You can test tools at several levels:

Manual tests with sample input and output.
Automated tests that run commands and compare their output and exit codes to expected values.
Integration tests that combine your tool with others in realistic workflows.

Pay special attention to edge cases. Test empty input, huge input, invalid options, missing files, insufficient permissions, and non UTF 8 data if you process text. A robust tool fails gracefully and explains why.

Over time, you will need to refactor and extend tools. Clear code structure, a small set of well defined responsibilities, and good documentation will make this much easier. Even short Bash scripts benefit from comments and consistent style when they grow beyond a few lines.

Putting It Together

Writing Linux tools is about much more than learning a programming language. It is about understanding and respecting the environment they run in: the shell, the filesystem, the process model, and the expectations of users and other programs.

In the following subchapters you will see how to put these principles into practice in Bash, Python, and C. As you explore the examples, pay attention to how they:

Parse arguments and options.
Interact with standard streams.
Return meaningful exit codes.
Compose with other tools.
Follow conventions that make them feel like native parts of the system.

If you design with these ideas in mind, even your earliest tools will be useful, composable, and easy to adopt in real Linux workflows.

7.4.1 Bash-based tools

7.4.2 Python command-line tools

7.4.3 Writing tools in C

7.4.4 Using Linux system APIs

Comments

Please login to add a comment.

Don't have an account? Register now!