7.4.3 Writing tools in C

Table of Contents

Why Write Linux Tools in C

C is the native language of Unix and Linux systems. The kernel, the standard C library, and most classic command line utilities are written in C. When you write tools in C, you work close to the system while still using a portable, well supported language. This is useful when you need performance, fine grained control over resources, or direct access to low level system calls and interfaces.

This chapter focuses on what is specific about writing Linux command line tools in C. It assumes you already understand general C syntax and basic compilation, and it does not repeat broader topics about Linux system APIs or generic programming patterns that appear elsewhere in this course.

Basic Structure of a C Command Line Tool

A Linux command line tool written in C is usually a small program organized around a main function, argument parsing, one or more core operations, and a clean exit. The simplest possible tool that does something observable prints to standard output and returns a status code to the shell.

A minimal example looks like this:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
    printf("Hello from a C tool!\n");
    return EXIT_SUCCESS;
}

The signature int main(int argc, char *argv[]) is standard. argc is the argument count, including the program name itself, and argv is an array of C strings. The exit status is an integer where 0 usually means success and any nonzero value indicates some form of error.

On Linux, you turn this source file into an executable with gcc:

gcc -Wall -Wextra -O2 -o mytool mytool.c

Here -Wall and -Wextra enable useful warnings, -O2 enables a reasonable optimization level, and -o specifies the output filename. Using these flags from the start helps you catch mistakes early and produce more reliable tools.

Parsing Command Line Arguments

Linux users expect tools to accept arguments and options so they can be composed in scripts and pipelines. In C, you access raw arguments through argc and argv, but you usually want a friendlier interface that understands options like -v or --help.

For simple tools, you can manually inspect argv:

#include <stdio.h>
#include <string.h>
#include <stdlib.h>
int main(int argc, char *argv[]) {
    int verbose = 0;
    for (int i = 1; i < argc; i++) {
        if (strcmp(argv[i], "-v") == 0 || strcmp(argv[i], "--verbose") == 0) {
            verbose = 1;
        } else if (strcmp(argv[i], "--help") == 0) {
            printf("Usage: %s [-v|--verbose] [name]\n", argv[0]);
            return EXIT_SUCCESS;
        } else {
            printf("Hello, %s!\n", argv[i]);
        }
    }
    if (verbose) {
        fprintf(stderr, "Verbose mode enabled\n");
    }
    return EXIT_SUCCESS;
}

For more structured option parsing and to match the behavior of many standard utilities, Linux provides getopt and getopt_long in the GNU C library. These functions scan argv for options like -f or --file and handle combined options such as -abc. When you design your tool, it is good practice to follow Unix conventions, for example short options for common actions, long options for clarity, and a --help option that prints usage information and exits successfully.

A command line tool should always provide clear usage output for invalid arguments, and it should exit with a nonzero status when argument parsing fails.

Working with Standard Input and Output

Linux tools are often linked together in pipelines, so correct use of standard input, standard output, and standard error is critical. In C, the standard I/O system uses the file streams stdin, stdout, and stderr, which are associated with file descriptors 0, 1, and 2.

You typically read from stdin when no filename is provided and write results to stdout. Messages about errors or diagnostics go to stderr so they do not interfere with data processing in a pipeline.

A simple filter that converts text to uppercase demonstrates this pattern:

#include <stdio.h>
#include <ctype.h>
int main(void) {
    int ch;
    while ((ch = fgetc(stdin)) != EOF) {
        if (ch == '\n') {
            fputc('\n', stdout);
        } else {
            fputc(toupper(ch), stdout);
        }
    }
    if (ferror(stdin)) {
        perror("read error");
        return 1;
    }
    return 0;
}

This program reads characters from stdin until EOF, transforms them, and writes to stdout. If you use this tool in a shell, you can connect it into a pipeline:

cat file.txt | ./toupper

Using stderr is as simple as passing it to fprintf:

fprintf(stderr, "warning: something unusual happened\n");

Keeping data output and diagnostics separate allows users to redirect streams independently, for example ./tool >out.txt 2>errors.log.

Simple File Handling for Tools

Many Linux tools read files, process their content, and write to files or standard output. In C you can use the standard I/O library functions such as fopen, fread, fwrite, fclose, and others. For simple tools this is often enough and keeps code portable and easy to read.

A common pattern is to write a function that processes a single file pointer, and then reuse it for both regular files and stdin:

#include <stdio.h>
#include <stdlib.h>
static int process_stream(FILE *in, FILE *out) {
    char buffer[4096];
    size_t n;
    while ((n = fread(buffer, 1, sizeof(buffer), in)) > 0) {
        if (fwrite(buffer, 1, n, out) != n) {
            perror("write");
            return -1;
        }
    }
    if (ferror(in)) {
        perror("read");
        return -1;
    }
    return 0;
}
int main(int argc, char *argv[]) {
    if (argc == 1) {
        if (process_stream(stdin, stdout) != 0) {
            return EXIT_FAILURE;
        }
    } else {
        for (int i = 1; i < argc; i++) {
            FILE *f = fopen(argv[i], "rb");
            if (!f) {
                perror(argv[i]);
                continue;
            }
            if (process_stream(f, stdout) != 0) {
                fclose(f);
                return EXIT_FAILURE;
            }
            fclose(f);
        }
    }
    return EXIT_SUCCESS;
}

This design mirrors classic utilities, where the absence of filenames means “read from standard input”. Passing stdin or a file pointer into a shared processing function keeps the code small and testable.

For tools that must manage large files efficiently or use Linux specific features like sparse files or memory mapping, you would use lower level system calls through the Unix API. Those calls are discussed elsewhere, so this chapter focuses on the higher level patterns around them.

Exit Codes and Error Conventions

Exit codes are a key part of how Linux tools communicate success or failure to the shell and to scripts. In C the exit code is the integer returned from main or passed to exit. On Linux, only the lowest 8 bits of this integer are meaningful, so values between $0$ and $255$ are used.

By convention, 0 means success and any nonzero value indicates some kind of error. Different values can distinguish error types. For example, you might use 1 for general errors and 2 for misuse of command line arguments.

Always return 0 on success and a nonzero value on failure. Do not treat negative values or arbitrary large numbers as special. Scripts expect the status to follow the standard convention where 0 means success.

The GNU C library provides EXIT_SUCCESS and EXIT_FAILURE macros defined in <stdlib.h>. Using these improves clarity:

if (something_failed) {
    return EXIT_FAILURE;
}
return EXIT_SUCCESS;

When an error occurs, it is good practice to both print a useful message and set an appropriate exit code. On Linux, if a system call or library function fails, it usually sets the global errno variable. You can print a message that describes the error using perror or strerror:

#include <errno.h>
#include <string.h>
FILE *f = fopen("data.txt", "r");
if (!f) {
    fprintf(stderr, "could not open data.txt: %s\n", strerror(errno));
    return EXIT_FAILURE;
}

This pattern makes your tool behave like standard Unix programs that report the failed operation and the diagnostic string provided by the system.

Leveraging the Standard C Library on Linux

Although Linux gives you many low level system calls, much of the time you can and should work through the standard C library. Functions like fgets, fprintf, snprintf, strtol, qsort, and many others provide building blocks for data processing, input validation, and formatting.

A small tool that sums integers from standard input illustrates the use of library functions to build predictable behavior:

#include <stdio.h>
#include <stdlib.h>
#include <errno.h>
#include <string.h>
int main(void) {
    char line[1024];
    long long sum = 0;
    while (fgets(line, sizeof(line), stdin)) {
        char *endptr;
        errno = 0;
        long value = strtol(line, &endptr, 10);
        if (errno != 0) {
            perror("conversion");
            return EXIT_FAILURE;
        }
        while (*endptr == ' ' || *endptr == '\t') {
            endptr++;
        }
        if (*endptr != '\n' && *endptr != '\0') {
            fprintf(stderr, "invalid input: %s", line);
            return EXIT_FAILURE;
        }
        sum += value;
    }
    if (ferror(stdin)) {
        perror("read");
        return EXIT_FAILURE;
    }
    printf("%lld\n", sum);
    return EXIT_SUCCESS;
}

Using strtol and errno here allows the tool to distinguish between valid and invalid input and to report meaningful errors. This same pattern shows up in many Unix programs.

Organizing Code into Multiple Files

Even small Linux tools can benefit from some structure. As soon as a program grows beyond a few hundred lines, splitting it into multiple source files and headers helps you maintain it. This also makes it easier to test individual components.

A common organization is to have a main file that handles argument parsing and high level logic, and one or more files that implement specific operations. For example, you might have main.c and process.c with a shared header:

/* process.h */
#ifndef PROCESS_H
#define PROCESS_H
int process_stream(const char *name);
#endif

/* process.c */
#include "process.h"
#include <stdio.h>
int process_stream(const char *name) {
    FILE *f = fopen(name, "r");
    if (!f) {
        perror(name);
        return -1;
    }
    /* work with f */
    fclose(f);
    return 0;
}

/* main.c */
#include "process.h"
#include <stdlib.h>
int main(int argc, char *argv[]) {
    if (argc < 2) {
        return EXIT_FAILURE;
    }
    return process_stream(argv[1]) == 0 ? EXIT_SUCCESS : EXIT_FAILURE;
}

You can compile and link these with gcc in a single command:

gcc -Wall -Wextra -O2 -o mytool main.c process.c

For more complex tools you can use make or another build system, but that topic belongs to build automation rather than the core of writing C tools themselves.

Using Linux Specific Extensions Carefully

On Linux, the GNU C library and the system headers expose many extensions beyond the C standard. These include functions, macros, and constants that let you access Linux specific behaviors. When you write tools in C for Linux, you can use these extensions to integrate more closely with the system.

However, doing so reduces portability to other Unix like systems. To control this trade off, many projects define feature test macros before including any system headers. For example, you might see:

#define _GNU_SOURCE
#include <stdio.h>
#include <string.h>

The macro _GNU_SOURCE enables a broad set of GNU extensions. If your goal is to write a Linux only tool that uses features such as getline or strcasestr, this can be appropriate. If you prefer to keep your tool portable, you can avoid GNU specific features or guard them with conditional compilation.

Decide early whether your tool must be portable across Unix like systems or whether it targets Linux only. Once you use GNU or Linux specific APIs, other systems may no longer build or run your code.

Using extensions can simplify implementations. For example, getline handles dynamic line buffering, and asprintf allocates formatted strings automatically. When you use such functions, always consult their documentation on your Linux system so you understand their behavior and return value conventions.

Testing and Integrating C Tools with the Shell

Linux command line tools written in C are usually tested from the shell. You can create small input files, run your tool with various arguments, and compare its output to expected results. This interactive feedback loop is part of the Unix development style.

One simple approach uses shell redirection and the diff command. Suppose your tool reads from standard input and writes to standard output. You can prepare an input file, store the expected output, then run:

./mytool <input.txt >output.txt
diff -u expected.txt output.txt

If diff reports no differences, the test passes. For more complex behavior, you can write shell scripts that call your C tool multiple times with different arguments and inspect exit codes via $?. Because exit codes and text output are the primary interfaces for Unix commands, this kind of testing closely reflects how real users and scripts will interact with your tool.

When you iterate on a C tool, this loop is straightforward. Edit the code, compile with gcc, rerun your test script, and refine as needed. Over time, you can collect these tests into a lightweight regression suite that guards against accidental behavior changes.

Summary

Writing Linux tools in C involves organizing your program around main, parsing command line arguments, handling standard input and output carefully, managing errors through messages and exit codes, and using the C library and, when appropriate, Linux specific extensions. By following Unix conventions for options, streams, and status codes, your tools will fit naturally into the Linux environment and interact well with other commands and scripts.

Comments

Please login to add a comment.

Don't have an account? Register now!