Kahibaro
Discord Login Register

2.2.4 Wildcards and globbing

Understanding Wildcards and Globbing

Working in the shell, you will often want to act on many files at once without typing every name. Wildcards and globbing are the tools that let you describe groups of filenames with patterns instead of listing each file manually.

Globbing is the name for the shell feature that expands these patterns into matching filenames before a command runs. The patterns themselves are called wildcards. This chapter focuses on how these patterns work, how the shell expands them, and how to avoid common mistakes.

How Globbing Works

When you type a command that contains wildcard characters, the shell examines each word that includes those characters. Before it runs the command, it replaces each pattern with a list of matching pathnames from the filesystem.

For example, if your directory contains a.txt, b.txt, and c.log, and you type:

bash
ls *.txt

the shell does not literally run ls .txt. Instead, it searches for filenames that match .txt, finds a.txt and b.txt, and actually runs:

bash
ls a.txt b.txt

This replacement step is called glob expansion or globbing and it happens before the program (ls in this case) even starts. The program never sees the wildcard characters, it only sees the expanded list of paths.

Important rule: Globbing is done by the shell before the command executes. Programs usually do not interpret *, ?, or [] themselves.

Because the shell performs this expansion, different shells can have slightly different globbing behavior or extra features, but the basic patterns are the same across all common shells.

The Asterisk `*`

The asterisk wildcard is the most frequently used pattern. It matches any sequence of characters, including no characters at all, within a single path component.

If a directory contains report.txt, report-final.txt, image.png, and data.csv, then:

bash
ls *

matches all visible entries in the current directory. It is equivalent to listing everything, except files that are hidden and begin with a dot.

bash
ls *.txt

matches report.txt and report-final.txt, because they both end with .txt.

bash
ls report*

matches any file whose name starts with report, for example report, report.txt, and report-final.txt.

The * character never crosses a directory separator / when globbing. This means that:

bash
ls *

matches entries in the current directory, but it does not automatically match files inside subdirectories.

bash
ls dir/*

matches everything directly inside the directory dir, but not deeper levels.

The Question Mark `?`

The question mark wildcard matches exactly one character within a single path component. It never matches zero characters and, like *, it does not cross a /.

If a directory contains a.txt, b.txt, ab.txt, and abc.txt, then:

bash
ls ?.txt

matches a.txt and b.txt, because they are exactly one character followed by .txt.

bash
ls ??*.txt

matches filenames where the first two characters can be anything and are followed by any number of additional characters and end with .txt. In the set above, it would match ab.txt and abc.txt.

Question marks are useful when you know the exact length of a part of the filename or want to distinguish between names that only differ by one character.

Bracket Expressions `[...]`

Bracket expressions allow you to match one character from a specific set or range of characters. You place the allowed characters between square brackets.

In a directory with file1.txt, file2.txt, fileA.txt, and fileB.txt, the pattern:

bash
ls file[12].txt

matches file1.txt and file2.txt because the bracket expression [12] matches either 1 or 2.

You can define ranges using a hyphen between two characters. For example:

bash
ls file[0-9].txt

matches any file with a single digit from 0 to 9 in that position, for instance file1.txt or file7.txt.

Similarly:

bash
ls file[a-z].txt

matches names where that position is any lowercase letter from a to z, such as filea.txt or filem.txt.

You can combine individual characters and ranges in the same brackets. For example:

bash
ls file[0-3A-C].txt

matches file0.txt, file1.txt, file2.txt, file3.txt, fileA.txt, fileB.txt, and fileC.txt.

The bracket expression always matches one character at that position, never more and never zero.

Negated Bracket Expressions `[^...]` or `[!... ]`

You can invert a bracket expression to match any character except the ones listed. In many shells, placing a caret ^ or an exclamation mark ! immediately after the opening bracket turns it into a negated pattern.

For example:

bash
ls file[!0-9].txt

matches files like filea.txt or fileX.txt, that have a single character in that position which is not a digit.

Some shells prefer [!...] syntax, so:

bash
ls file[^0-9].txt

and

bash
ls file[!0-9].txt

may behave the same, depending on the shell. The important point is that the character right after the opening [ determines whether the bracket list is normal or negated.

Hidden Files and Globbing

Files whose names begin with a dot are considered hidden. Basic glob patterns like *, ?, and [...] do not match those names unless the pattern itself starts with a dot.

If a directory contains .config, .bashrc, and notes.txt, then:

bash
ls *

matches notes.txt but not .config or .bashrc.

To match hidden files, your pattern must explicitly begin with a dot. For example:

bash
ls .*

matches .config and .bashrc along with . and .. which represent the current directory and parent directory. To avoid matching . and .., you can use a more specific pattern such as:

bash
ls .[^.]*

which matches names that start with a dot and whose second character is not a dot.

Important rule: Normal wildcard patterns do not match hidden files. Use a pattern that starts with . if you need to include them.

Recursive Globbing and `**`

Some shells support a special pattern ** that can match files recursively in subdirectories when combined with certain options.

In Bash, if the shell option globstar is enabled, the pattern:

bash
ls **/*.txt

can match all .txt files in the current directory and all its subdirectories at any depth. Without that option, ** usually behaves like *.

Since recursive globbing is a shell feature, its availability and exact behavior can differ between shell types and configuration. When you rely on it, you should be aware that not all systems or shells have it enabled by default.

Globbing and Directories

Globbing patterns can include directory components. The shell tries to match each component in order, treating / as a separator that globs do not cross.

If you have a directory structure like:

text
logs/
logs/app1.log
logs/app2.log
logs/old/app1-2022.log
logs/old/app2-2022.log

then:

bash
ls logs/*.log

matches logs/app1.log and logs/app2.log but not the files inside logs/old.

To match the files inside logs/old using simple globbing, you must include that directory explicitly:

bash
ls logs/old/*.log

You can use patterns in directory names as well. For example, if you had logs1/, logs2/, and each contained .log files, you could write:

bash
ls logs*/app*.log

This pattern matches files whose paths start with logs, followed by any characters, followed by /app, then any characters, then .log.

Globbing vs Regular Expressions

Glob patterns and regular expressions both describe patterns, but they are not the same thing and they are handled by different tools.

Globs are interpreted by the shell to match filenames. They use , ?, and [...]. Regular expressions are used by tools such as grep, sed, or awk and have their own syntax, including characters like ., +, , ?, |, and parentheses with special meanings.

Typing a regular expression directly into the shell without quoting it does not cause the shell to interpret it as a regular expression. Instead, the shell may treat parts of it as glob patterns or as ordinary characters.

Understanding that globbing is filename expansion done by the shell, while regular expressions are text matching done by individual programs, helps avoid confusion.

When Patterns Do Not Match

Behavior when a glob pattern matches no files can differ between shells and settings. In many default configurations, if a pattern does not match any filenames, the shell leaves the pattern unexpanded and passes it unchanged to the command.

For instance, if there are no .bak files and you run:

bash
rm *.bak

some shells will have rm see the literal argument .bak. The rm command then tries to remove a file named literally .bak, which does not exist, and prints an error.

Other shells or options can treat unmatched patterns differently and may remove them from the argument list completely or might treat this situation as an error.

When you rely on wildcard patterns, it is useful to remember that no matches is a distinct case and it might not behave as you expect for destructive commands.

Quoting to Disable Globbing

Sometimes you want to prevent the shell from doing globbing expansion and instead pass a pattern unchanged to a command, especially when you are working with tools that have their own pattern systems.

You can do this by quoting the pattern with single quotes or double quotes. For example:

bash
grep "*.txt" filelist.txt

passes the literal string .txt to grep, because the quotes prevent the shell from expanding the . The grep program then searches for the characters *, ., t, x, t in the file, instead of searching for text in all .txt files.

If you remove the quotes:

bash
grep *.txt

the shell expands *.txt into a list of all files that match, and then grep sees each of those filenames as arguments instead of the pattern.

Quoting is therefore essential when you want a program to interpret special characters itself instead of letting the shell expand them.

Important rule: Use quotes around wildcard characters when you want to prevent the shell from expanding them and pass them literally to a command.

Escaping Wildcard Characters

Another way to stop the shell from treating a wildcard character specially is to escape it with a backslash \. Placing a backslash before *, ?, or [ causes the shell to remove the backslash and treat the next character as ordinary.

For example:

bash
printf 'pattern: *\n'

prints pattern: instead of trying to expand as a glob.

Similarly:

bash
ls file\?.txt

matches a file named literally file?.txt, because the shell does not treat ? as a wildcard here.

Escaping is useful when you want to use wildcard characters in text, patterns for other tools, or when working with filenames that literally contain those characters.

Practical Uses and Common Patterns

Globbing is often used to give commands many filenames at once. There are several practical patterns that appear frequently in everyday work.

One common case is acting on all files of a particular type. For instance:

bash
rm *.tmp

removes all files in the current directory whose names end with .tmp.

When organizing images, you might copy all .jpg files into a directory named images with:

bash
cp *.jpg images/

You can also combine patterns to be more selective. Suppose you have report-2021.txt, report-2022.txt, and summary-2022.txt. The pattern:

bash
ls report-202?.txt

matches both report-2021.txt and report-2022.txt, because the ? stands for one digit.

Bracket expressions can handle file names containing numeric sequences, like:

bash
ls photo_[01][0-9].jpg

which might match photo_00.jpg through photo_19.jpg and skip files that do not fit that numeric pattern.

Understanding how these patterns expand allows you to construct useful combinations to save typing and reduce mistakes.

Safety Considerations with Globbing

Using wildcards, especially with commands that delete or overwrite files, requires care. Once the shell expands a glob, the command does not know which arguments were produced by a pattern and which were typed explicitly.

Because * matches everything visible, the command:

bash
rm *

deletes all non hidden files in the current directory. If you run this in an important directory by mistake, it can cause significant data loss.

It is often safer to test your patterns first with a harmless command such as ls or echo to see which filenames they will match. For example:

bash
echo *.log

shows you which log files match before you run a command that modifies or removes them.

Another useful approach is to combine globbing with interactive options of commands, like rm -i, so that you confirm each deletion even when many files are involved.

Important rule: Test wildcard patterns with a non destructive command first, especially before using them with rm or other destructive operations.

Summary

Wildcards and globbing allow you to work efficiently with groups of files by describing patterns of names rather than listing each name manually. Globbing is a feature of the shell that expands patterns using *, ?, and bracket expressions into lists of matching pathnames.

The asterisk matches any sequence of characters, the question mark matches exactly one character, and bracket expressions match specific sets or ranges of characters, including their negated forms. Hidden files require patterns that explicitly start with a dot. Quoting or escaping wildcard characters prevents globbing when you want to pass patterns literally to commands.

By combining these basic elements, you can express rich sets of filenames in simple commands while staying aware of how the shell performs expansion and how to avoid unintended matches.

Views: 7

Comments

Please login to add a comment.

Don't have an account? Register now!