Table of Contents
Why Batch Processing Matters
Many tasks in MATLAB involve repeating the same operation on many files. For example, you might need to read all CSV files in a folder, apply the same calculation to each, and save the results. Doing this by hand is slow and error prone. Batch processing is the idea of writing code that automatically loops over files and performs the same steps on each one without manual intervention.
In this chapter, you focus on how to find sets of files, move through them systematically, and combine this with MATLAB code you already know to automate repetitive work.
Finding Multiple Files with `dir`
The starting point for batch processing of files is to obtain a list of files that match some pattern. MATLAB provides the dir function for this. The simplest form returns information about all files and folders in the current folder:
listing = dir;
The variable listing is a structure array. Each element corresponds to one file or folder and contains fields like name, folder, date, bytes, and isdir.
For batch processing, you almost always use a pattern argument with dir, such as all files with a certain extension:
txtFiles = dir('*.txt');
csvFiles = dir('*.csv');
matFiles = dir('*.mat');
Now txtFiles contains only files whose names end with .txt. You then process each element in this array in a loop.
If your files are in a different folder, you include the folder in the pattern:
dataFiles = dir('C:\data\measurements\*.csv');or build the pattern from separate pieces:
dataFolder = 'C:\data\measurements';
pattern = fullfile(dataFolder, '*.csv');
dataFiles = dir(pattern);
Using fullfile is important for building paths in a platform independent way. fullfile automatically inserts the correct file separator for the operating system.
Building Full File Paths
The elements returned by dir include the file name but, for many operations, you need the full path. A frequent pattern in batch scripts is:
files = dir('*.csv');
for k = 1:numel(files)
thisName = files(k).name;
thisFolder = files(k).folder;
fullFileName = fullfile(thisFolder, thisName);
% Use fullFileName with read or write functions
end
Using fullfile(files(k).folder, files(k).name) ensures that your code works even if you change the current folder before or during processing, or if the files are not in the current folder to begin with.
You can check whether a name corresponds to a file or a folder by using the isdir field:
if ~files(k).isdir
% Process only entries that are files
end
This is often useful if you used a broad pattern, or if you want to skip . and .. directory entries when listing all contents.
Looping Over Files
Once you have a list of files and know how to form the full path, you can combine this with control flow to perform batch operations. A very common pattern is a for loop over the index of the structure array returned by dir.
For example, suppose you want to read each CSV file, compute the mean of some column, and print it:
csvFiles = dir('*.csv');
for k = 1:numel(csvFiles)
fullName = fullfile(csvFiles(k).folder, csvFiles(k).name);
T = readtable(fullName);
colMean = mean(T.Value); % assuming table has column 'Value'
fprintf('File: %s, mean(Value) = %.3f\n', csvFiles(k).name, colMean);
end
This structure is the core of many batch processing scripts. Inside the loop body, you perform whatever operations you need. You might read data using readtable, readmatrix, imread, audioread, or load, then perform calculations or transformations, then save results.
You can also preallocate space to save results from each file. For instance, if you want to store one numeric result per file:
csvFiles = dir('*.csv');
nFiles = numel(csvFiles);
means = zeros(nFiles, 1);
for k = 1:nFiles
fullName = fullfile(csvFiles(k).folder, csvFiles(k).name);
T = readtable(fullName);
means(k) = mean(T.Value);
end
Now means contains the mean for each file in the same order as csvFiles.
Applying Different Operations in a Batch
Batch processing is not limited to reading numeric data. The principle is always the same: obtain a list of relevant files, iterate over that list, and apply an operation to each file.
You might, for example, resize all images in a folder:
imageFiles = dir('*.png');
for k = 1:numel(imageFiles)
inName = fullfile(imageFiles(k).folder, imageFiles(k).name);
I = imread(inName);
I2 = imresize(I, 0.5);
[~, baseName, ~] = fileparts(imageFiles(k).name);
outName = fullfile(imageFiles(k).folder, [baseName '_small.png']);
imwrite(I2, outName);
end
In this example, fileparts is used to separate the base file name from its extension so that you can construct a new output name.
You could also convert all .mat files to .csv:
matFiles = dir('*.mat');
for k = 1:numel(matFiles)
inName = fullfile(matFiles(k).folder, matFiles(k).name);
S = load(inName); % loads into structure S
varNames = fieldnames(S);
data = S.(varNames{1}); % use first variable for simplicity
[~, baseName, ~] = fileparts(matFiles(k).name);
outName = fullfile(matFiles(k).folder, [baseName '.csv']);
writematrix(data, outName);
endThe exact operations inside the loop depend on your task, but the structure of listing files, looping, reading, transforming, and saving is the same.
Controlling Which Files to Process
Sometimes you do not want to process every file in a folder. You might need to skip certain files by name, by size, or by some pattern inside the name. The structure from dir gives you the information you need to add simple checks inside the loop.
For instance, to skip files whose names start with "test_":
files = dir('*.csv');
for k = 1:numel(files)
name = files(k).name;
if startsWith(name, 'test_')
continue;
end
fullName = fullfile(files(k).folder, name);
% Process fullName here
end
You can filter by size using the bytes field:
files = dir('*.csv');
for k = 1:numel(files)
if files(k).bytes == 0
continue; % skip empty files
end
fullName = fullfile(files(k).folder, files(k).name);
% Process fullName
endThis type of conditional logic lets you adapt your batch script to the specific data you have.
Recursively Processing Subfolders
In many projects, files are stored in multiple subfolders. You may want to apply the same processing to all relevant files in a folder tree. MATLAB allows recursive directory listing by using dir with the ** wildcard in recent versions, or by writing your own recursion using dir and checking isdir.
A simple modern approach that searches within the current folder and all subfolders for .csv files is:
csvFiles = dir('**/*.csv');
for k = 1:numel(csvFiles)
fullName = fullfile(csvFiles(k).folder, csvFiles(k).name);
% Process fullName
end
Here, folder in csvFiles(k).folder is the path to the folder that contains the file, regardless of how deep it is in the folder tree.
If you want to keep track of what folder each file came from, you can use csvFiles(k).folder directly in your output naming or logging.
Naming and Saving Output Files
Batch processing almost always produces new files. A common requirement is that each input file corresponds to one output file, with a related name. You have already seen that you can use fileparts to split a file name into parts.
For example, if you are reading data1.csv, data2.csv, and so on, and you want to save processed results as data1_processed.csv, data2_processed.csv, you might write:
files = dir('*.csv');
for k = 1:numel(files)
inFull = fullfile(files(k).folder, files(k).name);
data = readmatrix(inFull);
result = mean(data, 2);
[~, baseName, ~] = fileparts(files(k).name);
outName = [baseName '_processed.csv'];
outFull = fullfile(files(k).folder, outName);
writematrix(result, outFull);
endYou can also choose to save output in a different folder. Ensure that the output folder exists before writing to it. If necessary, you can create it in your script:
inputFolder = pwd;
outputFolder = fullfile(inputFolder, 'results');
if ~exist(outputFolder, 'dir')
mkdir(outputFolder);
end
files = dir(fullfile(inputFolder, '*.csv'));
for k = 1:numel(files)
inFull = fullfile(files(k).folder, files(k).name);
data = readmatrix(inFull);
% Perform processing...
[~, baseName, ~] = fileparts(files(k).name);
outFull = fullfile(outputFolder, [baseName '_processed.csv']);
writematrix(data, outFull);
endSeparating input and output folders reduces the risk of overwriting original data and keeps the folder structure more organized.
Dealing with Errors in Batch Scripts
When processing many files, it is common that some files have problems. The format might be slightly different, the file might be corrupted, or the file might be in use. You usually do not want an entire batch to stop because of one bad file. You can handle such situations by wrapping your per file processing code in try and catch blocks so that the script continues even if one file causes an error.
A typical pattern is:
files = dir('*.csv');
for k = 1:numel(files)
fullName = fullfile(files(k).folder, files(k).name);
try
data = readmatrix(fullName);
% Process data here
fprintf('Processed: %s\n', files(k).name);
catch ME
fprintf('Error in file %s: %s\n', files(k).name, ME.message);
% Optionally log ME for later inspection
end
endThis pattern gives you robustness. You can review error messages afterward to see which files failed and why, while successfully processed files still complete.
You can also decide to skip certain known problematic patterns by combining try and catch with conditional checks.
Logging Progress and Results
Batch processes can run for a long time, especially when many files are involved. It is helpful to log progress so that you can see what the script is doing and where it is up to. You can use fprintf in the Command Window, or write logs to a text file.
A simple on screen progress message might be:
files = dir('*.csv');
nFiles = numel(files);
for k = 1:nFiles
fprintf('Processing file %d of %d: %s\n', k, nFiles, files(k).name);
fullName = fullfile(files(k).folder, files(k).name);
% Process the file...
end
If you prefer to log to a file, you can open a file handle with fopen, write with fprintf, and close it at the end. This creates a record of what your batch run did.
Combining Batch Processing with Scripts
Batch processing is especially effective when you place your code in separate script files. One script might define settings, such as folders and patterns, and call another script or function that performs the actual work for each file. This keeps your batch processing code organized and easier to maintain.
For example, you can write a function processSingleFile that takes a file name and performs the processing, and a script runBatch that uses dir to loop over all files and call processSingleFile. This separation lets you test processSingleFile on one file before running it on many files.
A simple example of such a setup is:
% In processSingleFile.m
function processSingleFile(fullFileName)
data = readmatrix(fullFileName);
result = mean(data, 1);
[folder, baseName, ~] = fileparts(fullFileName);
outFull = fullfile(folder, [baseName '_summary.csv']);
writematrix(result, outFull);
end% In runBatch.m
files = dir('*.csv');
for k = 1:numel(files)
fullName = fullfile(files(k).folder, files(k).name);
processSingleFile(fullName);
endWith this structure, you can quickly change how a single file is processed without rewriting the batch loop.
Important points to remember for batch processing of files:
Use dir with patterns, such as '.csv' or '*/*.png', to obtain lists of files for batch operations.
Always build full file paths with fullfile(files(k).folder, files(k).name) before reading or writing to avoid issues with the current folder.
Loop over files with for and apply the same sequence of operations inside the loop body to automate repetitive tasks.
Use fileparts to construct meaningful output file names and consider writing outputs to a separate folder to protect original data.
Add simple checks and try and catch blocks so that one problematic file does not stop the entire batch process.
Log progress and results, for example with fprintf, so you can monitor long batch jobs and inspect any errors afterward.