Kahibaro
Discord Login Register

Batch Processing of Files

Why Batch Processing Matters

Many tasks in MATLAB involve repeating the same operation on many files. For example, you might need to read all CSV files in a folder, apply the same calculation to each, and save the results. Doing this by hand is slow and error prone. Batch processing is the idea of writing code that automatically loops over files and performs the same steps on each one without manual intervention.

In this chapter, you focus on how to find sets of files, move through them systematically, and combine this with MATLAB code you already know to automate repetitive work.

Finding Multiple Files with `dir`

The starting point for batch processing of files is to obtain a list of files that match some pattern. MATLAB provides the dir function for this. The simplest form returns information about all files and folders in the current folder:

matlab
listing = dir;

The variable listing is a structure array. Each element corresponds to one file or folder and contains fields like name, folder, date, bytes, and isdir.

For batch processing, you almost always use a pattern argument with dir, such as all files with a certain extension:

matlab
txtFiles = dir('*.txt');
csvFiles = dir('*.csv');
matFiles = dir('*.mat');

Now txtFiles contains only files whose names end with .txt. You then process each element in this array in a loop.

If your files are in a different folder, you include the folder in the pattern:

matlab
dataFiles = dir('C:\data\measurements\*.csv');

or build the pattern from separate pieces:

matlab
dataFolder = 'C:\data\measurements';
pattern = fullfile(dataFolder, '*.csv');
dataFiles = dir(pattern);

Using fullfile is important for building paths in a platform independent way. fullfile automatically inserts the correct file separator for the operating system.

Building Full File Paths

The elements returned by dir include the file name but, for many operations, you need the full path. A frequent pattern in batch scripts is:

matlab
files = dir('*.csv');
for k = 1:numel(files)
    thisName = files(k).name;
    thisFolder = files(k).folder;
    fullFileName = fullfile(thisFolder, thisName);
    % Use fullFileName with read or write functions
end

Using fullfile(files(k).folder, files(k).name) ensures that your code works even if you change the current folder before or during processing, or if the files are not in the current folder to begin with.

You can check whether a name corresponds to a file or a folder by using the isdir field:

matlab
if ~files(k).isdir
    % Process only entries that are files
end

This is often useful if you used a broad pattern, or if you want to skip . and .. directory entries when listing all contents.

Looping Over Files

Once you have a list of files and know how to form the full path, you can combine this with control flow to perform batch operations. A very common pattern is a for loop over the index of the structure array returned by dir.

For example, suppose you want to read each CSV file, compute the mean of some column, and print it:

matlab
csvFiles = dir('*.csv');
for k = 1:numel(csvFiles)
    fullName = fullfile(csvFiles(k).folder, csvFiles(k).name);
    T = readtable(fullName);
    colMean = mean(T.Value);  % assuming table has column 'Value'
    fprintf('File: %s, mean(Value) = %.3f\n', csvFiles(k).name, colMean);
end

This structure is the core of many batch processing scripts. Inside the loop body, you perform whatever operations you need. You might read data using readtable, readmatrix, imread, audioread, or load, then perform calculations or transformations, then save results.

You can also preallocate space to save results from each file. For instance, if you want to store one numeric result per file:

matlab
csvFiles = dir('*.csv');
nFiles = numel(csvFiles);
means = zeros(nFiles, 1);
for k = 1:nFiles
    fullName = fullfile(csvFiles(k).folder, csvFiles(k).name);
    T = readtable(fullName);
    means(k) = mean(T.Value);
end

Now means contains the mean for each file in the same order as csvFiles.

Applying Different Operations in a Batch

Batch processing is not limited to reading numeric data. The principle is always the same: obtain a list of relevant files, iterate over that list, and apply an operation to each file.

You might, for example, resize all images in a folder:

matlab
imageFiles = dir('*.png');
for k = 1:numel(imageFiles)
    inName = fullfile(imageFiles(k).folder, imageFiles(k).name);
    I = imread(inName);
    I2 = imresize(I, 0.5);
    [~, baseName, ~] = fileparts(imageFiles(k).name);
    outName = fullfile(imageFiles(k).folder, [baseName '_small.png']);
    imwrite(I2, outName);
end

In this example, fileparts is used to separate the base file name from its extension so that you can construct a new output name.

You could also convert all .mat files to .csv:

matlab
matFiles = dir('*.mat');
for k = 1:numel(matFiles)
    inName = fullfile(matFiles(k).folder, matFiles(k).name);
    S = load(inName);  % loads into structure S
    varNames = fieldnames(S);
    data = S.(varNames{1});  % use first variable for simplicity
    [~, baseName, ~] = fileparts(matFiles(k).name);
    outName = fullfile(matFiles(k).folder, [baseName '.csv']);
    writematrix(data, outName);
end

The exact operations inside the loop depend on your task, but the structure of listing files, looping, reading, transforming, and saving is the same.

Controlling Which Files to Process

Sometimes you do not want to process every file in a folder. You might need to skip certain files by name, by size, or by some pattern inside the name. The structure from dir gives you the information you need to add simple checks inside the loop.

For instance, to skip files whose names start with "test_":

matlab
files = dir('*.csv');
for k = 1:numel(files)
    name = files(k).name;
    if startsWith(name, 'test_')
        continue;
    end
    fullName = fullfile(files(k).folder, name);
    % Process fullName here
end

You can filter by size using the bytes field:

matlab
files = dir('*.csv');
for k = 1:numel(files)
    if files(k).bytes == 0
        continue;  % skip empty files
    end
    fullName = fullfile(files(k).folder, files(k).name);
    % Process fullName
end

This type of conditional logic lets you adapt your batch script to the specific data you have.

Recursively Processing Subfolders

In many projects, files are stored in multiple subfolders. You may want to apply the same processing to all relevant files in a folder tree. MATLAB allows recursive directory listing by using dir with the ** wildcard in recent versions, or by writing your own recursion using dir and checking isdir.

A simple modern approach that searches within the current folder and all subfolders for .csv files is:

matlab
csvFiles = dir('**/*.csv');
for k = 1:numel(csvFiles)
    fullName = fullfile(csvFiles(k).folder, csvFiles(k).name);
    % Process fullName
end

Here, folder in csvFiles(k).folder is the path to the folder that contains the file, regardless of how deep it is in the folder tree.

If you want to keep track of what folder each file came from, you can use csvFiles(k).folder directly in your output naming or logging.

Naming and Saving Output Files

Batch processing almost always produces new files. A common requirement is that each input file corresponds to one output file, with a related name. You have already seen that you can use fileparts to split a file name into parts.

For example, if you are reading data1.csv, data2.csv, and so on, and you want to save processed results as data1_processed.csv, data2_processed.csv, you might write:

matlab
files = dir('*.csv');
for k = 1:numel(files)
    inFull = fullfile(files(k).folder, files(k).name);
    data = readmatrix(inFull);
    result = mean(data, 2);
    [~, baseName, ~] = fileparts(files(k).name);
    outName = [baseName '_processed.csv'];
    outFull = fullfile(files(k).folder, outName);
    writematrix(result, outFull);
end

You can also choose to save output in a different folder. Ensure that the output folder exists before writing to it. If necessary, you can create it in your script:

matlab
inputFolder = pwd;
outputFolder = fullfile(inputFolder, 'results');
if ~exist(outputFolder, 'dir')
    mkdir(outputFolder);
end
files = dir(fullfile(inputFolder, '*.csv'));
for k = 1:numel(files)
    inFull = fullfile(files(k).folder, files(k).name);
    data = readmatrix(inFull);
    % Perform processing...
    [~, baseName, ~] = fileparts(files(k).name);
    outFull = fullfile(outputFolder, [baseName '_processed.csv']);
    writematrix(data, outFull);
end

Separating input and output folders reduces the risk of overwriting original data and keeps the folder structure more organized.

Dealing with Errors in Batch Scripts

When processing many files, it is common that some files have problems. The format might be slightly different, the file might be corrupted, or the file might be in use. You usually do not want an entire batch to stop because of one bad file. You can handle such situations by wrapping your per file processing code in try and catch blocks so that the script continues even if one file causes an error.

A typical pattern is:

matlab
files = dir('*.csv');
for k = 1:numel(files)
    fullName = fullfile(files(k).folder, files(k).name);
    try
        data = readmatrix(fullName);
        % Process data here
        fprintf('Processed: %s\n', files(k).name);
    catch ME
        fprintf('Error in file %s: %s\n', files(k).name, ME.message);
        % Optionally log ME for later inspection
    end
end

This pattern gives you robustness. You can review error messages afterward to see which files failed and why, while successfully processed files still complete.

You can also decide to skip certain known problematic patterns by combining try and catch with conditional checks.

Logging Progress and Results

Batch processes can run for a long time, especially when many files are involved. It is helpful to log progress so that you can see what the script is doing and where it is up to. You can use fprintf in the Command Window, or write logs to a text file.

A simple on screen progress message might be:

matlab
files = dir('*.csv');
nFiles = numel(files);
for k = 1:nFiles
    fprintf('Processing file %d of %d: %s\n', k, nFiles, files(k).name);
    fullName = fullfile(files(k).folder, files(k).name);
    % Process the file...
end

If you prefer to log to a file, you can open a file handle with fopen, write with fprintf, and close it at the end. This creates a record of what your batch run did.

Combining Batch Processing with Scripts

Batch processing is especially effective when you place your code in separate script files. One script might define settings, such as folders and patterns, and call another script or function that performs the actual work for each file. This keeps your batch processing code organized and easier to maintain.

For example, you can write a function processSingleFile that takes a file name and performs the processing, and a script runBatch that uses dir to loop over all files and call processSingleFile. This separation lets you test processSingleFile on one file before running it on many files.

A simple example of such a setup is:

matlab
% In processSingleFile.m
function processSingleFile(fullFileName)
    data = readmatrix(fullFileName);
    result = mean(data, 1);
    [folder, baseName, ~] = fileparts(fullFileName);
    outFull = fullfile(folder, [baseName '_summary.csv']);
    writematrix(result, outFull);
end
matlab
% In runBatch.m
files = dir('*.csv');
for k = 1:numel(files)
    fullName = fullfile(files(k).folder, files(k).name);
    processSingleFile(fullName);
end

With this structure, you can quickly change how a single file is processed without rewriting the batch loop.

Important points to remember for batch processing of files:
Use dir with patterns, such as '.csv' or '*/*.png', to obtain lists of files for batch operations.
Always build full file paths with fullfile(files(k).folder, files(k).name) before reading or writing to avoid issues with the current folder.
Loop over files with for and apply the same sequence of operations inside the loop body to automate repetitive tasks.
Use fileparts to construct meaningful output file names and consider writing outputs to a separate folder to protect original data.
Add simple checks and try and catch blocks so that one problematic file does not stop the entire batch process.
Log progress and results, for example with fprintf, so you can monitor long batch jobs and inspect any errors afterward.

Views: 4

Comments

Please login to add a comment.

Don't have an account? Register now!