Kahibaro
Discord Login Register

Reading Text and CSV Files

Understanding Text and CSV Data in MATLAB

When you work with MATLAB, you will often need to bring in data that is stored in plain text files or comma separated values (CSV) files. These files usually come from other programs such as spreadsheets, data loggers, or other software. In this chapter you will focus on how MATLAB reads this kind of data, which functions are commonly used, and how to handle some typical situations such as files with headers or mixed types.

Text and CSV files are simply files that store information as readable characters. A CSV file is just a special case of a text file where values are usually separated by commas. MATLAB provides several functions to read these files into variables that you can analyze and visualize.

Common File Formats and Structure

Before choosing a MATLAB function, it is useful to understand the structure of the file you want to read. Many text and CSV files share some common patterns.

Often data is stored in rows and columns, in a rectangular layout. Each row usually represents one observation or record, and each column represents a variable, such as time, temperature, or category. Many files start with one or more header lines that contain column names or comments. In CSV files, columns are usually separated by commas, for example:

time,temperature,city
0,20.5,Berlin
1,21.0,Berlin
2,19.8,Hamburg

In other text files, you may see spaces, tabs, semicolons, or other characters used as delimiters between columns. Missing values may appear as empty fields, or as some special marker such as NaN, -999, or NA. Some files include comment lines that start with a special character such as # or %.

Knowing how the file is organized helps you choose the right function and options to read it correctly.

High Level Import with readtable and readmatrix

For beginners, the easiest way to read text and CSV files is to use the high level functions readtable and readmatrix. These functions automatically inspect the file and try to detect the format.

The function readtable reads data into a table. A table can store mixed data types, such as numbers, text, and dates, with column names. This is very convenient when your file has a header row and columns that are not all numeric. For a typical CSV file with column names, you can write:

T = readtable('data.csv');

MATLAB reads the file data.csv, uses the first non empty line as variable names if possible, and stores each column as a variable in the table T. You can then refer to columns by their names, for example T.temperature.

The function readmatrix reads data into a numeric matrix, which is useful when the file only contains numeric data, with or without a header. You can read a numeric file with:

A = readmatrix('numbers.csv');

MATLAB will attempt to ignore any text header and return only numeric values in the matrix A. If the file contains some text fields in the middle of numeric data, readmatrix may give missing values (for example NaN) or may need extra options to handle the format.

These two functions have many optional name value arguments to control details such as delimiters, header lines, and data types, but as a beginner you can usually start with their default behavior and only add options when something does not look right.

Basic Example of Reading a CSV File

Consider a simple comma separated file, stored as measurements.csv, with the content:

Time,Voltage,Current
0.0,1.5,0.1
0.1,1.7,0.12
0.2,1.8,0.11

If this file is in the current folder, you can read it as:

data = readtable('measurements.csv');

After this call, data is a table with three variables Time, Voltage, and Current. You can check what MATLAB inferred by using, for example, head(data) to see the first rows, or in the Workspace Browser.

If you know that all values are numeric and you do not care about the column names, you can use:

M = readmatrix('measurements.csv');

In this case, MATLAB will attempt to skip the header row with text and return a numeric matrix with 3 columns that correspond to time, voltage, and current.

Handling Delimiters and Headers

Not all files use commas as delimiters. Sometimes a file uses tabs, spaces, semicolons, or other characters. High level functions often detect the delimiter automatically, especially for standard formats, but you can also specify it yourself.

For example, suppose you have a text file results.txt that uses semicolons as separators:

subject;age;score
1;18;87
2;20;92
3;19;78

You can tell readtable to use a semicolon as the delimiter:

T = readtable('results.txt','Delimiter',';');

If your file does not have a header row with variable names, MATLAB will assign default names such as Var1, Var2, and so on. For example, for a file without headers:

10 20 30
11 21 31
12 22 32

you can allow MATLAB to infer everything:

T = readtable('noheader.txt','ReadVariableNames',false);

The table T will then have variables called Var1, Var2, and Var3. Later, you can rename these variables if you want.

With readmatrix, you can influence how header lines are skipped. MATLAB tries to guess, but for complex files you might specify options that control which lines are data and which lines are headers. The exact option names are documented and can be discovered through the help system, and you will explore documentation in another chapter.

Dealing with Mixed Text and Numeric Data

Many text files contain a mix of numbers and text values. For example, a file may contain city names together with temperature readings. For such cases, a table is usually a better choice than a numeric matrix because each column can have its own data type.

Suppose you have a file weather.csv:

City,Day,Temperature
Berlin,1,20.5
Berlin,2,21.0
Hamburg,1,19.8
Hamburg,2,18.9

Reading it with:

weather = readtable('weather.csv');

will store City as a string or character column and Day and Temperature as numeric columns. You can then work with each column according to its type.

If you try to use readmatrix on such a file, MATLAB will need to convert text entries to numeric values, which is usually not what you want. In that situation, readmatrix may return NaN for the text fields or may not import the file properly, so prefer readtable for mixed type data.

Sometimes files contain special text for missing values, such as NA. High level import functions support options to treat such markers as missing values. For example, you can specify which strings represent missing data so that MATLAB converts them to NaN or a missing value in the resulting table.

Using readcell for Heterogeneous Data

If your file is highly irregular or contains many different kinds of entries that do not fit nicely into a table or numeric matrix, you can use readcell. This function reads the entire file into a cell array, where each cell can hold its own type.

For example:

C = readcell('mixedfile.txt');

returns a cell array C. Each row of the file becomes a row in C, and each field becomes one cell. Cells can contain numbers, character vectors, strings, or other types depending on what MATLAB detects. Later, you can manually convert or extract parts of this cell array as needed.

readcell is useful when automatic typing into tables or matrices is not straightforward, or when you plan to do custom parsing that does not fit into standard patterns.

Managing File Paths and Current Folder

To read a file, MATLAB must know where the file is stored. If the file is in the current folder or in a folder that is on the MATLAB path, you can simply pass the file name to readtable, readmatrix, or readcell. If the file is in another location, you must specify either a relative path or an absolute path.

For example, if your file is in a subfolder named data inside the current folder, you can write:

T = readtable('data/experiment.csv');

On Windows, both forward slashes and backslashes can be used, but be careful when writing backslashes in character vectors since they can be interpreted as escape characters. To avoid confusion, many users prefer forward slashes in MATLAB file paths.

If the file is somewhere else on your system, you can provide the full path, for example:

T = readtable('C:/Users/YourName/Documents/MATLAB/datafile.csv');

The choice between relative and absolute paths affects portability, which you will examine in more detail in later chapters on file management and projects.

Working with Import Options

High level import functions can use import options that describe the structure of a file. For more complicated files, you can create an options object, review how MATLAB interprets each column, and adjust the types or rules before reading the data.

For example, you can create options for a text file and then pass them to readtable:

opts = detectImportOptions('largefile.csv');
T = readtable('largefile.csv',opts);

The function detectImportOptions inspects the file and builds an object that stores information such as variable names, data types, delimiters, and rules for handling missing values. You can then modify opts before calling readtable, for instance by changing a variable type from text to numeric, or by specifying additional missing value strings. This two step process is useful when default detection does not match what you expect from the file.

Although using import options is more advanced than a single line command, it gives you better control and repeatability when working with complex or inconsistent data sources.

Simple Reading with Lower Level Functions

In some situations, you might only need to read a small part of a text file or you may want very direct control over how to interpret each line. MATLAB provides lower level text I/O functions such as fopen, fgetl, fscanf, and textscan. These functions work closer to the file contents, and you provide explicit formats for reading.

For example, to read all lines of a file as raw text, you can do:

fid = fopen('notes.txt','r');
line = fgetl(fid);
while ischar(line)
    disp(line)
    line = fgetl(fid);
end
fclose(fid);

This approach is helpful if your file is not in a simple columnar structure, or if you need to skip certain lines based on their content. However, it requires more careful programming, such as making sure that you close the file with fclose. For most ordinary CSV and text data, the high level functions readtable, readmatrix, and readcell are more convenient and reduce the chance of errors.

Verifying Imported Data

After importing data from a text or CSV file, it is important to check that MATLAB interpreted the file correctly. If you use readtable, you can look at the first few rows to verify that each column has the correct type and values. If numeric columns appear as text, or if the number of rows is not what you expect, you may need to adjust import options or delimiters.

For matrix data, you can check the size of the resulting matrix with size, and inspect a few rows or columns using indexing. If you see NaN values where you expected numbers, the file may contain unexpected text, missing values, or inconsistent delimiters.

By making a quick inspection of the imported data, you can catch problems early before you spend time analyzing or plotting incorrect values.

Important points to remember:
Use readtable for most CSV and text files, especially when they have headers or mixed text and numeric data.
Use readmatrix when the file contains mainly numeric data and you want a numeric array.
Use readcell when the file is very heterogeneous or irregular and does not fit neatly into a table or matrix.
Pay attention to delimiters, headers, and missing value markers, since they affect how MATLAB interprets your data.
Always verify the imported result before further analysis to ensure that the file was read as intended.

Views: 3

Comments

Please login to add a comment.

Don't have an account? Register now!