9.6 Choosing the Right Container Type

Table of Contents

Overview

When you work with collections of data in MATLAB, you can choose from several container types. The most common general purpose containers are numeric arrays, tables, structures, and cell arrays. Each of these is designed for a different style of data and a different style of work. Choosing the right one makes your code clearer, easier to maintain, and often faster.

This chapter focuses on how to decide which container type to use in typical beginner situations, and how to move between them when necessary. It assumes you already know at a basic level what tables, structures, and cell arrays are and how to create and index them.

Thinking about your data

The most useful first question is: “What does my data look like conceptually?”

There are three very common patterns.

The first pattern is a uniform grid, usually numeric, where every element has the same meaning and the same type. Examples include pixel intensities in an image, temperature at different times and positions, or the real and imaginary parts of signals. This pattern often fits naturally into standard numeric arrays.

The second pattern is a rectangular data set where rows represent observations or cases, and columns represent variables with names. Different columns may have different types, for example numbers, logical values, strings, or datetimes. This pattern fits naturally into tables.

The third pattern is a collection of heterogeneous pieces of information, where different fields or slots may have different sizes or shapes, or may be optional. This is often the case for configuration data, results of complex computations, or hierarchical information. This pattern fits naturally into structures, sometimes combined with cell arrays.

Once you recognize the pattern, it becomes easier to choose.

When to use plain numeric arrays

Use numeric arrays when all of the following are true. Every element has the same data type, such as double, single, int32, or logical. The data fits into a regular grid, so it can be represented by an $m \times n \times p \dots$ array without gaps. You will mainly perform numerical operations such as arithmetic, linear algebra, or elementwise functions.

Numeric arrays work very well with MATLAB’s vectorized computations and built in mathematical functions. If you do not need labels for rows or columns, and you do not need different types in different columns, arrays are the natural and efficient choice.

If you later discover that you need variable names or a mix of types, you can often convert columns of arrays into a table. A simple pattern is to keep your heavy numerical calculations in arrays, then convert the final results into a table for reporting or exporting.

When to use tables

Use a table when your data is conceptually tabular. In tabular data, each row is one observation, case, or record. Each column is one variable with a name and a defined meaning. Columns may have different types. A typical example is data loaded from CSV or Excel: a column with IDs as strings, another with dates, another with numeric measurements, and perhaps a logical column indicating a condition.

Tables are particularly helpful when you care about variable names and want to write code using those names. For example, you can write T.Pressure instead of remembering that pressure is column 7. This makes code easier to read and to change later. Tables also make it easy to store metadata such as units in VariableUnits, or more detailed descriptions in VariableDescriptions.

Tables are also a good choice when you plan to join, filter, or group by variables, or when you will export the data to files for use in other software. Many MATLAB data analysis and statistics functions work directly with tables.

Use a table instead of a numeric matrix when at least one of these is true. You have a mix of types across columns. You want meaningful variable names, possibly using dot notation to access them. You expect to modify the set of variables by adding or removing columns as you work.

On the other hand, if your data is purely numeric and you will mainly perform matrix operations, a table usually adds overhead without benefit. In that case, keep the core numerical part as an array, and keep any descriptive labels in separate containers or in a companion table.

When to use structures

Structures are a good fit when your data is naturally described as named fields rather than uniform rows and columns. A common pattern is to bundle related results together. For example, after a computation you might produce a structure result with fields coeffs, residuals, and settings. Each field can have a different size or type, such as a vector, a table, or a string.

Structures are especially useful for hierarchical data where fields themselves are structures. This matches many real world descriptions, for example a patient structure with fields Demographics, Measurements, and History, where Demographics is another structure with fields like Name and DOB.

Another good use of structures is configuration settings. A structure allows you to store many named options with default values. You can pass the whole configuration as a single variable into functions, and update particular fields as needed.

Choose a structure when the important thing is the meaning of each named field, and you do not need to treat the fields as columns in a table. Structures are less convenient than tables when you want to store many similar records. For multiple similar records, a struct array may be suitable, but for tabular data with variables, a table is usually more flexible.

When to use cell arrays

Cell arrays are for collections where each element can have a completely different type and size, but you want to index them with numeric indices instead of names. Each cell can hold any MATLAB value: a number, a string, a matrix, a table, a structure, or even a function handle.

Use a cell array when you want to keep an ordered list of items that are not homogeneous. A typical example is a list of file names along with associated data, where each cell might hold a numeric array read from a file. Another example is variable length sequences, where each cell contains a vector of different length.

Cell arrays are also useful as intermediate containers when you build more complex structures. For instance, you can read several sheets from an Excel file into a cell array, work on them in a loop, then combine some into a table.

You should not use a cell array if your data is actually regular and homogeneous, since that makes your code harder to read and prevents MATLAB from optimizing the operations. In such cases, prefer arrays or tables.

Comparing tables, structures, and cell arrays

When choosing between tables, structures, and cell arrays, consider three aspects. How do you want to access the data, what operations will you perform, and how important are names and types.

Tables provide column oriented access by variable names, and row oriented operations for tabular data. Structures provide field oriented access by name, and are flexible for nested and irregular shapes. Cell arrays provide index oriented access where positions matter more than names, and are best when elements differ widely in type or size.

If you mostly think of “records” with fixed variables, and you use names heavily, a table is often best. If you mostly think of “components” with different roles and shapes, a structure is usually clearer. If you mostly think of “items in a sequence” where each item is arbitrary, a cell array is appropriate.

You can also combine these containers. For example, a table column can be a cell array if each row contains a different length vector. A structure field can contain a table, allowing you to group related tabular results under one logical name. A cell array can hold multiple tables or structures when you have a list of similar but separate data sets.

Converting between container types

Sometimes you start with one container and later discover that another would have been more suitable. MATLAB allows you to convert and reorganize data between containers.

From arrays to tables, you can build a table by supplying one column per variable and then assigning names. From tables to arrays, you can extract numeric columns into a matrix using standard indexing, provided they are compatible in type and size.

From structures to tables, there are functions that interpret fields as variables. The reverse is also possible, where variables become fields. These conversions are useful when you want to fit a structure of variables into functions that expect a table, or when you want to flatten a table into a structure of arrays.

Between cell arrays and other containers, you often use indexing and functions that operate over cells. For example, you might gather contents of several cells into one array, or split a matrix into cells with mat2cell. You can then place these cells in a table or a structure field as needed.

When you convert, pay attention to whether you are losing information such as variable names or field names. Sometimes you will want to keep both a numeric representation for computation and a more descriptive container for organization and labeling.

Choosing for performance and clarity

As a beginner, you should prioritize clarity over small performance differences. Choose the container that expresses your intent most clearly. Using a table for natural tabular data, a structure for related but different elements, and arrays for dense numeric computation will usually give both readable and efficient code.

Performance concerns may arise if you use many small cell arrays or structures in tight loops. In such cases prefer contiguous numeric arrays where possible. Tables add some overhead compared to plain arrays, but they can significantly reduce bugs by making variables explicit and by preventing misaligned columns.

A helpful habit is to sketch on paper how you want to access the data in code. If you find yourself writing conceptual code like “for each row” and “by variable name,” that suggests a table. If you write “config.parameterName” and “result.partName,” that suggests structures. If you write “item 1,” “item 2,” with no natural names, a cell array or numeric array may be enough, depending on type uniformity.

Practical decision patterns

Several recurring situations have straightforward choices.

If you read from a spreadsheet where columns have headers and mixed types, keep the data as a table. Use the variable names from the file as table variable names so your code matches the file.

If you store output from a function that calculates several related results, use a structure to return them all together. Each field can hold a numeric array, a table, or another structure. This keeps the function interface clean and extension friendly.

If you accumulate a list of items in a loop where each item is a variable size vector or array, store them in a cell array. Later, if they turn out to be compatible, you can concatenate them into a single larger array or convert them into a table.

If you perform linear algebra or heavy numeric computations on data that is conceptually a matrix, store it in numeric arrays. Only convert to a table if you need to attach variable names or export it.

Over time, you will develop an intuition for these patterns, but starting with these simple rules can save time and confusion.

Important things to remember:
Choose numeric arrays for homogeneous numeric data on a regular grid where you mainly do math.
Choose tables for row and column oriented data with named variables, especially when types differ across columns.
Choose structures to group related pieces of information under named fields, especially for hierarchical or configuration data.
Choose cell arrays for ordered collections of items that can differ in size or type but are indexed by position.
You can convert between containers when your needs change, but try to design around how you intend to access and manipulate the data.

Comments

Please login to add a comment.

Don't have an account? Register now!