Modifying and Joining Tables

Table of Contents

Editing Table Contents

Once you have a table, you often need to adjust its contents. You can modify data inside existing variables using normal indexing. For numeric or logical table variables, you can treat them much like matrices, but you access them through the table.

If T is a table, you access one variable as T.VarName. This gives you the entire column, which you can then index. For example, T.Age(3) = 25; changes the third element of the variable Age. You can assign entire columns at once, such as T.Age = T.Age + 1; to increase all ages by one. When you modify part of a variable, the right side of the assignment must have a size compatible with the left side, for example, the same number of rows you are indexing.

Row indexing also works directly on the table. The expression T(2,:) selects the second row. You can then assign to it, for instance T(2,:) = T(1,:); to copy the first row over the second. Remember that every row in a table must have exactly one value for every variable, so you cannot create an incomplete row.

Adding and Removing Variables

You add a new variable to a table by assigning to a new variable name as a field of the table. For example, if T has 10 rows, then you can write T.BMI = weight ./ (height.^2); as long as weight and height each have 10 elements. The number of rows must match the existing table. MATLAB will complain if you try to assign a variable with a different row count. The new variable becomes a new column in the table and appears as another variable in the table display.

You can also create a completely new table variable from a constant or from a function of existing variables. For example, T.IsAdult = T.Age >= 18; adds a logical variable that marks adults. If you want to insert a variable at a specific position, you can construct a new table from parts using horizontal concatenation of table slices, for example T = [T(:,1:2) table(newVar,'VariableNames',{'NewVar'}) T(:,3:end)];. This keeps all rows aligned while placing the new variable between existing columns.

To remove variables, use parentheses to select all variables except the ones you want to drop, or use the function removevars. For example, T = removevars(T,{'Temp','Pressure'}); deletes the variables Temp and Pressure from the table. Another option is logical or numeric indexing in the second dimension, for instance T(:,3) = []; removes the third variable altogether. Deleting a variable reduces the table width but leaves all rows intact.

Adding and Removing Rows

Rows are added or removed using indexing on the first dimension of the table. If you have another table T2 with exactly the same variable names and compatible variable types, you can append its rows to T using vertical concatenation T = [T; T2];. Both tables must have the same variables with the same names and compatible types. The result is a table with more rows and the same set of variables.

To add a single new row, create a table with one row that has the same variables and names, then append it in the same way. For example:

newRow = T(1,:);       % copy structure
newRow{1,:} = {42,'M',true};  % fill with new values using cell assignment
T = [T; newRow];

You can also assign to a row index past the current end, such as T(end+1,:) = T(1,:);, then overwrite the values in that row. This grows the table by one row.

To remove rows, you use row indexing with empty assignment. For example, T(5,:) = []; removes the fifth row and shifts all later rows up by one. You can remove multiple rows with a vector of indices or a logical vector, such as T(T.Age < 0,:) = []; to delete rows with invalid ages. The total number of variables stays the same, but the table becomes shorter.

Renaming Variables and Rows

You can control how variables and rows are labeled. Variable names are stored in the table property VariableNames. To rename variables programmatically, you can assign a new cell array of character vectors or a string array to this property. For example, if T has three variables, you can write T.Properties.VariableNames = {'Height','Weight','Age'};. The number of names must match the number of variables. You can also rename a single variable by indexing the property, for example T.Properties.VariableNames{'Var1'} = 'NewName';.

For row labels, tables use RowNames. These are optional names for each row. You set them by assigning to T.Properties.RowNames. For instance, if T has five rows, you can do T.Properties.RowNames = {'A','B','C','D','E'};. When row names exist, you can index by name as well as by number, for example T('C',:) to select the row called C. If you change the number of rows, you must keep the row names consistent, either by updating or clearing them.

Row and variable names are very useful when you later join tables, since some join operations can use variable names or row names as keys. Clear, consistent names make these operations easier to manage.

Sorting and Reordering Tables

Tables often need to be sorted to make them easier to read or to prepare them for joining. To sort rows based on one or more variables, you use sortrows. For example, T = sortrows(T,'Age'); sorts the table by the Age variable in ascending order. If two rows have the same age, they keep their relative order unless you supply more sort keys. You can sort by multiple variables by passing a cell array of variable names such as sortrows(T,{'Age','Height'});. In this case, rows are sorted first by age and then by height within each age.

You can specify the direction for each key. Provide a second input that uses 1 for ascending and -1 for descending. For example, sortrows(T,{'Age','Height'},[-1 1]); sorts age from largest to smallest, and height from smallest to largest for equal ages. Sorting changes the order of rows but never changes row values themselves.

Reordering variables is different from sorting rows. You might want to move important variables to the front of the table. You can change the order by reindexing the second dimension of the table. For instance, if T has variables {'Name','Age','Height','Weight'}, you can reorder them as T = T(:,{'Name','Height','Weight','Age'});. This operation keeps all the data and all rows but changes how columns are arranged. You can also use numeric indices if you find it easier to think in positions instead of names.

Filtering and Selecting Subsets

Filtering a table means selecting a subset of rows that meet some condition. You typically build a logical index using one or more table variables and use it in the first dimension of the table. For example, you can select all rows for adults with idx = T.Age >= 18; and then TAdults = T(idx,:);. The table TAdults contains only the rows where Age is at least 18, and all variables for those rows. The same pattern can be used for more complex conditions, combining relational and logical operators.

You can also filter on categorical or string variables. For example, if T.Gender is categorical, idx = T.Gender == 'female'; creates a logical index of rows that have Gender equal to 'female'. When you apply this index with T(idx,:), you get a filtered table. Keeping these filtered tables separate can simplify further analysis.

Subsetting variables is done using the second dimension. You can select just a few variables that you want to work with, such as TSmall = T(:,{'Age','Height'});. You can also select by type or by some rule using helper functions, then build an index list. Creating these smaller tables is useful before joining tables that have many unrelated variables.

Joining Tables by Key Variables

Joining tables lets you align and combine data from two different tables that share some common key information. A key variable is a variable that identifies each row, such as an ID, a date, or a code. To join tables, you specify which variables are keys in each table, then choose how you want rows from the two tables to be matched and included.

A basic join is a "left join", known in MATLAB as join. Suppose you have a table Customers that has one row per customer with a variable CustomerID, and another table Orders that has one row per order, also with CustomerID as a variable. If you write

R = join(Orders,Customers,'Keys','CustomerID');

the result R contains all rows from Orders with the matching information from Customers added as new variables. Where a customer ID in Orders does not appear in Customers, the corresponding new values become undefined, for example NaN or <missing>, depending on the variable type. When both tables have variables with the same names that are not keys, MATLAB renames them using suffixes to keep them distinct, unless you tell it otherwise using name-value options.

The innerjoin function behaves similarly but only keeps rows where matching keys exist in both tables. Any row in either table that lacks a match in the other is removed in the result. This is useful when you only want complete records that are present in both data sources. For instance, R = innerjoin(Orders,Customers,'Keys','CustomerID'); returns orders only for customers that exist in the Customers table.

On the other hand, outerjoin can keep rows that exist in either table. The type of outer join is controlled by options. For example, a full outer join keeps all rows from both tables, padding missing values where there is no match. You can write

R = outerjoin(T1,T2,'Keys','ID','MergeKeys',true);

to get a result that contains every ID that appears in either T1 or T2, along with variables from both tables. Nonmatching rows get missing values in variables from the other table. There are also variants that keep only rows from the left or from the right table, controlled with additional name-value pairs.

When the key variable has different names in the two tables, you can still join them by specifying the LeftKeys and RightKeys separately. The values in the key variables must have compatible types and represent the same identifiers. It is often a good idea to sort tables on the key variable and inspect them before join operations to catch any unexpected duplicates or missing keys.

Concatenating Tables Horizontally and Vertically

Concatenation is a simpler way to combine tables when you do not need to match rows by key. Vertical concatenation increases the number of rows, while horizontal concatenation increases the number of variables. Both operations rely on consistent structure.

To concatenate tables vertically, you use square brackets with a semicolon between tables, such as Tall = [T1; T2];. This stacks T2 below T1. For this to succeed, both tables must have the same variable names in the same order, and the variables must have compatible types. If you have tables that share some variables but not all, joining by keys is usually more appropriate than concatenation.

Horizontal concatenation uses a comma or a space between tables, such as Twide = [T1 T2];. In this case, both tables must have the same number of rows and compatible row names if row names are present. The result has all the variables of T1 plus all the variables of T2. If variable names collide, you need to rename variables beforehand or select a subset of variables to avoid duplicates.

Sometimes, you only want to add a few new variables from another table that shares the same rows in the same order. In that case, selecting specific variables before concatenation can be efficient. For example, T = [T Orders(:,{'Total','Status'})]; adds only the Total and Status variables from Orders to T without using any keys.

Key points to remember:
Use T.VarName and row indexing to modify table contents.
Add or remove variables by assigning fields or using removevars, and keep row counts consistent.
Use sortrows to sort by one or more variables, and index T(:,vars) to reorder columns.
Filter rows with logical indexing on table variables, for example T(T.Age > 30,:).
Use join, innerjoin, and outerjoin to combine tables by key variables, and choose keys carefully.
Concatenate tables vertically with [T1; T2] when variables match, and horizontally with [T1 T2] when rows align.

Comments

Please login to add a comment.

Don't have an account? Register now!