Kahibaro
Discord Login Register

Categorical Arrays Basics

Introduction

Categorical arrays in MATLAB represent data that takes on a limited, fixed set of possible values called categories. They are useful for labels, groups, or qualitative data such as colors, regions, or experimental conditions. Unlike plain strings, categorical arrays know which values are valid categories, can treat some categories as ordered, and make grouping and statistical operations easier and more efficient.

When to Use Categorical Arrays

Categorical arrays are appropriate whenever your data represents kinds or groups rather than quantities or arbitrary text. Typical examples include gender, day of the week, product type, patient group, or test condition. Even if the data starts as strings or numbers, if you are using those values to indicate membership in a set of groups, a categorical representation is often clearer and more convenient.

You will often encounter categorical arrays when importing data from tables, especially from spreadsheets or statistical datasets that contain labels in columns.

Creating Categorical Arrays

The basic function to create them is categorical. You can convert from many types, such as numeric arrays, character arrays, or string arrays.

For example, if you have a string array of colors, you can write:

colors = ["red" "blue" "red" "green" "blue"];
C = categorical(colors);

The variable C now holds a categorical array. MATLAB scans the input, finds the unique values, and creates categories for them. You can see the categories with:

categories(C)

If you create a categorical array from numeric data, the values are treated as labels, not as numbers to be added or averaged. For example:

codes = [1 2 1 3 2 2];
G = categorical(codes);

Here, 1, 2, and 3 are categories. The numeric values no longer behave as numeric magnitudes.

You can also explicitly specify which categories, and which order, you want. This is often useful when some categories do not appear yet in the data or when you want to control their meaning and ordering:

sizes = ["M" "S" "L" "M" "XL"];
C = categorical(sizes, ["S" "M" "L" "XL"]);

The second input to categorical is the list of categories that you allow. MATLAB uses exactly these as the set of categories. Any values in sizes that are not in this list will be marked as undefined.

Categories, Levels, and Underlying Codes

A categorical array stores each element as a category index internally, along with the list of category names. Each element refers to one of the categories or is undefined.

The function categories returns the list of category names as a cell array of character vectors. For example:

C = categorical(["red" "blue" "red" "green"]);
cats = categories(C)

The order of the categories inside cats is the internal category order. This order matters for ordered categorical arrays and can matter when you display or summarize the data.

If you want to see the internal integer codes, you can use double:

codes = double(C);

Here, codes is a numeric array of indices into the category list. The exact mapping from indices to category names should be treated as an internal detail, and you normally work with the category names themselves rather than the codes.

Ordered vs Unordered Categories

Categorical arrays can be unordered or ordered. An unordered categorical array treats all categories as just different labels with no meaningful order. This is suitable for things like colors or product IDs.

An ordered categorical array defines a meaningful sequence among categories, such as "low", "medium", "high", or "small", "medium", "large". Ordered categoricals let you use relational operators like < and > to compare categories according to their order.

To create an ordered categorical array, use the Ordinal name-value argument:

levels = ["low" "medium" "high" "medium"];
C = categorical(levels, ["low" "medium" "high"], "Ordinal", true);

Here, "low" is less than "medium", and "medium" is less than "high". You can then use expressions such as:

C > "low"

to obtain a logical array indicating which elements are above "low" in the defined order.

If you convert a string or character array to a categorical without specifying "Ordinal", true, the result is unordered, and comparisons with < or > are not allowed.

Inspecting and Modifying Categories

You can query the categories that exist and modify them without changing the underlying data values. Basic inspection uses:

cats = categories(C);

To rename categories, use renamecats. This is useful if you want more descriptive labels but keep the same grouping:

C = categorical(["S" "M" "L" "M"]);
C = renamecats(C, ["S" "M" "L"], ["Small" "Medium" "Large"]);

If the categorical array has many categories, you can refer to them by indices instead of names:

C = renamecats(C, 1, "Very Small");

You can add new categories that are not currently present in the data using addcats. For example:

Views: 2

Comments

Please login to add a comment.

Don't have an account? Register now!