Kahibaro
Discord Login Register

Strings, Characters, and Categorical Data

Overview

In MATLAB, text is data, just like numbers. You can create, store, and process text, labels, codes, and categories in variables, you can pass them to functions, and you can use them in plots, tables, and user interfaces. For beginners, the most important families of text-related data types are character arrays, string arrays, and categorical arrays. This chapter gives you a high-level picture of these three, explains how they fit into everyday MATLAB work, and prepares you for the more detailed chapters that follow.

Text as Data in MATLAB

When you type words in quotes in MATLAB, you are working with text. Unlike many programming environments that only have one main text type, MATLAB has two primary ways to represent text, character arrays and string arrays. At first they can look very similar, but they have different behavior, advantages, and common uses.

On top of these, MATLAB provides categorical arrays. These are not general text containers, but they are closely related because they use text labels to represent discrete categories such as "Low", "Medium", and "High". Categorical arrays are important in data analysis, especially with tables, because they let you store qualitative information compactly and work with it in an organized way.

Understanding, at a conceptual level, when to use each of these types will help you avoid confusion later when you are manipulating text or importing data that includes names, labels, or codes.

Character Arrays vs String Arrays

Character arrays are the older, traditional way of representing text in MATLAB. A character array is an array where every element stores a single character, for example a letter or a space. If you have a phrase like "Hello", MATLAB stores it internally as a sequence of character codes, one per letter. When you see something surrounded by single quotes, for example 'Hello', that is a character array.

String arrays are a newer, higher-level text type that is often more convenient. A string array stores text as string objects. You create them using double quotes, for example "Hello". A scalar string array element can contain an entire piece of text such as a sentence. A string array with multiple elements is then an array of separate pieces of text, for example a list of names or labels.

One key conceptual difference is that a character array behaves like a numeric array of characters. Its size is tied to the number of characters, and if you want to store text of different lengths in one character array you must handle the fact that arrays in MATLAB are rectangular. A string array, in contrast, treats each element as one text entity. Individual elements can have different lengths without affecting the shape of the array itself.

For new code, MATLAB recommends using string arrays for most text processing tasks because they are easier to combine, compare, and store in tables. Character arrays remain important because many older functions use them, and because they are convenient for some low level operations such as working with individual text characters.

Text in Everyday MATLAB Use

Text appears in many parts of a typical MATLAB workflow. When you label axes on a plot, you give MATLAB text to display. When you create a table of experimental results, column names are text, and some columns may hold text values like names or codes. When you interact with files, file paths and file names are text. Even error messages and function options often use text values.

Because of this, it is common to convert between numbers and strings, or between different text-related types. For example, you might read numeric data from a file but also read a column of text labels, then you store the labels in a string array. Later you might convert some of those text labels to a categorical array so that you can group or filter by category efficiently.

To be effective with MATLAB, it is enough at this stage to recognize that text is first class data, that it has several representations, and that MATLAB functions often accept and return both numbers and text.

Categorical Data as Labeled Categories

Categorical arrays are designed for data that takes one of a fixed set of values, often described with text labels. For instance, suppose you record "Red", "Green", or "Blue" as a property. These are not numbers, but they repeat and form a natural set of categories. Storing them as free text is possible, but using a categorical array gives MATLAB more structure. Internally, a categorical variable stores integer codes and a list of category names. To you, it still looks like a column of labels.

This structure is helpful when you perform grouping operations, summaries by category, or when you want to specify an ordering of categories. For example, "Small", "Medium", and "Large" have a natural order that is important in analysis even though they are not numeric measurements.

Categorical arrays sit conceptually between text and numbers. You see them as labeled categories, but MATLAB uses the labels and category definitions to support specialized operations that are not as natural for plain strings or character arrays.

Where These Types Show Up in the Course

In later chapters of this section you will explore how to create and manipulate character arrays and string arrays in detail. You will see how to build strings from other data, how to change their contents, and how to search and compare them. You will also see how to convert between text and other data types, and how to use categories to represent qualitative data in a structured way.

Other parts of the course will build upon this knowledge. For example, when you work with tables you will often store string or categorical variables as columns. When you import data, MATLAB will decide whether text columns should be treated as strings or categories, and you will need to recognize the difference. When you create plots, you will use strings for titles, axis labels, and legends. Understanding the basic roles of strings, characters, and categorical arrays now will make all of those tasks easier to follow.

Remember that MATLAB uses single quotes for character arrays and double quotes for string arrays, that text is treated as data you can store and manipulate like numbers, and that categorical arrays are meant for repeated labeled categories rather than arbitrary free text.

Views: 5

Comments

Please login to add a comment.

Don't have an account? Register now!