Kahibaro
Discord Login Register

Comparing and Searching Strings

Comparing Strings in MATLAB

In MATLAB you will often need to decide whether two strings are the same, whether one contains another, or where a particular substring appears. This chapter focuses on how to compare and search strings using both character arrays and string arrays, and what results you can expect.

Equality and Inequality of Strings

To check whether two strings are equal you can use dedicated string functions. Avoid relying on the plain == operator for whole strings, because it compares character by character and returns an array, not a single true or false.

For character arrays, use strcmp and strcmpi. The function strcmp is case sensitive, while strcmpi ignores case. For example:

a = 'Hello';
b = 'Hello';
c = 'HELLO';
tf1 = strcmp(a, b)    % returns logical 1
tf2 = strcmp(a, c)    % returns logical 0
tf3 = strcmpi(a, c)   % returns logical 1

For string arrays, you can use the same functions or the == and ~= operators directly. The == operator performs elementwise comparison but returns a logical scalar when both inputs are single strings:

s1 = "cat";
s2 = "cat";
s3 = "Cat";
tf1 = s1 == s2        % returns logical 1
tf2 = s1 == s3        % returns logical 0
tf3 = s1 ~= s3        % returns logical 1

When you work with arrays of strings or character vectors, strcmp and strcmpi work elementwise and return logical arrays of the same size as the inputs. For example:

A = ["red","green","blue"];
B = ["red","Green","blue"];
eqCase = strcmp(A, B)    % [1 0 1]
eqNoCase = strcmpi(A, B) % [1 1 1]

Ordering and Sorting Comparisons

Sometimes you do not only want to know if two strings are equal, but also which one comes first in dictionary order. MATLAB provides strcmp related functions that return ordering information.

For this purpose you can use strncmp, strncmpi, and functions like sort that operate on strings. To compare two strings lexicographically, sort gives a simple way to see ordering:

S = ["apple", "banana", "apricot"];
sorted = sort(S)

The result is ordered according to the Unicode values of the characters, which usually matches standard alphabetical order for plain English text.

You can still perform relational comparisons like < and > on string scalars to test ordering. For example:

"apple" < "banana"    % returns logical 1
"dog" > "cat"         % returns logical 1

These operations compare character codes from left to right until a difference is found.

With character arrays, the same operators produce characterwise comparisons, so results are arrays instead of scalars. To check ordering for full character vectors, convert them to string arrays first:

'a' < 'b'          % logical 1 (scalar, single character)
"apple" < "apricot"  % logical 1 (string comparison)

Prefix and Suffix Comparisons

To test whether two strings match in their first characters, use strncmp and strncmpi. You specify how many leading characters to compare. For character arrays:

a = 'abcdef';
b = 'abcXYZ';
tf1 = strncmp(a, b, 3)     % compares 'abc' with 'abc', returns 1
tf2 = strncmp(a, b, 4)     % compares 'abcd' with 'abcX', returns 0

With string arrays the behavior is similar:

s1 = "Matlab";
s2 = "Matrix";
strncmp(s1, s2, 3)         % returns 1
strncmpi(s1, s2, 3)        % also returns 1, case insensitive

To test suffixes you can either use searching functions such as endsWith or compare substrings taken from the ends. For example:

s = "document.txt";
tfEnd = endsWith(s, ".txt")     % returns logical 1
% Character array alternative
c = 'document.txt';
suffix = '.txt';
tfEndChar = strcmp(c(end-numel(suffix)+1:end), suffix)

The functions startsWith and endsWith work with strings and are convenient for checking fixed prefixes and suffixes.

Searching Within Strings

Searching inside strings lets you locate words, letters, or patterns. MATLAB offers several functions that behave slightly differently depending on whether you use character arrays or string arrays.

For simple substring searches that return positions, use strfind or contains. With character vectors:

txt = 'The quick brown fox';
idx = strfind(txt, 'quick')   % returns starting index 5

If the pattern appears more than once, strfind returns all starting indices. If the pattern is not found, it returns an empty array.

With string arrays, strfind returns positions inside each string element. However, for most modern code using string arrays, contains, startsWith, and endsWith are often simpler. For example:

s = "The quick brown fox";
tf = contains(s, "quick")     % returns logical 1

When you apply contains to a string array, MATLAB searches each element and returns a logical array of the same size:

S = ["red apple", "green pear", "blue berry"];
hasApple = contains(S, "apple")    % [1 0 0]

By default, contains is case sensitive. You can ignore case using a name-value argument:

txt = "Hello World";
contains(txt, "hello")                          % 0
contains(txt, "hello", 'IgnoreCase', true)      % 1

The functions startsWith and endsWith accept the same IgnoreCase option.

Matching Whole Words

Sometimes you only want to find whole words instead of any occurrence of a substring. For example, searching for "art" inside "cart" and "art" should treat only the standalone "art" as a match.

For basic whole word matching without regular expressions, you can split text into words, then compare:

txt = "art in the cart";
words = split(txt);          % ["art","in","the","cart"]'
isArt = words == "art";      % [1 0 0 0]'

For more flexible matching, you can use pattern objects on string arrays. Patterns allow you to define whole word searches. For example:

txt = "art in the cart";
pat = patternBoundary("word") + "art" + patternBoundary("word");
tf = contains(txt, pat)          % returns logical 1

Patterns are very useful when you want to avoid matching substrings inside longer words while still using a single function call.

Locating All Occurrences and Indices

When you need exact positions of all matches, use strfind for simple substrings or regexp for more advanced patterns.

For character vectors:

txt = 'banana';
idx = strfind(txt, 'an')      % [2 4]

For string arrays, strfind applied to a scalar string returns a numeric row vector of indices. For arrays of strings, it returns a cell array where each cell contains indices for the corresponding string:

S = ["banana","bandana"];
idxAll = strfind(S, "an")
% idxAll is a 1x2 cell array:
% idxAll{1} is [2 4]
% idxAll{2} is [2 5]

If you need more control, such as matching patterns or overlapping matches, regexp and related functions provide detailed outputs, including start and end indices and matched substrings. For example:

txt = 'abracadabra';
idx = regexp(txt, 'a')    % positions of 'a'

For many beginner tasks, strfind and contains are sufficient. Use regexp when you need structured patterns, or when you must handle more complex text rules.

Searching in Arrays of Strings

When you store many strings in one array, you often need to find which elements contain a particular word or exactly match a target. Logical indexing combined with comparison or contains is the usual pattern.

For exact matches in a string array:

names = ["Alice","Bob","Charlie","Bob"];
isBob = names == "Bob";          % [0 1 0 1]
bobNames = names(isBob);         % ["Bob","Bob"]

To search using substrings:

S = ["file1.txt","file2.csv","notes.txt"];
isTxt = endsWith(S, ".txt");     % [1 0 1]
txtFiles = S(isTxt);             % ["file1.txt","notes.txt"]

For more complex string collections, such as vectors of character vectors inside cell arrays, you can either convert them to string arrays using string, or apply comparison functions that support cell arrays. For example:

C = {'pear','peach','apple'};
S = string(C);
hasPe = startsWith(S, "pe");    % [1 1 0]

This approach often simplifies the code and makes the behavior of comparison and searching operations more predictable.

Important points to remember:
Using strcmp and strcmpi is the reliable way to compare whole strings for equality.
For string arrays, == compares whole elements and returns logical results that you can use directly for indexing.
Use contains, startsWith, and endsWith for searching strings, and add 'IgnoreCase', true when you want case insensitive searches.
strfind returns positions of substring matches, and regexp gives more control for complex patterns.
Convert character vectors or cell arrays of character vectors to string arrays when you want consistent comparison and search behavior across many elements.

Views: 3

Comments

Please login to add a comment.

Don't have an account? Register now!