16.5 Basic Time Series Handling

Table of Contents

Understanding Time Series in MATLAB

A time series is a sequence of data values ordered in time. In MATLAB, you can treat time series as ordinary numeric arrays together with a matching time axis, or you can use dedicated time related data types introduced earlier. Here the focus is on how to work practically with simple time series once you already have time values and data in MATLAB.

Typical examples of time series are daily stock prices, hourly temperature readings, or sensor measurements sampled every millisecond. The key idea is that each data point has a corresponding time stamp, and most operations must respect that ordering and spacing in time.

Pairing Time Vectors with Data

In many basic workflows you store time information in one vector and the corresponding values in another. For example, you might have a datetime vector t and a numeric vector x with the same length. Element x(i) represents the value at time t(i).

You can construct such a pair directly. For equally spaced time points, you often use a vector of times created with : or with functions for date and time construction. For example, a series of hourly times can be created and then associated with random data values:

t = datetime(2024,1,1,0,0,0):hours(1):datetime(2024,1,1,23,0,0);
x = rand(size(t));

After this, t(k) and x(k) always belong together. Many plotting and analysis functions accept this style of input, with the time vector as the first argument and the data values as the second, for example plot(t, x).

If your time series is multivariate, such as temperature, pressure, and humidity at the same times, you store one time vector and a matrix of data. Then each column is a variable and each row corresponds to one time point:

t = datetime(2024,1,1,0,0,0):minutes(10):datetime(2024,1,1,3,0,0);
T = rand(length(t), 3);    % 3 variables

In this case T(i,1) is the value of variable 1 at time t(i), and so on.

Ensuring Proper Time Ordering

Many operations on time series assume that the time vector is sorted in ascending order and that there are no duplicated times. If your time points arrive out of order, you should sort them together with the corresponding data.

A simple pattern for sorting uses the sort function to obtain sorting indices and then reorders both the time vector and the data array in the same way. For example, if t is a datetime vector and x is a numeric vector:

[tSorted, idx] = sort(t);
xSorted = x(idx);

After this, tSorted is in increasing time order, and xSorted is reordered to match. You then work with tSorted and xSorted for further analysis.

You should avoid sorting time and data separately. If you used sort on t and another sort on x independently, you would break the mapping between each time and its value. Always use the same index vector when you reorder related arrays.

When your series has multiple variables stored in columns, you use the same approach and apply the sorting indices to all columns:

[tSorted, idx] = sort(t);
TSorted = T(idx, :);

If your time series has duplicate time stamps and you need unique times, one simple approach is to use logical indexing to keep only the first occurrence of each time. This is a basic step that prepares the data for later analysis.

Resampling and Changing Sampling Frequency

In practice, time series data rarely comes in exactly the time spacing you need. You might have measurements every second but want averages every minute, or you might have irregularly spaced data and want values at regular intervals. These tasks are examples of resampling.

A simple way to resample is to create a new time vector with the desired spacing, then compute new values based on the original series. For example, to obtain minute averages from seconds data, you can group indices that fall into each minute and take the mean for each group.

Suppose t is a datetime vector with many time stamps and x is the corresponding data. You can construct a new time vector tNew with a regular spacing and then compute values for it. There are many ways to do this; a straightforward approach uses indexing by ranges of time:

tNew = (t(1)):minutes(1):(t(end));   % new regular times
xNew = zeros(size(tNew));
for k = 1:numel(tNew)
    if k < numel(tNew)
        idx = t >= tNew(k) & t < tNew(k+1);
    else
        idx = t >= tNew(k) & t <= t(end);
    end
    xNew(k) = mean(x(idx));
end

This simple loop computes the average of all values in each one minute bin. In real work you might use more advanced functions to accomplish similar tasks more compactly, but the pattern is the same: define target times, then compute values for those times from the original data.

You can also resample by interpolation rather than averaging. Interpolation estimates data values at new time points by using neighboring samples. A basic tool for this is interp1, which takes original times, original values, and new times, and returns interpolated values. For example:

xInterp = interp1(t, x, tNew, 'linear');

In this call, 'linear' specifies linear interpolation. Different methods give different behavior, but all share the same basic pattern.

Handling Missing Data in Time Series

In many time series, some values are missing. In MATLAB, missing numeric data is often represented by NaN. Before calculating statistics or plotting, you must decide how to handle these missing values.

A basic step is simply to remove missing values entirely, which you can do with logical indexing. For instance, if x contains NaN values and you want to work only with the subset of valid points, you can construct a logical index valid that is true where both the time and value are valid and then keep only those rows:

valid = ~isnan(x);
tValid = t(valid);
xValid = x(valid);

Alternatively, you might want to fill missing values with estimates. One simple approach uses interpolation. For example, to replace NaN values in x with linear interpolation in time, you can first identify indices of valid elements and then interpolate only over them:

valid = ~isnan(x);
xFilled = x;
xFilled(~valid) = interp1(t(valid), x(valid), t(~valid), 'linear');

This code leaves existing values unchanged and replaces only the missing values by estimates based on neighboring times.

Sometimes you might prefer to fill with a fixed value, such as zero, or with the last observed value carried forward. These are also straightforward with logical indexing and do not require interpolation.

Regardless of the method, the key is to keep time and values aligned and to apply the same logical mask to both the time vector and any data arrays.

Aligning Multiple Time Series

Often you need to compare or combine two or more time series with different time vectors. For example, one sensor might report every second and another every five seconds. To correlate or plot them together in a meaningful way, you must align them in time.

A simple alignment strategy is to choose a common time axis and resample each time series onto that axis. For instance, suppose t1, x1 represent series 1 and t2, x2 represent series 2. You can define a time grid that covers the entire shared span, then interpolate both series onto that grid:

Comments

Please login to add a comment.

Don't have an account? Register now!