stomp_euclidean_matrix_profile

stomp_euclidean_matrix_profile(X: ndarray | List, T: ndarray, L: int, mask: ndarray, k: int = 1, threshold: float = inf, inverse_distance: bool = False, exclusion_size: int | None = None)[source]

Compute a euclidean euclidean matrix profile using STOMP [1].

This improves on the naive matrix profile by updating the dot products for each sucessive query in T instead of recomputing them.

Parameters:
X: np.ndarray, 3D array of shape (n_cases, n_channels, n_timepoints)

The input samples. If X is an unquel length collection, expect a TypedList of 2D arrays of shape (n_channels, n_timepoints)

Tnp.ndarray, 2D array of shape (n_channels, series_length)

The series used for similarity search. Note that series_length can be equal, superior or inferior to n_timepoints, it doesn’t matter.

Lint

The length of the subsequences considered during the search. This parameter cannot be larger than n_timepoints and series_length.

masknp.ndarray, 2D array of shape (n_cases, n_timepoints - length + 1)

Boolean mask of the shape of the distance profiles indicating for which part of it the distance should be computed. In this context, it is the mask for the first query of size L in T. This mask will be updated during the algorithm.

kint, default=1

The number of best matches to return during predict for each subsequence.

thresholdfloat, default=np.inf

The number of best matches to return during predict for each subsequence.

inverse_distancebool, default=False

If True, the matching will be made on the inverse of the distance, and thus, the worst matches to the query will be returned instead of the best ones.

exclusion_sizeint, optional

The size of the exclusion zone used to prevent returning as top k candidates the ones that are close to each other (for example i and i+1). It is used to define a region between \(id_timestomp - exclusion_size\) and \(id_timestomp + exclusion_size\) which cannot be returned as best match if \(id_timestomp\) was already selected. By default, the value None means that this is not used.

Returns:
Tuple(ndarray, ndarray)

The first array, of shape (series_length - length + 1, n_matches), contains the distance between all the queries of size length and their best matches in X_. The second array, of shape (series_length - L + 1, n_matches, 2), contains the indexes of these matches as (id_sample, id_timepoint). The corresponding match can be retrieved as X_[id_sample, :, id_timepoint : id_timepoint + length].

References

[1]

Matrix Profile II: Exploiting a Novel Algorithm and GPUs to break the one

Hundred Million Barrier for Time Series Motifs and Joins. Yan Zhu, Zachary Zimmerman, Nader Shakibay Senobari, Chin-Chia Michael Yeh, Gareth Funning, Abdullah Mueen, Philip Berisk and Eamonn Keogh. IEEE ICDM 2016