The Canonical Time-series Characteristics (catch22) transform¶
Catch22[1] is a collection of 22 time series features extracted from the 7000+ present in the hctsa [2][3] toolbox. A hierarchical clustering was performed on the correlation matrix of features that performed better than random chance to remove redundancy. These clusters were sorted by balanced accuracy using a decision tree classifier and a single feature was selected from the 22 clusters formed, taking into account balanced accuracy results, computational efficiency and interpretability. More about the individual features of catch22 can be learned in the Gitbook of the original creators.
In this notebook, we will demonstrate how to use aeon’s catch22 transformer on the ItalyPowerDemand univariate and BasicMotions multivariate datasets. We will go through the parameters of catch22 and how changing the default values may change results. Catch22 has also been used inside of classification.
1. Transformation¶
Catch22 is a feature based transformer that extracts 22 features from a time series. The input data can be both univariate and multivariate, without the need to reshape the data. It is most commonly used for interpretability of each time series data. Additionally, as the data of a time series will be reduced to 22 data values, it will increase computational efficiency of machine learning tasks such as clustering, classification, etc.
1.1 Import Data and Catch22¶
[7]:
import numpy as np
from aeon.datasets import load_basic_motions, load_italy_power_demand
from aeon.transformations.collection.feature_based import Catch22
1.2 Load Data¶
[3]:
IPD_X_train, IPD_y_train = load_italy_power_demand(split="train")
IPD_X_test, IPD_y_test = load_italy_power_demand(split="test")
print(
"Italy Power Demand (Univariate): ",
IPD_X_train.shape,
IPD_y_train.shape,
IPD_X_test.shape,
IPD_y_test.shape,
)
BM_X_train, BM_y_train = load_basic_motions(split="train")
BM_X_test, BM_y_test = load_basic_motions(split="test")
print(
"Load Basic Motions (Multivarite): ",
BM_X_train.shape,
BM_y_train.shape,
BM_X_test.shape,
BM_y_test.shape,
)
Italy Power Demand (Univariate): (67, 1, 24) (67,) (1029, 1, 24) (1029,)
Load Basic Motions (Multivarite): (40, 6, 100) (40,) (40, 6, 100) (40,)
1.3 Transform the Data¶
Univariate¶
[4]:
c22_uv = Catch22()
c22_uv.fit(IPD_X_train, IPD_y_train)
transformed_data_uv = c22_uv.transform(IPD_X_train)
print(transformed_data_uv.shape)
(67, 22)
Multivariate¶
Do note that the result of the shape won’t be (X , 22). This is because it’s a multivariate dataset, and therefore the feature vector will be of size 22 times the number of channels.
[5]:
c22_mv = Catch22()
data = c22_mv.fit_transform(BM_X_train, BM_y_train)
transformed_data_mv = c22_uv.transform(BM_X_train)
print(transformed_data_mv.shape)
(40, 132)
2. Parameters¶
Aeon’s catch22 includes a lot options for users need compared to the original catch22 implementation which we will talk about in section 2.4. Few of the parameters are shown below with examples, specifically the ones that change affect the output. More can be found in catch22’s documentation.
2.1 Features¶
Catch22 takes 22 distinct features from a time series. Sometimes you may not need all the features extracted by catch22, instead you may only need some very specific features. By defining an array containing strings of features, only those specified features will be extracted. The order of these features do matter, as that will be the order of the output. Aeon’s catch22’s documentation specifies a list of the 22 features for extraction.
[6]:
features_long = ["DN_HistogramMode_5", "CO_f1ecac", "FC_LocalSimple_mean3_stderr"]
features_short = ["mode_5", "acf_timescale", "forecast_error"]
c22_long = Catch22(features=features_long)
c22_long.fit(IPD_X_train, IPD_y_train)
transformed_data_long = c22_long.transform(IPD_X_train)
print(transformed_data_long.shape)
c22_short = Catch22(features=features_short)
c22_short.fit(IPD_X_train, IPD_y_train)
transformed_data_short = c22_short.transform(IPD_X_train)
print(transformed_data_short.shape)
(67, 3)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[6], line 11
9 c22_short = Catch22(features=features_short)
10 c22_short.fit(IPD_X_train, IPD_y_train)
---> 11 transformed_data_short = c22_short.transform(IPD_X_train)
12 print(transformed_data_short.shape)
File d:\AeonProject\aeon\.venv\Lib\site-packages\aeon\transformations\collection\base.py:157, in BaseCollectionTransformer.transform(self, X, y)
154 X_inner = self._preprocess_collection(X, store_metadata=False)
155 y_inner = y
--> 157 Xt = self._transform(X=X_inner, y=y_inner)
159 return Xt
File d:\AeonProject\aeon\.venv\Lib\site-packages\aeon\transformations\collection\feature_based\_catch22.py:182, in Catch22._transform(self, X, y)
165 """Transform X into the catch22 features.
166
167 Parameters
(...)
178 The catch22 features for each dimension.
179 """
180 n_cases = len(X)
--> 182 f_idx = _verify_features(self.features, self.catch24)
184 threads_to_use = check_n_jobs(self.n_jobs)
186 if self.use_pycatch22:
File d:\AeonProject\aeon\.venv\Lib\site-packages\aeon\transformations\collection\feature_based\_catch22.py:1300, in _verify_features(features, catch24)
1298 f_idx.append(23)
1299 else:
-> 1300 raise ValueError("Invalid feature selection.")
1301 elif isinstance(f, int):
1302 if f >= 0 and f < 22:
ValueError: Invalid feature selection.
2.2 Catch24¶
Catch24 extracts 24 features from a time series. The 24 features consist of the 22 features from catch22 with the addition of the mean and standard deviation of the time series. More features does not strictly define better results, as it may increase run time and overfit the data in certain time series tasks. In certain tasks, catch24 may outperform catch22. For example in [4], catch24 significally outperformed catch22 in cross-domain anomaly detection.
Catch22 extracts the most important features for machine learning tasks and therefore is more widely used.
[ ]:
c24 = Catch22(catch24=True)
data_c24 = c24.fit_transform(IPD_X_train)
print(data_c24.shape)
(67, 24)
2.3 Replace NaNs¶
You may find that some time series cannot extract certain features from it. This may happen when division by zero occurs, or the input value is zero. Simply, it means we cannot extract the feature from the time series. However, we may still want a number for calculations and therefore ‘replace_nans’ allows us to replace NaN with zero.
[ ]:
training_data = np.array([[0, 0, 0, 0, 0, 0]])
c22_nan = Catch22()
data_nan = c22_nan.fit_transform(training_data)
print(f"Data with NaN: {data_nan[0]}\n")
c22_no_nan = Catch22(replace_nans=True)
data_no_nan = c22_no_nan.fit_transform(training_data)
print("Data with no NaN: ", data_no_nan[0])
Data with NaN: [ nan nan 1. 0. 0. 6.
6. 0. nan 0. 0. 0.
3. 0. 1. 1.60943791 1. nan
nan nan 0.08 0. ]
Data with no NaN: [0. 0. 1. 0. 0. 6.
6. 0. 0. 0. 0. 0.
3. 0. 1. 1.60943791 1. 0.
0. 0. 0.08 0. ]
2.4 Pycatch22¶
Pycatch22 is the original implementation of catch22 based on [1]. Aeon allows you to use pycatch22 by setting the parameter ‘use_pycatch22’ to true. The difference of the two is that pycatch22 uses C as their backend while python uses the Numba library, which assembles python code into C. Aeon also regularly maintains their catch22 library, and therefore there should be barely any discrepancy between outputs. Pycatch22 has a few issues with their implementation such as at times struggling to run on windows. If you are using the aeon library for a certain task, but want to use pycatch22 for transformation of the data, it is recommended to use aeon’s catch22 with the parameter ‘use_pycatch22’ set to true. If you do that, you may encounter a warning that pycatch22 has not been installed and therefore will use aeon’s catch22, if that happens just install the pycatch22 library.
Currently, pycatch22 has an issue where the output features extracted using Python yield different values compared to those extracted using the native C code. Aeon’s catch22 implementation extracts the same results as pycatch22’s C code. Therefore, the extracted results may differ.
[ ]:
py22 = Catch22(use_pycatch22=True)
data_py22 = py22.fit_transform(IPD_X_test)
print(f"Pycatch22 : {data_py22[667]}\n")
py22 = Catch22()
data_py22 = py22.fit_transform(IPD_X_test)
print("aeon catch22 : ", data_py22[667])
Pycatch22 : [-0.57058807 -0.73624268 4. 0.625 -0.45833333 2.45190656
6. 0.42507544 0.58904862 0.92048041 0.11344743 0.37262397
3. 0.86956522 6. 1.81200059 0.75 0.15104572
0. 0. 0.04 0. ]
aeon catch22 : [ 0.09203038 -0.73624265 7. 0.625 -0.45833333 3.
6. 0.42507544 0.58904862 0.8982969 0.11344743 0.37262397
3. 0.86956522 4. 1.83902118 0.75 0.15104572
nan nan 0.06666667 0. ]
3. References:¶
[1] Lubba, C. H., Sethi, S. S., Knaute, P., Schultz, S. R., Fulcher, B. D., & Jones, N. S. (2019). catch22: CAnonical Time-series CHaracteristics. Data Mining and Knowledge Discovery, 33(6), 1821-1852.
[2] Fulcher, B. D., & Jones, N. S. (2017). hctsa: A computational framework for automated time-series phenotyping using massive feature extraction. Cell systems, 5(5), 527-531.
[3] Fulcher, B. D., Little, M. A., & Jones, N. S. (2013). Highly comparative time-series analysis: the empirical structure of time series and their methods. Journal of the Royal Society Interface, 10(83), 20130048.
[4] Agrahari, R., Nicholson, M., Conran, C., Assem, H. and Kelleher, J.D., 2022. Assessing feature representations for instance-based cross-domain anomaly detection in cloud services univariate time series data. IoT, 3(1), pp.123-144.
Generated using nbsphinx. The Jupyter notebook can be found here.