kasba_average¶

kasba_average(X: ndarray, init_barycenter: ndarray, previous_cost: float, previous_distance_to_centre: ndarray, distance: str = 'msm', max_iters: int = 50, tol=1e-05, ba_subset_size: float = 0.5, initial_step_size: float = 0.05, decay_rate: float = 0.1, verbose: bool = False, random_state: int | None = None, **kwargs) → tuple[ndarray, ndarray][source]¶

KASBA average [1].

The KASBA clusterer proposed an adapted version of the Stochastic Subgradient Elastic Barycentre Average. The algorithm works by iterating randomly over X. If it is the first iteration then all the values are used. However, if it is not the first iteration then a subset is used. The subset size is determined by the parameter ba_subset_size which is the percentage of the data to use. If there are less than 10 data points, all the available data will be used every iteration.

Parameters:

X: np.ndarray, of shape (n_cases, n_channels, n_timepoints) or: (n_cases, n_timepoints)

A collection of time series instances to take the average from.
init_barycenter: np.ndarray, of shape (n_channels, n_timepoints): The initial barycentre to refine.
previous_cost: float: The summed total distance from all time series in X to the init_barycentre.
previous_distance_to_centre: np.ndarray, of shape (n_cases,): The distance between each time series in X and the init_barycentre.
distance: str, default=’msm’: String defining the distance to use for averaging. Distance to compute similarity between time series. A list of valid strings for metrics can be found in the documentation form aeon.distances.get_distance_function.
max_iters: int, default=30: Maximum number iterations for dba to update over.
tolfloat (default: 1e-5): Tolerance to use for early stopping: if the decrease in cost is lower than this value, the Expectation-Maximization procedure stops.
ba_subset_sizefloat, default=0.5: The proportion of the data to use in the barycentre average step. For the first iteration all the data will be used however, on subsequent iterations a subset of the data will be used. This will be a % of the data passed (e.g. 0.5 = 50%). If there are less than 10 data points, all the available data will be used every iteration.
initial_step_sizefloat, default=0.05: The initial step size for the gradient descent.
decay_ratefloat, default=0.1: The decay rate for the step size in the barycentre average step. The initial_step_size will be multiplied by np.exp(-decay_rate * i) every iteration where i is the current iteration.
verbose: bool, default=False: Boolean that controls the verbosity.
random_state: int or None, default=None: Random state to use for the barycenter averaging.

Returns:

np.ndarray of shape (n_channels, n_timepoints): Time series that is the KASBA average of the collection of instances provided.

References

[1]

Holder, Christopher & Bagnall, Anthony. (2024). Rock the KASBA: Blazingly Fast and Accurate Time Series Clustering. 10.48550/arXiv.2411.17838.