load_anomaly_detection¶

load_anomaly_detection(name: tuple[str, str], split: Literal['train', 'test'] = 'test', extract_path: PathLike | None = None, return_metadata: bool = False) → tuple[ndarray, ndarray] | tuple[ndarray, ndarray, dict[str, Any]][source]¶

Load an anomaly detection dataset.

This function loads TSAD problems into memory, downloading from the TimeEval archive (https://timeeval.github.io/evaluation-paper/notebooks/Datasets.html) [1] if the data is not available at the specified extract_path. If you want to load a problem from a local file, specify the location in extract_path. This function assumes the data is stored in the TimeEval format.

If you do not specify extract_path, it will set the path to aeon/datasets/local_data. If the problem is not present in extract_path, it will attempt to download the data.

The problem name is a tuple of collection name and dataset name. ("KDD-TSAD", "001_UCR_Anomaly_DISTORTED1sddb40") is an example of a univariate unsupervised problem, ("CATSv2", "CATSv2") a multivariate supervised problem.

Parameters:

nametuple of str, str: Name of dataset. If a dataset that is listed in tsad_datasets is given, this function will look in the extract_path first, and if it is not present, attempt to download the data from the TimeEval archive saving it to the extract_path.
splitstr{“train”, “test”}, default=”test”: Whether to load the train or test partition of the problem. By default, it loads the test partition.
extract_pathstr, default=None: The path to look for the data. If no path is provided, the function looks in aeon/datasets/local_data/. If a path is given, it can be an absolute, e.g., C:/Temp/ or relative, e.g. Temp/ or ./Temp/, path to an existing CSV-file.
return_metadataboolean, default = False: If True, returns a tuple (X, y, metadata).

Returns:

X: np.ndarray: The univariate (1d) or multivariate (2d) time series with shape (n_instances, n_channels).
y: np.ndarray: The binary anomaly labels with shape (n_instances,).
metadata: optional: returns the following metadata ‘problemname’,timestamps,dimensions,learning_type,contamination,num_anomalies

Raises:

URLError or HTTPError: If the website is not accessible.
ValueError: If a dataset name that does not exist on the repo is given or if a dataset is requested that does not exist in the archive.

References

[1]

Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock: Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB 9:(15), 2022, DOI:10.14778/3538598.3538602.

Examples

>>> from aeon.datasets import load_anomaly_detection
>>> X, y = load_anomaly_detection(
...     name=("KDD-TSAD", "001_UCR_Anomaly_DISTORTED1sddb40")
... )