load_anomaly_detection¶
- load_anomaly_detection(name: tuple[str, str], split: Literal['train', 'test'] = 'test', extract_path: PathLike | None = None, return_metadata: bool = False) tuple[ndarray, ndarray] | tuple[ndarray, ndarray, dict[str, Any]][source]¶
Load an anomaly detection dataset.
This function loads TSAD problems into memory, downloading from the TimeEval archive (https://timeeval.github.io/evaluation-paper/notebooks/Datasets.html) [1] if the data is not available at the specified
extract_path. If you want to load a problem from a local file, specify the location inextract_path. This function assumes the data is stored in the TimeEval format.If you do not specify
extract_path, it will set the path toaeon/datasets/local_data. If the problem is not present inextract_path, it will attempt to download the data.The problem name is a tuple of collection name and dataset name.
("KDD-TSAD", "001_UCR_Anomaly_DISTORTED1sddb40")is an example of a univariate unsupervised problem,("CATSv2", "CATSv2")a multivariate supervised problem.- Parameters:
- nametuple of str, str
Name of dataset. If a dataset that is listed in tsad_datasets is given, this function will look in the extract_path first, and if it is not present, attempt to download the data from the TimeEval archive saving it to the extract_path.
- splitstr{“train”, “test”}, default=”test”
Whether to load the train or test partition of the problem. By default, it loads the test partition.
- extract_pathstr, default=None
The path to look for the data. If no path is provided, the function looks in aeon/datasets/local_data/. If a path is given, it can be an absolute, e.g., C:/Temp/ or relative, e.g. Temp/ or ./Temp/, path to an existing CSV-file.
- return_metadataboolean, default = False
If True, returns a tuple (X, y, metadata).
- Returns:
- X: np.ndarray
The univariate (1d) or multivariate (2d) time series with shape (n_instances, n_channels).
- y: np.ndarray
The binary anomaly labels with shape (n_instances,).
- metadata: optional
returns the following metadata ‘problemname’,timestamps,dimensions,learning_type,contamination,num_anomalies
- Raises:
- URLError or HTTPError
If the website is not accessible.
- ValueError
If a dataset name that does not exist on the repo is given or if a dataset is requested that does not exist in the archive.
References
[1]Sebastian Schmidl, Phillip Wenig, Thorsten Papenbrock: Anomaly Detection in Time Series: A Comprehensive Evaluation. PVLDB 9:(15), 2022, DOI:10.14778/3538598.3538602.
Examples
>>> from aeon.datasets import load_anomaly_detection >>> X, y = load_anomaly_detection( ... name=("KDD-TSAD", "001_UCR_Anomaly_DISTORTED1sddb40") ... )