energy_fault_detector.evaluation.care2compare

Care to compare (https://doi.org/10.5281/zenodo.10958774) dataset loader.

class Care2CompareDataset(path='./CARE_To_Compare', download_dataset=False)

Bases: object

Loads the Care to Compare Dataset (accompanying paper https://doi.org/10.3390/data9120138).

The data can be downloaded either manually from https://doi.org/10.5281/zenodo.14958989 (in this case specify path) or it can be downloaded automatically by setting download_dataset to True.

All data is loaded into memory, which might be problematic for large datasets (consider using DataLoader classes of TensorFlow and PyTorch in that case).

By default, only the averages are read. See statistics argument of the data loading methods.

get_event_info(): Returns event info for a given event ID

iter_datasets(): Reads datasets and yields the resulting training and test DataFrames while iterating over event IDs.

format_event_dataset(): Extracts normal_index from a loaded dataset and returns normal_index and sensor_data.

iter_formatted_datasets(): Reads datasets, extracts normal_index and yields the resulting train and test DataFrames as well as the normal_indexes while iterating over event IDs.

load_event_dataset(): Reads dataset specified by event_id and returns training and test data.

load_and_format_event_dataset(): Reads dataset specified by event_id and returns training and test data as well as the corresponding normal indexes.

iter_train_datasets_per_asset(): Reads datasets and yields the resulting training DataFrames while iterating over asset IDs and aggregating event IDs for the same assets.

update_c2c_config(): Updates a specified FaultDetector config based on provided feature descriptions.

Parameters:

path (Path) – The directory path where the dataset is located.
download_dataset (bool) – If True the Care to Compare dataset is automatically downloaded and unzipped.

Initialize the Care2CompareDataset class.

static format_event_dataset(data)

Splits a given dataset into normal_index and numerical sensor data

Return type:: Tuple[DataFrame, Series]

get_event_info(event_id)

Get event info of provided event ID.

Return type:: Series

iter_datasets(wind_farm=None, test_only=False, statistics=None, index_column='id', use_readable_columns=True)

Iterate over all datasets, optionally for a specific wind farm.

Parameters:

wind_farm (str, optional) – Wind farm name. If not provided, all datasets will be loaded.
test_only (bool, optional) – If true, only test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either ‘time_stamp’ or ‘id’. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Yields:

Iterator[Tuple] –

If test_only=False, yields a tuple of train and test data and event id.: If test_only=True, yields a tuple of test data and event id.

Return type:

Iterator[Tuple]

iter_formatted_datasets(wind_farm=None, test_only=False, statistics=None, index_column='id', use_readable_columns=True)

Iterate over all datasets, optionally for a specific wind farm and format the dataset by splitting it into boolean normal_index and numerical sensor data.

Parameters:

wind_farm (str, optional) – Wind farm name. If not provided, all datasets will be loaded.
test_only (bool, optional) – If true, only test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Yields:

Iterator[Tuple] –

If test_only=False, yields a tuple of train_sensor_data, train_normal_index,: test_sensor_data, test_normal_index and event id. If test_only=True, yields a tuple of test_sensor_data, test_normal_index and event id.

Return type:

Iterator[Tuple]

iter_train_datasets_per_asset(wind_farm=None, statistics=None, index_column='id', use_readable_columns=True)

Iterate over all asset IDs to generate a training dataset, optionally for a specific wind farm.

Parameters:

wind_farm (str, optional) – Wind farm name. If not provided, all assets will be considered.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Yields:

Iterator[Tuple[pd.DataFrame, int, List[int]]] –

Yields a tuple containing the training dataset, asset ID,: and list of event IDs for this asset.

Return type:

Iterator[Tuple[DataFrame, int, List[int]]]

load_and_format_event_dataset(event_id, statistics=None, test_only=False, index_column='id', use_readable_columns=True)

Load train and test datasets for a specific event ID and split them up into boolean normal index and numerical sensordata

Parameters:

event_id (int) – The event ID for which to retrieve datasets.
test_only (bool, optional) – If true, only the test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Returns:

If test_only=False, yields a tuple of train_sensor_data, train_status,: test_sensor_data and test_status.

If test_only=True, yields a tuple of test_sensor_data and test_status.

Return type:

Tuple[pd.DataFrame, pd.Series, pd.DataFrame, pd.Series]

load_event_dataset(event_id, test_only=False, statistics=None, index_column='id', use_readable_columns=True)

Load train and test datasets for a specific event ID.

Parameters:

event_id (int) – The event ID for which to retrieve datasets.
test_only (bool, optional) – If true, only the test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Returns:

If test_only is False, returns a tuple of training and testing datasets. If test_only is True, returns only the test dataset.

Return type:

Union[Tuple[pd.DataFrame, pd.DataFrame], pd.DataFrame]

update_c2c_config(config, wind_farm, use_readable_columns=True)

Update config based on provided feature descriptions. Updates the feature to exclude and angle lists of the data preprocessor steps.

Parameters:

config (Config) – Configuration object.
wind_farm (str) – name of wind farm (A, B or C)
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Return type:

None