energy_fault_detector.evaluation.care2compare

Care to compare (https://doi.org/10.5281/zenodo.10958774) dataset loader.

class Care2CompareDataset(path='./CARE_To_Compare', download_dataset=False)

Bases: object

Loads the Care to Compare Dataset (accompanying paper https://doi.org/10.3390/data9120138).

The data can be downloaded either manually from https://doi.org/10.5281/zenodo.14958989 (in this case specify path) or it can be downloaded automatically by setting download_dataset to True.

By default, only the averages are read. See statistics argument of the data loading methods.

Method overview:

  • get_event_info: Returns event info for a given event ID

  • iter_datasets: Reads datasets and yields the resulting training and test DataFrames while iterating over event IDs.

  • format_event_dataset: Extracts normal_index from a loaded dataset and returns normal_index and sensor_data.

  • iter_formatted_datasets: Reads datasets, extracts normal_index and yields the resulting train and test DataFrames as well as the normal_indexes while iterating over event IDs.

  • load_event_dataset: Reads dataset specified by event_id and returns training and test data.

  • load_and_format_event_dataset: Reads dataset specified by event_id and returns training and test data as well as the corresponding normal indexes.

  • iter_train_datasets_per_asset: Reads datasets and yields the resulting training DataFrames while iterating over asset IDs and aggregating event IDs for the same assets.

  • update_c2c_config: Updates a specified FaultDetector config based on provided feature descriptions.

Parameters:
  • path (Path) – The directory path where the dataset is located.

  • download_dataset (bool) – If True the Care to Compare dataset is automatically downloaded and unzipped.

Initialize the Care2CompareDataset class.

static format_event_dataset(data)

Splits a given dataset into normal_index and numerical sensor data

Return type:

Tuple[DataFrame, Series]

get_event_info(event_id)

Get event info of provided event ID.

Return type:

Series

iter_datasets(wind_farm=None, test_only=False, statistics=None, index_column='id', use_readable_columns=True)

Iterate over all datasets, optionally for a specific wind farm.

Parameters:
  • wind_farm (str, optional) – Wind farm name. If not provided, all datasets will be loaded.

  • test_only (bool, optional) – If true, only test dataset will be returned.

  • statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].

  • index_column (str) – The name of the index column, either ‘time_stamp’ or ‘id’. Defaults to ‘id’.

  • use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Yields:

Iterator[Tuple]

If test_only=False, yields a tuple of train and test data and event id.

If test_only=True, yields a tuple of test data and event id.

Return type:

Iterator[Tuple]

iter_formatted_datasets(wind_farm=None, test_only=False, statistics=None, index_column='id', use_readable_columns=True)

Iterate over all datasets, optionally for a specific wind farm and format the dataset by splitting it into boolean normal_index and numerical sensor data.

Parameters:
  • wind_farm (str, optional) – Wind farm name. If not provided, all datasets will be loaded.

  • test_only (bool, optional) – If true, only test dataset will be returned.

  • statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].

  • index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.

  • use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Yields:

Iterator[Tuple]

If test_only=False, yields a tuple of train_sensor_data, train_normal_index,

test_sensor_data, test_normal_index and event id. If test_only=True, yields a tuple of test_sensor_data, test_normal_index and event id.

Return type:

Iterator[Tuple]

iter_train_datasets_per_asset(wind_farm=None, statistics=None, index_column='id', use_readable_columns=True)

Iterate over all asset IDs to generate a training dataset, optionally for a specific wind farm.

Parameters:
  • wind_farm (str, optional) – Wind farm name. If not provided, all assets will be considered.

  • statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].

  • index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.

  • use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Yields:

Iterator[Tuple[pd.DataFrame, int, List[int]]]

Yields a tuple containing the training dataset, asset ID,

and list of event IDs for this asset.

Return type:

Iterator[Tuple[DataFrame, int, List[int]]]

load_and_format_event_dataset(event_id, statistics=None, test_only=False, index_column='id', use_readable_columns=True)

Load train and test datasets for a specific event ID and split them up into boolean normal index and numerical sensordata

Parameters:
  • event_id (int) – The event ID for which to retrieve datasets.

  • test_only (bool, optional) – If true, only the test dataset will be returned.

  • statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].

  • index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.

  • use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Returns:

  • If test_only=False, yields a tuple of train_sensor_data, train_status, test_sensor_data and test_status.

  • If test_only=True, yields a tuple of test_sensor_data and test_status.

Return type:

Tuple[pd.DataFrame, pd.Series, pd.DataFrame, pd.Series]

load_event_dataset(event_id, test_only=False, statistics=None, index_column='id', use_readable_columns=True)

Load train and test datasets for a specific event ID.

Parameters:
  • event_id (int) – The event ID for which to retrieve datasets.

  • test_only (bool, optional) – If true, only the test dataset will be returned.

  • statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].

  • index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.

  • use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Returns:

If test_only is False, returns a tuple of training and testing datasets. If test_only is True, returns only the test dataset.

Return type:

Union[Tuple[pd.DataFrame, pd.DataFrame], pd.DataFrame]

update_c2c_config(config, wind_farm, use_readable_columns=True)

Update config based on provided feature descriptions. Updates the feature to exclude and angle lists of the data preprocessor steps.

Parameters:
  • config (Config) – Configuration object.

  • wind_farm (str) – name of wind farm (A, B or C)

  • use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.

Return type:

None