energy_fault_detector.evaluation.care2compare
Care to compare (https://doi.org/10.5281/zenodo.10958774) dataset loader.
- class Care2CompareDataset(path='./CARE_To_Compare', download_dataset=False)
Bases:
objectLoads the Care to Compare Dataset (accompanying paper https://doi.org/10.3390/data9120138).
The data can be downloaded either manually from https://doi.org/10.5281/zenodo.14958989 (in this case specify path) or it can be downloaded automatically by setting download_dataset to True.
By default, only the averages are read. See statistics argument of the data loading methods.
Method overview:
get_event_info: Returns event info for a given event ID
iter_datasets: Reads datasets and yields the resulting training and test DataFrames while iterating over event IDs.
format_event_dataset: Extracts normal_index from a loaded dataset and returns normal_index and sensor_data.
iter_formatted_datasets: Reads datasets, extracts normal_index and yields the resulting train and test DataFrames as well as the normal_indexes while iterating over event IDs.
load_event_dataset: Reads dataset specified by event_id and returns training and test data.
load_and_format_event_dataset: Reads dataset specified by event_id and returns training and test data as well as the corresponding normal indexes.
iter_train_datasets_per_asset: Reads datasets and yields the resulting training DataFrames while iterating over asset IDs and aggregating event IDs for the same assets.
update_c2c_config: Updates a specified FaultDetector config based on provided feature descriptions.
- Parameters:
path (Path) – The directory path where the dataset is located.
download_dataset (bool) – If True the Care to Compare dataset is automatically downloaded and unzipped.
Initialize the Care2CompareDataset class.
- static format_event_dataset(data)
Splits a given dataset into normal_index and numerical sensor data
- Return type:
Tuple[DataFrame,Series]
- get_event_info(event_id)
Get event info of provided event ID.
- Return type:
Series
- iter_datasets(wind_farm=None, test_only=False, statistics=None, index_column='id', use_readable_columns=True)
Iterate over all datasets, optionally for a specific wind farm.
- Parameters:
wind_farm (str, optional) – Wind farm name. If not provided, all datasets will be loaded.
test_only (bool, optional) – If true, only test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either ‘time_stamp’ or ‘id’. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.
- Yields:
Iterator[Tuple] –
- If test_only=False, yields a tuple of train and test data and event id.
If test_only=True, yields a tuple of test data and event id.
- Return type:
- iter_formatted_datasets(wind_farm=None, test_only=False, statistics=None, index_column='id', use_readable_columns=True)
Iterate over all datasets, optionally for a specific wind farm and format the dataset by splitting it into boolean normal_index and numerical sensor data.
- Parameters:
wind_farm (str, optional) – Wind farm name. If not provided, all datasets will be loaded.
test_only (bool, optional) – If true, only test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.
- Yields:
Iterator[Tuple] –
- If test_only=False, yields a tuple of train_sensor_data, train_normal_index,
test_sensor_data, test_normal_index and event id. If test_only=True, yields a tuple of test_sensor_data, test_normal_index and event id.
- Return type:
- iter_train_datasets_per_asset(wind_farm=None, statistics=None, index_column='id', use_readable_columns=True)
Iterate over all asset IDs to generate a training dataset, optionally for a specific wind farm.
- Parameters:
wind_farm (str, optional) – Wind farm name. If not provided, all assets will be considered.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.
- Yields:
Iterator[Tuple[pd.DataFrame, int, List[int]]] –
- Yields a tuple containing the training dataset, asset ID,
and list of event IDs for this asset.
- Return type:
- load_and_format_event_dataset(event_id, statistics=None, test_only=False, index_column='id', use_readable_columns=True)
Load train and test datasets for a specific event ID and split them up into boolean normal index and numerical sensordata
- Parameters:
event_id (int) – The event ID for which to retrieve datasets.
test_only (bool, optional) – If true, only the test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.
- Returns:
If test_only=False, yields a tuple of train_sensor_data, train_status, test_sensor_data and test_status.
If test_only=True, yields a tuple of test_sensor_data and test_status.
- Return type:
Tuple[pd.DataFrame, pd.Series, pd.DataFrame, pd.Series]
- load_event_dataset(event_id, test_only=False, statistics=None, index_column='id', use_readable_columns=True)
Load train and test datasets for a specific event ID.
- Parameters:
event_id (int) – The event ID for which to retrieve datasets.
test_only (bool, optional) – If true, only the test dataset will be returned.
statistics (list[str], optional) – describes which statistic features will be selected. Possible statistics are ‘avg’, ‘min’, ‘max’ and ‘std’. If None are provided it defaults to [‘avg’].
index_column (str) – The name of the index column, either time_stamp or id. Defaults to ‘id’.
use_readable_columns (bool) – Use human-readable columns based on the feature descriptions. Default: True.
- Returns:
If test_only is False, returns a tuple of training and testing datasets. If test_only is True, returns only the test dataset.
- Return type:
Union[Tuple[pd.DataFrame, pd.DataFrame], pd.DataFrame]
- update_c2c_config(config, wind_farm, use_readable_columns=True)
Update config based on provided feature descriptions. Updates the feature to exclude and angle lists of the data preprocessor steps.