energy_fault_detector

The anomaly-detection-iee package

class Config(config_filename=None, config_dict=None)

Bases: BaseConfig

Configuration class. Either config_filename or config_dict must be provided. Reads a yaml files with the anomaly detection configuration and sets corresponding settings.

property angle_columns: List[str]: List of angle columns.

property arcana_params: Dict[str, Any]: Get the ARCANA parameters.

property data_clipping: bool: Whether to clip training data.

property data_clipping_params: Dict[str, Any]: Data clipping parameters.

property data_split_params: Dict[str, Any]: DataSplitter or train_test_split parameters.

property fit_threshold_on_val: bool: Whether to fit threshold on validation data only.

property max_criticality: int | None: Max criticality value.

property root_cause_analysis: bool: Whether to run ARCANA.

property verbose: int: Verbosity Level of the Autoencoder.

class FaultDetector(config=None, model_directory='fault_detector_model', model_subdir=None)

Bases: FaultDetectionModel

Main class for fault detection in renewable energy assets and power grids.

Parameters:

config (Optional[Config]) – Config object with fault detection configuration. Defaults to None. If None, the models need to be loaded from a path using the load_models method.
model_directory (str, optional) – Directory to save models to. Defaults to ‘fault_detector_model’.
model_subdir (Optional[Any], optional) – Deprecated. This argument will be removed in future versions. Defaults to None.

anomaly_score: AnomalyScore object.

autoencoder: Autoencoder object.

threshold_selector: ThresholdSelector object.

data_preprocessor: DataPreprocessorPipeline object.

save_timestamps: a list of string timestamps indicating when the model was saved.

fit(sensor_data, normal_index=None, save_models=True, overwrite_models=False, fit_autoencoder_only=False, fit_preprocessor=True, **kwargs)

Fit models on the given sensor_data and save them locally and return the metadata.

Parameters:

sensor_data (pd.DataFrame) – DataFrame with the sensor data of one asset for a specific time window. The timestamp should be the index and the sensor values as columns.
normal_index (Optional[pd.Series]) – Series indicating normal behavior as boolean with the timestamp as index. Optional; if not provided, assumes all sensor_data represents normal behavior.
save_models (bool, optional) – Whether to save models. Defaults to True.
overwrite_models (bool, optional) – If True, existing model directories can be overwritten. Defaults to False.
fit_autoencoder_only (bool, optional) – If True, only fit the data preprocessor and autoencoder objects. Defaults to False.
fit_preprocessor (bool, optional) – If True, the preprocessor is fitted. Defaults to True.

Returns:

metadata of the trained model: model_date, model_path, model reconstruction errors of the training and validation data.

Return type:

ModelMetadata

predict(sensor_data, model_path=None, root_cause_analysis=False, track_losses=False, track_bias=False)

Predict with given models for a specific asset

Parameters:

sensor_data (pd.DataFrame) – DataFrame with the sensor data of one asset for a specific time window. The timestamp should be the index and the sensor values as columns.
model_path (Optional[str], optional) – Path to the models to be applied. If None, assumes the attributes data_preprocessor, autoencoder, anomaly_score, and threshold_selector contain fitted instances.
root_cause_analysis (bool, optional) – Whether to run ARCANA. Defaults to False.
track_losses (bool, optional) – Optional; if True, ARCANA losses will be tracked over the iterations. Defaults to False.
track_bias (bool, optional) – Optional; if True, ARCANA bias will be tracked over the iterations. Defaults to False.

Returns:

with the following DataFrames:

predicted_anomalies: DataFrame with a column ‘anomaly’ (bool).
reconstruction: DataFrame with reconstruction of the sensor data with timestamp as index.
deviations: DataFrame with reconstruction errors.
anomaly_score: DataFrame with anomaly scores for each timestamp.
bias_data: DataFrame with ARCANA results with timestamp as index. None if ARCANA was not run.
arcana_losses: DataFrame containing recorded values for all losses in ARCANA. None if ARCANA was not run.
tracked_bias: List of DataFrames. None if ARCANA was not run.

Return type:

FaultDetectionResult

predict_anomalies(scores, x_prepped=None)

Predict anomalies based on anomaly scores.

Return type:: Series

predict_anomaly_score(sensor_data)

Predict the anomaly score.

Return type:: Series

preprocess_train_data(sensor_data, normal_index, fit_preprocessor=True)

Preprocesses the training data using the configured data_preprocessor

Parameters:

sensor_data (pd.DataFrame) – unprocessed training data
normal_index (pd.Series) – unprocessed normal index
fit_preprocessor (bool, optional) – if True the preprocessor is fitted. If False the preprocessor is not fitted and the user has to provide a ready-to-use preprocessor by loading models before calling this function.

Return type:

Tuple[DataFrame, DataFrame, Series]

Returns: tuple of (pd.Dataframe, pd.Dataframe, pd.Series): x_prepped (pd.DataFrame): preprocessed normal training data x: ordered training data (unprocessed) # needed for _fit_threshold y: ordered normal_index (unprocessed) # needed for _fit_threshold

run_root_cause_analysis(sensor_data, track_losses=False, track_bias=False)

Run ARCANA

Parameters:

sensor_data (DataFrame) – pandas DataFrame containing the sensor data which should be analyzed.
track_losses (bool) – optional bool. If True the arcana losses will be tracked over the iterations
track_bias (bool) – optional bool. If True the arcana bias will be tracked over the iterations

Return type:

Tuple[DataFrame, DataFrame, List[DataFrame]]

Returns: Tuple of (pd.DataFrame, pd.DataFrame, List[pd.DataFrame]): df_arcana_bias: pandas dataframe containing the arcana bias. arcana_losses: dictionary containing loss names as keys and lists of loss values as values. tracked_bias: list of pandas dataframe containing the arcana bias recorded over the iterations.

tune(sensor_data, normal_index=None, pretrained_model_path=None, new_learning_rate=0.0001, tune_epochs=10, tune_method='full', save_models=True, overwrite_models=False, data_preprocessor=None)

FaultDetector finetuning via the following methods:: ‘full’ (all autoencoder weights + threshold and anomaly-score scaling will be adapted), ‘decoder’ (only decoder weights + threshold will be adapted), ‘threshold’ (only the threshold and anomaly-score scaling is adapted)
Notes: Parameters tune_epochs and new_learning_rate should be chosen carefully while considering: potential overfitting issues depending on the similarity of the tuning data and the training data.

Parameters:

sensor_data (pd.DataFrame) – DataFrame with the sensor data of one asset for a specific time window. The timestamp should be the index and the sensor values as columns.
normal_index (pd.Series, optional) – Series indicating normal behavior as boolean with the timestamp as index. If not provided, it is assumed all data in sensor_data represents normal behaviour. Defaults to None.
pretrained_model_path (Optional[str], optional) – Path to pretrained model. If None, assumes attributes data_preprocessor, autoencoder, anomaly_score, and threshold_selector contain fitted instances.
tune_epochs (int, optional) – Number of epochs to fine-tune. Defaults to 10.
new_learning_rate (float, optional) – Learning rate to tune the autoencoder with. Defaults to 0.0001.
tune_method (str, optional) – Possible options: ‘full’ (all autoencoder weights + threshold and anomaly-score scaling will be adapted), ‘decoder’ (only decoder weights + threshold will be adapted), ‘threshold’ (only the threshold and anomaly-score scaling is adapted) Defaults to ‘full’.
save_models (bool, optional) – Whether to save models. Defaults to True.
overwrite_models (bool, optional) – If True, existing model directories can be overwritten. Defaults to False.
data_preprocessor (Optional[DataPreprocessor], optional) – Optional prefitted data preprocessor. Useful when using a generic preprocessor for all models.

Returns:

metadata of the trained model with model_date, model_path, model reconstruction errors of the training and validation data.

Return type:

ModelMetadata

quick_fault_detector(csv_data_path, csv_test_data_path=None, train_test_column_name=None, train_test_mapping=None, time_column_name=None, status_data_column_name=None, status_mapping=None, status_label_confidence_percentage=0.95, features_to_exclude=None, angle_features=None, automatic_optimization=True, enable_debug_plots=False, min_anomaly_length=18, save_dir=None)

Analyzes provided data using an autoencoder based approach for identifying anomalies based on a learned normal behavior. Anomalies are then aggregated to events and further analyzed. Runs the entire fault detection module chain in one function call. Sections of this function call are: 1. Data Loading and verification 2. Config selection and optimization 3. AnomalyDetector training 4. AnomalyDetector prediction 5. Event aggregation 6. ARCANA-Analysis of detected events 7. Visualization of output

Parameters:

csv_data_path (str) – Path to a csv-file containing tabular data which must contain training data for the autoencoder. This data can also contain test data for evaluation, but in this case train_test_column and optionally train_test_mapping must be provided.
csv_test_data_path (Optional str) – Path to a csv file containing test data for evaluation. If test data is provided in both ways (i.e. via csv_test_data_path and in csv_data_path + train_test_column) then both test data sets will be fused into one. Default is None.
train_test_column_name (Optional str) – Name of the column which specifies which part of the data in csv_data_path is training data and which is test data. If this column does not contain boolean values or values which can be cast into boolean values, then train_test_mapping must be provided. True values are interpreted as training data. Default is None.
train_test_mapping (Optional dict) – Dictionary which defines a mapping of all non-boolean values in the train_test_column to booleans. Keys of the dictionary must be values from train_test_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. Default is None.
time_column_name (Optional str) – Name of the column containing time stamp information.
status_data_column_name (Optional str) – Name of the column which specifies the status of each row in csv_data_path. The status is used to define which rows represent normal behavior (i.e. which rows can be used for the autoencoder training) and which rows contain anomalous behavior. If this column does not contain boolean values, status_mapping must be provided. If status_data_column_name is not provided, all rows in csv_data_path are assumed to be normal and a warning will be logged. Default is None.
status_mapping (Optional dict) – Dictionary which defines a mapping of all non-boolean values in the status_data_column to booleans. Keys of the dictionary must be values from status_data_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. True values are interpreted as normal status. Default is None.
status_label_confidence_percentage (Optional float) – Specifies how sure the user is that the provided status labels and derived normal_indexes are correct. This determines the quantile for quantile threshold method. Default is 0.95.
features_to_exclude (Optional[List[str]]) – List of column names which are present in the csv-files but which should be ignored for this failure detection run. Default is None.
angle_features (Optional[List[str]]) – List of column names which represent angle-features. This enables a specialized preprocessing of angle features, which might otherwise hinder the failure detection process. Default is None.
automatic_optimization (bool) – If True, an automatic hyperparameter optimization is done based on the dimension of the provided dataset. Default is True.
enable_debug_plots (bool) – If True advanced information for debugging is added to the result plots. Default is False.
min_anomaly_length (int) – Minimal number of consecutive anomalies (i.e. data points with an anomaly score above the FaultDetector threshold) to define an anomaly event.
save_dir (Optional[str]) – Directory to save the output plots. If not provided, the plots are not saved. Defaults to None.

Returns:

FaultDetectionResult object with the following DataFrames:

predicted_anomalies: DataFrame with a column ‘anomaly’ (bool).

reconstruction: DataFrame with reconstruction of the sensor data with timestamp as index.

deviations: DataFrame with reconstruction errors.

anomaly_score: DataFrame with anomaly scores for each timestamp.

bias_data: DataFrame with ARCANA results with timestamp as index. None if ARCANA was not run.

arcana_losses: DataFrame containing recorded values for all losses in ARCANA. None if ARCANA was not run.

tracked_bias: List of DataFrames. None if ARCANA was not run.

and the detected anomaly events as dataframe.

Return type:

Tuple(FaultDetectionResult, pd.DataFrame)