Usage examples

To see interactive demonstrations of the energy fault detection package, refer to the example notebooks in the repository’s notebooks folder.

Energy Fault Detection 

The main interface for the energy-fault-detector package is the FaultDetector class, which needs a configuration object Config.

To create a new FaultDetector model, create a configuration, as described below in the Configuration section, and run:

from energy_fault_detector.fault_detector import FaultDetector
from energy_fault_detector.config import Config

config = Config('configs/base_config.yaml')
fault_detector = FaultDetector(config=config, model_directory='model_directory')

To train new models, you need to provide the input data and call the fit method:

# get data from database / csv / API ...
sensor_data = ...  # a pandas DataFrame with timestamp as index and numerical sensor values as columns
normal_index = ...  # a pandas Series with timestamp as index and booleans indicating normal behaviour
# NOTE: The normal_index is optional, it is used to select training data for the autoencoder.
# If not provided, we assume all data represents normal behaviour. The other data points are used to set a
# threshold for the fault detection.

# If you do not use the models for time series, the index can also be a standard RangeIndex, as long as the
# sensor_data dataframe and the normal_index series have the same index.

model_data = fault_detector.fit(sensor_data=sensor_data, normal_index=normal_index, save_models=True)

# to save model manually:
# fault_detector.save_models('model_name')  # model_name is optional

The trained models are saved locally in the provided model_directory. The fit method returns a ModelMetadata object with the model metadata such as the model date and the model path.

To predict using the trained model, use the predict method:

results = fault_detector.predict(sensor_data=test_sensor_data)

The result is a FaultDetectionResult object with with the following information:

predicted_anomalies: DataFrame with a column ‘anomaly’ (bool).
reconstruction: DataFrame with reconstruction of the sensor data with timestamp as index.
deviations: DataFrame with reconstruction errors.
anomaly_score: DataFrame with anomaly scores for each timestamp.
bias_data: DataFrame with ARCANA results with timestamp as index. None if ARCANA was not run.
arcana_losses: DataFrame containing recorded values for all losses in ARCANA. None if ARCANA was not run.
tracked_bias: List of DataFrames. None if ARCANA was not run.

You can also create a FaultDetector object and load trained models using the load_models method. In this case, you do not need to provide a model_path in the predict method.

from energy_fault_detector.fault_detector import FaultDetector

fault_detector = FaultDetector()
fault_detector.load_models('path_to_trained_models')

# get data from database / csv / API ...
sensor_data = ...
results = fault_detector.predict(sensor_data=sensor_data)

The training configuration is set with a yaml file which contains train specification with model settings, to train new models and root_cause_analysis specification if you want to analyse the model predictions with the ARCANA algorithm. An example:

train:
  data_clipping:  # (optional) if not specified, not applied.
    # clip training data to remove outliers
    lower_percentile: 0.01
    upper_percentile: 0.99
    features_to_exclude:
      - do_not_clip_this_feature

  data_preprocessor:
    # only imputation and scaling are done by default, other steps can be skipped.
    params:
      imputer_strategy: 'mean'
      scale: 'standardize'  # standard scaling or minmax scaling (minmax)
      include_column_selector: true  # whether to apply the ColumnSelector
      max_nan_frac_per_col: 0.05  # ColumnSelector option - drop columns where >5% is NaN.
      features_to_exclude: # ColumnSelector option - features to always exclude
        - feature1
        - feature2
      angles:  # list of angles to transform to sine/cosine values; skipped if none provided
        - angle1
        - angle2
      include_low_unique_value_filter: true  # whether to apply the LowVarianceFilter
      min_unique_value_count: 2  # LowVarianceFilter option - drop columns with less than 3 unique values
      max_col_zero_frac: 0.99  # LowVarianceFilter option - drop columns which are at least 90% zero
      include_duplicate_value_to_nan: false  # whether to apply the DuplicateValuesToNan
      value_to_replace: 0  # DuplicateValuesToNan option - if duplicated, replace duplicates with NaN
      n_max_duplicates: 6  # DuplicateValuesToNan option - if duplicated for n_max_duplicates, replace after 6th value with NaN
      duplicate_features_to_exclude:  # DuplicateValuesToNan option - list of feature to not transform with DuplicateValuesToNan
        - do_not_replace_value_with_nan

  data_splitter:  # (optional) Define block size of train and validation blocks. Optional, if not specified, the defaults are used
    # defaults:
    type: DataSplitter  # or sklearn
    train_block_size: 5040
    val_block_size: 1680  # set val_block_size = 0 to use all data for training

  autoencoder:
    name: 'MultilayerAutoencoder'
    params:
      batch_size: 128
      decay_rate: 0.001  # remove decay_rate+decay_steps for a fixed learning rate
      decay_steps: 10000
      epochs: 10
      layers:
        - 200  # Size of the first and last hidden layer
        - 100  # Size of the second and second to last hidden layer
        - 50  # Size of the third and third to last hidden layer
      code_size: 20  # Size of the bottleneck
      learning_rate: 0.001
      loss_name: 'mean_squared_error'

  anomaly_score:
    name: 'rmse'
    params:
      scale: false

  threshold_selector:
    name: 'fbeta'
    params:
      beta: 0.5

root_cause_analysis:  # (optional) if not specified, no root_cause_analysis (ARCANA) is run
  alpha: 0.8
  init_x_bias: recon
  num_iter: 200

To update the configuration ‘on the fly’ (for example for hyperparameter optimization), you provide a new configuration dictionary via the update_config method:

from energy_fault_detector.config import Config
from copy import deepcopy

config = Config('configs/base_config.yaml')

# update some parameters:
new_config_dict = deepcopy(config.config_dict)
new_config_dict['train']['anomaly_score']['name'] = 'mahalanobis'
config.update_config(new_config_dict)

# or create a new configuration object and model
new_model = FaultDetector(Config(config_dict=new_config_dict))

You can look up the names for the available model classes in the class registry:

from energy_fault_detector import registry

registry.print_available_classes()

from energy_fault_detector.data_preprocessing import DataPreprocessor, DataClipper
from energy_fault_detector.autoencoders import MultilayerAutoencoder
from energy_fault_detector.anomaly_score import MahalanobisScore
from energy_fault_detector.threshold_selectors import FbetaSelector

This allows you to add additional steps or use different data preprocessing pipelines.

An example training pipeline (similar to the FaultDetector class ) would be:

x = ...  # i.e. sensor data
y = ...  # normal behaviour indicator

x_normal = x[y]
# fit data preprocessor on normal data
data_preprocessor = DataPreprocessor(...)
x_normal_prepped = data_preprocessor.fit_transform(x_normal)

# fit autoencoder on normal data
ae = MultilayerAutoencoder(...)
ae.fit(x_normal_prepped)

# create and fit score
anomaly_score = MahalanobisScore(...)
x_prepped = data_preprocessor.transform(x)

# fit on normal data
recon_error_normal = ae.get_reconstruction_error(x_normal_prepped)
anomaly_score.fit(recon_error_normal)
# get scores of all data points
recon_error = ae.get_reconstruction_error(x_prepped)
scores = anomaly_score.transform(recon_error)

# set the threshold and get predictions to evaluate
threshold_selector = FbetaSelector(beta=1.0)  # sets optimal threshold based on F1 score
threshold_selector.fit(scores, y)
# NOTE: the fit-method of the AdaptiveThreshold has slightly different arguments!
anomalies = threshold_selector.predict(scores)

And the inference:

x = ...

x_prepped = data_preprocessor.transform(x)
x_recon = ae.predict(x_prepped)  # reconstruction
x_recon_error = ae.get_reconstruction_error(x_prepped)
scores = anomaly_score.transform(x_recon_error)
anomalies = threshold_selector.predict(scores)  # boolean series indicating anomaly detected

Usage examples

Energy Fault Detection 

Configuration 

Evaluation 

Creating new model classes 

Creating your own pipeline 

Usage examples

Energy Fault Detection

Configuration

Evaluation

Creating new model classes

Creating your own pipeline

Energy Fault Detection 

Configuration 

Evaluation 

Creating new model classes 

Creating your own pipeline 