energy_fault_detector.quick_fault_detection.quick_fault_detector
Quick energy fault detection, to try out the EnergyFaultDetector model on a specific dataset.
- analyze_event(anomaly_detector, event_data, track_losses)
Runs root cause analysis for detected anomaly events.
- Parameters:
anomaly_detector (FaultDetector) – trained AnomalyDetector instance.
event_data (pd.DataFrame) – data from a detected anomaly event
track_losses (bool) – If True ARCANA-losses are tracked.
- Returns:
Series of importance values for every feature in event_data. tracked_losses (pd.DataFrame): Potentially empty DataFrame containing recorded ARCANA losses.
- Return type:
importances (pd.Series)
- quick_fault_detector(csv_data_path, csv_test_data_path=None, train_test_column_name=None, train_test_mapping=None, time_column_name=None, status_data_column_name=None, status_mapping=None, status_label_confidence_percentage=0.95, features_to_exclude=None, angle_features=None, automatic_optimization=True, enable_debug_plots=False, min_anomaly_length=18, save_dir=None)
Analyzes provided data using an autoencoder based approach for identifying anomalies based on a learned normal behavior. Anomalies are then aggregated to events and further analyzed. Runs the entire fault detection module chain in one function call. Sections of this function call are: 1. Data Loading and verification 2. Config selection and optimization 3. AnomalyDetector training 4. AnomalyDetector prediction 5. Event aggregation 6. ARCANA-Analysis of detected events 7. Visualization of output
- Parameters:
csv_data_path (str) – Path to a csv-file containing tabular data which must contain training data for the autoencoder. This data can also contain test data for evaluation, but in this case train_test_column and optionally train_test_mapping must be provided.
csv_test_data_path (Optional str) – Path to a csv file containing test data for evaluation. If test data is provided in both ways (i.e. via csv_test_data_path and in csv_data_path + train_test_column) then both test data sets will be fused into one. Default is None.
train_test_column_name (Optional str) – Name of the column which specifies which part of the data in csv_data_path is training data and which is test data. If this column does not contain boolean values or values which can be cast into boolean values, then train_test_mapping must be provided. True values are interpreted as training data. Default is None.
train_test_mapping (Optional dict) – Dictionary which defines a mapping of all non-boolean values in the train_test_column to booleans. Keys of the dictionary must be values from train_test_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. Default is None.
time_column_name (Optional str) – Name of the column containing time stamp information.
status_data_column_name (Optional str) – Name of the column which specifies the status of each row in csv_data_path. The status is used to define which rows represent normal behavior (i.e. which rows can be used for the autoencoder training) and which rows contain anomalous behavior. If this column does not contain boolean values, status_mapping must be provided. If status_data_column_name is not provided, all rows in csv_data_path are assumed to be normal and a warning will be logged. Default is None.
status_mapping (Optional dict) – Dictionary which defines a mapping of all non-boolean values in the status_data_column to booleans. Keys of the dictionary must be values from status_data_column, and they must have a datatype which can be cast to the datatype of train_test_column. Values of this dictionary must be booleans or at least castable to booleans. True values are interpreted as normal status. Default is None.
status_label_confidence_percentage (Optional float) – Specifies how sure the user is that the provided status labels and derived normal_indexes are correct. This determines the quantile for quantile threshold method. Default is 0.95.
features_to_exclude (Optional[List[str]]) – List of column names which are present in the csv-files but which should be ignored for this failure detection run. Default is None.
angle_features (Optional[List[str]]) – List of column names which represent angle-features. This enables a specialized preprocessing of angle features, which might otherwise hinder the failure detection process. Default is None.
automatic_optimization (bool) – If True, an automatic hyperparameter optimization is done based on the dimension of the provided dataset. Default is True.
enable_debug_plots (bool) – If True advanced information for debugging is added to the result plots. Default is False.
min_anomaly_length (int) – Minimal number of consecutive anomalies (i.e. data points with an anomaly score above the FaultDetector threshold) to define an anomaly event.
save_dir (Optional[str]) – Directory to save the output plots. If not provided, the plots are not saved. Defaults to None.
- Returns:
FaultDetectionResult object with the following DataFrames:
predicted_anomalies: DataFrame with a column ‘anomaly’ (bool).
reconstruction: DataFrame with reconstruction of the sensor data with timestamp as index.
deviations: DataFrame with reconstruction errors.
anomaly_score: DataFrame with anomaly scores for each timestamp.
bias_data: DataFrame with ARCANA results with timestamp as index. None if ARCANA was not run.
arcana_losses: DataFrame containing recorded values for all losses in ARCANA. None if ARCANA was not run.
tracked_bias: List of DataFrames. None if ARCANA was not run.
and the detected anomaly events as dataframe.
- Return type:
Tuple(FaultDetectionResult, pd.DataFrame)