Quick fault detection (CLI)

The quick_fault_detector command-line tool runs the complete fault detection pipeline on a CSV file:

  • load and split data (train / test),

  • configure a normal-behavior model (optionally optimize hyperparameters),

  • train the autoencoder and threshold,

  • predict anomalies on test data,

  • aggregate anomalies into events,

  • optionally run ARCANA on each event,

  • save plots and CSV results to a directory.

Basic usage

quick_fault_detector path/to/train_and_test.csv

Expected CSV format

Internally, the CLI converts your CSV into:

  • a training sensor_data DataFrame (numeric, wide format), and

  • a normal_index Series indicating normal behaviour during training, and

  • a test sensor_data DataFrame used for prediction.

This is done by energy_fault_detector.quick_fault_detection.data_loading.

Internally, the timestamp column is converted to a DatetimeIndex on the training and test DataFrames before passing them to the models.

Your CSV should contain:

  • A timestamp column (optional but recommended), e.g. time_stamp

    • Parsed to a DatetimeIndex if time_column_name is provided in the options.

    • Timestamps are sorted and duplicate rows are dropped upstream by the user if needed.

  • An optional train/test column, e.g. train_test

    • If train_test_column_name is given, it is converted to a boolean mask (True = training, False = test).

    • If the column is not already boolean, the mapping in train_test_mapping is applied.

  • An optional status column, e.g. status or status_type_id

    • If status_data_column_name is given, it is converted to a boolean mask (True = normal, False = non‑normal / faulty / maintenance).

    • If the column is not boolean, the mapping in status_mapping is applied.

    • This becomes the normal_index used for training.

    • If no status column is provided, all training samples are assumed normal, and a warning is logged.

  • All other columns are treated as sensor features:

    • Numeric columns are used directly.

    • Non‑numeric columns that can be cast to numeric are converted (e.g. strings of numbers).

    • Remaining non‑numeric columns are ignored for the model input.

Conceptually:

  • Training data: rows where the train/test mask is True (or all rows if no split is provided).

  • Test data: rows where the train/test mask is False, plus any separate test file provided via csv_test_data_path (these are concatenated).

Example layout:

time_stamp,train_test,status,power,wind_speed,pitch,asset_id
2024-01-01 00:00:00,train,production,  500,  7.3,  2.0, 1234
2024-01-01 00:10:00,train,production,  520,  7.5,  2.1, 1234
2024-01-02 00:00:00,prediction,error,  100,  3.0, 15.0, 1234

With an options YAML like:

time_column_name: "time_stamp"
train_test_column_name: "train_test"
train_test_mapping:
  train: true
  prediction: false
status_data_column_name: "status"
status_mapping:
  production: true
  service: false
  error: false

Options

You can pass an options YAML file to control how data is interpreted:

quick_fault_detector path/to/data.csv --options path/to/options.yaml

The options are defined in energy_fault_detector.main and correspond to the Options dataclass. An example options file:

csv_test_data_path: "path/to/separate_test.csv"
train_test_column_name: "train_test"      # True = training data
train_test_mapping:                       # mapping if train test column is not boolean
  train: true
  prediction: false
time_column_name: "time_stamp"
status_data_column_name: "status"        # True = normal behaviour
status_mapping:                          # mapping if status column is not boolean
  production: true
  service: false
  error: false
status_label_confidence_percentage: 0.95
min_anomaly_length: 18
features_to_exclude:
  - do_not_use_this_feature
angle_features:
  - wind_direction
automatic_optimization: true
enable_debug_plots: false

The underlying helper functions are implemented in:

  • energy_fault_detector.quick_fault_detection.data_loading

  • energy_fault_detector.quick_fault_detection.configuration

  • energy_fault_detector.quick_fault_detection.pipeline

Output

The CLI writes:

  • a combined results figure results.png,

  • CSV files for FaultDetectionResult via FaultDetectionResult.save(),

  • an events.csv file with aggregated anomaly events.

See the notebook Example - Quick Fault Detection.ipynb for an interactive walkthrough.