.. _configuration_guide: Configuration ================================ This page explains how to configure training, prediction, and optional root cause analysis (ARCANA). .. contents:: Table of Contents :depth: 3 :local: Quick start: minimal configuration ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ A minimal configuration that clips outliers, imputes missing values, and scales features: .. include:: basic_config.yaml :literal: This setup: - Applies DataClipper if specified. - Builds a DataPreprocessor with: - ColumnSelector that drops columns with more than 20% NaNs (configurable). - LowUniqueValueFilter that removes constant features by default (configurable). - SimpleImputer (mean) and a scaler (StandardScaler by default). If you do not add an imputer/scaler explicitly, the pipeline ensures mean-imputation and StandardScaler are added. - Trains a default autoencoder (with provided architecture, otherwise default values), with an RMSE anomaly score and a quantile threshold selector. - Runs ARCANA with provided parameters when calling :py:obj:`FaultDetector.predict(..., root_cause_analysis=True) `. If not provided, default ARCANA parameters are used (see :py:obj:`ARCANA docs `). If you leave out the data_preprocessor configuration (i.e., ``data_preprocessor: {}``), a default preprocessing pipeline is generated, which drops constant features, features where >5% of the data is missing, imputes remaining missing values with the mean value and scales the data to zero mean and unit standard deviation. Detailed configuration ^^^^^^^^^^^^^^^^^^^^^^ Below is a more thorough configuration. It shows how to specify preprocessing steps and more model parameters. .. include:: advanced_config.yaml :literal: DataPreprocessor specification """""""""""""""""""""""""""""" A steps-based preprocessing pipeline can be configured under ``train.data_preprocessor.steps``. Each step is a dict with the following keys: - ``name`` (str): the registered step name (see table below). - ``enabled`` (bool, optional): default ``True``; set to ``False`` to skip a step. - ``params`` (dict, optional): constructor arguments for the step. - ``step_name`` (str, optional): custom key for the sklearn pipeline; useful if a step is repeated. Allowed step names and aliases: +-------------------------+-----------------------------------------------+------------------------------------------------+ | Step name | Purpose | Aliases | +=========================+===============================================+================================================+ | column_selector | Drop columns with too many NaNs | \- | +-------------------------+-----------------------------------------------+------------------------------------------------+ | low_unique_value_filter | Drop columns with low variance/many zeros | \- | +-------------------------+-----------------------------------------------+------------------------------------------------+ | angle_transformer | Convert angles to sin/cos pairs | angle_transform | +-------------------------+-----------------------------------------------+------------------------------------------------+ | counter_diff_transformer| Convert counters to differences/rates | counter_diff, counter_diff_transform | +-------------------------+-----------------------------------------------+------------------------------------------------+ | simple_imputer | Impute missing values | imputer | +-------------------------+-----------------------------------------------+------------------------------------------------+ | standard_scaler | Standardize features (z-score) | standardize, standardscaler, standard | +-------------------------+-----------------------------------------------+------------------------------------------------+ | minmax_scaler | Scale to [0, 1] | minmax | +-------------------------+-----------------------------------------------+------------------------------------------------+ | duplicate_to_nan | Replace consecutive duplicate values with NaN | duplicate_value_to_nan, duplicate_values_to_nan| +-------------------------+-----------------------------------------------+------------------------------------------------+ For detailed documentation of the data preprocessor pipeline, refer to the :py:obj:`DataPreprocessor ` docs. Other training configuration sections """"""""""""""""""""""""""""""""""""" - Data clipping: :py:obj:`DataClipper ` supports ``features_to_exclude`` and ``features_to_clip`` for fine-grained control. - Data splitter (``train.data_splitter``): - ``type``: one of ``BlockDataSplitter`` (aliases: ``blocks``, ``DataSplitter``), or ``sklearn`` (alias ``train_test_split``). - For sklearn: ``validation_split`` (float in (0, 1)) and ``shuffle`` (bool). - For :py:obj:`BlockDataSplitter `: ``train_block_size`` and ``val_block_size``. - Early stopping guard: if ``train.autoencoder.params.early_stopping`` is true, you must either set a valid ``validation_split`` in (0, 1), or use :py:obj:`BlockDataSplitter ` with a positive ``val_block_size``. - Autoencoder (``train.autoencoder``): - ``name``: class name in the registry. - ``params``: architecture and training args (e.g., ``layers``, ``epochs``, ``learning_rate``, ``early_stopping``). Refer to the autoencoder class docs (:py:obj:`autoencoders `) for specific params and their defaults. - Anomaly score (``train.anomaly_score``): - ``name``: score name (e.g., ``rmse``, ``mahalanobis``). - ``params``: score-specific parameters. Refer to the :py:obj:`anomaly_scores ` docs. - Threshold selector (``train.threshold_selector``): - ``name``: e.g., ``quantile``, ``fbeta``, etc. - ``fit_on_val``: fit the threshold on validation only. - ``params``: selector-specific parameters (e.g., ``quantile`` for the quantile selector). See the :py:obj:`threshold_selectors ` docs for more info on the settings. Prediction options ^^^^^^^^^^^^^^^^^^ Under ``predict``, you can set: - ``criticality.max_criticality``: cap the calculated criticality (anomaly counter) to this value. Root cause analysis (ARCANA) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ If ``root_cause_analysis`` is provided, ARCANA will attempt to attribute anomalies to specific features using the provided settings. If not provided, default settings are used. For detailed documentation refer to :py:obj:`ARCANA docs `. Old params data preprocessing configuration (for older versions) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Older configurations use params under ``train.data_preprocessor.params``. These remain supported but are deprecated in favor of steps mode. When both ``steps`` and legacy params are present, ``steps`` take precedence and legacy params are ignored with a warning. .. include:: old_config.yaml :literal: